This web page presents the results of the 2013 2nd International Competition on Software Verification (SV-COMP'13). The competition and results are described in more detail in the competition report. The benchmarks and rules are also available from the competition web site.
In every table cell for competition results, we list the points in the first row and the CPU time (rounded to two significant digits) for successful runs in the second row. The background color is gold for the winner, silver for the second, and bronze for the third.
The entry '--' means that the competition candidate opted-out in the category.
Competition candidate | BLAST 2.7.1 | CPAchecker-Explicit 1.1.10 | CPAchecker-SeqCom 1.1.10 | CSeq 2012-10-22 | ESBMC 1.20 | LLBMC 2012-10-23 | Predator 2012-10-20 | Symbiotic 2012-10-21 | Threader 0.92 | UFO 2012-10-22 | Ultimate 2012-10-25 |
Representing Jury Member | Vadim Mutilin | Stefan Löwe | Philipp Wendler | Bernd Fischer | Lucas Cordeiro | Carsten Sinz | Tomas Vojnar | Jiri Slaby | Andrey Rybalchenko | Arie Gurfinkel | Matthias Heizmann |
Affiliation | Moscow, Russia | Passau, Germany | Passau, Germany | Southampton, UK | Manaus, Brazil | Karlsruhe, Germany | Brno, Czechia | Brno, Czechia | Munich, Germany | Pittsburgh, USA | Freiburg, Germany |
BitVectors 32 files, max score: 60 |
-- | 16 86 s | 17 190 s | -- | 24 480 s | 60 36 s | -75 95 s | -- | -- | -- | -- |
Concurrency 32 files, max score: 49 |
-- | 0 0 s | 0 0 s | 17 270 s | 15 1 400 s | -- | 0 0 s | -- | 43 570 s | -- | -- |
ControlFlowInteger 94 files, max score: 146 |
93 7 100 s | 143 1 200 s | 141 3 400 s | -- | 90 * 17 000 s | -- | -27 650 s | -- | -- | 146 450 s | -- |
ControlFlowInteger-MemPrecise 48 files, max score: 78 |
52 2 500 s | 78 260 s | 78 1 300 s | -- | 69 10 000 s | 78 70 s | -28 650 s | 28 34 s | -- | 78 170 s | 63 540 s |
ControlFlowInteger-MemSimple 46 files, max score: 68 |
41 4 600 s | 65 920 s | 63 2 100 s | -- | 22 6 300 s | -- | 0 0 s | -- | -- | 68 280 s | -- |
DeviceDrivers64 1237 files, max score: 2419 |
2 338 2 400 s | 2 340 9 700 s | 2 186 30 000 s | -- | 2 233 46 000 s | -- | 0 0 s | 870 230 s | -- | 2 408 2 500 s | -- |
FeatureChecks 118 files, max score: 206 |
130 42 s | 159 180 s | 159 160 s | -- | 132 86 s | 166 250 s | 166 6.0 s | 23 11 s | -- | 74 46 s | -- |
HeapManipulation 28 files, max score: 48 |
-- | 22 30 s | 22 29 s | -- | -- | 32 310 s | 40 2.3 s | -- | -- | -- | -- |
Loops 79 files, max score: 122 |
35 550 s | 51 370 s | 50 1 400 s | -- | 94 5 000 s | 112 540 s | 36 17 s | -- | -- | 54 750 s | -- |
MemorySafety 36 files, max score: 54 |
-- | 0 0 s | 0 0 s | -- | 3 1 300 s | 24 38 s | 52 61 s | -- | -- | -- | -- |
ProductLines 597 files, max score: 929 |
652 16 000 s | 655 7 300 s | 915 3 100 s | -- | 914 1 200 s | 926 3 600 s | 865 7 500 s | -- | -- | 929 5 000 s | -- |
SystemC 62 files, max score: 87 |
34 2 600 s | 61 3 500 s | 58 1 800 s | -- | 57 8 500 s | 49 1 900 s | -6 1 400 s | 0 0 s | -- | 65 3 000 s | 45 4 800 s |
Overall 2315 files, max score: 3791 |
80 30 000 s | 2 030 22 000 s | 2 090 41 000 s | -- | 1 919 81 000 s | -- | 799 9 700 s | -- | -- | -208 12 000 s | -- |
*) Note: The score in meta-categories is not the sum of scores of the sub-categories (normalization). The run time is the sum of raw run times of the sub-categories, rounded to two significant digits.
The figures above illustrate, for each category, the competition results using score-based quantile functions over all benchmark verification tasks of the respective category. Such a function graph for a competition candidate yields, with each data point (x, y), the maximum run time y for the n fastest correct verification runs with the accumulated score x of all incorrect results and those n correct results. A logarithmic scale is used for the time range from 1 s to 1000 s, and a linear scale is used for the time range between 0 s and 1 s. The graphs are decorated with symbols at some data points. This visualization is helpful in analyzing the different aspects of verification quality, as outlined below. We illustrate a few aspects in the following for the category Overall.
Amount of Successful Verification Work. Results for verification tasks have different value, depending on the "difficulty" of the verification task and on the correctness of the verification answer. This value is modeled by a community-agreed scoring schema (cf. rules page). The x-width of a graph above illustrates the value (amount) of successful verification work that the verifier has done. The verifier "Ufo 2012-10-22" is the best verification tool in this respect, in category Overall, because its quantile function has the largest x-width. This tool solved the most verification tasks, as also witnessed by the large score entries in the tables.
Amount of Incorrect Verification Work. The left-most data point yields the total negative score of a verifier (x-coordinate), i.e., the total score resulting from incorrect verification results. The more right a verifier starts its graph, the less incorrect results it has produced. The two CPAchecker-based candidates start with a very low negative score, and thus, have computed the least value of incorrect results, as also witnessed by the entry for category Overall in the detailed table: the verifiers reported wrong results for only 4 out of 2315 verification tasks.
Overall Quality Measured in Scores. The x-coordinate of the right-most data point of each graph represents the total score of the verification work (and thus, the total value) that was completed by the corresponding competition candidate. This measure identifies the winner of the category, as also reported in the tables above (the x-coordinates match the score values in the table).
Characteristics of the Verification Tools. The y-coordinate of the left-most data point indicates the verification time for the "easiest" verification task for the verifier, and the y-coordinate of the right-most data point indicates the maximal time that the verifier spend on one single successful task (this is mostly just below the time limit). The area below a graph is proportional to the accumulated run time for all successfully solved verification tasks. Also the shape of the graph can give interesting insights: for example, the graphs for CPAchecker-SeqCom and Esbmc 1.20 show the characteristic bend that occurs if a verifier, after a certain period of time (100 s for CPAchecker-SeqCom and 450 s for ESBMC 1.20), performs a sequential restart with a different strategy.
The above text is adopted from the competition report [1].