Results of the Competition

This web page presents the results of the 2013 2nd International Competition on Software Verification (SV-COMP'13). The competition and results are described in more detail in the competition report. The benchmarks and rules are also available from the competition web site.

All Results

In every table cell for competition results, we list the points in the first row and the CPU time (rounded to two significant digits) for successful runs in the second row. The background color is gold for the winner, silver for the second, and bronze for the third.

The entry '--' means that the competition candidate opted-out in the category.

Competition candidate BLAST 2.7.1 CPAchecker-Explicit 1.1.10 CPAchecker-SeqCom 1.1.10 CSeq 2012-10-22 ESBMC 1.20 LLBMC 2012-10-23 Predator 2012-10-20 Symbiotic 2012-10-21 Threader 0.92 UFO 2012-10-22 Ultimate 2012-10-25
Representing Jury Member Vadim Mutilin Stefan Löwe Philipp Wendler Bernd Fischer Lucas Cordeiro Carsten Sinz Tomas Vojnar Jiri Slaby Andrey Rybalchenko Arie Gurfinkel Matthias Heizmann
Affiliation Moscow, Russia Passau, Germany Passau, Germany Southampton, UK Manaus, Brazil Karlsruhe, Germany Brno, Czechia Brno, Czechia Munich, Germany Pittsburgh, USA Freiburg, Germany
BitVectors
32 files, max score: 60
--
16
86 s
17
190 s
--
24
480 s
60
36 s
-75
95 s
--
--
--
--
Concurrency
32 files, max score: 49
--
0
0 s
0
0 s
17
270 s
15
1 400 s
--
0
0 s
--
43
570 s
--
--
ControlFlowInteger
94 files, max score: 146
93
7 100 s
143
1 200 s
141
3 400 s
--
90 *
17 000 s
--
-27
650 s
--
--
146
450 s
--
ControlFlowInteger-MemPrecise
48 files, max score: 78
52
2 500 s
78
260 s
78
1 300 s
--
69
10 000 s
78
70 s
-28
650 s
28
34 s
--
78
170 s
63
540 s
ControlFlowInteger-MemSimple
46 files, max score: 68
41
4 600 s
65
920 s
63
2 100 s
--
22
6 300 s
--
0
0 s
--
--
68
280 s
--
DeviceDrivers64
1237 files, max score: 2419
2 338
2 400 s
2 340
9 700 s
2 186
30 000 s
--
2 233
46 000 s
--
0
0 s
870
230 s
--
2 408
2 500 s
--
FeatureChecks
118 files, max score: 206
130
42 s
159
180 s
159
160 s
--
132
86 s
166
250 s
166
6.0 s
23
11 s
--
74
46 s
--
HeapManipulation
28 files, max score: 48
--
22
30 s
22
29 s
--
--
32
310 s
40
2.3 s
--
--
--
--
Loops
79 files, max score: 122
35
550 s
51
370 s
50
1 400 s
--
94
5 000 s
112
540 s
36
17 s
--
--
54
750 s
--
MemorySafety
36 files, max score: 54
--
0
0 s
0
0 s
--
3
1 300 s
24
38 s
52
61 s
--
--
--
--
ProductLines
597 files, max score: 929
652
16 000 s
655
7 300 s
915
3 100 s
--
914
1 200 s
926
3 600 s
865
7 500 s
--
--
929
5 000 s
--
SystemC
62 files, max score: 87
34
2 600 s
61
3 500 s
58
1 800 s
--
57
8 500 s
49
1 900 s
-6
1 400 s
0
0 s
--
65
3 000 s
45
4 800 s
Overall
2315 files, max score: 3791
80
30 000 s
2 030
22 000 s
2 090
41 000 s
--
1 919
81 000 s
--
799
9 700 s
--
--
-208
12 000 s
--

*) Note: The score in meta-categories is not the sum of scores of the sub-categories (normalization). The run time is the sum of raw run times of the sub-categories, rounded to two significant digits.

Ranking by Category

BitVectors

1. LLBMC 2012-10-23
2. ESBMC 1.20
3. CPAchecker-SeqCom 1.1.10

Concurrency

1. Threader 0.92
2. CSeq 2012-10-22
3. ESBMC 1.20

ControlFlowInteger

1. UFO 2012-10-22
2. CPAchecker-Explicit 1.1.10
3. CPAchecker-SeqCom 1.1.10

DeviceDrivers64

1. UFO 2012-10-22
2. CPAchecker-Explicit 1.1.10
3. BLAST 2.7.1

FeatureChecks

1. Predator 2012-10-20
2. LLBMC 2012-10-23
3. CPAchecker-SeqCom 1.1.10

HeapManipulation

1. Predator 2012-10-20
2. LLBMC 2012-10-23
3. CPAchecker-SeqCom 1.1.10

Loops

1. LLBMC 2012-10-23
2. ESBMC 1.20
3. UFO 2012-10-22

MemorySafety

1. Predator 2012-10-20
2. LLBMC 2012-10-23
3. ESBMC 1.20

ProductLines

1. UFO 2012-10-22
2. LLBMC 2012-10-23
3. CPAchecker-SeqCom 1.1.10

SystemC

1. UFO 2012-10-22
2. CPAchecker-Explicit 1.1.10
3. CPAchecker-SeqCom 1.1.10

Overall

1. CPAchecker-SeqCom 1.1.10
2. CPAchecker-Explicit 1.1.10
3. ESBMC 1.20

Score-Based Quantile Functions for Quality Assessment

The figures above illustrate, for each category, the competition results using score-based quantile functions over all benchmark verification tasks of the respective category. Such a function graph for a competition candidate yields, with each data point (x, y), the maximum run time y for the n fastest correct verification runs with the accumulated score x of all incorrect results and those n correct results. A logarithmic scale is used for the time range from 1 s to 1000 s, and a linear scale is used for the time range between 0 s and 1 s. The graphs are decorated with symbols at some data points. This visualization is helpful in analyzing the different aspects of verification quality, as outlined below. We illustrate a few aspects in the following for the category Overall.

Amount of Successful Verification Work. Results for verification tasks have different value, depending on the "difficulty" of the verification task and on the correctness of the verification answer. This value is modeled by a community-agreed scoring schema (cf. rules page). The x-width of a graph above illustrates the value (amount) of successful verification work that the verifier has done. The verifier "Ufo 2012-10-22" is the best verification tool in this respect, in category Overall, because its quantile function has the largest x-width. This tool solved the most verification tasks, as also witnessed by the large score entries in the tables.

Amount of Incorrect Verification Work. The left-most data point yields the total negative score of a verifier (x-coordinate), i.e., the total score resulting from incorrect verification results. The more right a verifier starts its graph, the less incorrect results it has produced. The two CPAchecker-based candidates start with a very low negative score, and thus, have computed the least value of incorrect results, as also witnessed by the entry for category Overall in the detailed table: the verifiers reported wrong results for only 4 out of 2315 verification tasks.

Overall Quality Measured in Scores. The x-coordinate of the right-most data point of each graph represents the total score of the verification work (and thus, the total value) that was completed by the corresponding competition candidate. This measure identifies the winner of the category, as also reported in the tables above (the x-coordinates match the score values in the table).

Characteristics of the Verification Tools. The y-coordinate of the left-most data point indicates the verification time for the "easiest" verification task for the verifier, and the y-coordinate of the right-most data point indicates the maximal time that the verifier spend on one single successful task (this is mostly just below the time limit). The area below a graph is proportional to the accumulated run time for all successfully solved verification tasks. Also the shape of the graph can give interesting insights: for example, the graphs for CPAchecker-SeqCom and Esbmc 1.20 show the characteristic bend that occurs if a verifier, after a certain period of time (100 s for CPAchecker-SeqCom and 450 s for ESBMC 1.20), performs a sequential restart with a different strategy.

The above text is adopted from the competition report [1].

References

  1. D. Beyer. Second Competition on Software Verification (Summary of SV-COMP 2013). In Proc. TACAS, LNCS 7795, Springer, 2013.