Topics for SV-COMP


Replicability


Q: How do I make sure that my verifier was correctly executed?
   What should the teams do with the results that were sent to them?

A: The first step is to make the following sanity checks:
   - Did the verifier start properly?
   - Is the log file accessible?
   - Is the witness accessible?
   - Did the verifier find all components that are required?


Q: Is the category opt-out used correctly and are the options that my verifier requires correctly set?

A: It is the responsibility of the participants to make sure that the benchmark definition is correct.
   The benchmark definition that is used for VERIFIER is available in the repository:
   https://gitlab.com/sosy-lab/sv-comp/bench-defs/-/blob/main/benchmark-defs/VERIFIER.xml


Q: Is my verifier correctly installed on the competition machines and 
   are the results correctly interpreted?

A: It is the responsibility of the participants to make sure that the tool-info module for BenchExec works correctly.
   This module specifies which paths of the verifier archive should be installed on the competition machines
   and how to parse and translate the result from the log output.
   The tool-info module that is used for VERIFIER is available in the BenchExec repository:
   https://github.com/sosy-lab/benchexec/tree/main/benchexec/tools/VERIFIER.py
   Please make sure to test your tool-info module according to the instructions here:
   https://github.com/sosy-lab/benchexec/blob/main/doc/tool-integration.md#testing-the-tool-integration


Q: Where do I find the verification tasks that are used in SV-COMP?

A: The verification tasks for SV-COMP are available in the open SV-COMP repository at:
   https://gitlab.com/sosy-lab/benchmarking/sv-benchmarks/-/tree/main/c

   The organizer and jury partition the verification tasks into categories.
   Those categories will be described here: http://sv-comp.sosy-lab.org/2019/benchmarks.php
   The current plan for the category assignment is available there.


Q: What directories does my verifier see on the competition machines?

A: Conceptual description is available here:
   https://www.sosy-lab.org/research/pub/2017-STTT.Reliable_Benchmarking_Requirements_and_Solutions.pdf
   Technical details are available on the help page ('runexec -h').

   Here is an example call to BenchExec's runexecutor and a description of it below:

   python -m benchexec.runexecutor
   --container
   --read-only-dir /
   --hidden-dir /home
   --hidden-dir /sys/kernel/debug
   --hidden-dir /var/lib/cloudy
   --overlay-dir /etc
   --overlay-dir /tmp/vcloud-vcloud-master/worker/working_dir_73f51db6-c032-4848-9c92-a94446513a4f
   --dir /tmp/vcloud-vcloud-master/worker/working_dir_73f51db6-c032-4848-9c92-a94446513a4f/bin-2018/utaipan
   --output /tmp/vcloud-vcloud-master/worker/working_dir_73f51db6-c032-4848-9c92-a94446513a4f/cloudBenchmarkOutput.txt
   --output-directory /tmp/vcloud-vcloud-master/worker/working_dir_73f51db6-c032-4848-9c92-a94446513a4f/bin-2018/utaipan
   --debug
   --maxOutputSize 2000000
   --timelimit 60
   --memlimit 7000000000
   --memoryNodes 0
   --cores 3,7
   --
   ./Ultimate.py
   --spec ../../sv-benchmarks/c/Termination.prp
   --file ../../sv-benchmarks/c/product-lines/minepump_spec1_product06_true-unreach-call_false-termination.cil.c 
   --full-output
   --architecture 32bit

   Description:
   - starts BenchExec's module 'runexecutor'
   - in container mode [https://github.com/sosy-lab/benchexec/blob/main/doc/container.md]
   - making all files available in the container as 'read only', with the exceptions below
   - hides directory '/home' and makes a fresh copy available inside the container
   - same for '/sys/kernel/debug'
   - same for '/var/lib/cloudy'
   - makes the contents of '/etc' available inside the container and redirect all writes to a fresh directory
   - same for /tmp/vcloud-vcloud-master/worker/working_dir_73f51db6-c032-4848-9c92-a94446513a4f
   - working directory is set to the value given to '--dir'
   - all output of the verifier (stdout and stderr) is written to the file specified with '--output'
   - all files that are written to directory '.' (default of option '--result-files')
     are copied to directory specified with '--output-directory' on the host (i.e., available to the user)
   - debug info is produced
   - the maximal output size is limited to 2 MB as per SV-COMP rules
   - time limit is set (60 s was for the pre-runs)
   - mem limit is set (7 GB was for the pre-runs)
   - memory node 0 is used
   - cores with ids 3 and 7 are used (pre-runs were restricted to two cores)
   - the rest is specific to the verifier


Results


Q: Where can I find the output of my verifier?

A: The status column in the HTML tables contains links to the log files.
   If the results do not match the expectations, this is where all necessary information
   to track down the problem can be found.
   Ensure that enough detail is printed into the log file that is useful to understand the run result.


Q: Which version of my verifier was used for the last verification run?

A: The exact version is contained in the header of each result table.
   The tool-info module is asked to produce a version string for the verifier.
   Also, a good verifier prints exact version information at the beginning, like
   "This is Verifier BEST version 3.14 compiled 2016-11-27T1725."
   such that the version information is also available in the log files.


Q: Were all components that my verifier needs correctly found? Which version of component X was used?

A: Check the log output that the verifier produced.
   Any interesting exception should result in a line of output.
   Also, a verifier can print the exact version of a component (gcc, llvm, ...)
   in order to later find out what exact components were used.


Q: What data is made available from the verification runs?

A: After each verification run, the participating team receives an e-mail with links to the results of their verifier.
   The XML files are the raw data that BenchExec produced.
   The HTML tables contain less information for a convenient overview --- if more details are needed,
   please fall back to the XML files.  The XML files are easy to convert to costumized HTML pages
   using BenchExec's tablegenerator (https://github.com/sosy-lab/benchexec/blob/main/doc/table-generator.md).

   The XML and HTML files with 'merged' in the name are for those categories for which witness validation was done.
   The status (result) is corrected according to the result of the witness validation,
   for those categories for which witness validation is mandatory.
   There are additional columns about the witness-validation process.
   Witness validation was done with all available witness validators.

   For convenience, there are a couple of new features in the table that should significantly support inspection of results:
   the column with name of verification task contains links to the source code of the verification task,
   column 'wit' contains links to the witness that the verifier produced,
   column 'inspect' contains links to the witness-inspection pages,
   column 'status' contains links to the log file that the verifier produced, and
   column 'wit*_status' contains links to the log files that the witness validation produced.


Witness Validation


Q: How can I inspect information about my witnesses?

A: To inspect a witness, click on the 'inspect' link, which guides the user to a separate
   page that gives more information on the witness.


Q: Where can I find an example witness for a certain program?

A: The witness-inspection page contains a list of witnesses that other verifiers produced.


Q: How can I validate my witness myself?

A: The witness-inspection page contains a link via which you can use CPAchecker's online validation service
   to validate a witness.
   You can also download and install the available validators on your own machine.


Q: Was the witness that my verifier produced, found and made available for validation?
   It seems the witness that my verifier produced was not found.

A: Column 'wit' contains a link to your witness.
   Click on it to find out if your witness is available.

   Check that your witness file ends with .graphml and that the required fields are present.
   Important note: It is assumed that there is only one .graphml file produced, which is searched and used.


Q: Since when is it required to produce verification witnesses at SV-COMP?

A: It was always required to generate witnesses for errors.
   Starting SV-COMP 2015, the violation witnesses are required to be machine-readable
   and re-checkable, such that the organizer can validate if a violation witness is a false positive or not.

   This was decided by the SV-COMP community and applied only to the results "FALSE",
   i.e., the answer "TRUE" was still possible without witness.

   In SV-COMP 2014, there was a demo category on witness validation,
   were the format was experimented with and the first two witness validators were developed on.

   Since SV-COMP 2017, it is required to also provide witnesses for the result "TRUE"
   (after it was tested in SV-COMP 2016 as demo category).


Q: What is the penalty if my verifier produces a wrong (or no) witness, 
   i.e., one that cannot be validated by an available witness validator?

A: Each answer to a verification task in SV-COMP is getting a score (see rules).
   If no witness validator is able to validate your witness, it is deemed `unconfirmed'.

   For correctness witnesses, unconfirmed results are assigned 1 point (cf. scoring schema).


Q: How is the result of a verification run obtained and the score computed?

A: The execution platform that SV-COMP uses for reliable benchmarking is BenchExec:
   https://github.com/dbeyer/BenchExec
   Here is a discussion of the problems that occur with some other approaches and why this method should be used:
   http://www.sosy-lab.org/~dbeyer/Publications/2015-SPIN.Benchmarking_and_Resource_Measurement.pdf

   After obtaining the results and measurements with BenchExec, the scoring is computed using
   the table under "Evaluation by Scores and Runtime" on the rules page:
   http://sv-comp.sosy-lab.org/2017/rules.php
   An illustration of the algorithm for computing the score of a single verification task is available there.
   The weighted score for meta categories computed following the definition on the rules page.


Q: Can I contribute an own witness validator?

A: Absolutely.  The SV-COMP community is badly wanting witness validators.
   If you have one, please send to the organizer.

   If in the end we have more than 2 or 3 witness validators, the jury might make a selection
   or apply several witness validators.


Q: How many witness-based result validators are available?

A: There are three projects that currently work on result validators:
   CPAchecker, FShell, and Ultimate Automizer.


Q: Who proposed the witness format?

A: The witness format was developed by the people that developed the witness validators,
   and which experimented in the demo category of SV-COMP 2014.

   Since then, several extensions and improvements were suggested and incorporated.
   Recently, there was a conference presentation about witness validation at ESEC/FSE
   by the developers of the witness validators of CPAchecker and Ultimate Automizer.


Q: Which categories participate in result validation?

A: Certain verification tasks are excluded from result validation.
   For the details please visit https://sv-comp.sosy-lab.org/2019/rules.php


Q: The witness validator X does not succeed in re-verifying a verification task.
   Therefore, witness validation should not be done.

A: Witness validation will be done, for sure.  This decision will not be reverted,
   in particular after the successful application in SV-COMPs 2015 and 2016.

   Either take a different witness validator, if your favorite witness validator does not work,
   or argue for exclusion of the verification task from SV-COMP.


Q: My verifier wants to use a particular hint for guiding the witness validator. Can I use it?

A: Sure.  Just make sure that one of the witness validators can consume it.