Participating in the RERS 2012 competition: reachability questions

Pascal Cuoq - 20th Aug 2012

This post is a follow-up to this one. It begins to describe our participation using Frama-C's value analysis in the RERS 2012 competition.

In theory

The competition features nine target programs of increasing difficulty. Some of the questions to be answered about the target programs are reachability questions. This is the case for all calls to assert(0); in the program: for each call the question is exactly whether it can be reached in an execution.

In addition some LTL formulas are offered about each target program. The goal here is to individually diagnose each LTL formula as true or false when applied to the program. Some of the LTL formulas actually express reachability properties. Consider the following property from the list that applies to problem 1 (small-easy):

(G ! oU) 
output U does never occur 

This formula holds if a program state in which U is output is never reached. It suffices to instrument the program a little bit to recognize when this happens and the problem is reduced to a reachability problem only in reverse: the property holds if the states are unreachable and does not hold if one of the states can be reached.

In practice

The good news is that the value analysis is at its core designed to answer reachability questions. More specifically it is designed to answer unreachability questions: it computes supersets of the values of variables that can reach each point in the target program and it warns if dangerous instructions can be reached. It can guarantee that an assert(0); call is not reached.

Preparing the program for analysis as described in the manual and many times on this blog may result for instance in the following changes: removing the headers defining a function assert() removing the printf() calls so that one does not need to provide the function.

--- Problem1.c~	2012-07-31 01:32:42.000000000 +0200 
+++ Problem1.c	2012-08-08 10:15:00.000000000 +0200 
@@ -1 5 +1 12 @@ 
-#include <stdio.h>  
-#include <assert.h> 
+void assert(int x) 
+  if (!x) 
+    { 
+      Frama_C_show_each_assert_reached(); 
+      /*@ assert \false; */ 
+    } 
 	// inputs 
 	int a= 1; 
@@ -582 14 +589 11 @@ 
         // read input 
         int input; 
-        scanf("%d"  &input);         
+	input = any_int(); 
+/*@ assert input == 1 || input == 2 || input == 3 || input == 4 || input == 5 || input == 6 ; */ 
         // operate eca engine 
         output = calculate_output(input); 
-        if(output == -2) 
-        	fprintf(stderr  "Invalid input: %d"  input); 
-        else if(output != -1) 
-			printf("%d"  output); 

We assume that an execution that reaches an assert(0); is interrupted and that input at each cycle is a number between 1 and 6. This corresponds to our understanding of the description of the programs in the competition.

As previously hinted the value analysis is designed to answer reachability questions negatively only. It is intended to guarantee that dangerous operations are unreachable (i.e. the program is safe). In doing so it computes whether any statement is reachable even non-dangerous ones and with what variable values each of these statement is reached.

In static analysis this is called “soundness”. The value analysis does not in general guarantee that any statement is reachable. However for the particular program above instrumented in this particular way with the right command-line options the value analysis is “complete” in addition to being sound: in addition to only diagnosing as unreachable statements that are actually unreachable it only diagnoses as reachable statements that are actually reachable.

Being complete in addition to being sound means that the value analysis is able to answer the question (G ! oU) negatively as well as positively. If the value analysis was only sound it could only guarantee that a particular assert(0); call is not reached or that U is never output. Being complete means that it can in addition guarantee that an assert(0); call is reached in some execution or that U is output in some execution.

More formulas to evaluate

There are more complex LTL formulas in the list that express reachability properties sometimes modulo trivial instrumentation. Below are two of them:

(! oZ WU (oZ WU (! oZ WU (oZ WU (G ! oZ))))) 
output Z occurs at most twice 
(! oU WU oZ) 
output U does never occur before output Z 

The second formula expresses that a state where U is emitted without Z ever having been emitted is not reached. To verify this property one simply needs to add to the target program a boolean variable that remembers whether Z has been emitted before.

The first is a complex LTL formula with a simple English interpretation. The program can be instrumented with a counter for the number of times Z is output and the property to check is whether a state where Z has been emitted three times is reached.

Do not make completeness claims about Frama-C's value analysis at home

You may have noticed that the value analysis' manual rarely offers precision guarantees and in particular never describes the conditions in which it is complete.

Therefore an independent participant in the competition who wasn't a value analysis developer would not be able to assert that the value analysis is behaving completely. Such a participant could only hope to answer reachability questions negatively finding that (G ! oU) is true (when it is actually true) or finding that a particular assertion is never reached (when it is actually unreachable). This is the direction that matters when trying to establish that error states are unreachable to verify the safety of software. The competition organizers implicitly acknowledge that this kind of property can be difficult to establish otherwise when they say about weighting answers:

The weighting scheme gives you a way to express your personal confidence in your results. e.g. if you have found a feasable path to some error this is safe and you should weight it 10. Lifeness properties or stating that certain errors are non-feasible is of course more risky.

Here we are using a sound analyzer to establish that certain errors are non-feasible so in our case there is little risk involved in attributing the maximum weight to these answers. There is in fact more risk when answering that some execution path leads to some error since the value analysis even when it happens to be complete is not designed to produce a trace that we could double-check. But we are confident for these errors too because of all the Csmith testing that we applied earlier in the year.

And by the way this blog post does not count as documentation: it is still undocumented in what circumstances Frama-C's value analysis is complete. The documentation is long enough trying to describe how to use the value analysis soundly; and this is despite soundness being its defining property. There is no point in enumerating the very specific circumstances in which it is complete. The conditions are unlikely to occur simultaneously in real life and if you think that this competition contradicts this statement have another look at the shape of the competition's target programs.

Pascal Cuoq
20th Aug 2012