Assertions

Pascal Cuoq - 4th Feb 2014

Jesse Ruderman on assertions and fuzzing

Jesse Ruderman has published a blog post on assertions and how they complement fuzzing. Key quote: “Fuzzers make things go wrong. Assertions make sure we find out.”

Readers of this blog are accustomed to me talking about differential testing where a reference result (say obtained by compiling a random C program with a quality compiler) is used to detect bugs in the target program (say a static analysis framework for C programs). Differential testing imposes constraints: the random input must have a definite meaning and a reference implementation needs to be available to compute this meaning. Often one finds bugs in the reference implementation together with bugs in the target program.

Besides there are deeply burrowed bugs that are difficult to reveal with such a black-box approach. Assertions simplify the problem: if the code being tested contains enough well-chosen assertions bugs can be caught without a reference implementation.

Sometimes an assertion is a reference implementation: some of the assertions in Frama-C are alternative computations of the same intermediate result followed by a comparison of the normally computed result with the alternately computed result. These assertions initially caught bugs in either computation. Since then they have caught many more bugs in hash-consing where the two results are structurally identical but are not shared.

Even when an assertion happens to contain a reference implementation for an intermediate result it saves a lot of time to write the assertion as opposed to producing a complete reference implementation for the whole problem (say interpreting C programs). The alternative implementation in the assertion does not have to be efficient: in Frama-C's value analysis it is considered acceptable to spend 20% of the analysis time executing inefficient reference implementations in assertions.

John Regehr on writing and using assertions

John Regehr just published a blog post too on the subject of assertions. What a coincidence! Key quote: “we have tools that are specifically designed to help us reason about assertions; my favorite one for C code is Frama-C”.

In most of his post John describes executable assertions written in the same language as the program being debugged. In the section quoted above he moves to specific annotation languages to write assertions in. The advantages of using the same language as the programming language for assertions require no elaboration: it's the same language! The programmer already knows it and the reviewer of the code already knows it.

But there is a spectrum of annotation languages for assertions. Eiffel stands its ground somewhere on this spectrum very close to the underlying programming language but with enough good intentions to be noted. I think that the JML annotation language for Java was initially mostly intended for run-time checking but ended up being very popular too as the annotation language used by static analyzers (I would be happy to be corrected if I am getting this wrong). Nearby JML lies E-ACSL an executable subset of ACSL and also a Frama-C plug-in to convert a C program annotated with /*@ ... */ assertions into an instrumented C program that detects at run-time violated assertions. SPARK 2014 aims at making both camps happy.

I should point out for the sake of scientific integrity that I am being slightly cheeky in the choice of the key quote to represent John's post. I recommend you read the whole thing of which the Frama-C endorsement is only a tiny fraction.

Taking advantage of C assertions with Frama-C

One can also use Frama-C in conjunction with existing executable assertions typically implemented with the standard function assert() in header assert.h. The function assert() takes a C expression representing a property that should hold.

One way to take advantage of assertions is to fail to establish that they always hold and warn the user that perhaps they don't. It is easy for anyone who wants this behavior to obtain it. One simply needs to specify assert() thus:

/*@ requires x != 0 ; */
void assert(int x);

With this specification any existing call to the C function assert() intended for the programmer to be executed at run-time is required to have an argument that demonstrably corresponds to a non-null expression. This specification creates a link of sorts between static verification and run-time checking (or expressions intended to be checked at run-time anyway).

Here is an example:

...
/*@ requires x >= 0 ;*/
double sqrt(double x);
int get_int(void);
double dist(double x  double y)
{
  double s = x * x + y * y;
  assert(s >= 0.0);
  return sqrt(s);
}
int main(int c  char **v){
  dist(get_int()  get_int());
}

When this program is analyzed the value analysis detects that the assertion s >= 0.0 helpfully inserted by the programmer as an argument to assert() may not hold. In this particular example the warning is a false positive but never mind that for the moment.

$ frama-c -val t.c
...
t.c:4:[value] Function assert: precondition got status unknown.
...
t.c:1:[value] Function sqrt: precondition got status unknown.

Even more irritating in the example above is that after producing a false positive for assert() the analyzer produces a false positive for the pre-condition of sqrt(). This brings us to another way a static checker could use C assertions to its advantage: it could take them as hints properties that can be assumed to hold in order to avoid warning for problems that would seem to arise later if they did not.

With the user-specified function assert() above the analyzer computes the truth value of the argument s >= 0.0. Because the set of values computed for s is approximated the set of values for the expression s >= 0.0 is {0; 1}. The analyzer can tell that the pre-condition for assert() may not hold but the relationship with the possible values of s is too indirect for it to decide that the forbidden situation corresponds to s negative and that only s positive is allowed.

There exists a better modelization of assert(). It was implemented as a value analysis built-in in order to offer the very functionality we are describing here:

$ frama-c -val -val-builtin assert:Frama_C_assert t.c
...
t.c:11:[value] warning: Frama_C_assert: unknown
...
t.c:1:[value] Function sqrt: precondition got status valid.
...
[value] Values at end of function dist:
  s ∈ [-0. .. 9.22337203685e+18]

On the same example the builtin version of assert detects that it may be passed a null expression and warns about that. These is no improvement there (annotating the example to convince the analyzer that s is always positive is left as an exercise to the reader). Subsequently and this is an improvement with respect to the previous analysis the builtin does its best to incorporate the information provided in the assertion so that it can tell that s is henceforth positive and that the pre-condition of sqrt() is satisfied.

Pascal Cuoq
4th Feb 2014