Test driving static analysis tools

Pascal Cuoq - 31st May 2011

I have recently added an article \Test driving static analysis tools in search of C code vulnerabilities" by George Chatzieleftheriou and Panagiotis Katsaros on the wiki page of publications relevant to Frama-C. This article benchmarks a number of static analysis tools that it tests for precision (warning about real issues only) and recall (warning about all issues). Frama-C's value analysis is one of the static analysis tools. The article can be found in electronic form by googling for its title.

As the reader of this blog probably knows the value analysis when used properly is supposed to be sound. "Sound" is synonymous with 100% recall. Interestingly the article does not find that recall is the forte of the value analysis: in figure 3 the value analysis is the most precise of all tested tools but it does not have 100% recall. In fact the two commercial offerings in the benchmark have better recall than Frama-C's value analysis.

So what about this recall then?

I maintain my claim that the value analysis is supposed to be sound when used properly. The tested version is Beryllium released in 2009 which already had a strong implementation but may have been lacking in documentation. I don't think there was any Skein tutorial then let alone a second tutorial using QuickLZ as an example. However if we had made it clearer that the value analysis was primarily intended for the formal verification of embedded C code with neither dynamic allocation nor generally any calls to standard functions that are part of a desktop programming environment this might have discouraged the authors to include it in the comparison. So this is all for the best.

By the way I expect that many of the cases where the value analysis failed to identify an issue can be traced to the use of standard functions although I have not yet dug enough in the benchmark to be sure. Standard functions are natural suspects: the benchmark clearly uses them and although we provide several imperfect modelizations for malloc() they are so little documented that we can expect the authors of the benchmark used none. In this eventuality malloc() is treated by the analyzer like any unknown function that takes an integer and return a pointer. Now that there are tutorials you can see that they all start by listing the C functions that are missing from the analysis projects and by wondering for each how it can be provided.

The human factor

I was very happy to find this benchmark. I should explain: the 100% recall or "soundness" claim is utterly boring to test for. The value analysis was designed to be sound from the beginning (when used properly for its intended usage) and since its authors have only been fighting for precision. As a result and because they are only human there are a lot of what I would call positive tests (check that the value analysis succeeds at not warning about a complex but safe piece of code). There are not nearly enough negative tests (check that the value analysis does warn about an unsafe piece of code) because there is no progress to measure there. The value analysis implementors have been guilty of a basic very forgivable psychological error but even if we forgive them the situation must be fixed. This benchmark is a good start towards a collection of "negative tests" for the value analysis.

In fact I have started looking at the qualitative tests from the benchmar [removed dead link] and it is rather simple to check the value analysis against expected results. Because I want to fix the lack of "negative tests" I focused on lines with the comment DANGER that the value analysis should find. Here is the test script I currently have:

#!/bin/sh
# export FRAMACLIBC=/usr/local/share/frama-c/libc
export FRAMACLIBC=/Users/pascal/ppc/share/libc
for F in general/division_zero/*.c
do
    ~/ppc/bin/toplevel.opt -val -cpp-command "gcc -C -E -nostdinc -I. -I$FRAMACLIBC" $F | grep "division by zero"
    grep -n DANGER $F /dev/null
    echo
done
for F in integers/overflow/*.c
do
    ~/ppc/bin/toplevel.opt -val -cpp-command "gcc -C -E -nostdinc -I. -I$FRAMACLIBC" -val-signed-overflow-alarms $F | grep "assert"
    grep -n DANGER $F /dev/null
    echo
done
for F in arrays/*/*.c
do
    ~/ppc/bin/toplevel.opt -val -cpp-command "gcc -C -E -nostdinc -I. -I$FRAMACLIBC" -val-signed-overflow-alarms $F | grep "assert"
    grep -n DANGER $F /dev/null
    echo
done

It works rather well:

general/division_zero/divisionzero1.c:11:[kernel] warning: division by zero: assert 0 ≢ 0;
general/division_zero/divisionzero1.c:11:	d = 5 / 0;	/* DANGER */
general/division_zero/divisionzero1_alias1.c:21:[kernel] warning: division by zero: assert c2 ≢ 0;
general/division_zero/divisionzero1_alias1.c:21:	d = 10 / c2;	/* DANGER */
general/division_zero/divisionzero1_alias2.c:23:[kernel] warning: division by zero: assert c2 ≢ 0;
general/division_zero/divisionzero1_alias2.c:23:	d = 10 / c2;	/* DANGER */

In each of the two-line snippets above the first one is from grepping in the analysis results and the second one is from grepping for DANGER in the program itself.

But it is not so easy to compare expected and obtained results for what the authors of the benchmark call context-sensitive tests. This is more complex than it seems at first sight and I may need to dedicate an entire blog post to it:

general/division_zero/divisionzero1_context1.c:7:[kernel] warning: division by zero: assert val ≢ 0;
general/division_zero/divisionzero1_context1.c:18:	d = divide_values(c);	/* DANGER */

The file in question is:

/* Division by zero error */
/* Context sensitive example */
/* Different functions */
int divide_values(int val)
{
        return (10 / val); /* line 7 */
}
int main()
{
        int d;
        int b = 5;
        int c = 0;
        d = divide_values(b);   /* OK */
        d = divide_values(c);   /* DANGER */ /* line 18 */
        return 1;
}

Finally I have already fixed some bugs thanks to this benchmark so if you try to reproduce these steps at home you may want to use this file for __fc_machdep.h and this one for limits.h in directory libc in substitution for the files provided with Carbon.

Pascal Cuoq
31st May 2011