CompCert gets a safe interpreter mode

Pascal Cuoq - 29th Aug 2011

Safe C interpreters galore

The last release of CompCert includes a safe C interpreter based on the compiler's reference semantics. Like KCC and Frama-C's value analysis it detects a large class of undefined behaviors and some unspecified ones. Like them in order to remain useful it needs to make some difficult choices regarding what it accepts and what it rejects. C programs are seldom written entirely in the "strictly conforming" subset of the language even when their authors advertise them as "portable". I once found on the Internet a "portable memcpy()" that started by decrementing one of its pointer arguments (a definite undefined behavior if the pointer points to the beginning of a block). This is one case that CompCert's interpreter and Frama-C's value analysis both accept. To warn they both wait until you either dereference or compare the invalid pointer. I can't speak for Xavier but the reason why the value analysis waits is that too many programs do this and too many programmers do not even accept that it is wrong when you point it out to them (to be clear I am not talking about the industrial users interested in Frama-C for verification of critical C programs here; that would be cause for worry. I am talking about the random C programmer you may find anywhere else). It would be very easy to modify the value analysis to warn as soon as the invalid pointer is computed if someone desired strict enforcement of the standard.

While Xavier was implementing this C interpreter I was coincidentally working on an interpreter mode for the value analysis. This mode simply deactivates a number of features that are not useful for the step-by-step interpretation of deterministic C programs: recording memory states and detecting infinite loops. The option does not try to change the representation of memory states that are still sets of possible valuations for all the variables in memory — in interpreter mode the states just happen all to be singletons and singletons that the interpreter represents slightly inefficiently. In fact after it was done I realized that this option would double up as a useful tool for some of the brute-force numerical precision analyses that are possible with the value analysis so that right now I am struggling to find a name for the option. Since a benchmark was coming up I also de-activated all consistency checks inside the value analysis. Lastly I linked Frama-C with a miracle library that I need to tell you more about in another blog post.

Benchmark

And there it was: the benchmark. This is an informal little thing strictly for fun. There is no point in doing more serious measurements yet since Xavier may still improve the representation of memory states in CompCert's interpreter. I used the three files below:

SHA1 compression of a couple of small buffers with a final check of the result: sha1.c
a program generated by the Csmith random C program generator. The program is in the intersection of what both interpreters support: csmith.c
the program main(){ signed char c; for(c=1; c; c++); return c;}

For reference the commands used are (the Frama-C commands will not work with Carbon; you'll need to remove the unknown options; option -obviously-terminates can be approximated in Carbon with -slevel 999999999 -no-results):

$ time ~/ppc/bin/toplevel.opt -val -obviously-terminates \
   -cpp-command "gcc -C -E -Dprintf=Frama_C_show_each" sha1.c
...
[value] Called Frama_C_show_each({{ &"Test `%s': %s" }} {{ &test_input_1 }}
                                 {{ &"passed" }})
...
[value] Called Frama_C_show_each({{ &"Test `%s': %s" }} {{ &test_input_2 }}
                                 {{ &"passed" }})
...
$ time ccomp -interp sha1.c
...
Test `abc': passed
Test `abcdbcdecdefdefgefghfghighijhijkijkljklmklmnlmnomnopnopq': passed
...

$ time ~/ppc/bin/toplevel.opt -val -obviously-terminates -big-ints-hex 0 \
   -cpp-command "gcc -C -E -Dprintf=Frama_C_show_each -Iruntime" csmith.c
...
[value] Called Frama_C_show_each({{ &"checksum = %X" }} {0x3F65930A})
...
$ time ccomp -interp -Iruntime csmith.c
checksum = 3F65930A

$ time ~/ppc/bin/toplevel.opt -val -obviously-terminates c.c
$ time ccomp -interp c.c

Results

The times shown are the medians of three runs.

For the small program with the signed char ccomp -interp takes 0.016s and the value analysis 0.096s. I am tempted to attribute the value analysis' bad score to startup time: Frama-C is probably heavier than a compiler by now and it loads all its available plug-ins at start-up. I could have improved this by disabling plug-ins that are not necessary for interpreting C programs this way.

For the SHA1 mini-benchmark the value analysis takes 0.388s and ccomp -interp takes 0.296s.

For the Csmith random program:

Value analysis: 0.848s

CompCert interpreter: 0.616s

Conclusion

I think this is not bad at all for either of the two interpreters. I have examples where the value analysis does better than CompCert but it wouldn't be fair to include them at this point because they poke at a known weakness in CompCert's interpreter. On the other hand I do not think that the value analysis will ever do much better than the results above. If I stumble on improvements that look like they'll improve both the analyzer and the interpreter in the value analysis I'll implement these immediately but I do not think I should compromise the analyzer's performance to improve the interpreter. It is slow enough as it is. Just ask Boris.