We have a Csmith-proof framework
Pascal Cuoq - 30th Jul 2011Csmith that I mentioned earlier in this blog is a random generator of C programs. That much sounds easy but it generates only well-defined programs which two or more compilers have no excuse for compiling into executables that produce different results. And it generates varied and interesting enough C programs that it is possible to find lots of compiler bugs this way. These two constraints are obviously difficult to reconcile and the people behind Csmith have done very impressive work.
Frama-C is a collection of analyses and transformations for C programs. In analyzing and transforming it gives meaning to these programs. There are various ways of checking that the meaning given is consistent with the meaning implemented in a reference C compiler so that Frama-C bugs can be found with this powerful tool too.
Anne Pacalet Benjamin Monate Virgile Prevosto Boris Yakobowski and I have been fixing roughly 50 bugs in the framework's front-end pretty-printer and in the value analysis constant propagation and slicing plug-ins during the last 6 months. Fifty may seem like a large number but you have to consider the different functions the bugs were spread over. I am sure that Csmith found as many bugs in at least one open-source and widely respected compiler. As of today the web page claims "more than 350 previously-unknown compiler bugs" and the number is probably outdated since it increases weekly. The number does not include our own 50 as far as I know.
More importantly all the Frama-C plug-ins for which I found a way to take advantage of Csmith seem to be completely immune now excepting a couple of minor limitations that can be circumvented for additional Csmith testing until they are definitely fixed. The difficult design constraint for a Csmith-based Frama-C testing script is that it must produce no false positives: the script must be able to run day and night on tens of processors and only require human review when it has definitely found a bug in Frama-C (it is also acceptable to find bugs in the reference C compiler or in Csmith itself but if the script taps my shoulder for me to look at something there had better be a bug somewhere).
That does not mean that Frama-C's value analysis or slicing plug-ins have no bug left. There are many possible bugs that could not be found this way because of the limitations of the testing methodology (especially the "no false positive" constraint) and indeed in these plug-ins we were fixing bugs not found by Csmith during the same period. Still I think it is a noteworthy milestone.
Lastly Csmith-imperviousness is a moving target. There was a time when the front-end and value analysis plug-in had already reached Csmith-proofitude (it is better to do the layers from bottom to top so as to avoid cascading bug reports where for instance the slicing is wrong because the value analysis feeds it bad values because of a typing bug in the front-end). Two weeks later Xuejun Yang the main implementor of Csmith had improved it so that it generated new constructs and found new problems in these previously Csmith-robustified parts. I hear Xuejun is now writing his PhD. Hopefully this will allow us a few months to boast.