← September in security October in security →

Exact Gap Computation for Code Coverage Metrics in ISO-C

Pascal Cuoq - 16th Oct 2012

Comparing static analysis tools is (still) difficult

Earlier this year of 2012, some of my colleagues and I took the opportunity to point out that, as a research community, we are not doing a satisfactory job of comparing static analysis tools. This article and blog post were concerned with independent third-party comparisons. Another context in which each of us can be lead to do comparisons and to try to be as objective as possible is the “related work” section of any traditional non-benchmark article.

“Related work” sections

Dirk Richter and Christian Berg have designed a technique that apparently it makes some sense to compare to what Frama-C's value analysis allows. Here is their summary of their comparison between their tool and the value analysis:

No sophisticated examples can be handled by Frama-C’s value analysis. Some of the examples tested even cause runtime-errors in Frama-C itself thus it is not reliable.

Richter and Berg's technique appears to tackle the problem of estimating how much code must actually be covered for all the code that can be covered to have been covered.

Frankly it seems to me that the researchers should be comparing their tool to existing test-generation tools. They are much closer in objectives and in the means deployed to reach these objectives. One such tool that I know of because it is developped locally is PathCrawler. I am sure there are plenty of others. I am not sure why they spend a paragraph on Frama-C's value analysis when it is absolutely not documented as trying to do this.

Possible comparisons

However I am interested in tool comparisons so this is a good opportunity to understand how these comparisons work and how we can make them better.

First the claim “No sophisticated examples can be handled by Frama-C’s value analysis” is interesting because of course what really kills advanced techniques in practice is that they do not scale to even moderate size programs. The authors point out that their own work may only be “practical” for embedded systems. Coincidentally Frama-C's value analysis was designed to work well on embedded code too since we too felt this was a domain where practicality was within reach. This makes me curious as to what embedded code examples the authors experimented on before reaching their conclusion. If these examples reveal weaknesses in Frama-C's value analysis on embedded code we would definitely like to hear about them. I would summarize this into one additional item that we did not list in our “Benchmarking static analyzers” article (although it was implict in other recommendations we did list): when doing comparisons conditions should be made explicit and input programs should be made available. Here this would mean providing the Frama-C version and commandline in addition to the test programs themselves. For such programs and commandline that “cause runtime-errors in Frama-C itself” a good place to provide them is our Bug Tracking System. Instructions for reporting a bug are displayed on crashes. Otherwise a tarball with all the embedded C code examples the authors used would be fine. A URL could be provided in the article.

Second the authors claim that Frama-C is not reliable. This makes me want to investigate because precisely in 2012 we published a peer-reviewed article on the subject “Frama-C is reliable here is how we used random program generation to make it so”. I am eager to get my paws on the authors' implementation so as to evaluate its reliability since this is supposed to be one aspect where it does better. Unfortunately the implementation does not appear to be available yet.

Third it is always a difficult exercise to compare one's own tool to a different unfamiliar one. One risks ending up with a slightly unfair conclusion even if one tries one's hardest to be objective. I was remarking about the difficulty of finding “fair” inputs for the comparison of static analyzers in another previous post. The difficulty for static analyzers is that inputs are programs: programming languages are so expressive that two programs even in the same language can differ widely.

There as a remedy I suggested tool authors first try their own tool on the programs that the other analyzer works well on. For someone working on another technique and unfamiliar with the strengths and weaknesses of Frama-C's value analysis Csmith-generated examples are one obvious possibility. Another possibility would be to use the programs in the 2012 RERS challenge. These programs contains assert(0); calls and for each an objective is to tell whether it is reachable. This seems to me to be the same question as whether a test suite needs to cover the assert(0); call. The good news is that both on Csmith-generated programs and in the RERS challenge Frama-C's value analysis can be made to be both sound and complete just like Richter and Berg claim their technique is. It should be simple and informative to compare the results of their tool to the results obtained with the value analysis then. Below are our solutions for problems 1-9.

Reachable assertions in problems 1-9 of the RERS competition

Below are a list of reachable lines for each problem. The assert(0) calls not listed are unreachable.

Problem1.c

error_20: assert(0);
error_47: assert(0);
error_32: assert(0);
error_37: assert(0);
error_56: assert(0);
error_33: assert(0);
error_57: assert(0);
error_50: assert(0);
error_35: assert(0);
error_15: assert(0);
error_38: assert(0);
error_21: assert(0);
error_44: assert(0);
globalError: assert(0);

Problem2.c

error_50: assert(0);
error_45: assert(0);
error_59: assert(0);
globalError: assert(0);
error_43: assert(0);
error_13: assert(0);
error_16: assert(0);
error_44: assert(0);

Problem3.c

error_45: assert(0);
error_35: assert(0);
error_52: assert(0);
error_39: assert(0);
error_9: assert(0);
error_37: assert(0);
error_43: assert(0);
error_31: assert(0);
error_28: assert(0);
error_27: assert(0);
error_50: assert(0);
error_13: assert(0);
error_26: assert(0);
globalError: assert(0);

Problem4.c

error_12: assert(0);
error_19: assert(0);
error_31: assert(0);
error_39: assert(0);
error_52: assert(0);
error_6: assert(0);
error_58: assert(0);
error_40: assert(0);
error_4: assert(0);
error_38: assert(0);
error_45: assert(0);
error_11: assert(0);
error_26: assert(0);
globalError: assert(0);
error_9: assert(0);
error_17: assert(0);
error_32: assert(0);
error_35: assert(0);
error_55: assert(0);
error_36: assert(0);
error_14: assert(0);
error_18: assert(0);
error_13: assert(0);
error_15: assert(0);
error_27: assert(0);

Problem5.c

error_0: assert(0);
error_38: assert(0);
error_57: assert(0);
error_55: assert(0);
error_58: assert(0);
error_32: assert(0);
error_13: assert(0);
error_51: assert(0);
error_33: assert(0);
error_48: assert(0);
error_18: assert(0);
error_39: assert(0);
error_1: assert(0);
error_41: assert(0);
error_37: assert(0);
globalError: assert(0);
error_11: assert(0);
error_26: assert(0);
error_15: assert(0);
error_40: assert(0);
error_36: assert(0);
error_44: assert(0);
error_30: assert(0);
error_47: assert(0);
error_24: assert(0);

Problem6.c

error_12: assert(0);
error_21: assert(0);
error_11: assert(0);
error_44: assert(0);
error_1: assert(0);
error_36: assert(0);
error_0: assert(0);
error_2: assert(0);
error_38: assert(0);
error_48: assert(0);
error_37: assert(0);
error_4: assert(0);
error_59: assert(0);
error_10: assert(0);
error_20: assert(0);
error_5: assert(0);
error_15: assert(0);
error_27: assert(0);
error_33: assert(0);
error_9: assert(0);
error_29: assert(0);
error_47: assert(0);
error_56: assert(0);
error_24: assert(0);
error_58: assert(0);
globalError: assert(0);

Problem7.c

error_58: assert(0);
error_47: assert(0);
error_5: assert(0);
error_48: assert(0);
error_19: assert(0);
error_39: assert(0);
error_36: assert(0);
error_40: assert(0);
error_35: assert(0);
error_31: assert(0);
error_9: assert(0);
error_42: assert(0);
error_7: assert(0);
globalError: assert(0);
error_11: assert(0);
error_20: assert(0);
error_44: assert(0);
error_46: assert(0);
error_18: assert(0);
error_6: assert(0);
error_23: assert(0);
error_30: assert(0);
error_3: assert(0);
error_37: assert(0);
error_15: assert(0);

Problem8.c

error_48: assert(0);
error_2: assert(0);
error_49: assert(0);
error_15: assert(0);
error_7: assert(0);
error_55: assert(0);
error_51: assert(0);
error_50: assert(0);
error_43: assert(0);
error_10: assert(0);
error_29: assert(0);
error_24: assert(0);
error_1: assert(0);
error_26: assert(0);
error_6: assert(0);
error_5: assert(0);
error_46: assert(0);
error_13: assert(0);
error_4: assert(0);
error_37: assert(0);
globalError: assert(0);
error_34: assert(0);
error_25: assert(0);
error_28: assert(0);
error_59: assert(0);

Problem9.c:

error_46: assert(0);
error_57: assert(0);
error_36: assert(0);
error_19: assert(0);
error_6: assert(0);
error_10: assert(0);
error_34: assert(0);
error_15: assert(0);
error_32: assert(0);
error_41: assert(0);
error_11: assert(0);
error_35: assert(0);
error_2: assert(0);
error_20: assert(0);
error_3: assert(0);
globalError: assert(0);
error_44: assert(0);
error_38: assert(0);
error_51: assert(0);
error_54: assert(0);
error_56: assert(0);
error_53: assert(0);
error_47: assert(0);
error_59: assert(0);
error_8: assert(0);