Unspecified behaviors and derived analyses

Pascal Cuoq - 17th Oct 2010

Prologue

The C standard(s) specifies a minimum of things that all C compilers must agree on. For the sake of efficiency, many syntactically correct constructs are left without an unambiguous meaning.

The worst way for a construct not to have an unambiguous meaning is to be undefined behavior. An example of undefined behavior is invalid pointer access. Not all platforms have the required MMU, and even when they do, the MMU does not catch all invalid memory uses. Accessing an invalid pointer is not guaranteed to crash the program cleanly. A program that does so may \crash cleanly" or continue executing in an inconsistent state. The program may crash a little bit later because of the inconsistency or it may appear to work fine and return a wrong result or it may reformat your harddrive. All these behaviors and others are encompassed by "undefined behavior".

Compilers and compilation-platform-specific static analysis

Recently C compilers have started to take advantage of undefinedness in the C99 standard for the sake of aggressive optimization. They apply program transformations that are only valid in the case of defined behaviors and leave the program doing strange unpredictable things in other cases. John Regehr has a very good explanation of C's undefined behaviors that spans three blog posts.

There are other more benign ways for a construct not to have an unambiguous meaning though. Implementation-defined behaviors must be documented and consistent for a given compiler. In the case of unspecified behaviors one of several plausible alternatives must happen. In Frama-C we have reclassified some unspecified and even some undefined behaviors as implementation-defined. For instance the value analysis assumes by default that signed arithmetic overflows have been put there on purpose by the programmer and that he intended 2's complement modulo results. In this default configuration the value analysis does not warn about these overflows and propagate 2's complement modulo results. John Regehr convinced us however that this was not satisfactory so there is now an option -val-signed-overflow-alarms for when the programmer is not supposed to rely on this behavior (signed arithmetic overflows are technically undefined behavior). The Jessie plug-in allows roughly the same distinction with the defaults reversed (by default you get alarms for integer overflows). In short a Frama-C analysis session is parameterized by a description of the compilation platform that includes genuine or reclassified implementation-defined choices.

That still leaves of course some unspecified and undefined behaviors that the value analysis or Jessie plug-ins do indeed warn about and prove the absence of when they do not warn about them. Each construct that may cause some kind of unspecifiedness or undefinedness is the object of a warning containing the word assert in the value analysis log. Terms and conditions apply.

This post is about the one use case where the user does not have to worry about the alarms So if you were tired of the ongoing thread about the verification of Skein-256 good news! This is something else.

Derived analyses

Well I say "one" use case but it's really a family of use cases. I am talking about all the analyses that are helping the user to comprehend the code and that can therefore not mandate that all the alarms have been eliminated through the kind of laborious inspection and refinement illustrated in the Skein verification thread. Instead the values and therefore the results of the analysis that relies on them are guaranteed to apply to well-defined executions only (executions that do not falsify any of the assertions that have been emitted). As long as this is kept in mind it may even not be necessary to look at these alarms. In fact it is standard for high-level analyzers of C programs to assume the absence of undefined behaviors or apply only to executions without undefined behaviors. Fast alias analyses intended to scale to millions of lines of C for instance assume that you do not transmit an address to pointer p by writing it in *(&q+1) where q is the address below p in memory. These analyses and any other analyses that rely on the first ones do not tell you whether you should have reasons to worry about this happening. If you should worry they do not give you a list of dangerous constructs in the target program. The value analysis provides such a list of alarms but you can ignore it if you wish and then you get results with similar caveats.

Well almost. We are finally arriving to the core of the problem.

What is a well-defined execution?

The value analysis is more precise and more aggressive than the typical large-scale alias analysis in its handling of what look like possible quirks in the analyzed program. The reason the -val-signed-overflow-alarms option mentioned above is not enabled by default is that several test programs became meaningless. The analysis was in effect saying "look this program is sure to produce an overflow here at the third line there is no point in going further until this has been resolved" and refused to analyze the rest of the program. The typical large-scale context-insensitive path-insensitive alias analysis does not have this problem but that's only because it is too imprecise to have it. It is still making assumptions such that you do not access p when you read or write from *(&q+1).

So does this mean that all high-level analyses of C programs are meaningless unless the absence of undefined behaviors in target program is also verified(1)? In the style of Dan Simmons' Hyperion cycle we are going to leave this crucial question for another volume that is a future blog post.

(1) which it rarely is in practice. As pointed out earlier most of them do not even tell you whether you should worry and what to worry about.

Pascal Cuoq
17th Oct 2010