Easy value analysis example: putnum()

Pascal Cuoq - 12th Aug 2011

If some of the posts in this blog ever get re-organized into a course-style document, this one will be near the beginning, because it's simple and self-contained.

The program

This example was offered on our bug-tracking system:

int putchar(int); 
void print(const char *ptr) 
{ 
  while (*ptr) 
    putchar(*ptr++); 
} 
void putnum(unsigned long n) 
{ 
  char buf[10], *ptr = buf + sizeof(buf); 
  *--ptr = 0; 
  do { 
    *--ptr = \0123456789ABCDEF"[n&15]; 
    n >>= 4; 
  } while(n); 
  print(ptr); 
} 

A few minutes of looking at the code should allow you to infer for yourself that function putnum() is for printing unsigned hexadecimal numbers.

The precision issue

The reporter was disappointed that the value analysis emitted a number of alarms for this code which seems trivial. This brings to mind the quip by Richard Feynman that "mathematicians can prove only trivial theorems because every theorem that's proved is trivial". Excuse the approximate quote: this is another book that I sop up in audio form and these things are even worse than paper for searching. But perhaps the case deserves to be called trivial indeed because in the original program as provided by the reporter in the BTS putnum() is called only with a single concrete value:

int main(void) 
{ 
  putnum(0x12345); 
  putchar('\ 
'); 
  return 0; 
} 

This should help the value analysis: it likes to analyze functions with only the values for arguments and globals that appear to be really passed to them at execution. However remember that most classic bug-finding static analyzers would analyze function putnum() in abstracto and tell you what they think pertinent to tell you regardless of whether the function is called only with 0x12345 or indeed never called at all — and therefore like the plane that always stays on the ground is extremely safe.

Back to the value analysis. What are these alarms that it finds?

$ frama-c -val putnum.c 
... 
putnum.c:16:[kernel] warning: out of bounds write. assert \valid(ptr); 
putnum.c:7:[kernel] warning: accessing uninitialized left-value *ptr: assert(Ook) 
putnum.c:8:[kernel] warning: accessing uninitialized left-value *tmp: assert(Ook) 
putnum.c:7:[kernel] warning: out of bounds read. assert \valid(ptr); 
putnum.c:8:[kernel] warning: out of bounds read. assert \valid(tmp); 
... 

Fair enough. The value analysis does emit a number of alarms. Some of them are nearly duplicates. For instance variable tmp is only in function print() the temporary introduced in the translation of *ptr++. It's by design that the analysis no longer even tries to avoid multiple warnings in this case. On the whole it does limit its study of function putnum() to the case where n is 0x12345. The imprecision comes from somewhere else: the loop inside putnum(). By default the analyzer summarizes the loop in an unfavorable way. Variable ptr is only inferred to range over {{ &buf + [-1..8] }} at the point where it's dereferenced so it is not guaranteed to be valid:

loop_summary.png

Since the analyzer doesn't know which buf[] element exactly is written to all it can say is that each element between 0 and 8 may be [48..70] or UNINITIALIZED:

loop_buf.png

Note that 48 is '0' and 70 is 'F' so it is not doing too bad really despite emitting all these alarms.

The solution

What's the step after checking what library functions are used in the program and thinking how to offer equivalent source code to the analyzer? Ah yes it is: "if the analysis is fast enough use option -slevel to make it more precise".

$ frama-c -val putnum.c -slevel 50 
... 
[value] Values for function putnum: 
          n ∈ {0; } 
          buf[0..3] ∈ UNINITIALIZED 
             [4] ∈ {49} 
             [5] ∈ {50} 
             [6] ∈ {51} 
             [7] ∈ {52} 
             [8] ∈ {53} 
             [9] ∈ {0} 
          ptr ∈ {{ &buf + {4} }} 

The alarms all disappear. We can make sure this is not caused by a freakish accident but because the value analysis understand exactly what goes on during execution of this program: the hexadecimal representation of 0x12345 in the contents of array buf[] in a snapshot taken at the end of putnum().

Generalizing

The exercise can be made more interesting by considering the safety of putnum() when called with an unknown argument:

unsigned long unknown(); 
int main(void) 
{ 
  putnum(unknown()); 
  putchar('\ 
'); 
  return 0; 
} 

I have to admit I made a mistake here when replying to the bug report: I did not try it. Biased by hours spent on difficult case studies I thought that the value analysis would need to have different execution paths (where the program is valid for slightly different reasons) manually separated in order to conclude that the program is safe. Something like the following hint placed in a strategic location according to the principles in this post.

/*@ assert n <= 15 || 16 <= n <= 65535 || 65536 <= n <= ... ; 

Such an annotation typical of those needed for a larger-scale value-analysis-based verification is not required here (but it does not hurt). Even without it the value analysis separates the cases where the number is one-digit two-digits etc. long by itself in few millions of processor cycles — that is in a fraction of a second:

$ frama-c -val putnum.c -slevel 500 
... 
[value] Values for function putnum: 
          n ∈ {0; } 
          buf[0] ∈ UNINITIALIZED 
             [1..7] ∈ [48..70] or UNINITIALIZED 
             [8] ∈ [48..70] 
             [9] ∈ {0} 
          ptr ∈ {{ &buf + [1..8] }} 

For an annotationless analysis a lot of information is contained in this snapshot of the program's state. The value analysis is strongly suggesting that it is safe to reduce the size of buf[] by one since ptr never goes lower than buf+1 and buf[0] remains uninitialized for all possible putnum() arguments. All numbers are printed with at least one digit (buf[8] is guaranteed to be initialized and in the range '0'..'F' and ptr points no higher than buf[8]). On the other hand elements 1 to 7 are not guaranteed to be written to.

But wait: the value analysis seems to be saying that it's safe to reduce the size of buf[] by one but perhaps this code is not intended only for 32-bit architectures? Perhaps the additional element serves to make the function safe on a 64-bit architecture. Let us try it out:

$ frama-c -val putnum.c -slevel 500 -machdep x86_64 
... 
putnum.c:16:[kernel] warning: out of bounds write. assert \valid(ptr); 
... 
[value] Values for function putnum: 
          n ∈ {0; } 
          buf[0..7] ∈ [48..70] or UNINITIALIZED 
             [8] ∈ [48..70] 
             [9] ∈ {0} 
          ptr ∈ {{ &buf + [0..8] }} 

On a 64-bit architecture all the elements of buf[] seem to be useful but there may be a problem with ptr at line 16. Determining whether this is a true or false alarm will be left as an exercise to the reader (download file putnum.c).

Zaynah Dargaye provided insights based on an early version of this post.

Pascal Cuoq
12th Aug 2011