Value analysis tutorial, part 2

Pascal Cuoq - 7th Oct 2010

In order to create a tradition of providing solutions to previous quizzes, this post is a partial answer to the question in this one about Frama-C's value analysis.

To recap: at the end of the Boron tutorial we arrive at the main function below. This function is useful as an analysis context for the verification of hashing a 80-char message with Skein-256:

#include "skein.h" 
#define HASHLEN (32) 
u08b_t msg[80]; 
void main(void) 
{ 
  int i; 
  u08b_t hash[HASHLEN]; 
  for (i=0; i<80; i++) 
    msg[i] = Frama_C_interval(0  255); 
  Skein_256_Ctxt_t skein_context;  
  Skein_256_Init(&skein_context  HASHLEN * 8); 
  Skein_256_Update(&skein_context  msg  80); 
  Skein_256_Final( &skein_context  hash); 
} 

The context analysis above allocates as a global array and initializes a message msg with arbitrary contents. It also allocates a structure skein_context as a local variable to be used by the library for ongoing computations.

Our goal is now to generalize the analysis to arbitrary numbers of Skein_256_Update calls. Let us replace the single Skein_256_Update call in the above context by the loop below. This loop uses Frama_C_interval(0 1) as its condition. The value analysis is guaranteed to take into account the possible results 0 and 1 for all these calls. This ensures that the results of the analysis apply to all the different execution paths we are now interested in although these paths differ in length.

  while (Frama_C_interval(0 1)) 
    { 
      for (i=0; i<80; i++) 
        msg[i] = Frama_C_interval(0  255); 
      Skein_256_Update(&skein_context  msg  80); 
    } 

Using the same command-line as in the tutorial generates warnings about possible undefined behaviors:

frama-c -val -slevel 100 *.c .../share/builtin.c > log

grep ssert log 
lib.c:10:[kernel] warning: out of bounds write. assert \valid(tmp); 
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + n): assert(TODO) 
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t_0 )1)): assert(TODO) 
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t_0 )2)): assert(TODO) 
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t_0 )3)): assert(TODO) 
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t_0 )4)): assert(TODO) 
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t_0 )5)): assert(TODO) 
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t_0 )6)): assert(TODO) 
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t_0 )7)): assert(TODO) 

It is possible to increase the argument of option -slevel in order to unroll more iterations and postpone the moment in the analysis when precision is lost and alarms occur. However since the while loop is infinite there is no value for -slevel that lets this loop be analyzed with precision to its end.

In fact we do not want this while loop unrolled. We want the value analysis to do standard value analysis magic and extract properties that are true at all iterations of the loop without studying each iteration individually. In other words we want the value analysis not to unroll the while loop (but to continue unrolling the other finite loops so as to remain as precise as it was for the single Skein_256_Update call).

One way to obtain this result is to move the for loop outside function main so that the while loop is the only one that remains in main and to use the option -slevel-function introduced in Frama-C Boron. This option allows "semantic unrolling" to be tuned function by function. Here we can use it to tell the value analysis to unroll loops except inside main.

The complete analysis context becomes:

#include "skein.h" 
#define HASHLEN (32) 
u08b_t msg[80]; 
void arbitrarize_msg(void) 
{ 
  for (int i=0; i<80; i++) msg[i]=Frama_C_interval(0  255); 
} 
u08b_t hash[HASHLEN]; 
void main(void) 
{ 
  int i; 
  u08b_t hash[HASHLEN]; 
  Skein_256_Ctxt_t skein_context;  
  Skein_256_Init(&skein_context  HASHLEN * 8); 
  while (Frama_C_interval(0 1)) 
    { 
      arbitrarize_msg(); 
      Skein_256_Update(&skein_context  msg  80); 
    } 
  Skein_256_Final( &skein_context  hash); 
} 

The new command-line to use is:

frama-c -val -slevel 100 -slevel-function main:0 *.c > log

The analysis takes a minute or so.

grep ssert log 
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + n): assert(TODO) 
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t )1)): assert(TODO) 
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t )2)): assert(TODO) 
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t )3)): assert(TODO) 
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t )4)): assert(TODO) 
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t )5)): assert(TODO) 
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t )6)): assert(TODO) 
skein.c:53:[kernel] warning: accessing uninitialized left-value *(src + (n + (size_t )7)): assert(TODO) 
lib.c:10:[kernel] warning: out of bounds write. assert \valid(tmp); 
lib.c:18:[kernel] warning: out of bounds write. assert \valid(tmp); 

The warnings from the previous attempt are still there and they have a surprise guest at lib.c line 18 (in function memset). But since the only thing we have changed since the last attempt is to prevent a loop from being unrolled this was to be expected (you may have noticed that the analysis was a little bit faster also as a result of not unrolling the while loop).

Now let us try to get rid of these warnings for good. For this purpose we need information about what is causing them. Plunging in the code with frama-c-gui is a good way to obtain some of that but this approach lacks the ability to display values of variables for different calls to Skein_256_Update separately. Purely as an experiment in order to have a place to click to see the state at the first second ... fifth iteration let us change the while loop to:

      arbitrarize_msg(); 
      Skein_256_Update(&skein_context  msg  80); 
      arbitrarize_msg(); 
      Skein_256_Update(&skein_context  msg  80); 
      arbitrarize_msg(); 
      Skein_256_Update(&skein_context  msg  80); 
      arbitrarize_msg(); 
      Skein_256_Update(&skein_context  msg  80); 
      arbitrarize_msg(); 
      Skein_256_Update(&skein_context  msg  80); 
  while (Frama_C_interval(0 1)) 
    { 
      arbitrarize_msg(); 
      Skein_256_Update(&skein_context  msg  80); 
    } 

Using the command-line frama-c-gui -val -slevel 100 -slevel-function main:0 *.c the first surprise is that we have accidentally eliminated most of the alarms by hand-unrolling the while loop. Indeed only the following now remains:

lib.c:10:[kernel] warning: out of bounds write. assert \valid(tmp); 

We can stick to the initial plan and observe the states after each of the first five iterations to understand what happened. This is no time to celebrate victory yet. For starters the verification is not done until all alarms have been eliminated. Also our investigative instrumentation has eliminated some execution paths — the executions paths with less than five iterations — that would still need to be studied separately before the verification can be called finished.

Here are the contents of variable skein_context before the first second and third calls:

skein_context.h.hashBitLen ∈ {256; } 
             .h{.bCnt; .T[0]; } ∈ {0; } 
             .h.T[1] ∈ {8070450532247928832; } 
             .X[0] ∈ {4072681676153946182; } 
             .X[1] ∈ {5436642852965252865; } 
             .X[2] ∈ {2889783295839482789; } 
             .X[3] ∈ {6109786322067550404; } 
             .b[0..31] ∈ UNINITIALIZED 
skein_context.h.hashBitLen ∈ {256; } 
             .h.bCnt ∈ {16; } 
             .h.T[0] ∈ {64; } 
             .h.T[1] ∈ {3458764513820540928; } 
             {.X[0..3]; .b[0..15]; } ∈ [--..--] 
             .b[16..31] ∈ UNINITIALIZED 
skein_context.h.hashBitLen ∈ {256; } 
             .h.bCnt ∈ {32; } 
             .h.T[0] ∈ {128; } 
             .h.T[1] ∈ {3458764513820540928; } 
             {.X[0..3]; .b[0..31]; } ∈ [--..--] 

Based on this information it is now clear where the "accessing uninitialized left-value" warnings came from: there is inside the structure skein_context a buffer that only becomes initialized after the first two calls to Skein_256_Update. And these alarms are now gone because these iterations when the buffer is not yet completely initialized are handled separately from the latter iterations when the buffer is initialized and presumably read from at some point.

Quiz 1: continuing the study of the first five calls to Skein_256_Update what used to cause the warning below and what made it disappear?

lib.c:18:[kernel] warning: out of bounds write. assert \valid(tmp); 

Quiz 2: how should we get rid of the last alarm (that occurs in our file lib.c at line 10 function memcpy although it is caused by imprecision in the callers of that function)?

The files in their state at the end of this blog post are available here. To rewind to earlier stages of the verification process simply erase main.c which is the only file we have modified so far.

Pascal Cuoq
7th Oct 2010