Safe donut

Pascal Cuoq - 16th Sep 2011

This post documents the steps I followed in order to finish verifying function compute(), picking up from there.

Previously on this blog

In last episode we had found that some sub-cubes in the search space appeared to lead to dangerous value sets for variable N. These sets were:

        N ∈ {10; 11; 12}
        N ∈ {6; 7; 8; 9; 10; 11; 12}
        N ∈ {7; 8; 9; 10; 11; 12}
        N ∈ {7; 8; 9; 10; 11; 12; 13}
        N ∈ {8; 9; 10; 11; 12}
        N ∈ {8; 9; 10; 11; 12; 13}
        N ∈ {9; 10; 11; 12}
        N ∈ {9; 10; 11; 12; 13}

There are only eight dangerous value sets for N but these correspond to many values of input variables An Bn in jn in the analysis context we wrote for the occasion. Grepping for each of these eight lines in the original logs allows to get the corresponding values of An Bn in jn. These are the values for which we haven't concluded yet that compute() is safe:

for i in 1 2 3 4
do
    for line in \
	"N ∈ {10; 11; 12}" \
	"N ∈ {6; 7; 8; 9; 10; 11; 12}" \
	"N ∈ {7; 8; 9; 10; 11; 12}" \
	"N ∈ {7; 8; 9; 10; 11; 12; 13}" \
	"N ∈ {8; 9; 10; 11; 12}" \
	"N ∈ {8; 9; 10; 11; 12; 13}" \
	"N ∈ {9; 10; 11; 12}" \
	"N ∈ {9; 10; 11; 12; 13}"
    do
        grep "$line" log$i -A4 -B0 | grep -v "N "
        echo --
    done > pb$i
done

The results are well distributed among the four quarters of the search space that I initially chose arbitrarily. That's good news as it means things can be kept parallel by keeping these files separate. One file looks like:

        An ∈ {-26}
        Bn ∈ {-15}
        in ∈ {-26}
        jn ∈ {18}
--
        An ∈ {-26}
        Bn ∈ {-15}
        in ∈ {-26}
        jn ∈ {19}
--
        An ∈ {-26}
        Bn ∈ {-15}
        in ∈ {-3}
        jn ∈ {6}
--
...

How to re-analyze the problematic subcubes more precisely

Each file pb1 ... pb4 can be processed a little bit further to look like C code:

for i in 1 2 3 4
do
  sed -e s"/--/f();/" -e s"/∈ {/= /" -e s"/}/;/" < pb$i > pr$i.c
done

The above commands transform the interesting values files into:

        An = -26;
        Bn = -15;
        in = -26;
        jn = 18;
f();
        An = -26;
        Bn = -15;
        in = -26;
        jn = 19;
f();
        An = -26;
        Bn = -15;
        in = -3;
        jn = 6;
f();
...

Each of the four pieces of program weights in at a bit less than 7 MB of C code. I plan to make good use of this information tidbit the next time someone asks me what project sizes Frama-C's value analysis can handle.

$ ls -l pr?.c
-rw-r--r-- 1 cuoq cuoq 6788869 2011-09-17 18:52 pr1.c
-rw-r--r-- 1 cuoq cuoq 6551620 2011-09-17 18:52 pr2.c
-rw-r--r-- 1 cuoq cuoq 6655238 2011-09-17 18:52 pr3.c
-rw-r--r-- 1 cuoq cuoq 6486765 2011-09-17 18:52 pr4.c

The files pr1.c ... pr4.c are not complete C programs but they work well with the following prolog (download):

...
int An  Bn  in  jn;
void f(void)
{
  int Ans  Bns  ins  jns;
  float A  B  i  j;
  for (Ans=4*An; Ans<4*(An+1); Ans++)
    {
      A = Frama_C_float_interval(Ans / 32.  (Ans + 1) / 32.);
      for (Bns=4*Bn; Bns<4*(Bn+1); Bns++)
        {
          B = Frama_C_float_interval(Bns / 32.  (Bns + 1) / 32.);
          for (ins=4*in; ins<4*(in+1); ins++)
            {
              i = Frama_C_float_interval(ins / 32.  (ins + 1) / 32.);
              for (jns=4*jn; jns<4*(jn+1); jns++)
                {
                  j = Frama_C_float_interval(jns / 32.  (jns + 1) / 32.);
                  compute(A  B  i  j);
                }
            }
        }
    }
}
main(){

A closing bracket } is also needed to make the whole thing a syntactically correct C program.

Alas a regrettable performance bug in Frama-C's front end prevents from analyzing such huge generated C functions. We are a bit too close to the Nitrogen release to change data structures for representing the AST so this bug will probably remain for at least one release cycle. To circumvent the issue I simply split the files into 182 reasonably-sized chunks (reasonably-sized here meaning 10000 lines a more usual size for a function).

split -l 10000 pr1.c pr1.c.

182 C files to analyze and 4 cores to analyze them with: this is an undreamed-of opportunity to make use of the xargs -n 1 -P 4 command.

ls pr?.c.* | xargs -n 1 -P 4 ./do.sh

Here is the script do.sh for handling one program. It first catenates the prolog the chunk and a closing bracket and then it launches the value analysis on the resulting C program:

#!/bin/sh
( cat prolog.c ; cat $1 ; echo "}" ) > t_$1.c
frama-c -val share/builtin.c t_$1.c -obviously-terminates \
  -no-val-show-progress -all-rounding-modes > log_pass2_$1 2>&1

Results

The above yada yada yada produces one log file for each of the 182 chunks:

$ ls -l log_pass2_pr*
-rw-r--r-- 1 cuoq cuoq 500913957 2011-09-17 20:50 log_pass2_pr1.c.aa
-rw-r--r-- 1 cuoq cuoq 502329593 2011-09-17 20:49 log_pass2_pr1.c.ab
-rw-r--r-- 1 cuoq cuoq 503146982 2011-09-17 20:51 log_pass2_pr1.c.ac
...
-rw-r--r-- 1 cuoq cuoq 502560543 2011-09-18 01:22 log_pass2_pr1.c.ay
-rw-r--r-- 1 cuoq cuoq 502283181 2011-09-18 01:23 log_pass2_pr1.c.az
-rw-r--r-- 1 cuoq cuoq 503974409 2011-09-18 01:30 log_pass2_pr1.c.ba
-rw-r--r-- 1 cuoq cuoq 501308298 2011-09-18 01:29 log_pass2_pr1.c.bb
...
-rw-r--r-- 1 cuoq cuoq 502932885 2011-09-18 05:20 log_pass2_pr1.c.bs
-rw-r--r-- 1 cuoq cuoq 422006804 2011-09-18 05:03 log_pass2_pr1.c.bt
-rw-r--r-- 1 cuoq cuoq 502353901 2011-09-18 05:19 log_pass2_pr2.c.aa
-rw-r--r-- 1 cuoq cuoq 502485241 2011-09-18 05:23 log_pass2_pr2.c.ab
-rw-r--r-- 1 cuoq cuoq 503562848 2011-09-18 05:57 log_pass2_pr2.c.ac
...
-rw-r--r-- 1 cuoq cuoq 184986900 2011-09-18 12:28 log_pass2_pr2.c.bs
-rw-r--r-- 1 cuoq cuoq 498627515 2011-09-18 13:11 log_pass2_pr3.c.aa
...
-rw-r--r-- 1 cuoq cuoq 263096852 2011-09-19 05:27 log_pass2_pr4.c.bs

Incidentally it appears from the above listing that the second pass took about 36 hours (two nights and one day). The inside of one log file looks like:

[value] DUMPING STATE of file t_pr1.c.aa.c line 24
...
        N ∈ {10; 11}
        An ∈ {-26}
        Bn ∈ {-15}
        in ∈ {-26}
        jn ∈ {18}
        Ans ∈ {-104}
        Bns ∈ {-60}
        ins ∈ {-104}
        jns ∈ {72}
        A ∈ [-3.25 .. -3.21875]
        B ∈ [-1.875 .. -1.84375]
        i ∈ [-3.25 .. -3.21875]
        j ∈ [2.25 .. 2.28125]
        =END OF DUMP==
t_pr1.c.aa.c:15:[value] Function compute: postcondition got status valid.
[value] DUMPING STATE of file t_pr1.c.aa.c line 24
...
        N ∈ {10; 11}
        An ∈ {-26}
        Bn ∈ {-15}
        in ∈ {-26}
        jn ∈ {18}
        Ans ∈ {-104}
        Bns ∈ {-60}
        ins ∈ {-104}
        jns ∈ {73}
        A ∈ [-3.25 .. -3.21875]
        B ∈ [-1.875 .. -1.84375]
        i ∈ [-3.25 .. -3.21875]
        j ∈ [2.28125 .. 2.3125]
        =END OF DUMP==
...

Each value set for variable N is computed more precisely than in the first pass because the values for floating-point variables A B i j are known more precisely. Above N is determined to be either 10 or 11 (both safe values) for two of the subsubcubes under examination. Below is the complete list of sets of values computed for N:

$ for i in log_pass2_pr* ; do grep "N " $i | sort -u ; done | sort -u
        N ∈ {10}
        N ∈ {10; 11}
        N ∈ {11}
        N ∈ {7; 8; 9}
        N ∈ {8; 9}
        N ∈ {8; 9; 10}
        N ∈ {9}
        N ∈ {9; 10}
        N ∈ {9; 10; 11}

Conclusion

I have already provided the "the function being analyzed is safe after all" conclusion in the previous post. This post is only for showing the "how" so that you can judge for yourself for instance how much cheating there was. There was quite a bit of C code written for the purpose of the verification: this code could itself have bugs. And there was a good amount of logs processing with shell scripts. There could be bugs there too. Stéphane Duprat would however point out that if we had done the verification with tests there would have been testing code and logs-processing-scripts too. The difference between the two approaches is that we used Frama_C_float_interval() in the places where we might have used a random number generator. It does not feel like a large conceptual leap and we obtained deeper properties in exchange (would you trust the function not to return 12 if like me you didn't understand what it computes at all and had only tested it a million times? A billion times? How many tests would be enough for you to sit under a piano?)

The possibility that variable N in function compute() held a number larger than 11 was related to the last and most difficult of a series of alarms that we found in a mostly unmodified piece of obfuscated C code. Some of the other alarms disappeared simply by playing with the -slevel setting which is the setting to think of immediately when the analysis is fast enough and some alarms remain. I explained where the other alarms came from (a call to memset() was not handled precisely enough etc.) but in actual use you don't need to: just increase the argument to -slevel and see if the alarms go away even before starting to think.

The method I have shown here is a little bit laborious but one reason for us Frama-C developers to do these case studies is to find out about possible uses and make them more comfortable in future releases. This series of posts shows how to do this kind of verification now (well... it describes how to do them soon; the method shown will work with the Nitrogen release). This is absolutely not definitive. A plug-in could automate a lot of what we have done by hand here.

This post was improved by insights from Florent Kirchner.