Frama-C-discuss mailing list archives

This page gathers the archives of the old Frama-C-discuss archives, that was hosted by Inria's gforge before its demise at the end of 2020. To search for mails newer than September 2020, please visit the page of the new mailing list on Renater.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Frama-c-discuss] how padding are introduced by Frama-C

  • Subject: [Frama-c-discuss] how padding are introduced by Frama-C
  • From: Pascal.CUOQ at (CUOQ Pascal)
  • Date: Tue, 8 Dec 2009 01:21:40 +0100
  • References: <1260225203.19891.17.camel@valin>

> Here is an exemple where padding has
> a consequence on warnings.

Nice find. I nominate this question for the FAQ
in the next version of the value analysis' documentation.

> Could someone explain me what happens.

You were using a command-line such as:

frama-c -val t.c -lib-entry -main f1

Well, the short answer to your question first:

In -lib-entry mode, the value analysis assumes
padding bytes in structures to be uninitialized.
This way, any memory access that is sure to
read from a padding area is detected, warned about,
and excluded from further propagation,
while a memory access that may or may not read
from a padding area is detected, warned about
and the propagation continues with
the only values that make sense (that values that do *not*
come from the padding area).

If the warning in this latter case is a false alarm,
then there is no loss of precision for the rest of the analysis.
It could be argued that we should use a more explicit value
than "Uninitialized", perhaps "Padding".
Yes, that would be clearer. I am making a note of this.

Okay, here really comes the short answer now :

the array of structs tV1 does not contain any padding bytes :

        tV1[0..9] IN [--..--]

the array of structs tV2 contains some padding bytes. 

        tV2[0].c IN [--..--]
           [0].[bits 8 to 15] IN UNINITIALIZED
           {[0].t[0..1023]; [1].c; } IN [--..--]
           [1].[bits 8 to 15] IN UNINITIALIZED
           {[1].t[0..1023]; [2].c; } IN [--..--]
 ... you get the idea ...
           [9].t[0..1023] IN [--..--]

Ergo, any dumb analyzer could guarantee
that memory accesses within tV1 do not read from
padding bytes, whereas it is less obvious that a memory
access inside tV2 is not an undefined access to
padding bytes.

Long version: 

launch the GUI with

frama-c-gui -val t.c -lib-entry -main f1

Select the statement in the first (resp. second) loop
and right-click, in the contextual menu select
"evaluate expression", and enter "(char *)(tV1[id].t) + i"
(resp. "(char *)(tV2[id].t) + i") in the dialog.

You get respectively:
Before the selected statement, all the values taken by the expression (char *)(tV1[id].t) + i are contained in {‌{ &tV1 + [4..20519] ;}‌}
Before the selected statement, all the values taken by the expression (char *)(tV2[id].t) + i are contained in {‌{ &tV2 + [2..20499] ;}‌}

These offsets are expressed in bytes.
In the first line, the offset is at least 4 because
tV1[0].i is a 4-byte int followed immediately by tV1[0].t[0].
In the second line, tV2[0].i is a char, tV2[0].t[0]
needs to be 16-bit aligned and is therefore preceded by
one byte of padding, whence the lower bound of 2 for
the offset at which this memory access can take place.

The upper bounds for the offsets are slightly different
because the sizes of the structures are slightly different.
tV1 contains slightly more bytes than tV2.

The important thing to notice in these offsets
is what is not there: a congruence information that would
ensure that the program is not accessing any random
byte in tV1 or tV2. Indeed, this information cannot be 
represented in the value analysis' format r%m.
Consecutive bytes are accessed in tV1[id].t or
tV2[id].t, so there is no interesting remainder and modulo that
characterize the stride of the memory access here.
Therefore the analyzer is not able to infer that accessing
inside tV2 does not read from the padding (which
is undefined), and it is only able to reach this conclusion
for tV1 because there are no padding bytes there at all!

But there is more!

"How would one go about forcing the analyzer to be
more precise on this program, and reach the desired
conclusion that function f1 is safe from
padding-related undefinedness?", I hear you asking.

There are two ways to reason about this:

The first line of reasoning, which I recommend, is that there is
one big button to change the trade-off between precision and
resource consumption in the Value Analysis, and this button is
called "-slevel". Therefore one should use this option before even
starting to think about the problem.

The second possible line of reasoning is that
the imprecision comes from the fact that
the analysis tries to handle all iterations in each loop
simultaneously, so one should look for a way to make
the analyzer treat each iteration separately if possible.
This result can be achieved with the option "-slevel".
Each loop takes 2048 iterations (the loops iterate on
the bytes of arrays of 1024 shorts), and we don't want
to have to run the analysis again because we were
a little short, so we allow 2050 iterations to be
analyzed separately before starting to lose precision.

Manzana:~ pascal$ time frama-c -slevel 2050 -val t.c -lib-entry -main f1 >L 2>&1
real   0m18.997s
Manzana:~ pascal$ grep t\\.c:32 L
Manzana:~ pascal$ 

Note that unrolling each loop does not make the abstract value
for expression (char *)(tV2[id].t) + i a singleton.
There are still different values for this expression,
because variable id is not precisely known.

You cannot observe the values used to analyze each iteration
with -slevel, but you can try -ulevel instead, which
may or may  not produce an AST too big for analysis or
for displaying in the GUI. Or you can use one of the provided
built-in for printing abstract values during the analysis:

  /* loop 2 */
  for(i=0; i<sizeof(T_TAB); i++)
      Frama_C_show_each_address((char *)(tV2[id].t) + i);
      ((char*)v2.t)[i] = ((char*)tV2[id].t)[i] ;

Using the same command-line again produces in addition:

[value] Called Frama_C_show_each_address({‌{ &tV2 + [2..18452],2%2050 ;}‌})
[value] Called Frama_C_show_each_address({‌{ &tV2 + [3..18453],3%2050 ;}‌})
[value] Called Frama_C_show_each_address({‌{ &tV2 + [4..18454],4%2050 ;}‌})
[value] Called Frama_C_show_each_address({‌{ &tV2 + [5..18455],5%2050 ;}‌})
... you get the idea ...
[value] Called Frama_C_show_each_address({‌{ &tV2 + [2049..20499],2049%2050 ;}‌})

Each of the abstract values for the address 
(char *)(tV2[id].t) + i 
is not, because of the indeterminacy on id, a singleton.
Still, the indeterminacy that remains can be captured precisely
by the value analysis' r%m representation. At each iteration
it is able to capture the fact we are accessing inside a field t
(and therefore not inside padding), but in a different way
for each iteration.

If I had a blog, this would probably had been the second post there.