Frama-C-discuss mailing list archives

This page gathers the archives of the old Frama-C-discuss archives, that was hosted by Inria's gforge before its demise at the end of 2020. To search for mails newer than September 2020, please visit the page of the new mailing list on Renater.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Frama-c-discuss] how padding are introduced by Frama-C



Hi Pascal,

Thanks a lot for your answers (the short one and the long one).
I didn't know the possibility of evaluating an expression in a chosen 
statement. It's great.
And I didn't know also that padding were considered as uninitialized 
segment in libentry mode.

You heard me asking
> "How would one go about forcing the analyzer to be
> more precise on this program, and reach the desired
> conclusion that function f1 is safe from
> padding-related undefinedness?", I hear you asking.
>   
but a this time, I was really sleeping!

My point of view, is that for critical development, we should explicitly 
declare padding fields.
For this exemple, we should declare :
typedef struct {
    char c;
    char padding1 ;
    T_TAB t
}

So, in this case, the false alarm is a good information for a lack of 
explicit padding (we just have to be aware of this).

thank you again,

St?phane


CUOQ Pascal a ?crit :
>> Here is an exemple where padding has
>> a consequence on warnings.
>>     
>
> Nice find. I nominate this question for the FAQ
> in the next version of the value analysis' documentation.
>
>   
>> Could someone explain me what happens.
>>     
>
> You were using a command-line such as:
>
> frama-c -val t.c -lib-entry -main f1
>
> Well, the short answer to your question first:
>
> In -lib-entry mode, the value analysis assumes
> padding bytes in structures to be uninitialized.
> This way, any memory access that is sure to
> read from a padding area is detected, warned about,
> and excluded from further propagation,
> while a memory access that may or may not read
> from a padding area is detected, warned about
> and the propagation continues with
> the only values that make sense (that values that do *not*
> come from the padding area).
>
> If the warning in this latter case is a false alarm,
> then there is no loss of precision for the rest of the analysis.
> It could be argued that we should use a more explicit value
> than "Uninitialized", perhaps "Padding".
> Yes, that would be clearer. I am making a note of this.
>
> Okay, here really comes the short answer now :
>
> the array of structs tV1 does not contain any padding bytes :
>
>         tV1[0..9] IN [--..--]
>
> the array of structs tV2 contains some padding bytes. 
>
>         tV2[0].c IN [--..--]
>            [0].[bits 8 to 15] IN UNINITIALIZED
>            {[0].t[0..1023]; [1].c; } IN [--..--]
>            [1].[bits 8 to 15] IN UNINITIALIZED
>            {[1].t[0..1023]; [2].c; } IN [--..--]
>  ... you get the idea ...
>            [9].t[0..1023] IN [--..--]
>
> Ergo, any dumb analyzer could guarantee
> that memory accesses within tV1 do not read from
> padding bytes, whereas it is less obvious that a memory
> access inside tV2 is not an undefined access to
> padding bytes.
>
> Long version: 
>
> launch the GUI with
>
> frama-c-gui -val t.c -lib-entry -main f1
>
> Select the statement in the first (resp. second) loop
> and right-click, in the contextual menu select
> "evaluate expression", and enter "(char *)(tV1[id].t) + i"
> (resp. "(char *)(tV2[id].t) + i") in the dialog.
>
> You get respectively:
> Before the selected statement, all the values taken by the expression (char *)(tV1[id].t) + i are contained in {‌{ &tV1 + [4..20519] ;}‌}
> Before the selected statement, all the values taken by the expression (char *)(tV2[id].t) + i are contained in {‌{ &tV2 + [2..20499] ;}‌}
>
> These offsets are expressed in bytes.
> In the first line, the offset is at least 4 because
> tV1[0].i is a 4-byte int followed immediately by tV1[0].t[0].
> In the second line, tV2[0].i is a char, tV2[0].t[0]
> needs to be 16-bit aligned and is therefore preceded by
> one byte of padding, whence the lower bound of 2 for
> the offset at which this memory access can take place.
>
> The upper bounds for the offsets are slightly different
> because the sizes of the structures are slightly different.
> tV1 contains slightly more bytes than tV2.
>
> The important thing to notice in these offsets
> is what is not there: a congruence information that would
> ensure that the program is not accessing any random
> byte in tV1 or tV2. Indeed, this information cannot be 
> represented in the value analysis' format r%m.
> Consecutive bytes are accessed in tV1[id].t or
> tV2[id].t, so there is no interesting remainder and modulo that
> characterize the stride of the memory access here.
> Therefore the analyzer is not able to infer that accessing
> inside tV2 does not read from the padding (which
> is undefined), and it is only able to reach this conclusion
> for tV1 because there are no padding bytes there at all!
>
> But there is more!
>
> "How would one go about forcing the analyzer to be
> more precise on this program, and reach the desired
> conclusion that function f1 is safe from
> padding-related undefinedness?", I hear you asking.
>
> There are two ways to reason about this:
>
> The first line of reasoning, which I recommend, is that there is
> one big button to change the trade-off between precision and
> resource consumption in the Value Analysis, and this button is
> called "-slevel". Therefore one should use this option before even
> starting to think about the problem.
>
> The second possible line of reasoning is that
> the imprecision comes from the fact that
> the analysis tries to handle all iterations in each loop
> simultaneously, so one should look for a way to make
> the analyzer treat each iteration separately if possible.
> This result can be achieved with the option "-slevel".
> Each loop takes 2048 iterations (the loops iterate on
> the bytes of arrays of 1024 shorts), and we don't want
> to have to run the analysis again because we were
> a little short, so we allow 2050 iterations to be
> analyzed separately before starting to lose precision.
>
> Manzana:~ pascal$ time frama-c -slevel 2050 -val t.c -lib-entry -main f1 >L 2>&1
> real   0m18.997s
> Manzana:~ pascal$ grep t\\.c:32 L
> Manzana:~ pascal$ 
>
> Note that unrolling each loop does not make the abstract value
> for expression (char *)(tV2[id].t) + i a singleton.
> There are still different values for this expression,
> because variable id is not precisely known.
>
> You cannot observe the values used to analyze each iteration
> with -slevel, but you can try -ulevel instead, which
> may or may  not produce an AST too big for analysis or
> for displaying in the GUI. Or you can use one of the provided
> built-in for printing abstract values during the analysis:
>
>   /* loop 2 */
>   for(i=0; i<sizeof(T_TAB); i++)
>     {
>       Frama_C_show_each_address((char *)(tV2[id].t) + i);
>       ((char*)v2.t)[i] = ((char*)tV2[id].t)[i] ;
>     }
>
> Using the same command-line again produces in addition:
>
> [value] Called Frama_C_show_each_address({‌{ &tV2 + [2..18452],2%2050 ;}‌})
> [value] Called Frama_C_show_each_address({‌{ &tV2 + [3..18453],3%2050 ;}‌})
> [value] Called Frama_C_show_each_address({‌{ &tV2 + [4..18454],4%2050 ;}‌})
> [value] Called Frama_C_show_each_address({‌{ &tV2 + [5..18455],5%2050 ;}‌})
> ... you get the idea ...
> [value] Called Frama_C_show_each_address({‌{ &tV2 + [2049..20499],2049%2050 ;}‌})
>
> Each of the abstract values for the address 
> (char *)(tV2[id].t) + i 
> is not, because of the indeterminacy on id, a singleton.
> Still, the indeterminacy that remains can be captured precisely
> by the value analysis' r%m representation. At each iteration
> it is able to capture the fact we are accessing inside a field t
> (and therefore not inside padding), but in a different way
> for each iteration.
>
> Pascal
> __
> If I had a blog, this would probably had been the second post there.
>