Frama-C-discuss mailing list archives
This page gathers the archives of the old Frama-C-discuss archives, that was hosted by Inria's gforge before its demise at the end of 2020. To search for mails newer than September 2020, please visit the page of the new mailing list on Renater.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Frama-c-discuss] how padding are introduced by Frama-C
- Subject: [Frama-c-discuss] how padding are introduced by Frama-C
- From: Pascal.CUOQ at cea.fr (CUOQ Pascal)
- Date: Tue, 8 Dec 2009 01:21:40 +0100
- References: <1260225203.19891.17.camel@valin>
> Here is an exemple where padding has > a consequence on warnings. Nice find. I nominate this question for the FAQ in the next version of the value analysis' documentation. > Could someone explain me what happens. You were using a command-line such as: frama-c -val t.c -lib-entry -main f1 Well, the short answer to your question first: In -lib-entry mode, the value analysis assumes padding bytes in structures to be uninitialized. This way, any memory access that is sure to read from a padding area is detected, warned about, and excluded from further propagation, while a memory access that may or may not read from a padding area is detected, warned about and the propagation continues with the only values that make sense (that values that do *not* come from the padding area). If the warning in this latter case is a false alarm, then there is no loss of precision for the rest of the analysis. It could be argued that we should use a more explicit value than "Uninitialized", perhaps "Padding". Yes, that would be clearer. I am making a note of this. Okay, here really comes the short answer now : the array of structs tV1 does not contain any padding bytes : tV1[0..9] IN [--..--] the array of structs tV2 contains some padding bytes. tV2[0].c IN [--..--] [0].[bits 8 to 15] IN UNINITIALIZED {[0].t[0..1023]; [1].c; } IN [--..--] [1].[bits 8 to 15] IN UNINITIALIZED {[1].t[0..1023]; [2].c; } IN [--..--] ... you get the idea ... [9].t[0..1023] IN [--..--] Ergo, any dumb analyzer could guarantee that memory accesses within tV1 do not read from padding bytes, whereas it is less obvious that a memory access inside tV2 is not an undefined access to padding bytes. Long version: launch the GUI with frama-c-gui -val t.c -lib-entry -main f1 Select the statement in the first (resp. second) loop and right-click, in the contextual menu select "evaluate expression", and enter "(char *)(tV1[id].t) + i" (resp. "(char *)(tV2[id].t) + i") in the dialog. You get respectively: Before the selected statement, all the values taken by the expression (char *)(tV1[id].t) + i are contained in {{ &tV1 + [4..20519] ;}} Before the selected statement, all the values taken by the expression (char *)(tV2[id].t) + i are contained in {{ &tV2 + [2..20499] ;}} These offsets are expressed in bytes. In the first line, the offset is at least 4 because tV1[0].i is a 4-byte int followed immediately by tV1[0].t[0]. In the second line, tV2[0].i is a char, tV2[0].t[0] needs to be 16-bit aligned and is therefore preceded by one byte of padding, whence the lower bound of 2 for the offset at which this memory access can take place. The upper bounds for the offsets are slightly different because the sizes of the structures are slightly different. tV1 contains slightly more bytes than tV2. The important thing to notice in these offsets is what is not there: a congruence information that would ensure that the program is not accessing any random byte in tV1 or tV2. Indeed, this information cannot be represented in the value analysis' format r%m. Consecutive bytes are accessed in tV1[id].t or tV2[id].t, so there is no interesting remainder and modulo that characterize the stride of the memory access here. Therefore the analyzer is not able to infer that accessing inside tV2 does not read from the padding (which is undefined), and it is only able to reach this conclusion for tV1 because there are no padding bytes there at all! But there is more! "How would one go about forcing the analyzer to be more precise on this program, and reach the desired conclusion that function f1 is safe from padding-related undefinedness?", I hear you asking. There are two ways to reason about this: The first line of reasoning, which I recommend, is that there is one big button to change the trade-off between precision and resource consumption in the Value Analysis, and this button is called "-slevel". Therefore one should use this option before even starting to think about the problem. The second possible line of reasoning is that the imprecision comes from the fact that the analysis tries to handle all iterations in each loop simultaneously, so one should look for a way to make the analyzer treat each iteration separately if possible. This result can be achieved with the option "-slevel". Each loop takes 2048 iterations (the loops iterate on the bytes of arrays of 1024 shorts), and we don't want to have to run the analysis again because we were a little short, so we allow 2050 iterations to be analyzed separately before starting to lose precision. Manzana:~ pascal$ time frama-c -slevel 2050 -val t.c -lib-entry -main f1 >L 2>&1 real 0m18.997s Manzana:~ pascal$ grep t\\.c:32 L Manzana:~ pascal$ Note that unrolling each loop does not make the abstract value for expression (char *)(tV2[id].t) + i a singleton. There are still different values for this expression, because variable id is not precisely known. You cannot observe the values used to analyze each iteration with -slevel, but you can try -ulevel instead, which may or may not produce an AST too big for analysis or for displaying in the GUI. Or you can use one of the provided built-in for printing abstract values during the analysis: /* loop 2 */ for(i=0; i<sizeof(T_TAB); i++) { Frama_C_show_each_address((char *)(tV2[id].t) + i); ((char*)v2.t)[i] = ((char*)tV2[id].t)[i] ; } Using the same command-line again produces in addition: [value] Called Frama_C_show_each_address({{ &tV2 + [2..18452],2%2050 ;}}) [value] Called Frama_C_show_each_address({{ &tV2 + [3..18453],3%2050 ;}}) [value] Called Frama_C_show_each_address({{ &tV2 + [4..18454],4%2050 ;}}) [value] Called Frama_C_show_each_address({{ &tV2 + [5..18455],5%2050 ;}}) ... you get the idea ... [value] Called Frama_C_show_each_address({{ &tV2 + [2049..20499],2049%2050 ;}}) Each of the abstract values for the address (char *)(tV2[id].t) + i is not, because of the indeterminacy on id, a singleton. Still, the indeterminacy that remains can be captured precisely by the value analysis' r%m representation. At each iteration it is able to capture the fact we are accessing inside a field t (and therefore not inside padding), but in a different way for each iteration. Pascal __ If I had a blog, this would probably had been the second post there.
- Follow-Ups:
- [Frama-c-discuss] how padding are introduced by Frama-C
- From: stephane.duprat at atosorigin.com (Stephane DUPRAT)
- [Frama-c-discuss] how padding are introduced by Frama-C
- References:
- [Frama-c-discuss] how padding are introduced by Frama-C
- From: stephane.duprat at atosorigin.com (Stéphane Duprat)
- [Frama-c-discuss] how padding are introduced by Frama-C
- Prev by Date: [Frama-c-discuss] how padding are introduced by Frama-C
- Next by Date: [Frama-c-discuss] Discrepancy between Jessie gui and command line when using yices
- Previous by thread: [Frama-c-discuss] how padding are introduced by Frama-C
- Next by thread: [Frama-c-discuss] how padding are introduced by Frama-C
- Index(es):