Frama-C-discuss mailing list archives

This page gathers the archives of the old Frama-C-discuss archives, that was hosted by Inria's gforge before its demise at the end of 2020. To search for mails newer than September 2020, please visit the page of the new mailing list on Renater.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Frama-c-discuss] Adding some semantic meaning to a few (standard string) functions


  • Subject: [Frama-c-discuss] Adding some semantic meaning to a few (standard string) functions
  • From: Pascal.Cuoq at cea.fr (Pascal Cuoq)
  • Date: Mon, 15 Dec 2008 15:16:46 +0100
  • In-reply-to: <1229346661.3335.142.camel@supelec>
  • References: <1229346661.3335.142.camel@supelec>

On Dec 15, 2008, at 2:11 PM, Jonathan-Christofer Demay wrote:

> In the example code at the end of this message, at line 28, frama-c is
> able to tell that boo's value is 2. However, at line 23, it can't tell
> that *foo's value is "foobar".
>
> Note that I didn't wrote foo's value, but *foo's value. I know that
> frama-c is already analyzing foo's value as a pointer. But analyzing  
> (a
> bit) what it's poiting at might be useful. Of course, since  
> generally a
> pointer can point at anything, in most case, it might see nothing, but
> at least in the case of strings I think a few things might be  
> computed.
>
> So I was wondering if it would possible to add this kind of  
> knowledge to
> frama-c through a plugin.

I think that I see what you mean. In this particular case it may be  
easier
to patch the value analysis than to try to write a different plug-in.

> Of course, If I were to code, for example, my own strcmp-like  
> function,
> there's no way to automatically deduce this. But at least, for  
> standard
> string functions, and only for a few of them, we might want to
> statically give this knowledge to frama-c. For example, if strcmp  
> return
> 0, then it can be seen as ==, and so from the statement line 20, we  
> know
> that *foo can be anything but {""}.

Yes, it is possible to do this in principle.

There remains the problem of encoding the information that can
be extracted from the condition into the
existing representation for program states, or to define a
new representation for this additional information. The case
where strcmp(v, LITERAL)==0 can be encoded exactly in the
existing representation. At line 18 in your example,
the information from the if condition is equivalent to foo[0] == 0,
which can be represented.

In general, however, it is not always possible to represent
any kind of information, even information that is very obvious
for a programmer reading the code. For instance,
because the value analysis actually represents the value of each
char in v separately, the
information that strcmp(v, LITERAL)!=0 is
a kind of disjunction (the condition may be false because any
character in v is different from the literal string it is compared to),
which is generally speaking a bother to manipulate.

Don't forget that the analyzer will need to be able to compose
pieces of information that comes from different execution paths,
and to extrapolate fixpoints when it encounters loops. It is very
tempting to go all out and represent every little piece of
knowledge that you can think of but in practice, it is better
to limit yourself to the information for which you are able
to define good composition rules for the operations that
appears in C.

> We could also see strcpy as =, and
> so after the statement line 20, *bar is in {"foobar"}, and so is foo  
> at
> the statement line 23.

To go further in making the analyzer realize this, see the remark
at the end of this message.

> So any comment on this ? Bad/good idea ? Achievable or not ? (at least
> in some simple case like this example). I'm already reading the plugin
> manual ^^.

I would suggest thinking about the primitives (strcpy, ..) that you want
to provide, and how their respective effects can be encoded in the
existing representation of states (which you would have to become
familiar with). Next would be to list the syntactic patterns for
which it is possible to infer some information that can be represented
in the current states (strcmp (v, LITERAL)==0 is one).

Defining a new abstract domain for strings would come after this
in difficulty.

I should also point you to existing methods to infer loop invariants
automatically in presence of string manipulations in the Jessie plug-in.
See http://www.lri.fr/~moy/minix3.html
and the pointers there for more information.
If you want to play with this,
you need to have the APRON library installed
(http://apron.cri.ensmp.fr/library/). APRON in turn depends
on libraries such as GMP and MPFR. Hopefully your distribution
would contain packages for all these necessary libraries.

You may also be interested in reading this:
http://penjili.org/papers/AllamigeonGodardHymansSAS06.pdf

> Another question about something I noticed while writing this example.
> The statement at line 20 was originally written like this:
>
> if(bar != strcpy(bar, "foobar"))
> 	return -1;
>
> But it makes frama-c think this condition is always true and so all  
> code
> after this statement is considered as dead code. Since strcpy in most
> case (when everything goes right) will in fact return 'bar', I don't  
> see
> what is wrong here.

This is because your project did not include the code for
function strcpy. Since it is a library function whose return
type is a pointer, it was assumed to return a pointer to a
location without aliases to other variables in the program.
In these conditions bar!=strcpy(bar, "foobar")
is always true.

You can fix this by providing the code of the actual
function strcpy that will be executed when your program runs,
or a simplified equivalent version. Some common
functions are provided in share/libc.{c,h}.

Note the case of the first definition of memcpy:
void* memcpy(void* region1, const void* region2, size_t n)
{
   if (n > 0)
     Frama_C_memcpy(region1, region2, n);
   return region1;
}

This function calls a built-in function, which allows
to make any necessary transformation on the interpreter's
state, in a simpler way than making the analyzer look at one
possible implementation. Our built-ins are not perfect yet,
which is why an alternative is proposed for memcpy, but what I
am suggesting is that you look at this mechanism first.

Pascal

-------------- section suivante --------------
Une pi?ce jointe HTML a ?t? enlev?e...
URL: http://lists.gforge.inria.fr/pipermail/frama-c-discuss/attachments/20081215/7e861267/attachment.htm