Back to the drawing board

Pascal Cuoq - 21st Jul 2011

This post is a backward changelog that introduces actual new features. Beat that, Emacs 19 Antinews!

Shortly after the release of Carbon, I offered to my Frama-C co-developers the idea of a great clean-up of the value analysis for the next release. This would not immediately add new features (indeed, it would remove some of them), but it would make way for more exciting new ideas in the long term.

This has started. One thing is not working according to plan, hence this blog post. I was planning to have a two-stage removal where, in the next release, obsolete features would be undocumented but would still be there, perhaps with a boolean option to activate them. This turns out not to be convenient: some features are instead going to change in ways that make it difficult to choose between old and new behaviors, and it's just a pain in the backside to keep others lying around half-dead in the codebase. The transition is going to be faster than I initially thought. This blog post is partly to let everyone know that it is coming. If you use Frama-C productively, this is the time to contact the people listed on the contact page and talk about a maintenance contract. We can only preserve features that we know people are using.

Here is a list of changes that have already happened in the development version for Nitrogen:

In Carbon and before relational information could sometimes be kept between the values of the program's variables. This did not work very well anyway and made the implementation of other more useful ideas impossible so this is gone. There may be a little loss of precision on some programs but that should be very infrequent for embedded code or for clean system code.

One example regression is for the following program:

int t[10]  *p  x; 
main(int c  char **v){ 
  if (c) p=t; 
  Frama_C_show_each_before(p); 
  x = *p++; 
  Frama_C_show_each_after(p); 
} 
$ /usr/local/Frama-C_Carbon/bin/frama-c -val t.c  
... 
[value] Called Frama_C_show_each_before({{ &NULL ; &t ;}}) 
... 
t.c:6:[kernel] warning: out of bounds read. assert \valid(tmp); 
[value] Called Frama_C_show_each_after({{ &t + {4; } ;}}) 
... 
$ bin/toplevel.opt -val t.c 
... 
[value] Called Frama_C_show_each_before({{ &NULL ; &t }}) 
... 
t.c:6:[kernel] warning: out of bounds read. assert \valid(tmp); 
[value] Called Frama_C_show_each_after({{ &NULL + {4} ; &t + {4} }}) 
... 

Before the assignment at the call to Frama_C_show_each_before() the value of p is {{ &NULL ; &t }}. The pointer p is then dereferenced. This is dangerous as underscored by the alarm. Once the pointer has been dereferenced Carbon knows that it wasn't NULL in the first place whereas the newer development version doesn't and thinks it may now also be (int*)NULL+1.

The issue here comes from the fact that the variable actually dereferenced in the Abstract Syntax Tree is not p but a variable tmp related to p:

$ bin/toplevel.opt -print t.c 
... 
 Frama_C_show_each_before(p); 
  { /* ... */  tmp = p; p ++; x = *tmp; } 
  Frama_C_show_each_after(p); 
... 

To be clear the representation of the program did not change here between Carbon and the development version. What changed is the ability that Carbon had to remember that tmp was first equal to p and then later to p minus four bytes so that the fact that tmp had been dereferenced and had to be valid meant that p could only be t plus four bytes.

The treatment of volatile variables will probably change in the next release this time because of a change in the representation of the program's AST.

Option -mem-exec is gone. This option made a single modular analysis of a function's code and re-used the results at each call site (instanciating pointers in the generic initial state appropriately with addresses from the state at the call site if needed). You can see it in action on the following program:

int *p; 
int *q; 
void f(void) 
{ 
  *p = 4; 
  *q = 5; 
} 
int A B; 
main() 
{ 
  A = 10; 
  B = 11; 
  p = &A; 
  q = &B; 
  f(); 
  p = &A; 
  q = &A; 
  f(); 
  return 0; 
} 
$ /usr/local/Frama-C_Carbon/bin/frama-c -val -mem-exec f m.c 
... 
[value] computing for function f <-main. 
        Called from m.c:19. 
... Instanciation succeeded: {{ S_p -> {{ A -> {0; } ;}}; 
                                S_q -> {{ B -> {0; } ;}};  }} 
[value] Done for function f 
[value] computing for function f <-main. 
        Called from m.c:23. 
m.c:23:[value] Failed to see context as an instance of the generic context: 
           inlining call to f. 
... 

Function f is pre-analyzed in a generic state in which p points to some location S_p and q points to S_q. The particular state at the call site line 19 is found to be captured by the generic state with S_p being A and S_q being B so the pre-analysis can be reused (taking into account that the changes that happened to S_p in the pre-analysis happen to A here and so on).

On the other hand the state at the second call site (line 23) is not found to be an instance of the generic state because there is an aliasing relation between p and q that may completely change the meaning of the function. The analysis concludes that the generic analysis cannot be reused here and enters a specialized analysis of f same as if option -mem-exec had not been used.

I think this feature is a fine piece of engineering (yes I wrote that part myself. There are other parts I wrote myself I do not think are great at all). Nevertheless no-one used it and it had some long-standing bugs. The use case for this option was not clear. You can use instead -no-results-function -slevel-function or replace a complex function by a simpler stub written with value analysis primitives.

So this was a partial list of features that will go away in Nitrogen. There are many other bits that no-one is using and that only make new developments difficult. Many of these were never documented so if you use the value analysis in theory you shouldn't even notice that they are gone.

Pascal Cuoq
21st Jul 2011