Iterating over the AST
Virgile Prevosto - 21st May 2012Context
A facetious colleague who claims that he should be better writing his thesis but keeps committing Coq and OCaml files on Frama-C's repository, asked me the following question: Is there a function in Frama-C's kernel that can fold[1] a function over all types that appear in a C program?
As it turns out nothing is readily available for this exact purpose but another facetious colleague could have come up with a terse answer: There's a visitor for that!
Now we can be a bit more helpful and actually write that visitor. Our end goal is to come up with a function
fold_typ: (Cil_types.typ -> 'a -> 'a) -> 'a -> 'a
so that fold_typ f init
will be f t_1 (f t_2 (... (f t_n init)...))
where the t_i
are all the C types appearing in a given C program (in no particular order).
Frama-C's visitors
The visitor pattern is a well-known object-oriented design pattern that can be used to perform a given action over all nodes of a complex data structure (in our case the Abstract Syntax Tree -AST for short- of a C program). Frama-C provides a generic visitor mechanism built upon CIL's visitor whose entry points can be found in the aptly named src/kernel/visitor.mli
file. It is also documented in section 5.14 of the developer manual. In summary you first define a class that inherits from the generic visitor and overrides the methods corresponding to the nodes you're interested in (in our case this will be vtype
for visiting uses of Cil_types.typ
). Then you apply an object of this class to the function from the Visitor
module that correspond to the subpart of the AST that you want to inspect (in our case this will be the whole AST).
Basic version
The standard visitor does not provide anything to return a value outside of the AST. In fact all the entry points in Visitor
return a node of the same type that the node in which the visit starts (for visitors that don't perform code transformation this is in fact physically the same node). But if you accept to sneak in a little bit of imperative code into your development -and by using Frama-C you've already accepted that- there is an easy way out: pass to the visitor a reference that it can update and you just have to read the final value that reference is holding after the visit. The visitor then looks like the following:
class fold_typ_basic f acc = object inherit Visitor.frama_c_inplace method vtype ty = acc:= f ty !acc; Cil.DoChildren end
And that's it. Each time the visitor sees a type it will apply f
to ty
and the result of the previous computations stored in acc
. fold_typ
then just needs to call the visitor over the whole AST and give it a reference initialized with init
:
let fold_typ f init = let racc = ref init in let vis = new fold_typ_basic f racc in Visitor.visitFramacFileSameGlobals vis (Ast.get()); !racc
Don't do the same work twice
This first version that is barely more than 10 LoC works but we can do a little better. Indeed f
will be called each time a type is encountered in the AST. In most cases we want to call f
once for any given type. This can be done quite simply by memoizing in an instance variable of our visitor the set of types encountered thus far. Frama-C's Cil_datatype
module (cil/src/cil_datatype.mli
) provides all the needed functions for that:
class fold_typ f acc = object inherit Visitor.frama_c_inplace val mutable known_types = Cil_datatype.Typ.Set.empty method vtype ty = if Cil_datatype.Typ.Set.mem ty known_types then Cil.DoChildren else begin known_types <- Cil_datatype.Typ.Set.add ty known_types; acc:= f ty !acc; Cil.DoChildren end end
Testing the infrastructure
It is now time to test if everything works smoothly. The following function will print the name and size of the type who has the biggest size in the analyzed program.
let test () = let f ty (maxty maxsize as acc) = try let size = Cil.sizeOf_int ty in if size > maxsize then (ty size) else acc with Cil.SizeOfError _ -> acc in let (ty size) = fold_typ f (Cil.voidType 0) in Format.printf "Biggest type is %a @ with size %d@." !Ast_printer.d_type ty size
Since it is only a quick test we don't do anything special if Cil complains that it cannot compute the size of a given type: we just stick to the maximal value computed so far.
File fold_typ.ml provides the code for the visitor and the test function. frama-c -load-script fold_typ.ml file.c
should output something like
[kernel] preprocessing with "gcc -C -E -I. file.c" Biggest type is struct Ts with size 44
Exercises
- How can you use the
fold_typ
class to define aniter_typ
function that apply a functionf
returningunit
to each type of the AST (val iter_typ: (Cil_types.typ -> unit) -> unit
)? - Writing
fold_typ
is a bit overkill if you're going to apply it once in your plug-in. Write a specialized visitor that will do the same thing as thetest
function above.
Notes
[1] For those who are not familiar with functional programming: http://en.wikipedia.org/wiki/Fold_%...