More on the precise analysis of C programs for FLT_EVAL_METHOD==2

Pascal Cuoq - 15th Jul 2013

Introduction

It started innocently enough. My colleagues were talking of supporting target compilers with excess floating-point precision. We saw that if analyzing programs destined to be compiled with strict IEEE 754 compilers was a lovely Spring day at the beach analyzing for compilers that allow excess precision was Normandy in June 1944. But we had not seen anything yet.

The notion of compile-time computation depends on the optimization level

One first obvious problem was that of constant expressions that were evaluated at compile-time following rules that differed from run-time ones. And who is to say what is evaluated at compile-time and at run-time? Why it even depends for one same compiler on the optimization level:

#include <stdio.h> 
int r1  r2; 
double ten = 10.0; 
int main(int c  char **v) 
{ 
  ten = 10.0; 
  r1 = 0.1 == (1.0 / ten); 
  r2 = 0.1 == (1.0 / 10.0); 
  printf("r1=%d r2=%d"  r1  r2); 
} 

Note how compared to last time we make the vicious one-line change of assigning variable ten again inside function main().

$ gcc -v 
Target: x86_64-linux-gnu 
… 
gcc version 4.4.3 (Ubuntu 4.4.3-4ubuntu5.1)  
$ gcc -mno-sse2 -mfpmath=387 -std=c99 -O2  s.c && ./a.out  
r1=1 r2=1 
$ gcc -mno-sse2 -mfpmath=387 -std=c99  s.c && ./a.out  
r1=0 r2=1 

So the problem is not just that the static analyzer must be able to recognize the computations that are done at compile-time. A precise static analyzer that went this path would in addition have to model each of the tens of optimization flags of the target compiler and their effects on the definition of constant expression.

Fortunately for us after many varied complaints from GCC users Joseph S. Myers decided that 387 floating-point math in GCC was at least going to be predictable. That would not solve all the issues that had been marked as duplicates of the infamous bug 323 over its lifetime but it would answer the valid ones.

A ray of hope

Joseph S. Myers provided a reasonable interpretation of the effects of FLT_EVAL_METHOD in the C99 standard. The comparatively old compiler we used in the previous post and in the first section of this one does not contain the patch from that discussion but recent compilers do. The most recent GCC I have available is SVN snapshot 172652 from 2011. It includes the patch. With this version of GCC we compile and execute the test program below.

#include <stdio.h> 
int r1  r2  r3  r4  r5  r6  r7; 
double ten = 10.0; 
int main(int c  char **v) 
{ 
  r1 = 0.1 == (1.0 / ten); 
  r2 = 0.1 == (1.0 / 10.0); 
  r3 = 0.1 == (double) (1.0 / ten); 
  r4 = 0.1 == (double) (1.0 / 10.0); 
  ten = 10.0; 
  r5 = 0.1 == (1.0 / ten); 
  r6 = 0.1 == (double) (1.0 / ten); 
  r7 = ((double) 0.1) == (1.0 / 10.0); 
  printf("r1=%d r2=%d r3=%d r4=%d r5=%d r6=%d r7=%d"  r1  r2  r3  r4  r5  r6  r7); 
} 

We obtain the following results different from the results of the earlier version of GCC but independent of the optimization level and understandable (all computations are done with FLT_EVAL_METHOD==2 semantics):

$ ./gcc-172652/bin/gcc -mno-sse2 -mfpmath=387 -std=c99  t.c && ./a.out  
r1=1 r2=1 r3=0 r4=0 r5=1 r6=0 r7=0 

As per the C99 standard the choice was made to give the literal 0.1 the value of 0.1L. I am happy to report that this simple explanation for the values of r2 and r7 can be inferred directly from the assembly code. Indeed the corresponding constant is declared in assembly as:

.LC1: 
	.long	3435973837 
	.long	3435973836 
	.long	16379 
	.long	0 

Quiz: why is it obvious in the above assembly code for a long double constant that the compiler used the long double approximation for 0.1 instead of the double one?

As described the semantics of C programs compiled with FLT_EVAL_METHOD==2 are just as deterministic as if they were compiled with FLT_EVAL_METHOD==0. They give different results from the latter but always the same ones regardless of optimization level interference from unrelated statements and even regardless of the particular compiler generating code with FLT_EVAL_METHOD==2. In the discussion that followed between Joseph Myers and Ian Lance Taylor this is called “predictable semantics” and it is a boon to anyone who whishes to tell what a program ought to do when executed (including but not limited to precise static analyzers).

Implementation detail: source-to-source transformation or architecture option?

Now that at least one C compiler can be said to have predictable behavior with respect to excess precision the question arises of supporting FLT_EVAL_METHOD==2 in Frama-C. This could be one more of the architecture-dependent parameters such as the size of type int and the endianness.

The rules are subtle however and rather than letting each Frama-C plug-in implement them and get them slightly wrong it would be less error-prone to implement these rules once and for all as a source-to-source translation from a program with FLT_EVAL_METHOD==2 semantics to a program that when compiled or analyzed with FLT_EVAL_METHOD==0 semantics computes the same thing as the first one.

The destination of the transformation can be a Frama-C AST

A translated program giving when compiled with strict IEEE 754 semantics the FLT_EVAL_METHOD==2 semantics of an existing program can be represented as an AST in Frama-C. Here is how the translation would work on an example:

double interpol(double u1  double u2  double u3) 
{  
  return u2 * (1.0 - u1) + u1 * u3;   
} 

Function interpol() above can be compiled with either FLT_EVAL_METHOD==0 or with FLT_EVAL_METHOD==2. In the second case it actually appears to have slightly better properties than in the first case but the differences are minor.

A source-to-source translation could transform the function into that below:

double interpol_80(double u1  double u2  double u3) 
{  
  return u2 * (1.0L - (long double)u1) + u1 * (long double)u3;   
} 

This transformed function interpol_80() when compiled or analyzed with FLT_EVAL_METHOD==0 behaves exactly identical to function interpol() compiled or analyzed with FLT_EVAL_METHOD==2. I made an effort here to insert only the minimum number of explicit conversions but a Frama-C transformation plug-in would not need to be so punctilious.

The source of the transformation cannot be a Frama-C AST

There is however a problem with the implementation of the transformation as a traditional Frama-C transformation plug-in. It turns out that the translation cannot use the normalized Frama-C AST as source. Indeed if we use a Frama-C command to print the AST of the previous example in textual form:

~ $ frama-c -print -kernel-debug 1 t.c 
… 
/* Generated by Frama-C */ 
int main(int c  char **v) 
{ 
  /* Locals: __retres */ 
  int __retres; 
  /* sid:18 */ 
  r1 = 0.1 == 1.0 / ten; 
  /* sid:19 */ 
  r2 = 0.1 == 1.0 / 10.0; 
  /* sid:20 */ 
  r3 = 0.1 == 1.0 / ten; 
  /* sid:21 */ 
  r4 = 0.1 == 1.0 / 10.0; 
  … 
} 

Explicit casts to a type that an expression already has such as the casts to double in the assignments to variables r3 and r4 are erased by the Frama-C front-end as part of its normalization. For us this will not do: these casts although they convert a double expression to double change the meaning of the program as shown by the differences between the values of r1 and r3 and respectively or r2 and r4 when one executes our example.

This setback would not be insurmountable but it means complications. It also implies that FLT_EVAL_METHOD==2 semantics cannot be implemented by individual plug-ins which looked like a possible alternative.

To conclude this section on a positive note if the goal is to analyze a C program destined to be compiled to the thirty-year old 8087 instructions with a recent GCC compiler we can build the version of Frama-C that will produce results precise to the last bit. The amount of work is not inconsiderable but it is possible.

But wait!

But what about a recent version of Clang? Let us see using the same C program as previously:

#include <stdio.h> 
int r1  r2  r3  r4  r5  r6  r7; 
double ten = 10.0; 
int main(int c  char **v) 
{ 
  r1 = 0.1 == (1.0 / ten); 
  r2 = 0.1 == (1.0 / 10.0); 
  r3 = 0.1 == (double) (1.0 / ten); 
  r4 = 0.1 == (double) (1.0 / 10.0); 
  ten = 10.0; 
  r5 = 0.1 == (1.0 / ten); 
  r6 = 0.1 == (double) (1.0 / ten); 
  r7 = ((double) 0.1) == (1.0 / 10.0); 
  printf("r1=%d r2=%d r3=%d r4=%d r5=%d r6=%d r7=%d"  r1  r2  r3  r4  r5  r6  r7); 
} 
$ clang -v 
Apple LLVM version 4.2 (clang-425.0.24) (based on LLVM 3.2svn) 
$ clang -mno-sse2 -std=c99  t.c && ./a.out 
r1=0 r2=1 r3=0 r4=1 r5=1 r6=0 r7=1 

Oh no! Everything is to be done again… Some expressions are evaluated as compile-time with different results than the run-time ones as shown by the difference between r1 and r2. The explicit cast to double does not seem to have an effect for r3 and r4 as compared to r1 and r2. This is different from Joseph Myers's interpretation but if it is because floating-point expressions are always converted to their nominal types before being compared it may be a good astonishment-lowering move. The value of r5 differs from that of r1 pointing to a non-obvious demarcation line between compile-time evaluation and run-time evaluation. And the values of r5 and r6 differ meaning that our interpretation “explicit casts to double have no effects” based on the comparison of the values of r1 and r3 on the one hand and r2 and r4 on the other hand is wrong or that some other compilation pass can interfere.

What a mess! There is no way a precise static analyzer can be made for this recent version of Clang (with these unfashionable options). Plus the results depend on optimizations:

$ clang -mno-sse2 -std=c99 -O2  t.c && ./a.out 
r1=0 r2=1 r3=0 r4=1 r5=1 r6=1 r7=1 

FLT_EVAL_METHOD is not ready for precise static analysis

In conclusion it would be possible and only quite a lot of hard work to make a precise static analyzer for programs destined to be compiled to x87 instructions by a modern GCC. But for most other compilers even including recent ones it is simply impossible: the compiler gives floating-point operations a meaning that only it knows.

This is the sort of problem we tackled in the Hisseo project mentioned last time. One of the solutions we researched was “Just do not make a precise static analyzer” and another was “Just analyze the generated assembly code where the meaning of floating-point operations has been fixed”. A couple of years later the third solution “Just use a proper compiler” is looking better and better. It could even be a certified one although it does not have to. Both Clang and GCC when targeting the SSE2 instruction set give perfect FLT_EVAL_METHOD==0 results. We should all enjoy this period of temporary sanity until x86-64 processors all sport a fused-multiply-add instruction.

Two things I should point out as this conclusion's conclusion. First with the introduction of SSE2 the IA-32 platform (and its x86-64 cousin) has gone from the worst platform still in existence for predictable floating-point results to the best. It has correctly rounded operations for the standard single-precision and double-precision formats and it retains hardware support for an often convenient extended precision format. Second the fused-multiply-add instruction is a great addition to the instruction set and I for one cannot wait until I get my paws on a processor that supports it but it is going to be misused by compilers to compile source-level multiplications and additions. Compilers have not become wiser. The SSE2 instruction set has only made it more costly for them to do the wrong thing than to do the right one. They will break predictability again as soon as the opportunity comes and the opportunity is already in Intel and AMD's product pipelines.

Pascal Cuoq
15th Jul 2013