The problem with differential testing is that at least one of the compilers must get it right
Pascal Cuoq - 25th Sep 2013A long time ago, John Regehr wrote a blog post about a 3-3 split vote that occurred while he was finding bugs in C compilers through differential testing. John could have included Frama-C's value analysis in his set of C implementations and then the vote would have been 4-3 for the correct interpretation (Frama-C's value analysis predicts the correct value on the particular C program that was the subject of the post). But self-congratulatory remarks are not the subject of today's post. Non-split votes in differential testing where all compilers get it wrong are.
A simple program to find double-rounding examples
The program below looks for examples of harmful double-rounding in floating-point multiplication. Harmful double-rounding occurs when the result of the multiplication of two double
operands differs between the double-precision multiplication (the result is rounded directly to what fits the double
format) and the extended-double multiplication (the mathematical result of multiplying two double
numbers may not be representable exactly even with extended-double precision so it is rounded to extended-double and then rounded again to double
which changes the result).
$ cat dr.c #include <stdio.h> #include <stdlib.h> #include <math.h> #include <float.h> #include <limits.h> int main(){ printf("%d %a %La" FLT_EVAL_METHOD DBL_MAX LDBL_MAX); while(1){ double d1 = ((unsigned long)rand()<<32) + ((unsigned long)rand()<<16) + rand() ; double d2 = ((unsigned long)rand()<<32) + ((unsigned long)rand()<<16) + rand() ; long double ld1 = d1; long double ld2 = d2; if (d1 * d2 != (double)(ld1 * ld2)) printf("%a*%a=%a but (double)((long double) %a * %a))=%a" d1 d2 d1*d2 d1 d2 (double)(ld1 * ld2)); } }
The program is platform-dependent but if it starts printing something like below then a long list of double-rounding examples should immediately follow:
0 0x1.fffffffffffffp+1023 0xf.fffffffffffffffp+16380
Results
In my case what happened was:
$ gcc -v Using built-in specs. Target: i686-apple-darwin11 ... gcc version 4.2.1 (Based on Apple Inc. build 5658) (LLVM build 2336.11.00) $ gcc -std=c99 -O2 -Wall dr.c && ./a.out 0 0x1.fffffffffffffp+1023 0xf.fffffffffffffffp+16380 ^C
I immediately blamed myself for miscalculating the probability of easily finding such examples getting a conversion wrong or following while (1)
with a semicolon. But it turned out I had not done any of those things. I turned to Clang for a second opinion:
$ clang -v Apple clang version 4.1 (tags/Apple/clang-421.11.66) (based on LLVM 3.1svn) Target: x86_64-apple-darwin12.4.0 Thread model: posix $ clang -std=c99 -O2 -Wall dr.c && ./a.out 0 0x1.fffffffffffffp+1023 0xf.fffffffffffffffp+16380 ^C
Conclusion
It became clear what had happened when looking at the assembly code:
$ clang -std=c99 -O2 -Wall -S dr.c && cat dr.s ... mulsd %xmm4 %xmm5 ucomisd %xmm5 %xmm5 jnp LBB0_1 ...
Clang had compiled the test for deciding whether to call printf()
into if (xmm5 != xmm5)
for some register xmm5
.
$ gcc -std=c99 -O2 -Wall -S dr.c && cat dr.s ... mulsd %xmm1 %xmm2 ucomisd %xmm2 %xmm2 jnp LBB1_1 ...
And GCC had done the same. Although to be fair the two compilers appear to be using LLVM as back-end so this could be the result of a single bug. But this would remove all the salt of the anecdote so let us hope it isn't.
It is high time that someone used fuzz-testing to debug floating-point arithmetic in compilers. Hopefully one compiler will get it right sometimes and we can work from there.