Floating-point quiz

Pascal Cuoq - 8th Nov 2011

Here is a little quiz you can use to test your C floating-point expertise. I have tried to write the examples below so that the results do not depend too much on the platform and compiler. This is theoretically impossible since C99 does not mandate IEEE 754 floating-point semantics, but let us assume that the compiler at least tries. It could be a recent GCC on 8087-class hardware, for instance.

Question 1

What's the return code of this program?

```main(){
return 0.1 == 0.1f;
}
```

Answer: the program returns `0`. Promotion and conversion rules mean that the comparison take place between `double` numbers. The decimal number `0.1` is represented differently as a single-precision `float` and as a double-precision `double` (and none of these two representations is exact), so when `0.1f` is promoted to `double`, the result is quite a bit different from the `double` representing `0.1`.

Question 2

```main(){
float f = 0.1;
return f == 0.1f;
}
```

Answer: the program returns `1`. This time, the comparison takes place between `float` numbers. But first things first: variable `f` is initialized with the `double` representation of `0.1`, but this number has to be converted to `float` to fit `f`. As a result, `f` ends up containing only those digits of `0.1` that fit into a `float` mantissa. When the contents of `f` are read back, they compare exactly to the `0.1f` single-precision constant.

Question 3

```main(){
float f = 0.1;
double d = 0.1f;
return f == d;
}
```

Answer: the program returns `1`. The comparison takes place between `double` numbers again. The left-hand side is the promotion to double of the single-precision representation of `0.1`. The right-hand side is the contents of double-precision variable `d`, that has been initialized with the conversion to `double` of the single-precision representation of `0.1`. The two sequences of operations produce the same result.

Question 4

```main(){
double d1 = 1.01161128282547f;
double d2 = 1.01161128282547;
return d1 == d2;
}
```

Answer: the program returns `0`. The decimal number `1.01161128282547` is no more representable than `0.1`, and again, its `double` representation in `d2` has more digits than its `float` representation converted to double in `d1`.

For a fractional number to be representable as a (base 2) floating-point number, its decimal expansion has to end in `5`, although the converse isn't true. Numbers `0.5` and `0.625` are representable as floating-point numbers, but `0.05`, `0.1` and `1.01161128282547` aren't. A number may also have the same representation as `float` and `double` although neither of these two representations is exact: for this to happen, it suffices that the 29 additional binary digits available in the `double` format be all zeroes.

Question 5

```main(){
float f1 = 1.01161128282547f;
float f2 = 1.01161128282547;
return f1 == f2;
}
```

Answer: if this looks like a trick question, it's because it is. The program returns `0`. Variable `f1` is initialized with the single-precision representation of `1.01161128282547`. On the other hand, `f2` receives the conversion to `float` of the double representation of this number. In this particular case, the two are not the same: the number 1.01161128282547 is actually very close to the middle point of two successive floating-point numbers. When it is first rounded to double (when initializing `f2`), it is rounded to the middle point itself (which happens to be representable as a `double`). When that `double` is rounded to a `float`, applicable rounding rules send it to the `float` on the opposite side of the middle point we started from. On the other hand, when initializing `f1`, the original number is rounded directly to the nearest `float`.

```       ~1.01161122                    ~1.01161128                     ~1.01161134
+------------------------------+------------------------------+
f2                                ^                            f1
original number
```

I could make another series of questions, somewhat symmetrical to this one, where two different but standard-complicant compilers produce different results each time, but that wouldn't be as much fun. The examples here were relatively well defined. The rules that make them puzzling (or not) apply indiscriminately to most compilers. Unless they do not even try to follow C99's guideline that recommends IEEE 754 arithmetics.

Pascal Cuoq
8th Nov 2011