← What functions does a function use: option -users Analyzing single-precision floating-point constants →

Floating-point quiz

Pascal Cuoq - 8th Nov 2011

Here is a little quiz you can use to test your C floating-point expertise. I have tried to write the examples below so that the results do not depend too much on the platform and compiler. This is theoretically impossible since C99 does not mandate IEEE 754 floating-point semantics, but let us assume that the compiler at least tries. It could be a recent GCC on 8087-class hardware, for instance.

Question 1

What's the return code of this program?

main(){ 
  return 0.1 == 0.1f; 
}

Answer: the program returns 0. Promotion and conversion rules mean that the comparison take place between double numbers. The decimal number 0.1 is represented differently as a single-precision float and as a double-precision double (and none of these two representations is exact), so when 0.1f is promoted to double, the result is quite a bit different from the double representing 0.1.

Question 2

main(){ 
  float f = 0.1; 
  return f == 0.1f; 
}

Answer: the program returns 1. This time, the comparison takes place between float numbers. But first things first: variable f is initialized with the double representation of 0.1, but this number has to be converted to float to fit f. As a result, f ends up containing only those digits of 0.1 that fit into a float mantissa. When the contents of f are read back, they compare exactly to the 0.1f single-precision constant.

Question 3

main(){ 
  float f = 0.1; 
  double d = 0.1f; 
  return f == d; 
}

Answer: the program returns 1. The comparison takes place between double numbers again. The left-hand side is the promotion to double of the single-precision representation of 0.1. The right-hand side is the contents of double-precision variable d, that has been initialized with the conversion to double of the single-precision representation of 0.1. The two sequences of operations produce the same result.

Question 4

main(){ 
  double d1 = 1.01161128282547f; 
  double d2 = 1.01161128282547; 
  return d1 == d2; 
}

Answer: the program returns 0. The decimal number 1.01161128282547 is no more representable than 0.1, and again, its double representation in d2 has more digits than its float representation converted to double in d1.

For a fractional number to be representable as a (base 2) floating-point number, its decimal expansion has to end in 5, although the converse isn't true. Numbers 0.5 and 0.625 are representable as floating-point numbers, but 0.05, 0.1 and 1.01161128282547 aren't. A number may also have the same representation as float and double although neither of these two representations is exact: for this to happen, it suffices that the 29 additional binary digits available in the double format be all zeroes.

Question 5

main(){ 
  float f1 = 1.01161128282547f; 
  float f2 = 1.01161128282547; 
  return f1 == f2; 
}

Answer: if this looks like a trick question, it's because it is. The program returns 0. Variable f1 is initialized with the single-precision representation of 1.01161128282547. On the other hand, f2 receives the conversion to float of the double representation of this number. In this particular case, the two are not the same: the number 1.01161128282547 is actually very close to the middle point of two successive floating-point numbers. When it is first rounded to double (when initializing f2), it is rounded to the middle point itself (which happens to be representable as a double). When that double is rounded to a float, applicable rounding rules send it to the float on the opposite side of the middle point we started from. On the other hand, when initializing f1, the original number is rounded directly to the nearest float.

       ~1.01161122                    ~1.01161128                     ~1.01161134 
            +------------------------------+------------------------------+ 
           f2                                ^                            f1 
                                       original number

I could make another series of questions, somewhat symmetrical to this one, where two different but standard-complicant compilers produce different results each time, but that wouldn't be as much fun. The examples here were relatively well defined. The rules that make them puzzling (or not) apply indiscriminately to most compilers. Unless they do not even try to follow C99's guideline that recommends IEEE 754 arithmetics.