Numerical functions are not merely twisted
Pascal Cuoq - 25th Feb 2011In a previous post there was a cosine function that I made up for the purpose of blogging about it. Let us now look at a real cosine function (OpenBSD's libm derived from widespread code written at Sun in the 1990s) to get a feeling how far off we were:
double __kernel_cos(double x double y) { double a hz z r qx; int32_t ix; GET_HIGH_WORD(ix x); ix &= 0x7fffffff; /* ix = |x|'s high word*/ if(ix<0x3e400000) { /* if x < 2**27 */ if(((int)x)==0) return one; /* generate inexact */ } z = x*x; r = z*(C1+z*(C2+z*(C3+z*(C4+z*(C5+z*C6))))); if(ix < 0x3FD33333) /* if |x| < 0.3 */ return one - (0.5*z - (z*r - x*y)); else { if(ix > 0x3fe90000) { /* x > 0.78125 */ qx = 0.28125; } else { INSERT_WORDS(qx ix-0x00200000 0); /* x/4 */ } hz = 0.5*z-qx; a = one-qx; return a - (hz - (z*r-x*y)); } }
Some remarks in decreasing order of obviousness:
- the line
if(ix<0x3e400000) { /* if x < 2**27 */
contains a clear bug: the comment should undoubtedly be "if x < 2**-27". - The function computes the cosine of
x+y
. Argumenty
is the "tail" ofx
and sincex
is adouble
in the range -π/4 .. π/4y
is really small and only needs to be taken into account in the lower-order terms of the polynomial approximation. - This code accesses bits of the
double
argumentx
with 32-bit integers and this is mostly for purposes of speed (only "mostly" here. In other functions from the same library the same trick is used and the only justification is speed). This made sense at a time but on a modern architecture with a sensible ABI this is very much misguided. Argumentx
would be passed in a floating-point register and would stay there for the duration of the function if this pessimization did not force it to be written to memory. Worse I am pretty sure that with some of the architectures implementing SSE the program is penalized by long stalls for accessing with general registers memory that has been written from SSE registers. - You may wonder if it's necessary to think about the difference between
doubles
and reals. Look at it this way: if you did not have to think about this difference the whole mess withqx
would not be part of this function because with reals it doesn't have any effect to subtract the same quantity from two arguments of the same subtraction. And yes this correction is only there to obtain precision at a scale much smaller than we were verifying using the value analysis. The problem is this modification makes the code more complex for everyone. If you only need a precision of one thousandth but you absolutely must be sure that you have it otherwise the plane/nuclear power plant/satellite may crash it is now less certain that you have this precision because the increased complexity makes it less obvious the function works properly. We even found a bug although that was only in a comment. You'd better verify it carefully if that's the function you have decided to use.
To be fair regarding the last point I chose a cosine implementation on purpose because its result gets closer to zero (making it easier to observe a possible imprecision) when its argument gets farther from zero (meaning less precision available for computations if you are not careful). The qx
trick actually serves to get computations closer to zero where more precision is (in absolute terms) available. The sine implementation does not need such a trick.