Skip to main content
  1. page/

Extended Precision Problem.

·806 words
Andy Jackson
Author
Andy Jackson
Fighting entropy since 1993
Table of Contents

Introduction #

The simulation that hit this problem involved random-walking hard spheres until they just touch - i.e. the output of a FP computation (determining the distance between two points) was being compared to the required separation (2r) at various stages of the code.

Examples #

Comparing double precision numbers that may be exactly equal (and in this case where they should be ==) is always a bit hairy, and to be avoided. The code I inherited relied on such a comparison always being consistent, which need not be the case. A simple example would be (in C):

int main()
{
  double          x=sqrt(2.0);
  volatile double y=x;
  if (x!=y) printf("Uh oh!\n");
  return 0;
}

One would assume that this should never print ‘Uh oh!’, but when compiled with optimization under gcc for x86 [1] it can indeed find that x!=y. Similar problems can arise with code of the form:

if( (y=_{something-nasty-mathsy-expression}_) > x ) {
  // Do something fab...
  _{something fab}_
  // Check consistency:
  assert( y > x );
}

The assert line can fail, even though the if construct would seem to assure that y is always greater than x here. As before, the failure only tends to happen under optimization.

Explanation #

The issue is that under optimization, more floating point numbers are stored in registers more frequently (or rather, stored to memory less frequently as storing things is slow), and the registers on these chips actually store floating point numbers using, say, 80 bits (’extended precision’) instead of 64 bits. If the numbers are rounded at every stage (e.g. by being stored to memory) then no inconsistencies arise, but if not, as is more common for optimized code, then FP variables utilised at different stages of the code may differ due to this variable degree of rounding. In the above assertion example, the first comparison happens on chip just after y is computed, at extended precision, and determines y to be greater than x. Later on, after both variables have been rounded to 64 bits via a store-to-memory operation, x is no longer greater than y because at this level of precision x is in fact equal to y.

This also explains another thing I hit with this code: that by adding a subroutine call (e.g. printf("")) in certain places in such codes the unexpected behaviour can be stopped. A function call forces all variables to be stored to memory (or at least onto the stack), unless the function is inlined.

Fix #

As noted in the references below, there may always be some residual problems due to this kind of extended precision issue. However, under gcc on x86, you can get rid of a lot of them by using the -ffloat-store compiler flag (if I remember correctly, it only works when placed after the -O option, but this contradicts the manual and so I may be wrong about that). I have only observed this problem under gcc C/C++, it did not happen under Portland Group’s pgcc/pgCC and I have no idea if it can happen under other languages or compilers. For what it’s worth, I suspect it does not happen when using FORTRAN - the standard is probably rather specific on this issue.

The residual problems mentioned above arise in the temporaries used during evaluation - in short, everything you assign, e.g. the x in x = {something-mathsy} will be rounded consistently, but the temporaries used in {something-mathsy} will often be evaluated under extended precision. For many simulation purposes this is fine (and frequently desirable), but for the purposes of the purest forms of numerical analysis (where one is concerned with creating exactly reproducible algorithms dependent on e.g. IEEE arithmetic and independent of compiler-specific and processor-specific issues) it may be troublesome.

However, for me, the real problem was that the code I was working with was designed to use floating point numbers when the entire thing could have been done using integer arithmetic. The spheres are made to random-walk on a lattice, and the discrete nature of this problem makes an integer algorithm possible and indeed preferable. Using floats under these conditions give the algorithm more freedom than it requires, and provides just enough rope for it to hang itself. Alternatively, if one insists on using floats, even just comparing the squares of the seperation between points (instead of taking the square root) would have led to a more stable algorithm, as the kind of numbers the code was dealing with would then never have been rounded at all.

Notes #

[1]The level of optimization required to observe this behaviour depends on the particular version of gcc being used.

References #