This isn't a new problem. The 80-bit internal 8087 FPU precision has always been a mismatch for the 64 bit IEEE double representation, even before the presence of SSE registers (which don't have the 80 bit mode) complicated things. Intel 8087 code has always been able to produce different results for the same source code, depending on when/whether/which intermediate results get spilled to memory. The Motorola 68k FPU had the same issue with higher internal than external precision, IIRC.
This isn't a bug. Both code paths produce results of the highest representable precision of the hardware in question. It's just that there are multiple hardware units capable of giving you the answer, and Apple's toolchain picks a different one than whatever Tom is using elsewhere.
And as has been pointed out in FooBarWidget's comment -- any code that relies on bit-precise results from floating point computation is almost certainly concealing precision bugs anyway. That's not the right way to approach floating point architecture.
For people unfamiliar with this, "spilled to memory" means rounded to 64 bits (from 80).
x87 does floating point math with 80 bits (in registers). But the variables are stored in memory locations with 64 bits. So the results are rounded.
The problem comes from when to do the rounding. And that can vary depending on if a register is needed for something else, optimizations, the order of code, etc.
This isn't a bug. Both code paths produce results of the highest representable precision of the hardware in question. It's just that there are multiple hardware units capable of giving you the answer, and Apple's toolchain picks a different one than whatever Tom is using elsewhere.
And as has been pointed out in FooBarWidget's comment -- any code that relies on bit-precise results from floating point computation is almost certainly concealing precision bugs anyway. That's not the right way to approach floating point architecture.