Ahah, Apple does have different math

ajross · on Jan 5, 2010

This isn't a new problem. The 80-bit internal 8087 FPU precision has always been a mismatch for the 64 bit IEEE double representation, even before the presence of SSE registers (which don't have the 80 bit mode) complicated things. Intel 8087 code has always been able to produce different results for the same source code, depending on when/whether/which intermediate results get spilled to memory. The Motorola 68k FPU had the same issue with higher internal than external precision, IIRC.

This isn't a bug. Both code paths produce results of the highest representable precision of the hardware in question. It's just that there are multiple hardware units capable of giving you the answer, and Apple's toolchain picks a different one than whatever Tom is using elsewhere.

And as has been pointed out in FooBarWidget's comment -- any code that relies on bit-precise results from floating point computation is almost certainly concealing precision bugs anyway. That's not the right way to approach floating point architecture.

ars · on Jan 6, 2010

For people unfamiliar with this, "spilled to memory" means rounded to 64 bits (from 80).

x87 does floating point math with 80 bits (in registers). But the variables are stored in memory locations with 64 bits. So the results are rounded.

The problem comes from when to do the rounding. And that can vary depending on if a register is needed for something else, optimizations, the order of code, etc.

profquail · on Jan 5, 2010

Tell PG: When I submitted this article, HN stripped the "!" characters out of the title...but I was then able to edit the story and put them back in there. I suppose that means there's a small bug in there somewhere...

notauser · on Jan 5, 2010

You can also use Unicode in comments, which may or ʎɐɯ ʇou be a bug but did surprise me.

sailormoon · on Jan 6, 2010

Why on earth would the ability to use unicode in comments be a bug? Seems like a basic feature of any modern site.

Perceval · on Jan 5, 2010

It's probably a feature not a bug, designed to automatically de-sensationalize sensationalist titles!!!

profquail · on Jan 6, 2010

Good point (I don't like sensational titles either) -- but shouldn't HN disallow you to edit the title and put them back in then?

echaozh · on Jan 6, 2010

Few people would come back and add the sensation back. If they do, they must have thought about it and been more rational.

lg · on Jan 6, 2010

i think it's because editing doesn't apply the mysterious scrubrules, might be intentional.

FooBarWidget · on Jan 5, 2010

It has been long known that one cannot rely on floating point for precise calculations, e.g. monetary values. One should use integers instead or some kind of other precise representation. But it does surprise me that even on the same CPU architecture you cannot expect every machine to produce the same results.

profquail · on Jan 5, 2010

I concentrated in optimization and numerical analysis for my undergrad degree in Math, and I'm still surprised on a regular basis how many programmers out there don't know about the inaccuracies and errors that come out of floating-point representations.

jwr · on Jan 5, 2010

My thoughts exactly. Tom Christiansen brings these things up as if they were a huge surprise, but they aren't, or at least shouldn't be.

To put it another way, it isn't Apple doing something wrong. If your code depends on rounding errors in FP representations, your code is buggy.

That's also why good numerical algorithms are hard to design. You need to worry about losing precision. As a simple example, if you calculate (a-b)c and a is almost equal to b, you're much better off doing ac-b*c. These tricks are important.

pmorici · on Jan 5, 2010

Are you really that surprised. Crappy programmers abound and many introductory programming classes have students write programs to calculate monetary things using floating point so it's no small wonder many people think it's ok.

brettnak · on Jan 6, 2010

As did, I. I am a professional programmer now and didn't go through the CS school ( To all sophomores out there: We do exist. ) Do they really not teach this sort of thing in a formal CS school?

wtallis · on Jan 6, 2010

Most CS undergraduate programs in the US suck enough these days that numerical analysis isn't taught at the undergrad level. For example, at my school, undergraduates get to choose between the automata/grammars/computability class and a "numerical methods" class that is really watered down. There is a two-semester undergraduate numerical analysis class, but as I recall it can't count toward your degree except as a general elective. To get numerical analysis to count as a CS elective, you have to take the graduate level class.

barrkel · on Jan 5, 2010

For the same CPU architecture, compiler and FPU control word, you should expect the same results.

For example, on x86, many C / C++ compilers interpret long double as a 64-bit float while others interpret it as 80-bit extended precision.

And extended precision TBYTE (in Intel syntax) does actually take up 10 bytes. There's a speed loss for not aligning, but no correctness loss. In a possibly related note, however, Apple has bizarre and unnecessarily strict alignment policies for Mac OS on x86:

http://blogs.embarcadero.com/eboling/2009/05/20/5607

tentonova2 · on Jan 5, 2010

For the same CPU architecture, compiler and FPU control word, you should expect the same results.

That's a stretch. Just because it's called "GCC" doesn't mean it behaves exactly the same on every platform, even if the CPU architecture is the same.

Apple makes extensive modifications, has their own ABI (see below), etc.

In a possibly related note, however, Apple has bizarre and unnecessarily strict alignment policies for Mac OS on x86:

It's not "bizarre and unnecessarily strict" if you want to be able to rely on SSE2+. Apple had the advantage of being able to define their ABI without regard to most legacy concerns, and so they did.

The reason that it's strictly enforced everywhere is that since Apple's compilers use SSE2+ they must be able to assume that, at function entry, the stack is properly (16 byte) aligned.

I understand your pain -- I've had to update a JIT implementation to deal with this, along with quite a bit of assembly that assumed 4 byte alignment, but Apple's reasoning makes sense.

See also: http://stackoverflow.com/questions/612443/why-does-the-mac-a...

barrkel · on Jan 5, 2010

The stack is not actually aligned on function entry, because the return address is on top, so more alignment will be needed to avoid SSE2 locals being misaligned. It's not so hard for the callee side of the ABI to make sure the stack is aligned if it's going to use SSE2 and friends; it's rather more onerous to require every call site to make the alignments for the benefit of the callee.

tentonova2 · on Jan 5, 2010

The stack is not actually aligned on function entry, because the return address is on top, so more alignment will be needed to avoid SSE2 locals being misaligned.

The stack has --known alignment-- on entry, which removes the need to compute alignment at runtime. Any other approach requires more instructions overall.

It's not so hard for the callee side of the ABI to make sure the stack is aligned if it's going to use SSE2 and friends; it's rather more onerous to require every call site to make the alignments for the benefit of the callee.

I disagree that it's onerous. It seems silly to increase the runtime costs in exchange for a minutely simplified compiler port. It's not as if non-4-byte aligned ABIs are unusual.

barrkel · on Jan 5, 2010

But instead of aligning the stack in one ___location, the callee, now it needs to be aligned everywhere. It's pretty probable that's more instructions everywhere.

And it's not a "minutely simplified compiler port". That statement is startlingly naive. Do you have any idea how much hand-coded inline assembly, both in the runtime library and in customer code, needs to be carefully reviewed and modified to port from a platform without this requirement to one with it? Particularly since almost every other platform targeting the same architecture doesn't have the requirement?

tentonova2 · on Jan 5, 2010

But instead of aligning the stack in one ___location, the callee, now it needs to be aligned everywhere. It's pretty probable that's more instructions everywhere.

SSE2 is used everywhere. That's unlikely.

And it's not a "minutely simplified compiler port". That statement is startlingly naive. Do you have any idea how much hand-coded inline assembly, both in the runtime library and in customer code, needs to be carefully reviewed and modified to port from a platform without this requirement to one with it? Particularly since almost every other platform targeting the same architecture doesn't have the requirement?

Do you have any idea what the advantages are of being able to use SSE2+ everywhere? I find your position to be startling naive, especially given the fact that the vast majority of the existing Mac OS X developer base did not have any hand-coded inline assembly targeted at x86-32.

Other than game developers, how many legacy x86-32 developers is Apple genuinely interested in courting? Even for game developers (or JIT authors, or otherwise) with an overabundance of x86 4-byte-alignment-assuming assembly, fixing stack alignment is an annoying issue, not an impossible one.

barrkel · on Jan 5, 2010

Ah yes, Apple doesn't want any more developers for its platform. I forgot about that.

tentonova2 · on Jan 5, 2010

No, Apple made a perfectly sane business and technical decision to optimize for their users and existing developer base rather than a small subset of the non-Apple developer base who would have issue with 16-byte stack alignment.

The reasoning makes sense and I'd have done the same. I fixed our code and moved on.

barrkel · on Jan 6, 2010

The vote is clearly in, and the majority is siding with Apple. I haven't changed my position, though, and the more you write in this thread, the more convinced I am that you don't know what you're talking about. The technical reasons are not strong; SSE2 is primarily for floating point ops and SIMD vectorized ops. Most user code does not use floating point, and it's hard for compilers to extract latent parallelism to produce vectorized code.

However, if you put the technical reasons aside, and only focus on the business reasons for making a choice here, it's clear to me that the best way to go is to fall in line with the existing precedents for the platform. That way maximizes your business upside. There is no business reason for wanting 16-byte alignment, only business reasons for not wanting it.

The technical case would need to outweigh the business case in order for it to win. But I don't see the technical case as being that strong. Floating point code is rare. Outside of scalable vector-oriented UI libraries and ___domain-specific number crunching, it's hardly ever used. Many architectures survived for decades with only optional support for floating point, in a coprocessor. Many embedded architectures still use emulated FP, if it's needed at all.

And I really meant it about shockingly naive back there. It tells me everything I need to know about what you know about commercial compilers: that you think of them in the academic sense of being the bit that turns text into code. There's more to it than that in the real world.

FooBarWidget · on Jan 5, 2010

> For the same CPU architecture, compiler and FPU control word, you should expect the same results.

Wait a minute, does that mean that floating point types don't have a stable ABI? If I have a library with function foo(float bar) compiled with compiler A, then it's not safe to compile an app - that uses that library - with compiler B?

barrkel · on Jan 5, 2010

By results, I meant computational results. The ABI is more a product of what type declarations you have to use to get interoperability on any given platform, and the platform vendor's compiler sets the lead here.

For example, the following C statement:

    printf("%d\n", sizeof(long double));

gives the following results on my box:

    Cygwin gcc:        12
    MSVC:               8
    Embarcadero bcc32: 10

So yes, if you are assuming that foo(long double bar) - that specific header definition - compiled with compiler A will be binary compatible when compiled with compiler B, then you are not necessarily safe.

Just `float`, though, is rather more solid.

modeless · on Jan 5, 2010

What Every Computer Scientist Should Know About Floating-Point Arithmetic: http://docs.sun.com/source/806-3568/ncg_goldberg.html

tentonova2 · on Jan 5, 2010

This shouldn't surprise anyone and I can't say I understand the breathless post. For more information, I recommend Write Great Code: Understanding the Machine -- Chapter 4, Floating-Point Representation.

9oliYQjP · on Jan 5, 2010

Apple doesn't use the x87 floating point unit. They've never needed to because they can rely on the SSE2 unit since any Intel Mac ever created has one. This isn't necessarily the case on the PC even though every modern one will have it as well. I'm pretty sure gcc is set to compile floating point numbers using a SSE2 code-path by default on OS X.

ssp · on Jan 6, 2010

One pedantic correction: SSE is the 4-wide floating point instruction set. SSE2 added integer instructions using the same registers.

jcl · on Jan 5, 2010

This reminds me of a page on fast and accurate geometric primitives; the code requires that you to flip the Intel FPU into a reduced precision mode to make the error easier to estimate.

http://www.cs.cmu.edu/~quake/robust.pc.html

TallGuyShort · on Jan 5, 2010

Wasn't a very similar phenomenon a huge blow to Intel's PR some years back?

profquail · on Jan 5, 2010

Sort of, but that had to do with a bug in the actual hardware of the Pentium Pro chip:

http://en.wikipedia.org/wiki/Pentium_FDIV_bug

wglb · on Jan 5, 2010

No, that was not the small errors that are shown here, but rather errors of very significant magnitude.

wendroid · on Jan 6, 2010

On plan9

    Trying compiler constants first...

    f  is 0.12345, rounded to 0.1235, expanded to 0.123450003564357760000000000000
    d  is 0.12345, rounded to 0.1235, expanded to 0.123450000000000000000000000000
    ld is 0.12345, rounded to 0.1235, expanded to 0.123450000000000000000000000000

    Now trying derived values...

    f  is 0.12345, rounded to 0.1235, expanded to 0.123450003564357760000000000000
    d  is 0.12345, rounded to 0.1235, expanded to 0.123450000000000000000000000000
    ld is 0.12345, rounded to 0.1235, expanded to 0.123450000000000000000000000000

So don't blame the architecture