Hacker News new | past | comments | ask | show | jobs | submit login

There are three reasons why that approach tends to work well in scientific computing circles.

First off individual computational errors are rarely important. EX: Simulating galactic evolution as long all the velocity's stay reasonable each individual calculation is fairly unimportant and bounds checking is fairly inexpensive.

Second, there is a minimal time constraint, losing 30 minutes of simulation time day is a reasonable sacrifice for gaining efficiency in other areas.

Third, computational resources tend to be expensive, local, and fixed. AKA, Cray Jaguar not all those spare CPU cycles running folding at home.

However, if you’re running VISA or World of Warcraft then you get a different set of optimizations.




First off individual computational errors are rarely important.

This is completely wrong. Arithmetic errors (or memory errors) tend not to just mess up insignificant bits. If it occurs on integer data, then your program will probably seg-fault (because integers are usually indices into arrays) and if it occurs in floating point data, you are likely to either produce a tiny value (perhaps making a system singular) or a very large one. If you are solving an aerodynamic problem and compute a pressure of 10^80, then you have might as well have a supernova on the leading edge of your wing. And flipping a minor bit in a structural dynamics simulation could easily be the difference between the building standing and falling.

I would argue that data mining is actually more tolerant of such undetected errors because they are more likely to remain local and may stand out as obviously erroneous. People are unlikely to die as the result of an arithmetic error in data mining.

Second, there is a minimal time constraint,

There is not usually a real-time requirement, though there are exceptions, e.g. http://spie.org/x30406.xml, or search for "real-time PDE-constrained optimization". But by and large, we are willing to wait for restart from a checkpoint rather than sacrifice a huge amount of performance to get continual uptime. If you need guaranteed uptime, then there is no choice but to run everything redundantly, and that still won't allow you to handle a nontrivial network partition gracefully. (It's not a database, but there is something like the CAP Theorem here.)


For the most part you can keep things reasonable with bounds checking. If pressure in some area is 10x those around it then there was a mistake in the calculation. If your simulating weather patterns on earth over time, having a single square mile 10 degrees above what it should be is not going to do horrible things to the model. Clearly there are tipping points but if it's that close to a tipping point the outcome is fairly random anyway.

Anyway, if you could not do this and accuracy is important, then you really would need to double check every calculation because there is no other way to tell if you had made a mistake.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: