I have no doubt that people have had their lives ruined, or even died, as the result of flaws in system programming, but is anyone actually tracking this? Is it "countless?"
I'd really like to see an organization when evaluating tools or practices, do an actual risk assessment even just once. Even a hand-wavy conversation where we make 0th-order number-of-zeros estimates on any of the quantities involved in making such a decision.
I'm a huge fan of thorough designs, engineering formalities and quantifying application performance, but if we're building some fast food ordering app is expected to increase sales by 10 percent as soon as it's done and there's no risk from quality or availability or performance of the application (other than the 10 percent extra sales, we expect people to walk in / drive thru like normal if it under-performs on those metrics) then it's obvious the business value is in launching a minimal implementation as soon as possible using tools that optimize for productivity, not safety or performance.
Too often I see dogmatic pure functional TDD line of business app with continuous deployment infrastructure that could break for two weeks at a time without impacting anyone, or the copy-pasted-code-from-google-searches business-critical application that loses money for every hour it's offline but has no software development lifecycle or monitoring or alerting or anything.
It's not "countless" it's "uncountable" because we couldn't agree on what a software caused death was.
In my CS program we had an ethics class that included stories of bad X-ray machine software that overdosed people. Bad, bad bad. I don't think many people died as a direct result, but 10 years later there was probably a spiked cancer incidence. Did software kill people? Well yeah....kinda.
In airplane systems, there have been a number of cases where bad alerting / warning systems basically either misled the pilot or lulled them into a false sense of security prior to events that caused crashes. Did software kill people? Uhm, yes, I think, sort of, but not directly?
It's only when we get fully sentient AI that arms itself and decides to clean up the human pestilence that we'll be able to draw a straight line there. :)
Errors in the software of a MIM-104 Patriot resulted in failure to locate and intercept an incoming missile and the death of 28 soldiers.
http://www.gao.gov/products/IMTEC-92-26
There was no error in the Toyota throttle control system. It was a pedal error with floor mats, and a separate pedal design error for other models.[1], combined with operator error. If you are interested, I highly recommend Malcolm Gladwell's podcast Revisionist History, which did an episode on this[2]. Long story short, your brakes can easily stop your engine at full open throttle, and in not much more space than braking without any throttle. Unfortunately, a foible of human behavior seems to lead us to get flustered and often not do the right thing in situations like these.
I agree that the pedal design, human error and floor mat problems are much more likely causes. I don't claim it was a software error for sure. My understanding is that indeed no specific software error was identified, but it also was never ruled out for sure. Neither of the two links seem to contain anything to that effect either.
To my memory, the podcast lays out a fairly comprehensive argument that it was just human error in the case of mechanical problems (as we'll as noting the car computers in all the cases show the brake wasn't pushed), and back it up with decades of research showing that this is a common problem, so that's about as close to definitive as you can get in this situation IMHO.
> Errors in the software of the radiation therapy machine Therac-25…
Caused by unsigned 8-bit integer overflow combined with (presumably) treating 0 as falsy. May or may not have been solved by using Rust (but probably wouldn't have because it sounds like a logic error in "clever" code).
> Errors in the software of a MIM-104 Patriot…
Caused by loss of precision in floating-point calculations. Would not have been solved by using Rust.
The other two were more likely human or mechanical than (unspecified) software errors.
Integer overflow is defined to panic in debug builds, and either do that or two's compliment overflow in release builds. The current implementation overflows. However, zero is not false.
I assume that treating 0 as false was intentional in this case (it probably was written in PDP-11 ASM) and a direct translation of the Therac code therefore would be
How many people have died from terrible UIs in apps that are commonly used while driving? Obviously, personal responsibility is primary here, but I don't think the app designers and builders should be absolved of all guilt.
I was just thinking about this the other day when using Spotify in the car. Spotify's UI is pretty good, but after they lost all my saved songs, I've had to resort to only using playlists, which require 3x as many clicks to add a song.
Yes as we move forward we should strive for more safety but 'countless lives lost' looks to blame software for that.
I think realistically we need to ask: would it be a better situation if no system program were allowed for lack of safety. Because I think humanity in general is better with existence of unsafe software vs no software exist at all unless it is safe as per rigorous safety standards.
It was a dumb statement unless he meant... given context of software maintenance... lives "wasted" toiling away on C or C++ monoliths. There are indeed countless lives and billions wasted on that that might be reduced or situation improved by better tooling.
Even by solving safety, erros in software will still happen. The software errors cited are much more about bad engineering, something that can't be solved by safety only.
I have no doubt that people have had their lives ruined, or even died, as the result of flaws in system programming, but is anyone actually tracking this? Is it "countless?"