Close to 100% of the SSD failures I've seen have not been from being left off fo...

qb45 · on May 13, 2016

Part of what you count as firmware bugs may have been cases of data/firmware corruption caused by unexpected power loss.

kabdib · on May 13, 2016

That's a firmware bug, then.

I've written a few flash storage systems (small ones, for consumer devices). If your data structures are such that you can't do an atomic commit, you shouldn't be in the marketplace.

Sudden loss of power is not an excuse.

qb45 · on May 13, 2016

There was some widely circulated study few years ago which found that most SSD models they tested had problems on power loss. Some even bricked. They didn't publish concrete names, unfortunately.

But yeah, I agree it's a bug and shouldn't happen. So what :)

edit:

There we go: https://www.usenix.org/system/files/conference/fast13/fast13...

Some more: http://lkcl.net/reports/ssd_analysis.html

Note that the 320 recommended by the second article had similar issues, supposedly fixed in firmware 4PC10362, but some users reported problems even on that version: https://communities.intel.com/thread/24339?start=15 (many of those complaints look like different problems, but not all).

cptskippy · on May 13, 2016

I had corruption issues with an OCZ drive on a machine with unstable power. Since adding a UPS the drive has been rock solid.

kabdib · on May 13, 2016

We had corruption / sudden death issues with OCZ even on machines with redundant power supplies. I do not recommend them.

rincebrain · on May 14, 2016

My "favorite" were the OCZ Vertex 1 drives which almost certainly had issues with the firmware eating paste and overwriting some internal metadata - periodically the drives would start throwing read/write errors on swathes of LBAs, and the only way to mitigate this would be to throw the jumper on to put the drive in "recovery" mode and do a destructive firmware reflash that blew away all the contents.

And then it would work perfectly well again...for a while.

rincebrain · on May 14, 2016

I agree with the downstream commenter about this still qualifying as a FW bug, but to be more clear:

* there's the story I posted about the OCZ V1 drives eating paste periodically while behind redundant power.

* there was a fun problem of some OCZ Deneva 2 drives deciding that they would simply drop off the port and not come back without a cold power cycle, behind redundant power (and a related story about the drives being in such a state, and the AHCI init code on the BIOS getting confused and hanging forever, which was what actually necessitated the cold cycle).

* there was the really fun case someone I worked with came up with where they found a way to reproducibly brick a particular family of SSDs using only about a TB of IO on a 480 GB SSD. (And by "brick", I mean "call the manufacturer, it's not coming back.")

* there was the SandForce bug once upon a time which caused the affected SSDs to not show up correctly as disks under Windows when on SATL because the firmware reported an empty "version" field, and while all the other OSes tested were content with this, and Windows when not using SATL was content with this, Windows when using SATL would report the disk as an "Unknown Device" for this.

executesorder66 · on May 13, 2016

> mechanical failure.

How does that happen?

mirimir · on May 13, 2016

I had my first computer fire, last year. One of the contacts in an SSD power connector delaminated, causing a short. The SSD was toast. Or at least, the power connector had melted. The power supply was toast. But with a new power supply, the machine itself is fine :)

MichailP · on May 13, 2016

Crushing the SSD disk or whole laptop :)

ptaipale · on May 13, 2016

Connectors are the weak point?