Hacker News new | past | comments | ask | show | jobs | submit login

Close to 100% of the SSD failures I've seen have not been from being left off for a few years, nor from actual flash media failure, but from firmware bugs or, occasionally, mechanical failure.

(I've had very small sample size of SSDs not being powered up for a few years, but the few I have powered up after not powering on for years have had all their data retained. Ask me in another 15 years or so, though.)




Part of what you count as firmware bugs may have been cases of data/firmware corruption caused by unexpected power loss.


That's a firmware bug, then.

I've written a few flash storage systems (small ones, for consumer devices). If your data structures are such that you can't do an atomic commit, you shouldn't be in the marketplace.

Sudden loss of power is not an excuse.


There was some widely circulated study few years ago which found that most SSD models they tested had problems on power loss. Some even bricked. They didn't publish concrete names, unfortunately.

But yeah, I agree it's a bug and shouldn't happen. So what :)

edit:

There we go: https://www.usenix.org/system/files/conference/fast13/fast13...

Some more: http://lkcl.net/reports/ssd_analysis.html

Note that the 320 recommended by the second article had similar issues, supposedly fixed in firmware 4PC10362, but some users reported problems even on that version: https://communities.intel.com/thread/24339?start=15 (many of those complaints look like different problems, but not all).


I had corruption issues with an OCZ drive on a machine with unstable power. Since adding a UPS the drive has been rock solid.


We had corruption / sudden death issues with OCZ even on machines with redundant power supplies. I do not recommend them.


My "favorite" were the OCZ Vertex 1 drives which almost certainly had issues with the firmware eating paste and overwriting some internal metadata - periodically the drives would start throwing read/write errors on swathes of LBAs, and the only way to mitigate this would be to throw the jumper on to put the drive in "recovery" mode and do a destructive firmware reflash that blew away all the contents.

And then it would work perfectly well again...for a while.


I agree with the downstream commenter about this still qualifying as a FW bug, but to be more clear:

* there's the story I posted about the OCZ V1 drives eating paste periodically while behind redundant power.

* there was a fun problem of some OCZ Deneva 2 drives deciding that they would simply drop off the port and not come back without a cold power cycle, behind redundant power (and a related story about the drives being in such a state, and the AHCI init code on the BIOS getting confused and hanging forever, which was what actually necessitated the cold cycle).

* there was the really fun case someone I worked with came up with where they found a way to reproducibly brick a particular family of SSDs using only about a TB of IO on a 480 GB SSD. (And by "brick", I mean "call the manufacturer, it's not coming back.")

* there was the SandForce bug once upon a time which caused the affected SSDs to not show up correctly as disks under Windows when on SATL because the firmware reported an empty "version" field, and while all the other OSes tested were content with this, and Windows when not using SATL was content with this, Windows when using SATL would report the disk as an "Unknown Device" for this.


> mechanical failure.

How does that happen?


I had my first computer fire, last year. One of the contacts in an SSD power connector delaminated, causing a short. The SSD was toast. Or at least, the power connector had melted. The power supply was toast. But with a new power supply, the machine itself is fine :)


Crushing the SSD disk or whole laptop :)


Connectors are the weak point?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: