> This is because most of the non-technical retail customers don't value reliability too much. … Similarly, many users don't have much valuable, unique data on their computers. The most important stuff very often lives completely in the cloud, where it's properly backed up, etc.
There’s a contradiction here which I think is worth fully considering. People definitely do value their data – data recovery services have been getting revenue for decades from people who lost things they care about – but there’s been a market failure where operating system vendors left the users to fend for themselves, everyone agreed that was hard, and the majority of users quite rationally decided that cloud services were better than learning to be unpaid sysadmins.
What we should be asking is why this is all so hard. Local file systems with strong integrity checks and seamless replication onto attached storage should be so basic that people can just assume it’ll work without them having to spend time on it.
> but there’s been a market failure where operating system vendors left the users to fend for themselves
I don't think this is an accident. These companies want you to hand all your data over to them so they can mine it for data, increase your reliance on them, and/or on paying them reoccurring fees to backup your precious files. They have a real financial inventive to make protecting your data without handing it over them very difficult, just like hard drive manufacturers have an incentive to sell you garbage drives you'll need to frequently replace.
No contradiction. There of course are numerous people who value the data on their desktops and laptops a lot. But they are not the majority. Even those who sort of value their data which they spent some effort producing still would rather take chances than overcome the friction of setting up a reliable backup.
This is what makes automatic backups so important: people keep taking chances until something bad happens, and then they re very relieved when they find out that their work has been backed up somewhere. The same thing applies to cloud apps: they relieve people from the burden of caring about their data's safety and integrity. The iCloud backup that allows you to restore your lost phone down to the state of draft "unsaved" texts is the gold standard here.
> Local file systems with strong integrity checks and seamless replication onto attached storage should be so basic that people can just assume it’ll work without them having to spend time on it.
What you're asking for is basically a bug-free btrfs plus a bit of automation on top (unless you also want a smart caching layer, which we already know is a Hard Problem). That automation is simple enough that it would definitely exist by now if there was a bug-free btrfs for it to manage. But that turns out to be a hard problem, so anyone who wants to run their system with automatic add and remove and rebalance is looking for trouble. (ZFS has a better reputation on the reliability front, but is too inflexible for a consumer-grade pooling of arbitrary storage devices.)
> What you're asking for is basically a bug-free btrfs plus a bit of automation on top
Apple was this close around Mac OS X 10.5. Time Machine is more-or-less a perfect backup solution for most people, and if they had ended up using ZFS rather than just testing it in a Server release, your average Mac user would have automatic block-level file hashing and verification (with background scanning and auto-recovery from backups with Time Machine, presumably) which is good enough for the vast majority of people.
APFS is an improvement on HFS+ but ZFS would’ve been much better. FileVault (2) may have complicated matters but who knows, maybe we would’ve seen sped up native encryption in ZFS with support from Apple.
ZFS was never going to have good support for adding and removing drives of arbitrary size. It's a filesystem for people who buy their drives by the dozen.
It works completely fine with a single drive - just as well as any other filesystem, if not better. As a single root/boot drive’s fs, it’s far batter than any of the “standard” alternatives (HFS+, NTFS, EXT#, APFS). No you can’t pool arbitrarily sized drives, but you can’t with any of those either. And yes, you lose data if your single drive fails, but you also do with any of those.
I feel like you're trying to get into an argument about which filesystem is better, rather than simply acknowledge that ZFS doesn't actually solve the problem under discussion.
I disagree. ZFS + backup autorecovery solves 99.9%+ of bitrot cases. Time Machine already is the easiest to use fully automatic backup software, ZFS already notices bitrot (and corrects for it in mirror/raidz configurations, but that’s neither here nor there). It’s not a hard logical (or technical) leap to assume that had Apple moved to ZFS, it would leverage Time Machine backups to restore corrupted files automatically. This does actually solve the problem under discussion (or at least, improves it for the average person).
Without ZFS (or some other automatic checksumming), Time Machine (or any other automatic backup solution of fixed size) backups with a good copy of data will eventually be updated with new corrupt copies of the data, and then aged out when the backup target runs out of space. The solution doesn’t have to be ZFS; my point was that it could’ve been, and very nearly was.
Right - when I heard the ZFS announcement at Usenix I assumed it’d become standard a decade later when consumer hardware could handle the extra work, but it seems to have been sandbagged by the rise of cloud storage and the switch to SSDs reducing the frequency where people get corrupted data instead of all-or-nothing failures.
Apple claims that their hardware makes it unnecessary for APFS but I’m skeptical and the last time I looked the Time Machine situation was still pretty clunky. I haven’t checked Windows recently since I only use Linux and macOS.
There’s a contradiction here which I think is worth fully considering. People definitely do value their data – data recovery services have been getting revenue for decades from people who lost things they care about – but there’s been a market failure where operating system vendors left the users to fend for themselves, everyone agreed that was hard, and the majority of users quite rationally decided that cloud services were better than learning to be unpaid sysadmins.
What we should be asking is why this is all so hard. Local file systems with strong integrity checks and seamless replication onto attached storage should be so basic that people can just assume it’ll work without them having to spend time on it.