It's kinda mind-blowing that we have (so-called) AI, quantum computing, 6K screens, M2 NVME, billions of networked devices, etc., but regular data *can only be expected to last about 5 years* due to the propensity of moving disk failure, SSD impermanence, bitrot, etc., and is only overcome with great attention and significant cost (continually maintaining a JBOD or RAID or NAS, or painstakingly burning to M-Disc bluray etc.) or handing it over to someone else to manage (cloud) or both. I mean maybe you get lucky with a simple 3-2-1 but maybe you don't, and for larger archives of data that is simply not necessarily a walk in the park either.
I'd like to expand. What I find mindblowing about it is that, as a regular consumer:
* When you need more space you can't just plug in another disk or USB stick. You also have to choose on which device you want to use it, and you have to tell all your software to use it. And that may involve shuffling data around.
* As a corollary, you need to remember in which device you put which stuff.
* As an extra corollary, any data loss is catastrophic by default.
* File copy operations still fail, and when they fail, they do so without ACID-strong commit/fallback semantics.
* Backups don't happen by default, and are not transparent to the end user.
* Data corruption can be silent.
Bonus, but related:
* You can't share arbitrary files with people without going through a 3rd party.
This is because most of the non-technical retail customers don't value reliability too much. They want more capacity and good speed at the lowest price point. Hence the proliferation of QLC / MLC NVMe drives with a small SLC write cache. It's not very stable, and it can only keep the write speed high for small files, but hey, you can get a terabyte for $50, and it loads the latest game real fast!
Similarly, many users don't have much valuable, unique data on their computers. The most important stuff very often lives completely in the cloud, where it's properly backed up, etc.
Also, the most ubiquitous computing device now is a smartphone. It has all the automatic backup stuff built in, you can put an SD card into it, and it will transparently extend the free space, without hassle. Even on PCs, MS and Apple nudge the users very prominently to use OneDrive and Apple Cloud for backing up their desktops / laptops. But past certain size, it costs money though, hence many people opt out of that. Again, because most people value the lowest price, and just hope for the best; because what could ever happen?
Silent data corruption can still be an issue, but, frankly, malware is a much bigger threat for a typical non-technical user.
Technical people have no trouble setting up all that: mount your disks under LVM, run ZFS on top of it, set up multiple backups, set up their own "magic wormhole" to share files with stranger easily. But they know why they are doing that.
Educating users about IT hygiene is key for improving it, much like educating people about the dangers of not washing hands, or of eating unhealthy stuff, helped improve their health.
> This is because most of the non-technical retail customers don't value reliability too much. … Similarly, many users don't have much valuable, unique data on their computers. The most important stuff very often lives completely in the cloud, where it's properly backed up, etc.
There’s a contradiction here which I think is worth fully considering. People definitely do value their data – data recovery services have been getting revenue for decades from people who lost things they care about – but there’s been a market failure where operating system vendors left the users to fend for themselves, everyone agreed that was hard, and the majority of users quite rationally decided that cloud services were better than learning to be unpaid sysadmins.
What we should be asking is why this is all so hard. Local file systems with strong integrity checks and seamless replication onto attached storage should be so basic that people can just assume it’ll work without them having to spend time on it.
> but there’s been a market failure where operating system vendors left the users to fend for themselves
I don't think this is an accident. These companies want you to hand all your data over to them so they can mine it for data, increase your reliance on them, and/or on paying them reoccurring fees to backup your precious files. They have a real financial inventive to make protecting your data without handing it over them very difficult, just like hard drive manufacturers have an incentive to sell you garbage drives you'll need to frequently replace.
No contradiction. There of course are numerous people who value the data on their desktops and laptops a lot. But they are not the majority. Even those who sort of value their data which they spent some effort producing still would rather take chances than overcome the friction of setting up a reliable backup.
This is what makes automatic backups so important: people keep taking chances until something bad happens, and then they re very relieved when they find out that their work has been backed up somewhere. The same thing applies to cloud apps: they relieve people from the burden of caring about their data's safety and integrity. The iCloud backup that allows you to restore your lost phone down to the state of draft "unsaved" texts is the gold standard here.
> Local file systems with strong integrity checks and seamless replication onto attached storage should be so basic that people can just assume it’ll work without them having to spend time on it.
What you're asking for is basically a bug-free btrfs plus a bit of automation on top (unless you also want a smart caching layer, which we already know is a Hard Problem). That automation is simple enough that it would definitely exist by now if there was a bug-free btrfs for it to manage. But that turns out to be a hard problem, so anyone who wants to run their system with automatic add and remove and rebalance is looking for trouble. (ZFS has a better reputation on the reliability front, but is too inflexible for a consumer-grade pooling of arbitrary storage devices.)
> What you're asking for is basically a bug-free btrfs plus a bit of automation on top
Apple was this close around Mac OS X 10.5. Time Machine is more-or-less a perfect backup solution for most people, and if they had ended up using ZFS rather than just testing it in a Server release, your average Mac user would have automatic block-level file hashing and verification (with background scanning and auto-recovery from backups with Time Machine, presumably) which is good enough for the vast majority of people.
APFS is an improvement on HFS+ but ZFS would’ve been much better. FileVault (2) may have complicated matters but who knows, maybe we would’ve seen sped up native encryption in ZFS with support from Apple.
ZFS was never going to have good support for adding and removing drives of arbitrary size. It's a filesystem for people who buy their drives by the dozen.
It works completely fine with a single drive - just as well as any other filesystem, if not better. As a single root/boot drive’s fs, it’s far batter than any of the “standard” alternatives (HFS+, NTFS, EXT#, APFS). No you can’t pool arbitrarily sized drives, but you can’t with any of those either. And yes, you lose data if your single drive fails, but you also do with any of those.
I feel like you're trying to get into an argument about which filesystem is better, rather than simply acknowledge that ZFS doesn't actually solve the problem under discussion.
I disagree. ZFS + backup autorecovery solves 99.9%+ of bitrot cases. Time Machine already is the easiest to use fully automatic backup software, ZFS already notices bitrot (and corrects for it in mirror/raidz configurations, but that’s neither here nor there). It’s not a hard logical (or technical) leap to assume that had Apple moved to ZFS, it would leverage Time Machine backups to restore corrupted files automatically. This does actually solve the problem under discussion (or at least, improves it for the average person).
Without ZFS (or some other automatic checksumming), Time Machine (or any other automatic backup solution of fixed size) backups with a good copy of data will eventually be updated with new corrupt copies of the data, and then aged out when the backup target runs out of space. The solution doesn’t have to be ZFS; my point was that it could’ve been, and very nearly was.
Right - when I heard the ZFS announcement at Usenix I assumed it’d become standard a decade later when consumer hardware could handle the extra work, but it seems to have been sandbagged by the rise of cloud storage and the switch to SSDs reducing the frequency where people get corrupted data instead of all-or-nothing failures.
Apple claims that their hardware makes it unnecessary for APFS but I’m skeptical and the last time I looked the Time Machine situation was still pretty clunky. I haven’t checked Windows recently since I only use Linux and macOS.
And without care. Everyone's precious photo albums will go with them to the grave. We live in a time where the books and data we write are likely to vanish long before any previous period. We don't leave behind books, codexes, or scrolls, or stone tablets anymore.
Scholars will have more clue about life in 4th century via the Oxyrhynchus Papyri collection than 21st century Terra Haute, Indiana.
> And without care. Everyone's precious photo albums will go with them to the grave.
If you're in the Apple ecosystem, please set up a recovery key so that your loved ones can access your photos and other things in case something unexpected should happen:
More generally: open Notepad (or whatever) and put the URL, username, and password for e-mail† account and bank/credit card accounts (and phone PIN), print it out, and put the paper in a sealed envelope and show its ___location (on top of fridge?) to a trusted friend/family.
† If someone can access your e-mail / SMS, they can probably reset any other account(s).
This needs to be published by everyone who has a blog and shared with friends and family everywhere.
I know first hand the mess from my friend who died of cancer at 38 leaving behind a wife and daughter with not only only accounts being locked out but a sprawling home lab that they did not understand. Even things like smart lights became a stumbling block when they could not control them. The router was a virtual machine on a rack mounted server in the basement. Had to unwind a lot.
>Even things like smart lights became a stumbling block when they could not control them. The router was a virtual machine on a rack mounted server in the basement. Had to unwind a lot.
A possibly unpopular opinion is that people should think long and hard about things like smart lights so that they can have cool mood lighting or whatever. I'm needing to have a lot of electrical redone because of a kitchen fire--microwave offed itself in the middle of the night, scary--and I told my electrician "no smart stuff." Which he was totally on-board with. I honestly think it's pretty much a gimmick for most people.
Or even free with a checking account. Depends on the branch. I think the right strategy is to store the high level passwords that do not change much and then an annual backup archive of all the family photos, tax returns, and most important documents on a hard drive in the box.
> A safe deposit box isn't expensive and can store a hard copy of your passwords and other personal information.
It may be necessary for other people to get 'authorized' to get access to it.
If you have access to one, it's certainly useful in case something happens to you residence, but one should also probably have access to something more convenient (especially in the case of incapacitance).
Although fewer bank branches have them and passwords change/get added. And anything that requires manual effort to update often won't get done. Heck, updating my tech info at home has been on my to-do list for at least a year.
Absolutely! Cell phone are a great example. Most people love taking pics, but rarely give much thought about storing them for the future. Why? It's kind of difficult, from the angle of categorizing the subject matter. Most of us on this website, I'm guessing, enjoy this sort of activity. But I'm guessing most people would rather go to the dentist then sort through photos and decide the best place to put them, both folders and storage.
> But I'm guessing most people would rather go to the dentist then sort through photos and decide the best place to put them, both folders and storage.
Most people tend to love going through their old photos, thinking of the people and places depicted, and sorting them. For an extremely long time this was entirely the norm and it took place mostly using large cumbersome physical books where meta data was recorded by hand by writing on the back of the photo.
It's become much easier now that files are digital, only slightly offset by the fact that digital has enabled us to take many more photos than in the past. What I see more often are people who have no idea when the photos on their phones are actually stored at all (what's a file system?), let alone how to copy them off their phone and onto something more appropriate besides resorting to something like emailing photos to themselves which has all the problems of adding and successfully transferring large file attachments.
It's a chore because companies really don't want people to have access to their own files unless it's on the company's terms and using their cloud servers.
I don't think that's true really. Yes most of the data recorded today will be lost but there is just so much being recorded that what remains is still much more than what we have of previous centuries. What we have of the 4th century is limited to a subset of what people specifically chose to write down. What will remain of this century includes that plus much more incidental information like the backdrop of a selfie that could turn out much more interesting to future historians than what you think is important now.
> Silent data corruption can still be an issue, but, frankly, malware is a much bigger threat for a typical non-technical user.
Complete BS. Cryptolockers and similar are not a concern for regular users while disk failure is something pretty much everyone gets to experience over their lifetime.
Well, professionally, tape is it - there is technology and it lasts more than 5 years. Unfortunately, the market for tape has evolved such that it's not very friendly to the non-pros. Not impossible but not friendly. That probably has to do with the lack of perceived market for that among non-corporate - or perhaps the impression that clown storage is where it's at for non-corporate.
To be fair some more, JBOD/RAID and hard drives does work pretty well. Past the 5 year horizon to be sure.
Product mgt and corp finance has also fallen in love with subscriptions - and clown storage is such an awesome match for that! Who needs to sell long term terabyte solutions when you can rent it out. Easy to argue against that logic of course, but not easy to fight.
I have an LTO-6 tape drive. It works fine, but it is a pain in the ass to set up on Linux. It only connects via SAS, you need to load a lot of arcane kernel modules, the logs are non-standardized and often misleading, and the entire interface is command line based.
I don't mind living in the command line and I don't even mind fighting to get everything up, but I don't see most people putting up with it. It's also a huge pain to get working with a laptop, since I don't think most laptops have a SAS connector, so you have to use an eGPU case with a Host Bus Adapter, which is its own share of headaches.
Hehe. If you're trying to live with just a laptop and an LTO drive, I can certainly see it! I'd expect most people who get into LTO drives have a massive set of hard drives, some convoluted desktop-and-up case(s) to run them, a couple laptops and random peripherals such as cameras, scanners and whatnot. USB all over the place. - So that for them both the drive and the command line interface are deeper in a technology map.
But that goes back to the minimal market for LTO among amateurs. They will scratch their itch and write software for it but it's not exactly critical mass.
To keep those drives happy and minimize wear, you need to push data at a constant rate so the tape can move continuously while writing. You'll need to push a sustained stream of up to 400 MBps (for LTO-9) to the drive (and, therefore, read that from the sources). A common pattern is to send files for archival to a disk volume and dump from there to tape. To fill an LTO-9 tape, expect to set up from 20 to 50TB of fast storage. I would assume a RAID-0 of large-ish HDDs with an SSD cache in front can handle it. I'd prefer a data-center grade SSD with the full size of the batch you want to record. Nothing forces you to fill the tape though.
If your datasets are in the 100GB range, Blu-ray archival might be a better option. The disks are supposed to last 100 years if well stored and the drivers are much, much, much cheaper.
Also a LTO-9 tape can store up to 45TB of uncompressed data (18 TB uncompressed), so you better have a fast storage volume ready with your data when you’re writing your next tape.
The one thing worth remembering though is that you are not forced to write all of the 18TB of data, so your staging drive can be significantly smaller, but considering the tape alone costs about half of what a large hard disk costs (at least at my local office supplies provider), unless you are using more than 50% of the capacity, you'll be better off by writing it to a HDD (at least for backup purposes).
Long-term archival, however, is a different thing and tape is a much better medium for that.
It's a shame DOTS [1] doesn't seem to be going anywhere...
I'm seeing LTO-9 tapes (with 18TB of uncompressed capacity) starting at 85€ whereas 18TB HDDs start at 309€, so here the price difference is at a factor of 3.6x
I used to have a rack mounted server that I used for Jellyfin and as a NAS so it was easy to install a SAS card, but it was sucking up way too much power and costing me about $50/month to run, so I got rid of it, and replaced it with one of those tiny gaming PCs you can get on Amazon.
I've gotten it to work over thunderbolt with an eGPU case on both the mini computer and my laptop, so it can be done, at least with NixOS.
I still have the tapes, and I use them to archive really important stuff, but honestly what I mostly do now is have a pretty resilient ZFS RAID, which is 24 16TB drives, three RAID-Z2's chained together, so I can lose up to six drives (two from each RAID) before I risk losing data. I run a scrub once a month, so if a drive dies, hopefully I can catch it before I lose anything.
Nobody needs to worry about storage density in the sense of how much shelf space you need for cold storage. Tape, NAND flash, and even high-capacity hard drives are all dense enough that the incremental cost of more shelf space is not the most important part of the problem.
And NAND flash is already pretty 3D, with hundreds of layers of memory cells fabricated on the surface of each wafer, and several such dies stacked in each BGA package, and it's not uncommon for U.2 drives to contain two PCBs each with NAND packages on each side.
Tape is a way to get volume. Through easily spooled layers. Not fast access but otherwise not unreasonable. Plus tapes stored outside of the drive.
Another way to look at it: "and yet here we are". Tape is still what has led to the highest density so far (depending how much storage you look at, and cost tradeoff). But also cost tradeoff of separating the drive from the medium. which hasn't really worked: LTO tapes are pro supplies - and so, expensive.
Yes, it is. I actually have been mulling over a fictional world set in the future where the period between the 20th and 25th centuries is a mysterious time that so little is known about. The story follows a professor who is obsessed with the "Bit Rot Era" and finding out just what happened to that civilization.
I have a prototype first chapter written that cold opens with an archeological dig '...John Li Wei looked up from his field journal where he had just written “No artifacts found in Basement Level 1, Site 46-012-0023”, wiping sweat from his brow. "Did you find something, Arnold?" he asked, his voice weary. "Three days in this godforsaken jungle, and we've got nothing but mud to show for it. Every book in this library’s long since turned to muck.”
Arnold gestured towards the section of the site he had been laboring for the last 30 minutes, digging through layer after layer of brown muck, with fragments of metal hardware that once supported shelving. A glint of metal caught the filtered light.
“Arnold, that’s just another computer case,” John sighed, his shoulders slumping slightly. He could already imagine the corroded metal and the disintegrated components inside. Useless. “Help me pull this out.”
The two men strained against the clinging earth, their boots sinking into the mud with each heave. As they finally wrestled the heavy, corroded metal case free, a piercing shriek cut through the jungle sounds – beep, beep, beep, beep....'
I suppose I can publish on by blog and then keep on writing. Can discover this future together along with anyone who wants to read it. This is my first serious attempt at fiction. I have only just begun.
This is another portion of later in the first chapter.
'...The transport, a ground-based vehicle, levitated silently outside. John glanced at his chronometer as he boarded. Jeg er ked af det, he thought, I'm late. The doors hissed shut, and the vehicle computer announced, “Destination: University. Estimated arrival: 25 minutes.” With a gentle hum, the vehicle glided smoothly along the elevated guideway. The air inside was cool and faintly metallic. Outside, the landscape was a patchwork of green fields, managed forests, and gleaming white research facilities. The transport’s progress was slow; the gentle sway and hum of the engines were a constant reminder of the NAU’s strict energy policies. John sighed, thinking about the upcoming lecture. How could he convey the importance of the Bit Rot Era when so many were focused on the pressing needs of the present?
At the University, John rushed into the classroom, three students already waiting. “Jeg er ked af det, I’m late,” John said, a slight Danish accent coloring his Danglish. The classroom was intimate, a very different design from the great lecture halls of antiquity, designed for perhaps twenty students, with a central platform surrounded by holographic cameras for remote attendees. Historical maps and timelines adorned the interactive displays lining the walls.
John quickly moved to the lectern and carefully removed six artifacts from his bag. “Velkommen to Ancient North America 1,” he began. “Welcome to Ancient North America 1. This is the class where you are going to learn about the past that was and the present that may yet be.” He gestured to a holographic timeline that appeared above the lectern. “In broad strokes, we consider the history of Ancient North America to consist of four periods: the pre-colonial, the rise of the nations – of Canada, the United States of America, and Mexico – the pre-Collapse, and the post-Collapse periods. You can take in-depth courses on most of these. Dr. Jones’ ‘Rise of the Nations’ is well worth your study to learn about the United States, its Constitution, and Canada. This very university is located in a place that was once called Newfoundland, Canada, and in ancient times the climate was quite harsh. Meget forskelligt from the lush green farmland, forests, and even nice beaches we have today. You can also take Dr. Pech’s history of the pre-colonial tribes and empires. That will teach you a great deal about the people who first inhabited the continent and learn about their history, culture, foodstuffs, many of which are still considered extinct. Desværre.
“However,” John continued, his tone shifting, “the class you cannot take is for what we call the Bit Rot Era. A ‘bit’… imagine a light switch. On or off. That’s a bit. One or zero. The basic unit of digital information. The Bit Rot Era, from roughly the 20th to the 25th century, is a complete black box. En sort boks. What we do know comes from fragments of writing on paper. Millions of books were printed, but even those are often lost to time. We have fragments of ancient texts that talk about computers in every home and the ‘digitization’ of information and libraries. Digitization was the process of scanning physical media into computers. We’ve recovered millions of artifacts – disintegrated polycarbonates, silica, and bits of rare metals that were once these computers. But nothing within survived. Then something happened. The books vanished. The period from about the 21st to the 25th centuries was, as it were, expunged from history. Research indicates they went entirely digital, depending on system administrators to maintain the data… until the administrators stopped.”...'
I often think we're living in a dark age (an age that is characterized by little surviving cultural output)... Dpends on how thing's will go, of course, but I ain't holding my breath.
An important reason to continue buying printed books and supporting print media.
Most web content I've consumed in my lifetime is already lost, many floppy disks and burned CD-ROMs I held onto are now unreadable, and in 200 years, the situation will only be worse.
But I can go to the British Library and read a 1000+ year old text without much difficulty.
FWIW does any of it remain on archive.org? They have some really great collections on there, and writing on digital archiving, like how to preserve CDs as best as possible (spoiler: the brand of CD is a large factor).
That's well and good for the _current_ ability to potentially find some old media, but regarding GP's point about digital data being fickle over long time-spans, Archive.org is only one hack/fire/solar-flare/DDOS/etc. away from being gone (potentially forever). They were recently hacked, thankfully the group who claimed responsibility weren't trying to destroy data.
> A single copy of the Internet Archive library collection occupies 145+ Petabytes of server space -- and we store at least 2 copies of everything).
Thankfully they have a few backups around the world, but there are still plenty of plausible events which could render the data useless. In contrast, physical mediums like books, vinyl records or microfilm/microfiche can last for hundreds of years.
But notably, during the 1990s there was a concerted push to switch books to alkaline paper.
A lot of old printed books were done on acidic paper, causing them to more easily disintegrate. Disintegration of acidic books can mitigated by brushing alkaline powder between each page, and by keeping very low humidity.
The switch to alkaline paper substantially changes the smell, much to some readers' dismay. But be sure to look for alkaline prints, when available.
In some ways it is surprising, but the examples you gave are only currently straightforward because of massive investment over many years by thousands of people. If you wanted to build chatGPT from scratch I'm sure it would be pretty hard, so it doesn't seem so unreasonable that you might pay someone if you care about keeping your data around for extended periods of time.
"Rather poor" is putting it mildly. This sent me down a sort rabbit hole. From a Stack Exchange discussion[0] it was a short trip to exceedingly technical discussion about using QAM encoding[1] to really beef-up the storage capability.
With the wacky QAM encoding tt looks like maybe 20MB per C90 cassette (and 90 minutes to "read" it back).
It's interesting. I wouldn't dare to go beyond a 1500 baud signaling rate, but, then, audio tape is amenable to QAM, and that could multiply the transmission to 16 bits per token or more depending on the quality of the tape and recorder.
I would be careful with that, however. If you are archiving your data, it's because you like it and, if you like your data, you want it to be readable a long time from now. I'd suggest vinyl records rather than tapes, as they are very robust, and can be read without physical contact.
I have a hundred or so such tapes that contain Commodore PET programs from the early 1980s. Last time I tried to read them (about 10 years ago, when they were about 35 years old), I had...mixed results.
Part of that may be the tape drive (about 40 years old) but the reality is that consumer level cassette tapes aren't built to last: magnetic fields weaken, coating flakes off, tape stretches, and other factors prevent these from being storage solutions beyond 10-20 years (my guess), if that.
They might be a fun nostalgic diversion for listening to old music, where the audio degradation is part of the experience, but for data, they're a non-starter in my book.
Somewhat related, I think there were some projects to use VHS video cassettes for data storage too. It was much better than C cassettes, but still a very far cry from what one would consider worthwhile these days. IIRC a couple of GB per cassette?
LTO tape tech has gotten into pretty nutty territory - in order to achieve its density and speed. It wasn't "easy". So, so far away from C90 technology.
I share your disappointment. I explained it for myself with this: nobody cares if we the netizens have our data backed up. The corps want it for themselves and they face zero accountability if they lose it or share it illegally with others.
So it's up to us really. I have a fairly OK setup, one copy on a local machine and several encrypted compressed copies in the cloud. It's not bulletproof but has saved my neck twice now, so can't complain. It's also manual...
We the techies in general are dragging our feet on this though. We should have commoditized this stuff a decade ago because it's blindingly obvious the corps don't want to do it (not for free and with the quality we can do it anyway). Should have done app installers for all 3 major OS-es, zero-interaction-required unattended auto-updates -- make is to grandma would never know it's there and it's working. The only thing it asks is access to your cloud storage accounts and it decides automatically what goes where and how (kind of like disk RAID setups I suppose).
I think in cases like this "personal computer" is a blessing and a curse. It seems like most of the big parties involved in computing with the big levers to move things are mostly in it for themselves and pick and choose whether they shoulder responsibility for a certain feature, or which side of 'personal' it comes down on with regards to if they hook it into their services and privacy. What would the attitudes be to losing physical information or property be for other parts of our lives versus digital info/property, if that was media, references, financial details, property deeds, old records, etc.
While I do appreciate the generosity in how many projects make themselves available (free or otherwise), it does seem like they can have a narrow focus where they solve the challenge they had to solve, but aren't interested going past that point. There's logical reasons for why that happens, but there's unfulfilled potential to make personal computing a better environment there.
> it does seem like they can have a narrow focus where they solve the challenge they had to solve, but aren't interested going past that point. There's logical reasons for why that happens, but there's unfulfilled potential to make personal computing a better environment there.
Oh absolutely, I agree so much with this. I get it, we can get obsessed with UNIX philosophy at times, one tool does its job perfectly etc. but we don't even make an attempt to assemble several things into one good cohesive whole.
Sadly I am way too swamped with being out of a job and chronically ill but I just started becoming extremely bitter towards many more privileged techies who seem much more interested in farting out the 2489th LISP interpreter than solve a real world problem. :|
Since we already have excellent tools that follow the UNIX philosophy then I would opt for being a LEGO assembly guy by combining these:
- Management of backups themselves: borg / restic (or rustic) / duplicati / duplicacy etc. Last I evaluated them borg offered the most in one package. I like restic a bit more but it's a fair bit slower than borg, sadly. `rustic` aims to fix that but it's a volunteer effort and the author has bursts of productivity that are not on a stable schedule. I use rustic as a backup of my main backup tool (borg) but for now dare not use it exclusively.
- Management of keys / secrets / passwords: my semi-vulnerable setup is to just upload my borg repo key to my Linux server and also have it pinned in one of my private Telegram channels. That one could be done better but I haven't done research. Probably something close to gpg (many modern alternatives with less friction but I forget names; I have them bookmarked though) to make sure even your keys are properly protected. One possibility is a password vault like keepass[x] or Enpass or others.
- Management of storage: I use a local Linux server (EDIT: I have a very non-redundant and basic ZFS setup) and 5+ cloud storage services on the free tier. All of them have at least 5GB and my borg repository is barely 150MB I think. That includes ~20 past backups.
The way these could be combined is mostly by having a top-notch GUI and CLI (so we serve different kinds of people) that allows for granting access to cloud storage servers, asks optionally for a local storage server (NFS, Samba, WebDAV etc.) and just does everything else by itself. It's very doable.
Regarding the first two points, maybe Kopia [0] come close. It has both GUI and a CLI. For the GUI, it saves your backup key for you (although I have to admit I didn't check how much securely stored it is), but you still have to keep a copy yourself in a password manager or similar in case you need to access your backup from some other machine. AFAIK, for the CLI you are completely on your own regarding secrets management. But it's also true that the average user doesn't have servers to backup, so the GUI would be fine.
Regarding the management of storage, what do you mean? Kopia already supports different "storage backends" out of the box which can be configured via the GUI. Do you mean that you would like it to be able to merge the storage of different storage providers, so that you can use multiple free tiers to get the storage space you need?
Now you just need to find a drive that still works in 40 years, and can connect to a computer you have in 40 years.
LTO drives will only read media from 1 generation off the specified version (LTO 9 will read LTO 8 tapes, not LTO-7)
So you will need a 40 year old drive by then... good luck finding one that works.
If you really need to read data from a 40 year old tape you must also store a computer able to read the tape. Most companies with large LTO archives migrate tape from generation to generation, sometimes skipping one. That is why LTO drives can read/write back one gen while they can only read 2 gens back. BUT LTO is the most common tape format by a LOT so it is pretty likely that even 40 years from today there will be drives that can read LTO7,8,9
I've thought about the "hundreds of years" problem on and off for a while (for some yet to be determined future time capsule project), and I figure that about all we know for sure that will work is:
- engraved/stamped into a material (stone tablets, Edison cylinders, shellac 78s, vinyl, voyager golden record(maybe))
I actually looked into what it might take to "print" an archival grade microfilm somewhat recently - there might be a couple options to send out and have one made but 99.99% of all the results are to go the other way, scanning microfilm to make digital copies. This is all at the hobbyist grade cheapness scale mind you, but it seems weird that a pencil drawing I did in 2nd grade has a better chance of lasting a few hundred years than any of my digital stuff.
Not weird at all that a piece of paper you wrote 20 years ago which has like 5-10KB of info that you can decode without any tech can stand the test of time. Archives are hard because of scale, env factors etc.
you could buy 400 TB worth of hard drives. Overall I'd have more confidence in the produced-in-volume hard drives compared to LTO tapes which have sometimes disappeared from the market because vendors were having patent wars. Personally I've also had really bad experiences with tapes, going back to my TRS-80 Color Computer which was terribly unreliable, getting a Suntape with nothing at zeros on it, when the computer center at NMT ended my account, the "successful" recovery of a lost configuration from a tape robot in 18 hours (reconstructed it manually long before then), ...
My day job (company died a couple of weeks ago). We had > 100,000 LTO tapes in the end. With data archived way back in 2002 until present. We were still regularly restoring data. In our busiest years we were doing what averaged to 177 restores per day (365 days a year). Barely any physically destroyed tapes.
I see a few articles citing robotic failures as a big issue, but really someone can just place a tape in the robot if critical recovery is needed and the robot has died.
Tape is reliable and suitable for long term archiving, but it still needs care and feeding.
Having some kind of parity data recorded so losing a single tape does not result in data loss, routine testing and replacement of failing tapes, and a plan to migrate to denser media every x years are all considerations.
Spinning rust just feels simple because the abstractions we use are built on top of a substrate that assumes individual drive (or shelf) failure. Everybody knows that if you use hard drives you'll need people to go around and replace failing hardware for the entire lifetime of the data.
There's a biiiiiiiiiig asterisk on all tape storage, about temperature and humidity. It's not like paper that you can leave in an attic for a century and still find readable.
People restoring old tapes right now have to do all sorts of exotic things with solvents to remove the mildew and baking the tapes to make the emulsion not immediately fall off the substrate, etc. I have to imagine that at today's density, any such treatment would be much worse for the data.
So those tapes are only as immortal as their HVAC. One hot humid summer in the wrong kind of warehouse may be it.
Similarly, I worked at a place where, before I joined, a system upgrade gone wrong had caused the retrieval of backup tapes stored in a metal safe, where the safe's temperature had been below the dew-point. Neither the tape cases nor the safe were sealable against moisture. This meant they had no backups of data they were required to retain for five years. And of course, the person who attempted the upgrade resigned.
That little note half way through this that said "The Svalbard archipelago is where I spent the summer of 1969 doing a geological survey" made me want to know more about the author - and WOW they have had a fascinating career: https://blog.dshr.org/p/blog-page.html
If you're using cloud storage for backups, don't forget to turn on Object Lock. This isn't as good as offline storage, but it's a lot better than R/W media.
At work we've been using restic to back up to B2. Restic does a deduplicating backup, every time, so there's no difference between a "full" and an "incremental" backup.
I wish tape archival was easier to get into. But because it's niche and mainly enterprise, drives usually start in the multiple thousands of dollar range unless you go way down in capacity to less than a modern SSD.
I'm not sure you can distinguish those.
It is IBM, and IBM has a preference for who its customers are.
So do enterprises, who like the sound of "no one ever got fired for..."
And it's also because the market is pretty small (at least in terms of sites) - there's just not that much total accessible market for any competitor.
there are a couple tape makers, regardless of how many companies rebadge the product.
afaik, there are only 2-3 drive makers too.
but don't forget that tape doesn't make much sense (in its market) without the robotic library. there might be some off-brands that sell small libraries, but the big ones are, afaik, dominated by IBM.
I'm just talking about single drive units you manually swap tapes on. That would still make an excellent long term cold backup for me even if I did have to swap the tape s once a week but they're still 4k+ unless I go all the way down to LTO-5 tapes that are just ~1.5TB which could be good enough for critical things but not really helpful for backing up everything.
The 3-2-1 data protection strategy recommends having three copies of your data, stored on two different types of media, with one copy kept off-site.
I keep critical data mirrored on SSDs because I don't trust spinning rust, then I have multiple Blu-ray copies of the most static data (pics/video). Everything is spread across multiple locations at family members.
The reason for Blu-ray is to protect against geomagnetic storms like the Carrington Event in 1859.
[Addendum]
On 23 July 2012, a "Carrington-class" solar superstorm (solar flare, CME, solar electromagnetic pulse) was observed, but its trajectory narrowly missed Earth.
3-2-1 has been updated to 3-2-1-1-0 by Veeam’s marketing at least.
At least 3 copies, in 2 different mediums, at least 1 off-site, at least 1 immutable, and 0 detected errors in the data written to the backup and during testing (you are testing your backups regularly?).
All the data is spread across more than 3 sites, both SSDs and Blu-ray (which is immutable). I don't test the SSDs because I trust Rclone, the Blu-ray is only tested after writing.
There is surely risk of Bit rot on the SSDs but it's out of sight and out of mind for my use case.
I've been considering to get in to the Blu-ray backups for a while. Is there a good guide on how to organize your files in order to split it in to multiple backup disks? And how to catalogue all your physical Discs to keep track of them?
I remember about 20 years ago my friend had a huge catalogue of 100s of disks with media(anime) and he used some kind of application to keep track of where each file is located across 100s of discs in his collection. I assume that software must have improved for that sort of a thing?
I don't know about the best way to split things (I do it topically mostly, e.g. each website backup goes to a separate disc). But hashdeep is a great little tool for producing files full of checksums of all files that get written to the disc, and also for auditing those checksum files.
Powering on the SSD does nothing. There is no mechanism for passively recharging a NAND flash memory cell. You need to actually read the data, forcing it to go through the SSD's error correction pipeline so it has a chance to notice a correctable error before it degrades into an uncorrectable error. You cannot rely on the drive to be doing background data scrubbing on its own in any predictable pattern, because that's all in the black box of the SSD firmware—your drive might be doing data scrubbing, but you don't know how long you need to let it sit idle before it starts, or how long it takes to finish scrubbing, or even if it will eventually check all the data.
Adding to this... Spinrite can re-write the bits so their charge doesn't diminish over time. There's a relevant Security Now and GRC article for those curious.
Re-writing data from the host system is quite wasteful of a drive's write endurance. It probably shouldn't be done more often than once a year. Reading the data and letting the drive decide if it needs to be rewritten should be done more often.
How about a background cron of diff -br copyX copyY , once per week, for each X and Y .. if they are hot/cold-accessible
Although, in my case, the original is evolving, and renaming a folder and few files makes that diff go awry.. needing manual intervention. Or maybe i need a content-based-naming - $ ln -f x123 /all/sha256-of-x123 then compare those /all
I've been reading a lot of eMMC datasheets and I see terms like "static data refresh" advertised quite a bit.
You're quite right that we have no visibility into this process, but that feels like something to bring up with the SFF Committee, who keeps the S.M.A.R.T. standard.
Might need to go through the NVMe consortium rather than SFF/SNIA. Consumer drives aren't really following any SFF standards these days, but they are still implementing non-optional NVMe features so they can claim compliance with the latest NVMe spec.
I've got files going back to 1991. They started on floppy and moved to various formats like hard drives, QIC-80 tape, PD optical media, CD-R, DVD-R, and now back to hard drives.
I don't depend on any media format working forever like tape. New LTO tape drives are so expensive and used drives only support small sized tapes so I stick with hard drives.
3-2-1 backup strategy, 3 copies, and 1 offsite.
Verify all the files by checksum twice a year.
You can over complicate it if you want but when you script things it just means a couple of commands once a week.
I have some going back to my first days with computers (~1997), but it's purely luck. I've certainly lost more files since then than I've kept.
Does that tear me up? Not one bit. And I guess that's the reason why people aren't clamouring for archival storage. We can deal with loss. It's a normal part of life.
It's nice when we do have old pictures etc. but maybe they're only nice because it's rare. If you could readily drop into archives and look at poorly lit pictures of people doing mundane things 50 years ago, how often would you do it?
I'm reminded of something one of my school teachers recognised 20+ years ago: you'd watch your favourite film every time it was on TV, but once you get it on DVD you never watch it again.
I think in general we find it very difficult to value things without scarcity. But maybe we just have to think about things differently. Food is already not valuable because it's scarce. Instead I consider each meal valuable because I enjoy it but can only afford to eat two meals a day if I want to remain in shape. I struggle to think of an analogy for post-scarcity data, though.
What is your process for automating this checksum twice a year? Does it give you a text file dump with the absolute paths of all files that fail checksum for inspection? How often does this failure happen for you?
All my drives are Linux ext4 and I just run this program on every file in a for loop. It calculates a checksum and stores it along with a timestamp as extended attribute metadata. Run it again and it compares the values and reports if something changed.
These days I would suggest people start with zfs or btrfs that has checksums and scrubbing built in.
Over 400TB of data I get a single failed checksum about every 2 years. So I get a file name and that it failed but since I have 3 copies of every file I check the other 2 copies and overwrite the bad copy. This is after verifying that the hard drive SMART data shows no errors.
> What is your process for automating this checksum twice a year?
Backup programs usually do that as a standard feature. Borg, for example, can do a simple checksum verification (for protection against bitrot) or a full repository verification (for protection against malicious modification).
This article touches on a lot of different topics and is a bit hard for me to get a single coherent takeaway, but the things I'd point out:
1. The article ends with a quote from the Backblaze CTO, "And thus that the moral of the story was 'design for failure and buy the cheapest components you can'". That absolutely makes sense for large enterprises (especially enterprises whose entire business is around providing data storage) that have employees and systems that constantly monitor the health of their storage.
2. I think that absolutely does not make sense for individuals or small companies, who want to write their data somewhere and ensure that it will be there in many years when they might want it without constant monitoring. Personally, I have a lot of video that I want to archive (multiple terabytes). I've found the easiest thing that I'm most comfortable with the risk is (a) for backup, I just store on relatively cheap external 20TB Western Digital hard drives, and (b) for archival storage I write to M-DISC Bluerays, which claim to have lifetimes of 1000 years.
I personally don't believe an archival storage, at least for personal use.
Data has to be living if it is to be kept alive, so keeping the data within reach, moving it to new media over time and keeping redundant copies seems like the best way to me.
Once things are put away, I fear the chances of recovering that data steadily reduce over time.
> Once things are put away, I fear the chances of recovering that data steadily reduce over time.
I’ve run into this a lot. You store a backup of some device without really thinking of it, then over time the backup gets migrated to another drive but the device it ran on is lost and can’t be replaced. I remember reading a post years ago where someone commented that you don’t need a better storage solution, you need fewer files in simpler formats. I never took his advice, but I think he might have been right.
Cloud, or variants thereof, is fine -- I use rsync.net for backup and archive. But needing to manually run a backup (say, onto a thumb drive) is not sustainable, and even though the author suggests that disks (spinning rust or optical) might actually have a reasonable lifespan, I don't trust myself to be able to recover data from them if I want it.
As the author says, the limiting factor isn't technical. For media, it's economic. For any archival system it's also going to be social. There's a reason that organisations that really need to keep their archives have professional archivists, and it's not because it's easy :).
Only 'online' data is live/surviving data... So I keep a raid5 array of (currently 4) disks running for my storage needs. This array has been migrated over the years from 4x1 TB, to 2TB, to 4TB, 8TB and now 4x 16TB disks. The raid array is tested monthly (automated). I do make (occasional, manual) offline backups to external HDD's ( a stack of 4/5 TB seagate 2.5" externals), but this is mostly to protect myself from accidental deletions, and not against bitrot/failing drives.
Tapes are way to slow/expensive for this (low) scale, optical drives are way to limited in capacity, topping out at 25/50GB, and then way to expensive to scale.
You don't need constant monitoring if you have extra disks. If your budget is at least a thousand dollars, you can set up 4 data disks and 4 parity disks and you'll be able to survive a ton of failure. That's easily inside small company range.
You don’t have to (though it can make sense, but I’d encrypt everything and have an additional backup). Another fairly straightforward solution (in that there is ready-made hard- and software for it and doesn’t need much maintenance) is to use a NAS with RAID 5/6, and to have a second NAS at another ___location (can be a friend or relative — and it can be their NAS) that you auto-backup the first one to over the internet.
This article is specifically about digital archival. That is, keeping bit-perfect copies of data for 100+ years. But I think for regular people this is not so obviously useful. People want to keep things like texts (books), photographs, videos etc. Analogue formats are a much better option for these things, for a couple of reasons:
* They gracefully degrade. You don't just start getting weird corruption or completely lose whole files when a bit gets flipped. They might just fade or get dog-eared, but won't become completely unusable,
* It's a more expensive outlay and uses scarce physical space, so you'll think more carefully about what to archive and therefore have a higher quality archive that you (and subsequent generations) are more likely to access.
The downside I guess is backups are far more difficult, but not impossible, and they will be slightly worse quality than the master copy. But if you lose a master copy of something, would it really be the end of the world? Sometimes we lose things. That's life.
I've backed up on just about everything going back to QIC-150s, but today I just use a set of 4Tb drives that I rsync A/B copies to and rotate offsite. That gives me several generations as well as physical redundancy.
The iteration before that, I made multiple sets of Blu-Rays, which became unwieldy due to volume, but was write-once with multiple physical generations. I miss that, but at one point I needed to restore some files and even though I used good Verbatim media, a backup from a couple months prior was unreadable. All copies had a mottled appearance and the drive that wrote it (and verified) was unable to read it. Did finally find a drive that would read it, but finally pushed me over the edge.
I wonder how the author's 18yo media will compare to modern 5yo media. It's been a long time since we have had the rock solid Taio Yuden gold disks ...
This made me smile. I have a very similar configuration. Simple but effective. The only thing that worries me bitrot might get me. Then again, my body will bitrot, too. So no point worrying too much about some random data in some turbulence in time.
You may not even be able to get real MDiscs any more [0] and I'm always extremely dubious of 1000 year lifespans since they're effectively impossible to test.
> Hopefully this can put closure to the speculation. Our organization is a databank and is a big user of mdisc for archiving. We reached out to Verbatim last week about this Media Identification Code (MID) discrepancy. Here is their reply, in their own words ---- "The creator of the MDisc technology- Millenniata went out of business in 2017, they sold the technology to Mitsubishi, who until 2019 owned Verbatim. Due to this, the stamper ID changed, but the formula & the disc materials stayed the same. Mitsubishi sold Verbatim & all the technologies to CMC in December of 2020. Verbatim is the only company authorized to sell the original technology. Any Millenniata discs available were all produced before 2017 when the company shut down and any other brand is not the original technology." ----- So there it is, mdiscs with either the 'VERBAT' or 'MILLEN' prefix are fine. Just different production periods. Cheers.
There are 100 GB BDXL flavors of M-Disc, but yeah definitely not enough for really large amounts of data but large enough to store a good chunk of my photos which is mostly what I'd want to keep around.
LTO tape is excellent for archival storage because that is what it was designed for. It uses a two layer error correction code that means it has an incredibly low bit error rate so you will still be able to read a tape that was stored correctly 40 years later. Just remember to also store a compatible drive!
When the HDDVD-Bluray wars were going on China had their own implementations of optical storage, and it has been evolving ever since. Much of it is undocumented in languages other than Chinese.
Companies in China use these alternative optical discs, some of which store up to 1TB of data.
The only reference I can find to it on English Wikipedia is the CBHD
it's actually 128GB per disk (BDXL), I only know of Chinese companies announcements of 500GB optical disks last year[1], not sure if they are already deployed to some enterprise partners, it's entirely possible. Their more theoretical research goes far beyond that. [2]
There are archival storage machines similar to tape drive robots for archival storage in the Chinese market where you have hundreds of such disks in a single unit and 1PB+ per rack.
The quote of LTO tape being much less prone to read failures (10^-20) vaguely reminded me of an old article stating that something like 50% of tape backups fail. I'm not in that side of the industry so can't really comment as to if there is some missing nuance.
Last year my company read in excess of 20,000 tapes from just about every manufacturer and software vendor. For modern, LTO/3592/T10000 era tapes the failure rate we see is around 0.3%.
Most of these failures are due to:
cartridges being dropped or otherwise physically deformed such that they do not fit into the drives anymore.
cartridges getting stuck in drives and destructive extraction being required.
Data was never written correctly in the first place.
The only exception to this rule that we have seen is tapes written with LTFS. These tapes have a 20 fold higher incidence of failure, we believe because reading data back, as if it was a HDD, causes excessive wear.
Anyone claiming 50% failure rates on tapes has no idea what they are talking about, are reading back tapes from the 1970s/80s or have a vested interest in getting people away from tape storage.
They're not saying the failure rate of tapes is 50%. They're saying if you survey attempts to do data restores from tape then 50% of the time not all the requested data is found.
I can't claim the same volumes you can but I did handle tape backups and recovery for a mid sized business for a few years. We only had one tape failure in my tenure but we had plenty of failed recoveries. We had issues like the user not knowing the name and ___location of the missing file with enough detail to find it, or they changed the file 6 times in one day and they need version 3 and the backup system isn't that granular.
Those are just the issues after the system was set up and working well. Plenty of people set a backup system running, never check the output, and are disappointed years later to learn the initial config was wrong.
Long story short 50% failure of tapes is ludicrous but 50% failure of recovery efforts is not.
The read failures are also attributed to other parts of the system, which for the end user still end up in failed reads. The author links to a sales PDF from Quantum.
e.g. the robot dies, the drive dies, the cartridge dies, the library bends, the humidity was too much.. multiplied by each library, robot, drive and cartridge your data is spread across.
Or, a fun little anecdote, the cleaner had access to the server room and turned off the AC of the server room, most disk drives failed, and the tapes melted inside the robots.
Yes, exactly. As a data hoarder myself I've been thinking 'what data is _really_ important to me?'. And the answer is - not that much of it. The work, mental space, time, money you have to invest into storing your own data is so much effort, it is probably not worth it.
I don't consider myself to be a proper data hoarder since I only have tens of TiB at my disposal, but I managed to minimize the work, mental space, and time aspects by automating as much as possible. At first, I had a bunch of scripts running on a raspberry pi, but now I have the entire process managed by Home Assistant.
its a rube goldberg machine involving mqtt, VMs, cheap VPSs, rsync, and wifi plugs, but it works. I only get notified if a daily backup fails, and always get full summary of the weeklies. I probably could automate the process of writing to DVD, so the only manual thing I'd need to do would be to insert blank disks, but my quality of life has drastically improved.
As for cost, I'm still working on it, and my storage needs are very predictable so I can hunt for deals ahead of time - I still have unused HDDS from last year. It is common to find discounts to about $12/TiB, which is cheap.
It helps that I enjoy coding, and that I deeply care about the data I'm preserving. I got burned after losing 2-years worth of unbacked-up data scraped off Twitter right before they closed the API, so I'm never ever going to get that data back again.
The backup automation evolved pretty organically, but slowly. I was happy when I finally was able to get the weekly backup process to start automatically.
I used to save everything by default and over the years (~20) my storage requirements started getting out of hand.
So I had a change in philosophy where I decided to throw everything into a "to delete" folder and start with a single flat folder structure and go through everything file by file and put it in the "keep folder" and really evaluate whether I needed it. As a result I ended up with about a 90% reduction and I don't feel like I'm missing anything.
Yeah, this. The data I am most concerned about is not even 1GB after compression. That's all my $HOME configs and all my projects I am working on. Then I have some open datasets I like to fiddle with (mostly *.sql.zst compressed DB dumps) which I periodically dump on my Linux server (weekly with rsync) and finally -- video.
Video is obviously like 99.99% of everything but I have made sure to store all the sources of it (mostly downloaded playlists from YouTube) and I have scripts that synchronize the videos from the net to my local folders. Even tested that a few times and it worked pretty well.
So indeed, in the end, just find what's most valuable and archive that properly. In my case I have one copy in my server and 5+ copies on various cloud storage free tiers. All encrypted and compressed. Tested that setup several times as well, I have a one-line script to restore my dev $HOME, and it works beautifully.
I have my whole unraid server backed up using Backblaze's windows backup tool / subscription.
There is a docker container for running the backup app, so it can be setup on a standard Linux desktop as well I believe. You just need to mount the storage you want backed up into the container as another drive and then you can configure the backup software in the fake window environment to back up those drives on "your PC" in the container.
A lot of things are stated as conclusions in this article, where SOTA has reversed or in some cases invalidated the conclusions. Unfortunately they are not published, and will probably remain trade secrets for another decade.
The biggest conclusion that is invalidated is that your archival workload cannot be bin packed with your hot workloads. With the ever reducing IO/byte of HDD, this has radically changed where the bytes go.
Another example is the cost of IO when your backups are not perfectly coalesced. If you're striping writes across datacenters, into objects which are sharded across clusters that encode across many drives - your backups get extremely fragmented.
The IO for your backup suddenly becomes HUGE when compared to the size of the end object you wish to read. This makes things like tape nasty - sure you can read at incredible linear speeds.. but that's only worth it if you actually wanted to restore the exact TBs that are on the exact cartridge your drive & robot picked up.
My recipe for large files: 3 copies. Right now, 1st copy on external 8 to 16TB NTFS desktop hard drives, and 2nd copy on 14 to 16TB internal ext4 drives. Theses drives I power up only for copy purposes, once a month or so. At present time, my drives are 5 to 7 years old, and still good.
Main working copies I keep on 4 to 8TB NTFS SSDs (mix of sata and nvme), plugged into a PC I'm using regularly, but intermittently.
Seems like a bad idea. Not only is it incredibly expensive to get there, but tremendously inconvenient when you need to restore from that backup. Also, lots of high energy interstellar particles bombarding your media. You'd be better off filling a chamber with an inert gas at the bottom of a mine and sealing it up. Still inconvenient as hell to access, but you're much better protected against most disasters.
I'm a bit surprised BluRay is not mentioned. It's relatively cheap cold archival storage. Of course the recovery latency is a bit bad because of manual steps, but hey, why not repurpose that old jukebox you have in the garage ;)
i can't find an original source but for sure back in that era i read info that the BD-R is not rated. as a data hoarder i ended up not going with that tech due to that analysis. even the main m-disc site only says the BD is "based on" the same tech, not that it was also evaluated. all i can find right now is overwhelmed by marketing and bloggers repeating the not-well-supported 1000 year claim for the BD.
anyway, from a bit of googling it appears you can't buy BD version of MDisc any more anyway. The ones marketed as such are apparently the more "normal" HTL BDR which are good for 100 years.
While I get that there are use cases for physical media, as both a data hoarder and data paranoiac (good things in my line of work), I've moved on. It's the data that matters, not the media.
To that end, I have automated two separate, but complementary, processes for my personal data:
- Sync to multiple machines, at least one of which is offsite, in near realtime. This provides rapid access to data in the event of a disk/system/building/etc failure. Use whatever software you want (Dropbox, Resilio, rsync scripts, etc), but in the event of failure, this solves 99% of my issues - I have another device that has fast access to my most recent data, current to within seconds. This is especially important when bringing up a new, upgraded system - just sync over the LAN. (Currently this is 4 devices, 2 offsite, but it flexes up/down over time occassionally).
- Backup to multiple cloud providers on a regular cadence (I do hourly encrypted incrementals). This protects me against data loss, corruption, malware attacks, my stupidity deleting something, etc. This solves the remaining 1% of my issues, enabling point-in-time recovery for any bit of data in my past, stretching back many years. I've outsourced the "media" issue to the cloud providers in this case, so they handle whatever is failing, and the cost is getting pretty absurdly cheap for consumers, and will continue to do so. My favorite software is Arq Backup, but there are lots of options. (Currently this is 4 discrete cloud providers, and since this is non-realtime typically, utilizes coldest storage options available).
Between these two complimentary, fully automated approaches, I no longer have to worry about the mess of media failure, RAID failures, system failures, human error, getting a new system online, cloud provider being evil, etc etc.
Are you sure about that? Many ransomware attackers do recon for some time to find the backup systems and then render those unusable during the attack. In your case your cloud credentials (with delete permissions?) must be present on your live sou ce systems, rendering the cloud backups vulnerable to your overwrite or deletion.
There are immutable options in the bigger cloud storage services but in my experience they are often unused, used incorrectly, or incompatible with tools that update backup metadata in-place.
I’ve encountered several tools/scripts mark a file file as immutable for 90 days the first time it is backed up, but not extend that date correctly on the next incremental, leaving older but still critical data vulnerable to ransomware.
I discovered recently that Microsoft OneDrive will detect a ransomeware attack and provide you with the option to restore your data to a point before the attack!
MS need to advertise this feature more, because I'd never heard of it and assumed all the files on the PC were toast!
Of course, the fact that a script on Windows can be accidentally run and then quietly encrypt all the users files in the background is another matter entirely!
Actually I think almost all malworm worms are totally automated. The attacker knows nothing about your network and backups, it just encrypts and deletes absolutely everything it has write access to.
No delete credentials present a cost issue when moving from a provider... I've accidentally left data behind after I thought I'd deleted it. Worth the risk, and learned my lesson.
You don't set the lifecycle rule at runtime. You set it at environment setup time. The credentials that put your object don't have to have the power to set lifecycle permissions.
You obviously don't put your environment setup user in your app. That would be utterly retarded.
And when you're moving providers you use your application credentials to do that? That makes no sense. This is nonsensical engineering. You'd use your environment credentials to alter the environment.
I'm not "engineering" anything - I'm just stopping a service. I close the account, or disable billing, or whatever that step requires. I don't even read the data back out or anything - just cancel. Doesn't really require "engineering".
You seem well placed to answer this one: how is cost for this resilience? Compare to the cost of the storage itself? Including the cost of migrating from solutions that are withdrawn from the market?
The cost (I assume you're talking about "my time" cost?) is unbelievably low, mostly in part due to good software. It "just works".
Specifically, Arq Backup, for example, lets you simply add/remove providers at will. It's happened multiple times, Amazon Drive changed (or went away? I forget...), Google Drive changed their Enterprise policies, etc... No big deal, I just deleted the provider and added another one. I still had plenty of working provider backups, so I wasn't worried while it took a day or two to fill the next provider. (Good argument for having 2+, I'd argue 3+ providers...)
Using notifications from your sync apps/systems/scripts/whatever is essential, of course, in case something fails... but all the good software has that built-in (including email and other notifications, not just OS, which helps for remote systems).
At this point, it's nearly idiot proof. (Good for idiots like me ;)
I meant more monetary cost. Nominally cloud storage for one unit of storage and one unit of time is perfectly "fine". Except that it adds up. More storage, indefinitely held, multiple hosts. Data which needs to be copied from one to the other which incurs costs from both. Add to this routine retrieval costs - if you "live this way". And routine test retrieval costs if you know what's good for you.
So last time I looked, unit costs were low - sure. But all-included costs were high.
Certainly some of this simply comes down to "how valuable is my data?".
Currently, given the extremely low (and dropping YoY) cost of storing cold data at rest, the essentially free cost of ingest, and the high cost of retrieving cold data which I almost never have to do, the ROI is wildly positive. For me.
And since all of these things (how many providers, which providers, which storage classes, how long to retain the data, etc) are all fine-tunable, you can basically do your own ROI math, then pick the parameters which work for you.
I get some peace of mind (in both professional and business settings) from having backup include a physically separable and 100% offline component. I like knowing an attacker would need to resort to kinetic means to completely destroy all copies of the data.
The physically separable component often lags behind the rest of the copies. It may only be independently verified on an air-gapped machine periodically. It's not the best copy, for sure.
I still take comfort in knowing attackers generally won't launch a kinetic attack.
Conceptually, though, I think my separation of "sync vs backup" and separating of discrete providers (both software and supplier) accomplishes this same goal. Conceptually, it's not very different, or possibly just a level up, from "online media vs archive media". At least, it seems that way to me.
Mr Metorite can launch such an attack but as long as you have two physically gapped backups at a distance greater than the likely blast radius you'll be fine.
Backups and archival are different things, with similar requirements, but different priorities. A backup doesn't care about data you think you'll never need again.
Absolutely mindblowing.