> Google Cloud just killed all our servers > we were taking Postgres base backup...

scoot · on Jan 12, 2024

That's incorrect. You definitely do want backups in the same ___location as production if possible to enable rapid restore. You just don't want that to be your only copy.

The canonical strategy is the 3-2-1 rule: three copies, two different media, one offsite; but there are variations, so I'd consider this the minimum.

JeffSnazz · on Jan 12, 2024

> You definitely do want backups in the same ___location as production if possible to enable rapid restore.

This is a distant second priority to ensuring any reliable backup.

aembleton · on Jan 12, 2024

What other media should you store backups in? Tape? Paper print out?

scoot · on Jan 12, 2024

Historically tape, but in practice these days it means "not on the same storage as your production data". For example in addition to a snapshot on your production system (rapid point in time recovery if the data is hosed), a local copy on deduplicated storage (recovery if the production volume is hosed), and an offsite copy derived from replicated deltas (disaster recovery if your site is hosed).

The same principle can be applied to cloud hosted workloads.

tetha · on Jan 12, 2024

As an example, for postgres, we have:

Backups on a pgbackrest node directly next to the postgres cluster. This way, if the an application figures a good migration would include TRUNCATE and DROP TABLE or terrible UPDATEs, a restore can be done in some 30 - 60 minutes for the larger systems.

This dataset is pushed to an archive server at the same hoster. This way, if e.g. all our VMs die because someone made a bad change in terraform, we can relatively quickly restore the pgbackrest dataset from the morning of that day, usually in an hour or two .

And this archive server is mirrored by and is mirroring some archive servers at different hosters entirely, also geographically far apart. This way, even if a hoster cuts a contract right now without warning we'd lose at most 24 hours of archives, which can be up to 48 hours of data (excluding things like offsite replication for important data sets).

wongarsu · on Jan 12, 2024

In the original version that means tape, yes. It's the point most startups skip, but it has some merit. A hacker or smart ransomware might infect all your backup infrastructure, but most attackers can't touch the tapes sitting on a shelf somewhere. Well, unless they just wait until you overwrite them with a newer backup.

EvanAnderson · on Jan 12, 2024

Don't forget to test the tapes, ideally in an air-gapped tape drive. One attack scenario I posed in tabletop exercise was to silently alter the encryption keys on the tape backups, wait for a few weeks/months, then zero the encryption keys at the same time the production data was ransomed. If the tape testing is being done on the same servers where the backups are being taken you might never notice your keys have been altered.

(The particular Customer I was working with went so far as to send their tapes out to a third-party who restored them in and verified the output of reports to match production. It was part of a DR contract and was very expensive but, boy, the piece of mind was nice.)

shiroiuma · on Jan 12, 2024

Either papyrus or clay tablets if you want it to last.

More seriously, perhaps the "2 different media" means don't use, for instance, the same brand and/or model of hard drive for your multiple backups.

fatihpense · on Jan 12, 2024

Papyrus doesn't last :) You want clay tablets, buried in the ground. Looking at Sumerian tablets that would give you 5000 years.

shiroiuma · on Jan 12, 2024

I thought papyrus lasted a really long time, as long as you sealed it in huge stone tombs in the desert.

I think we should build a big library in a lava tube on the Moon to store all the most important data humanity has generated (important works of art and literature, Wikipedia, etc.). That's probably our best hope of really preserving so much knowledge.

tim333 · on Jan 12, 2024

Some lasted at least 3000 years https://www.britannica.com/topic/Ebers-papyrus

EvanAnderson · on Jan 12, 2024

Depending on the size of your data corpus a few USB disks w/ full disk encryption could be a cheap insurance policy. Use a rotating pool of disks and make sure only one set is connected at once.

Force the attacker to restore to kinetic means to completely wipe out your data.

ddorian43 · on Jan 12, 2024

The egrees fees will be bigger than your db cost.

silon42 · on Jan 12, 2024

Yes, maybe (some kind of diff/sync could maybe help), but this means using such a cloud is a bad IT practice.

theanirudh · on Jan 12, 2024

Yes, the egress fees on base backups alone were higher than the cost of the DB VMs. If we replicate the WAL also, it would be way higher. In the post, the example DB was 4.3 GB, but the WAL created was 77 GB.

sgarland · on Jan 12, 2024

The joys of WAL bloat [0]. UUIDv4s?

[0]: https://www.2ndquadrant.com/en/blog/on-the-impact-of-full-pa...