No, I'm not surprised either. But if you're operating at this kind of scale and with this level of immediate roll-out, what I would expect are:
* A staggered process for the roll-out, so that machines that are updated check-in with some metrics that say "this new version is OK" (aka "canary deployment") and that the update is paused/rolled back if not.
* Basic smoke testing of the files before they're pushed to any customers
* Validation that the file is OK before accepting an update (via a checksum or whatever, matched against the "this update works" automated test checksums)
* Fuzz tests that broken files don't brick the machine
Literally any of the above would have saved millions and millions of dollars today.
* A staggered process for the roll-out, so that machines that are updated check-in with some metrics that say "this new version is OK" (aka "canary deployment") and that the update is paused/rolled back if not.
* Basic smoke testing of the files before they're pushed to any customers
* Validation that the file is OK before accepting an update (via a checksum or whatever, matched against the "this update works" automated test checksums)
* Fuzz tests that broken files don't brick the machine
Literally any of the above would have saved millions and millions of dollars today.