I assume Crowdstrike is software you usually want to update quickly, given it is (ironically) designed to counter threats to your system.
Very easy for us to second guess today of course. But in another scenario a manager is being torn a new one because they fell victim to a ransomware attack via a zero day systems were left vulnerable to because Crowdstrike wasn’t updated in a timely manner.
Maybe, if there's a new zero-day major exploit that is spreading like wildfire. That's not the normal case. Most successful exploits and ransom attacks are using old vulnerabilites against unpatched and unprotected systems.
Mostly, if you are reasonably timely about keeping updates applied, you're fine.
> Maybe, if there's a new zero-day major exploit that is spreading like wildfire. That's not the normal case.
Sure. And Crowstrike releasing an update that bricks machines is also not the normal case. We're debating between two edges cases here, the answers aren’t simple. A zero day spreading like wildfire is not normal but if it were to happen it could be just as, if not more, destructive than what we’re seeing with Crowdstrike.
In the context of the GP where they were actively treating a heart attack, the act of restarting the computer (let alone it never come back) in of itself seems like an issue.
I believe this update didn't restart the computer, just loaded some new data into kernel. Which didn't crash anything the previous 1000 times. A successful background update could hurt performance, but probably machines where that's considered a problem just don't run a general-purpose multitasking OS?
Crowdstrike pushed a configuration change that was a malformed file, which was picked up by every computer running a the agent (millions of computers across the globe). It's not like hospitals and IT systems are manually running this update and can roll it back.
As to why they didn't catch this during tests or why they don't use perform gradual change rollouts to hosts, your guess is as good as mine. I hope we get a public postmortem for this.
Considering Crowdstrike mentioned in their blog that systems that had their 'falcon sensor' installed weren't affected [1], and the update is falcon content, I'm not sure it was a malformed file, but just software that required this sensor to be installed. Perhaps their QA only checked if the update broke systems with this sensor installed, and didn't do a regression check on windows systems without it.
It says that if a system isn’t “affected”, meaning it doesn’t reboot in a loop, then the “protection” works and nothing needs to be done. That’s because the Crowdstrike central systems, on which rely the agents running on the clients’ systems, are working well.
The “sensor” is what the clients actually install and run on their machines in order to “use Crowdstrike”.
The crash happened in a file named csagent.sys which on my machine was something like a week old.
Likely because staggered updates would harm their overall security services. I'm guessing these software offer telemetry that gets shared across their clientele, so that gets hampered if you have a thousand different software versions.
My guess is this was an auto-update pushed out by whatever central management server they use. Given CS is supposed to protect your from malware, IT may have staged and pushed the update in one go.