Outage caused by Crowdstrike update takes down computers around the world

If one is looking for a positive spin, it’s that Chipotle had issues, stock declined, CEO resigned, but after about 3/3.5 years the stock had fully recovered and has done pretty well since. Equifax had a double-dip, but its stock also eventually recovered.

McAfee got bought a year after its debacle, but Crowdstrike is a stronger company today than McAfree was and harder to switch out.

I think what happens will depend on how Crowdstrike’s customer’s decision makers (CSOs, CIOs, etc.) react. Do they sympathize and give Kurtz and Crowdstrike a break as long as they make strong fast moves, or are they so upset that they demand pounds of flesh, big discounts, and change how they choose and employ security vendor products moving forward? For instance, can any company single source security anymore?

I don’t know the answers.

22 Likes

Industry specific evaluations like MITRE seems to suggest S has a sufficiently good product and is the leader in some cases. I’m not smart enough to be able to evaluate the products myself so this is from a layman’s view. As to revenue growth, S has been balancing growth with “profitability” which is required in today’s market.

I haven’t seen anything to suggest S has an inferior product line up. Maybe it isn’t as encompassing as CRWD…don’t really know.

S is a small position for me but I have not been unhappy with their cadence after the ARR debacle was put to bed. I think the set up remains favorable and the CRWD mishap will only help.

15 Likes

I’m not sure people care about the technical details any more, but Microsoft published their own in-depth analysis in a blog post recently:

In their blog post, CrowdStrike describes the root cause as a memory safety issue—specifically a read out-of-bounds access violation in the CSagent driver.

Note that one new piece of information we got from this MS blog post is that far more than 8.5 million PCs were affected - those were just the PCs that were set up to report details of crashes. Microsoft didn’t say how many more, just that the 8.5M were a “subset” of the affected machines.

On the crash:

…we can see in the disassembly that there is a check for NULL before performing a read at the address specified in the R8 register…Our observations confirm CrowdStrike’s analysis that this was a read-out-of-bounds memory safety error in the CrowdStrike developed CSagent.sys driver.

So, not a NULL pointer (for which Crowdstrike did have a check in the code), but a bad pointer, which is somewhat harder to check for. But (unsaid in the blog), presumably using Try/Catch handling would have the driver not actually crash on errors such as these.

Microsoft tamely and indirectly criticizes Crowdstrike’s approach:

It is possible today for security tools to balance security and reliability. For example, security vendors can use minimal sensors that run in kernel mode for data collection and enforcement limiting exposure to availability issues. The remainder of the key product functionality includes managing updates, parsing content, and other operations can occur isolated within user mode where recoverability is possible. This demonstrates the best practice of minimizing kernel usage while still maintaining a robust security posture and strong visibility.

IOW, Crowdstrike should have moved more stuff, like parsing content and updates, out of the kernel driver into regular user mode applications that are less likely to prevent systems from rebooting.

The rest of Microsoft’s blog post is mostly about the security mechanisms already in Windows, including its Microsoft Defender product.

30 Likes