Outage caused by Crowdstrike update takes down computers around the world

rdutt · July 19, 2024, 11:59am

This is not the news that CRWD shareholders need. Apparently, Crowdstrike released a defective software update that has caused business disruption throughout the world. They quickly also released a fix, but not before “blue screens of death” on Windows computers across the globe grounded flights at airports, impacted medical records systems at hospitals, disrupted banks, etc. The software runs on thousands of computers, and is a reminder of how dependent the world is on a few dominant platforms.

buynholdisdead · July 19, 2024, 2:48pm

Only 8 percent down right now and all that disruption. I am surprised it hasn’t dropped at least 25 percent. What a mess.

DoctorRob · July 19, 2024, 5:04pm

I work in several healthcare systems and many of them have had significant issues due to this outage today. It has impacted timely patient care and I’m sure some very serious discussions at the administrative level will be had. I really like $CRWD, but this is a major outage covering many industries.

There are court systems, public transit systems, airlines, 911 systems, banks, and many others affected by this outage.

Maybe this will prove to just be a buyable dip, but the widespread nature and significant economic damage attributable to this may fundamentally alter the company’s long-term growth prospects.

Elon Musk just said “We just deleted Crowdstrike from all our systems”

FallingWallenda · July 19, 2024, 5:45pm

This isn’t a cybersec issue in that Crowdstrike didn’t let a hacker through. It’s a sloppy coding issue with an update and, although I won’t minimize the damage, these companies will still need to protect against hackers, and CRWD is the best at that. That being said, I hope Kurtz is humble (not his strong suit) and goes way way way out of his way to satisfy customers. If they jump ship, I really don’t see a comparable company to jump to. As for Mr. Shadenfreud, Elon Musk, I’m sure he will make hay with this since he’s probably envious of how much better CRWD has done recently, and how much better of a CEO Kurtz is than him.

All in all, I blame myself because YESTERDAY I bought back all the shares I sold a few weeks ago at 380. So frustrating. I added a bit in 280s this morning and sold to lower the cost basis of the shares I bought yesterday. Anyway, long term, I can’t see this impacting the company but this will probably take some scratch to fix the problem, and fix all their customer’s problems. Oy.

buynholdisdead · July 19, 2024, 6:33pm

You might say Crwd was the hacker. Like DoctorRob said this was a major outage with many different major entities. I don’t think anyone knows how this is going to move Crwd but there are many other providers of Security other than Crwd and I can’t believe how stupid and careless they were. This is unforgivable and makes me think Panw is going to be the leader going forward.

Andy

stenlis · July 19, 2024, 7:48pm

I sold half of my CRWD position today. They have rolled out an update with a critical error in it. This points to faulty quality processes, which is a big no-no for a security provider.

I fear investors will think this over through the weekend and decide to sell more next week. Today there was a lot of confusion in the news and the consequences have not settled in yet.

rdgyy · July 19, 2024, 10:53pm

Crowdstrike has been in businesses for 13 years, they would not be here without having a good QA process in place. There will be a formal post-mortem provided once their investigation has been completed as per their own CEO.

Iamnzane · July 20, 2024, 3:30am

I led the QA for the FireEye appliances for 8 years, and we never had a serious widespread outage. Besides rigorous testing, the last decision Go/No Go being mine, not the corporation or some executive. I was accountable. So what are the release stages and how is their Go/No Go decision made? A safer approach might be having a new software update was rolled out to a small subset of customers before spreading the goodness or puke to a large customer base. Just saying without knowing any detail on their root cause.

-zane

JabbokRiver42 · July 20, 2024, 2:16pm

Although CRWD is a distant memory in my portfolio as its amazing returns helped buy my new cottage, I am really struggling to believe that something of this magnitude was a simple (although devastating) mistake. Maybe I’m biased. But it would not surprise me if the investigation turned up a bad actor.

Of course that would be bad in its own way, for any kind of security company, whether said “bad actor” was inside or outside of the company. But there are lots of hackers who could only dream of the kind of disruption this caused. Maybe one didn’t just dream.

mizzmonika · July 20, 2024, 3:12pm

Anyone who has worked in the software industy in charge of releases knows rolling out a new release can, in fact, crash everything without there being a bad actor anywhere in sight. It is always possible to miss a bug no matter how great your QA is. You deal with this by having limited rollout - first onto a staging machine where you hope to catch it - and excellent rollback processes.

I agree with rdgyy and lamnzane. Their QA process is likely good. Their rollout process is flawed. The problem as I read it is that they release onto Azure servers directly. How on earth can they do that globally without doing it in stages? They will learn never to do that again.

Having said all that, there will be people who wonder just like JabbokRiver42 does and if that catches fire, the stock will react next week.

Smorgasbord1 · July 20, 2024, 3:14pm

It really was. Here is Crowdstrike’s statement on the technical details:

On July 19, 2024 at 04:09 UTC, as part of ongoing operations, CrowdStrike released a sensor configuration update to Windows systems…Updates to Channel Files are a normal part of the sensor’s operation and occur several times a day in response to novel tactics, techniques, and procedures discovered by CrowdStrike. …The update that occurred at 04:09 UTC was designed to target newly observed, malicious named pipes being used by common C2 frameworks in cyberattacks. The configuration update triggered a logic error that resulted in an operating system crash.

With such frequent updates (several times a day!), it’s likely they don’t do any manual testing. And, it appears that even automated testing is non-existent or not nearly extensive enough at Crowdstrike for this kind of update. They probably considered these Channel File updates to be safe - and they probably don’t have actual code in them. If you read what Crowdstrike said: “triggered a logic error,” which I believe is their way of saying the logic error existed (exists) in other files, not the one being updated. It’s just this update made that existing logic do something bad. So, look for multiple fixes coming down the road.

stenlis · July 20, 2024, 5:17pm

Seems like Crowdstrike rolled out a kernel driver full of null characters.

Kernel drivers are pieces of machine code that typically operate a piece of hardware, like your graphics card, motherboard, USB or similar devices (sometimes the devices are not physical). Kernel drivers enable windows to operate those devices. As kernel drivers operate hardware they in sense run with highest of privileges. They are critical from Cybersecurity perspective.

Windows has a binary code for hundreds of characters. For instance 01100001 stands for “a”. The null character or 00000000 is typically used to determine an end of a sequence, like an end of a string of characters being transmitted between processes.

Every windows machine that attempted to load this “driver” has crashed.

How can you get a file full of null characters? Through some kind of corruption, like a faulty hard drive or the application crashing just as it is attempting to write the file on a hard drive. So the fact that this driver was full of null characters may have been just bad luck. I.e. they tested everything fine and the driver got corrupted afterwards.

However, there are tried and true rollout measures that prevent this kind of error from reaching the customer. For instance you put a checksum on your files when you test them, a sort of “seal of approval”. If the content of the files changes afterwards, the checksum will not fit anymore and the updater should refuse to roll the files out automatically. This is standard fare for any update process. This is how Microsoft rolls out billions of updates per year and doesn’t crash anything.

The second standard measure is staged roll out - you roll out to 1000 devices, check that everything worked, then roll out to the rest of them.

The fact that Crowdstrike doesn’t employ any of these standard measures is extremely concerning.

rdgyy · July 21, 2024, 10:41am

CRWD is my largest position (10-12%) and I plan on waiting for their official root cause analysis to complete. I’m definitely not going to make investment decision based on random twitter posts.

One-point that I don’t see anyone addressing is that Crowdstrike is a security firm. They need to be super careful what they disclose so that information doesn’t get abused by malicious actors.

The CrowdStirke agent is over a decade old and quite sophisticated. If Crowdstrike is running and you try to delete the file in question, you will get access denied and that will trigger an alert for suspicious activities. Obviously this is sensitive area and Crowdstrike is doing imho the right thing by disclosing as min as possible at this moment.

FallingWallenda · July 21, 2024, 5:12pm

CRWD is my second largest position next to NVDA and I’m hoping they handle the PR component to this very delicately. I saw Kurtz on CNBC (and in Apocalpyse Now) and he kind of took the bait that part of the responsibility belongs to Microsoft. He can’t do that, even though it’s no coincidence it only impacted companies that still run Windows. I guess my biggest concern is how the remedies are going to hit the bottom line and result in a re-rating. I literally added back to my position (I sold 20% when it hit 380) the afternoon before this hit so I’m not super pleased. But CRWD is a company I had no concerns about (other than valuation) and really shocked they allowed this to happen.

I was pleased with Matthew Prince not gloating here.

Smorgasbord1 · July 21, 2024, 7:25pm

Obscurity is not a security solution. If Crowdstrike is being more careful with what they say about their flaw(s) than they obviously were about their actual updates, that’s a real problem.

What I suspect is happening is that Crowdstrike is doing a root cause analysis and will issue additional updates to correct the problem, as well as a technical note explaining what the bug was, how it is fixed, and what procedures the company will put in place to prevent anything like this from ever happening again. That’s not just more QA on updates (including running the update on internal computers first, duh), it’s potentially things like staged roll-outs for customers (perhaps based on system importance and redundancy, etc.).

When a security company bug unintentionally accomplishes what hackers dream of doing themselves, there’s going to be a lot of soul searching among security companies, and within companies buying security products/services.

rdgyy · July 21, 2024, 8:38pm

Smorgasbord1 · July 21, 2024, 9:10pm

Keep reading:

Security by obscurity alone is discouraged and not recommended by standards bodies. The National Institute of Standards and Technology (NIST) in the United States recommends against this practice: “System security should not depend on the secrecy of the implementation or its components.”[9] The Common Weakness Enumeration project lists “Reliance on Security Through Obscurity” as CWE-656.[10]

A large number of telecommunication and digital rights management cryptosystems use security through obscurity, but have ultimately been broken. These include components of GSM, GMR encryption, GPRS encryption, a number of RFID encryption schemes, and most recently Terrestrial Trunked Radio (TETRA).[11]

EDIT: Matter of fact, there’s a belief among many in the field, that open sourcing your security is a best practice. That gets a lot of eyeballs on the code, which can find potential problems sooner.

For instance, two of the best private email services, Proton Mail and TutaMail, open source their code precisely for that reason - to get eyeballs on it. Their security relies on solid public key cryptography, not obscurity.

rdgyy · July 21, 2024, 10:01pm

There is zero chance a security company like CrowdStirke which is used by the US government is going to open source their product. There is a formal post from Crowdstrike that they will disclose more on this issue as they go through their formal root cause analysis process. There are way too many interested parties here, and some of them are bad actors.

Iamnzane · July 21, 2024, 10:21pm

Smorg,
You are correct about obscurity of a security not being a good practice. You can only defend for what you know and the rest you guess at. (CVS, NIST, OWASP, etc.) Transparency certainly in western enterprises is the acceptable good citizen practice and sometimes is the law. But disclosures of new breeches are typically held confidentially for a few weeks until the breeched party can make available a patch and distribute. This allows vulnerable parties to take the necessary steps before public disclosure. This minimizes damage and is most respectful even amongst competitors.

This CRWD issue is of course not a bad actor or corrective action to an exploit. I empathize with the CRWD testing an application across millions of Windows machine with an infinite number of different configurations. And this signature/rules/code release is daily and many times automated to happen within minutes or hours in response to a new exploit. This is why companies pay the big bucks to companies like CRWD. However, this CRWD issue was a self inflicted broad spectrum failure where just a few simple automated checks would have blocked the release. As such this egregious and inexcusable. This is about as bad as it gets. I trust and have no doubts that CRWD will take corrective actions. CRWD will rise and be a better company. But the damage sustained will likely result in lingering financial impacts both in sales and in damage liabilities.

It is hard to determine how long this stock will be down. Most times the news is forgotten within a few weeks. Many times the breech is not even the fault of the vendor, but rather an enterprise practices failure. But this one, IMO, is bad a 10 out of 10.

-zane

stenlis · July 22, 2024, 6:54am

Short technical correction to my description of the failure (it doesn’t change anything on the measures CRWD could have used so my previous post is close enough):

The CRWD kernel driver consists of multiple files. Only one of them was corrupt and full of zeros. If this was the file containing the main functionality of the driver, windows would have recognized this was not a valid driver and would not have loaded it nor would it crash. The main CRWD driver file was fine and was up and running. The CRWD driver then attempted to load the corrupted file. Unlike windows it apparently did not check that the file was invalid before loading it and crashed, taking the whole system with it.

The rollout measures I mentioned above would still have prevented the problem. Additionally, correct error state handling in the main driver would have solved this as well.

However there is nothing Microsoft could have done here to prevent this.

Sorry for so much technical detail in here but I wanted to correct myself. I will not post any more technical details.

Topic		Replies	Views
Crowdstrike results - my thoughts Saul’s Investing Discussions	24	601	January 15, 2021
Solarwinds hack and CRWD Saul’s Investing Discussions	17	296	December 26, 2020
CrowdStrike, Microsoft, and Apple Stocks A to Z apple	3	171	July 23, 2024
Crwd Q121 Saul’s Investing Discussions	0	75	June 7, 2020
This Week in Computing News Macro Economic Trends and Risks	3	131	July 25, 2024

Outage caused by Crowdstrike update takes down computers around the world

Related topics