Lessons From The CrowdStrike Incident

7–22–2024 (Monday)

Hello and welcome to The Intentional Brief - your weekly video update on the one big thing in cybersecurity for growth stage companies, investors, and management teams.

I’m your host, Shay Colson, Managing Partner at Intentional Cybersecurity, and you can find us online at intentionalcyber.com.

Today is Monday, July 22, 2024, and here’s hoping that if you’re a CrowdStrike and Windows customer, you survived the response process and most of your computers are back to normal operation.

Instead of focusing on the event itself, we’re going to look this week at some lessons we can all learn from these events, whether we were impacted or not.

Lessons From the CrowdStrike Incident

By way of brief background, Endpoint Detection and Response (EDR) vendor CrowdStrike pushed an update on Friday morning that caused computers running Microsoft Windows to crash and be unable to boot back up without manual intervention.

This caused what is being called the largest IT outage in history - and rightly so. Due to the nature of the update, it only impacted Windows machines, so Linux and Mac users were spared, as were non-Crowd strike customers. Technical root cause reports are already available, though it’s not much help to end users or system admins.

Microsoft, for their part, has tried to distance themselves from the event - despite the fact that it resulted in the traditional Blue Screen of Death - and noted that “this was not a Microsoft incident” and the incident impacted “less than one percent of all Windows machines” worldwide.

It’s also worth noting that CrowdStrike had another issue recently where their software caused computers to max out their processors - which was relatively quickly remedied, but does raise questions about their testing and release procedures and Quality Assurance. Senators on the Committee on Commerce, Science, and Transportation have already called for an explanation, and I’m sure we’ll get drips and drabs of news over the coming weeks.

So, given all this, what lessons should we take away from this incident?

First of all, I think it’s worth acknowledging that it was, in fact, an incident - and that many organizations had to invoke their Incident Response, Business Continuity, and Disaster Recovery plans. Microsoft used the word “incident” in their press release, and though the CrowdStrike CEO was sure to emphasize that it wasn’t a security incident - I’d disagree.

Traditional security theory offers what’s known as the CIA triad: confidentiality, integrity, and availability. This incident struck at the core of the availability pillar, and put many companies in a difficult position.

So, while this wasn’t a ransomware attack, and didn’t involve data loss, it absolutely had an impact on businesses and should be treated as such. I would encourage any business who had to invoke their contingency plans to perform a full after-action review and capture lessons learned for future events. These future events might be something like this again, or it could be a storm or construction crew knocks out your Internet connection and entire offices are down, or Microsoft / Google suffer major outages impacting email, etc. Understanding how your business can continue to operate in a contingency mode is critical.

Speaking of future events, I know there are lots of companies who are either second-guessing their choice of buying CrowdStrike, or other EDR companies looking to come in and steal marketshare. I won’t make prognostications either way, but I will say that I have to imagine CrowdStrike is going to have by far the most effort, energy, and attention paid to their product deployments for the foreseeable future. That’s not to say they won’t have another incident, but I suspect they’ll do everything they can to not.

Finally, I think it’s a good reminder of a couple of things that are happening - perhaps subtly - in this space. First of all, consolidation in the technology sector is going to force fewer and fewer choices, which has the potential to exacerbate issues like this.

Secondly, the complexity and interoperability of these systems is difficult to understand and manage for individuals, and even teams. A good piece in The Atlantic highlighted this, and is worth reading.

Finally, I think it’s worth remembering that everything in security is a tradeoff - and so while we do find ourselves in the uncomfortable position of having an enterprise security tool cause a global security incident, we need to balance that risk with a similar risk in running any other endpoint security tool, as well as the risk of running without these tools in place to defend against a myriad of threats.

Fundraising

From a fundraising perspective, a light week with only $4.1B in newly committed capital.

We’ll have to see how the rumored Wiz acquisition is potentially impacted by the CrowdStrike incident (anti-trust, and such), but another big tech firm Smartsheet is reportedly considering a buy-out process to go private as they wrestle with continued losses from a cashflow perspective.

You can find links to all the articles we covered below, find back issues of these videos and the written transcripts at intentionalcyber.com, and we’ll see you next week for another edition of the Intentional Brief.

Links

https://techcrunch.com/2024/07/19/what-we-know-about-crowdstrikes-update-fail-thats-causing-global-outages-and-travel-chaos/

https://blogs.microsoft.com/blog/2024/07/20/helping-our-customers-through-the-crowdstrike-outage/

https://www.reuters.com/technology/collaboration-software-maker-smartsheet-fields-buyout-interest-sources-say-2024-07-18/

https://www.thestack.technology/crowdstrike-bug-maxes-out-100-of-cpu-requires-windows-reboots/

https://subscriber.politicopro.com/f/?id=00000190-d742-d2ff-a7de-d747ca600000

https://www.theatlantic.com/ideas/archive/2024/07/microsoft-outage-technological-systems-fail/679110/

https://www.reuters.com/technology/collaboration-software-maker-smartsheet-fields-buyout-interest-sources-say-2024-07-18/

Previous
Previous

TPRM in a Post-CrowdStrike World

Next
Next

AT&T Loses Data on 110M Customers