CrowdStrike’s faulty update crashed 8.5 million Windows devices, says Microsoft

BrikoX@lemmy.zip · 2 years ago

CrowdStrike’s faulty update crashed 8.5 million Windows devices, says Microsoft

gravitas_deficiency@sh.itjust.works · 2 years ago

I feel like that’s not even close to what the real number is, considering the impact it had.

Godort@lemm.ee · 2 years ago

If this figure is accurate, the massive impact was likely due to collateral damages. If this took down every server at an enterprise and left most of the workstations online, then that still means that those workstations were basically paperweights.

Sami@lemmy.zip · edit-2 2 years ago

They have about 24,000 clients so that comes out to around 350 impacted machines per client which is reasonable. It only takes a few impacted machines for thousands of people to be impacted if they are important enough.

SchmidtGenetics@lemmy.world · edit-2 2 years ago

My bothers work uses VMs so if the server is down there’s probably 50k computers right there. But it’s only 1 affected computer.

gravitas_deficiency@sh.itjust.works · 2 years ago

As far as I know, none of the OSes used for virtualization hosts at scale by any of the major cloud infra players are Windows.

Not to mention: any company that uses any AWS or azure or GCP service is “using VMs” in one form or another (yes, I know I am hand waving away the difference between VMs and containers). It’s basically what they build all of their other services on.

Godort@lemm.ee · 2 years ago

No, but HyperV is used extensively in the SMB space.

VMWare is popular for a reason, but its also insanely expensive if you only need an AD server and a file share.

SchmidtGenetics@lemmy.world · 2 years ago

Banks use VMs and banks were down without access to their systems to login into the VM, so they could work. They were bricked by extension.

gravitas_deficiency@sh.itjust.works · 2 years ago

No, the clients were bricked. The VMs themselves were probably fine - and in fact, probably auto-rollbacked the update to a working savepoint after the update failed (assuming the VM infrastructure was properly set up).

SchmidtGenetics@lemmy.world · edit-2 2 years ago

He couldn’t login to the VM to access his work portals or emails, call it what you will, but one bricked computer/server affected thousands.

It’s weird that you’re arguing, but asked how it was possible in the first place. VMs are the answer dude, argue all you want, but it’s making you look foolish for A not understanding, and B arguing against the answer. Also, why this one thread? Multiple other people told you the exact same thing. You just looking for an argument here or something?

ByteOnBikes@slrpnk.net · 2 years ago

I wonder if a large percentage of impact is internal facing systems.

And we won’t know until Monday.

biscuitswalrus@aussie.zone · 2 years ago

That’s how supply chains work. A link in the chain is broken, the whole thing doesn’t work. Also 10% of major companies being affected, is still giant. But you’re here using online services, probably still buying bread probably got fuel, probably playing video games. It’s huge in the media, and it saw massive affects but there’s heaps of things that just weren’t even touched that information spread on. Like TV news networks seemingly kept going enough to report on it non stop unaffected. Tbh though any good continuity and disaster recovery plan should handle this with impact but continuity.

remotelove@lemmy.ca · 2 years ago

The only companies I have seen with workable BCDR plans are banks, and that is because they handle money for rich people. It wouldn’t surprise me if many core banking systems are hyper-legacy as well.

I honestly think that a majority of our infrastructure didn’t collapse because of the lack of security controls and shitty patch management programs.

Sure. Compliance programs work for some aspects of business but since the advent of “the cloud”, BCDR plans have been a paperwork drill.

(There are probably some awesome places out there with quadruple-redunant networks with the ability to outlast a nuclear winter. I personally haven’t seen them though.)

biscuitswalrus@aussie.zone · 2 years ago

It’s impossible to tell and you’re probably more close to the truth than not.

One fact alone, bcdr isn’t an IT responsibility. Business continuity should be inclusive of things like: when your CNC machine no longer has power, what do you do? Cause 1: power loss. Process: Get the diesel generator backup running following that SOP. Cause 2:broken. Process: Get the mechanic over, or get the warranty action item list. Rely on the SLA for maintenance. Cause 3: network connectivity. Process: use USB following SOP.

I’ve been a part of a half dozen or more of these over time, which is not that many for over 200 companies I’ve supported.

I’ve even done simulations, round table “Dungeons and dragons” style with a person running the simulation. Where different people have to follow the responsibilities in their documented process. Be it calling clients and customers and vendors, or alerting their insurance, or positing to social media, all the way through to the warehouse manager using a Biro, ruler, and creating stock incoming and outgoing by hand until systems are operational again.

So I only mention this because you talk about IT redundancy, but business continuity is not an IT responsibility, although it has a role. It’s a business responsibility.

Further kind of proving your point since anyone who’s worked a decade without being part of a simulation or contribute to their improvement at least, probably proves they’ve worked at companies who don’t do them. Which isn’t their fault but it’s an indicator of how fragile business is and how little they are accountable for it.

remotelove@lemmy.ca · 2 years ago

You aren’t wrong about my description. My direct experience with compliance is limited to small/medium tech companies where IT is the business. As long as there is an alternate work location and tech redundancy, the business can chug along as usual. (Data centers are becoming more rare so cloud redundancy is more important than ever.) Of course, there is still quite a bit that needs to be done depending on the type of emergency, as you described: It’s just all IT, customer and partner centric.

Unfortunately, that does make compliance an IT function because a majority of the company is in some IT engineering function, less sales and marketing.

I can’t speak to companies in different industries whereas you can. When physical products and manufacturing is at stake, that is way out of scope with what I could deal with.

biscuitswalrus@aussie.zone · 2 years ago

Hmm, yeah. Thanks for sharing. Because of 15 odd years of IT Managed Services, I only have non-technical companies on the brain and in my world view I hadn’t considered technology provider companies at all. They typically don’t need managed service providers (right or wrong :p).

remotelove@lemmy.ca · edit-2 2 years ago

It gets worse. Tech companies are service providers that typically work with a chain of other service providers. About 40%-50% of the controls for the last SOC2 audit I ran was carved out and deferred to our service providers. (Also, there are limited applicable frameworks: SOC2, PCI, ISO-270001, HIPAA and HITRUST are common for me, but usually related to cloud services.)

Yeah, I tend to break the brains of auditors that have never dealt with startups and have been used to Fortune 500 mega-companies. What’s funnier, is that I am just a lowly security engineer. A very experienced security engineer, but a lowly one nonetheless.

Auditor: So what is your documented process for this ?

Me: Uhh, we don’t have one?

Auditor: What about when X or Y catastrophic issue happens?

Me: Anyone just pushes this button and activates that widget.

Auditor: Ok. Uh. Is that process documented?

Me: Nope. We probably do it about 2-3 times a week anyway.

biscuitswalrus@aussie.zone · 2 years ago

Yeah we do a lot around frameworks at my current place, and previously we worked directly with customers with iso and acsc essential 8 frameworks. For us, non-compliance = revenue opportunity. That means we are financially rewarded for aligning them and encouraged to do so. On that same note I wrote up a checklist for “sysadmin best practices” aimed for driving reviews and checks and Remedial opportunities for small businesses, useful in that space. I got such an overwhelming amount of response in the msp reddit from people asking in DMs about it (not hundreds, just dozens, too many for me though). It’s quiet here in lemmy. Happy to share my updated version of course, just I think if you’re dealing in your sector it’ll look like childs play lol. But I kind of want to encourage a bit of community within professionals here. I just don’t want do spend time on it…

I feel you about the lowly experienced officer bit though. An account manager or business development manager, or even CTO won’t listen to me. I have a business degree, most of them don’t. I try to apply critical decision making in my solutions and risk advisory. But the words fall on deaf ears. I take a small but very guilty pleasure watching the very thing I warn against, happening both to clients and my employers. Especially when the prevention was trivial but all it needed was any amount of attention.

After nearly 20 years of IT and about 15 in MSP I’m so tired. I’m very much resonating with that “lowly engineer” comment.