This is an unpopular opinion, and I get why – people crave a scapegoat. CrowdStrike undeniably pushed a faulty update demanding a low-level fix (booting into recovery). However, this incident lays bare the fragility of corporate IT, particularly for companies entrusted with vast amounts of sensitive personal information.

Robust disaster recovery plans, including automated processes to remotely reboot and remediate thousands of machines, aren’t revolutionary. They’re basic hygiene, especially when considering the potential consequences of a breach. Yet, this incident highlights a systemic failure across many organizations. While CrowdStrike erred, the real culprit is a culture of shortcuts and misplaced priorities within corporate IT.

Too often, companies throw millions at vendor contracts, lured by flashy promises and neglecting the due diligence necessary to ensure those solutions truly fit their needs. This is exacerbated by a corporate culture where CEOs, vice presidents, and managers are often more easily swayed by vendor kickbacks, gifts, and lavish trips than by investing in innovative ideas with measurable outcomes.

This misguided approach not only results in bloated IT budgets but also leaves companies vulnerable to precisely the kind of disruptions caused by the CrowdStrike incident. When decision-makers prioritize personal gain over the long-term health and security of their IT infrastructure, it’s ultimately the customers and their data that suffer.

  • breakingcups@lemmy.world
    link
    fedilink
    English
    arrow-up
    6
    ·
    2 months ago

    Please, enlighten me how you’d remotely service a few thousand Bitlocker-locked machines, that won’t boot far enough to get an internet connection, with non-tech-savvy users behind them. Pray tell what common “basic hygiene” practices would’ve helped, especially with Crowdstrike reportedly ignoring and bypassing the rollout policies set by their customers.

    Not saying the rest of your post is wrong, but this stood out as easily glossed over.

    • LrdThndr@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      2 months ago

      A decade ago I worked for a regional chain of gyms with locations in 4 states.

      I was in TN. When a system would go down in SC or NC, we originally had three options:

      1. (The most common) have them put it in a box and ship it to me.
      2. I go there and fix it (rare)
      3. I walk them through fixing it over the phone (fuck my life)

      I got sick of this. So I researched options and found an open source software solution called FOG. I ran a server in our office and had little optiplex 160s running a software client that I shipped to each club. Then each machine at each club was configured to PXE boot from the fog client.

      The server contained images of every machine we commonly used. I could tell FOG which locations used which models, and it would keep the images cached on the client machines.

      If everything was okay, it would chain the boot to the os on the machine. But I could flag a machine for reimage and at next boot, the machine would check in with the local FOG client via PXE and get a complete reimage from premade images on the fog server.

      The corporate office was physically connected to one of the clubs, so I trialed the software at our adjacent club, and when it worked great, I rolled it out company wide. It was a massive success.

      So yes, I could completely reimage a computer from hundreds of miles away by clicking a few checkboxes on my computer. Since it ran in PXE, the condition of the os didn’t matter at all. It never loaded the os when it was flagged for reimage. It would even join the computer to the domain and set up that locations printers and everything. All I had to tell the low-tech gymbro sales guy on the phone to do was reboot it.

      This was free software. It saved us thousands in shipping fees alone. And brought our time to fix down from days to minutes.

      There ARE options out there.

      • magikmw@lemm.ee
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        2 months ago

        This works great for stationary pcs and local servers, does nothing for public internet connected laptops in hands of users.

        The only fix here is staggered and tested updates, and apparently this update bypassed even deffered update settings that crowdstrike themselves put into their software.

        The only winning move here was to not use crowdstrike.

        • John Richard@lemmy.worldOP
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 months ago

          Almost all computers can be set to PXE boot, but work laptops usually even have more advanced remote management capabilities. You ask the employee to reboot the laptop and presto!

          • magikmw@lemm.ee
            link
            fedilink
            English
            arrow-up
            1
            ·
            2 months ago

            I wonder how you’re supposed to get PXE boot to work securely over the internet. And how that helps when affected disk is still encrypted and needs unusual intervention to fix, including admin access to system files.

            I’ve been doing this for a while, and I like creative solutions, so I wonder about those issues a lot. Not much comes to my mind besides let’s recall all the laptops and do it one by one.

    • Riskable@programming.dev
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      2 months ago

      what common “basic hygiene” practices would’ve helped

      Not using a proprietary, unvetted, auto-updating, 3rd party kernel module in essential systems would be a good start.

      Back in the day companies used to insist upon access to the source code for such things along with regular 3rd party code audits but these days companies are cheap and lazy and don’t care as much. They’d rather just invest in “security incident insurance” and hope for the best 🤷

      Sometimes they don’t even go that far and instead just insist upon useless indemnification clauses in software licenses. …and yes, they’re useless:

      https://www.nolo.com/legal-encyclopedia/indemnification-provisions-contracts.html#:~:text=Courts have commonly held that,knowledge of the relevant circumstances).

      (Important part indicating why they’re useless should be highlighted)

    • ramble81@lemm.ee
      link
      fedilink
      English
      arrow-up
      0
      ·
      2 months ago

      You’d have to have something even lower level like a OOB KVM on every workstation which would be stupid expensive for the ROI, or something at the UEFI layer that could potentially introduce more security holes.

      • circuscritic@lemmy.ca
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        2 months ago

        …you don’t have OOBM on every single networked device and terminal? Have you never heard of the buddy system?

        You should probably start writing up an RFP. I’d suggest you also consider doubling up on the company issued phones per user.

        If they already have an ATT phone, get them a Verizon one as well, or vice versa.

        At my company we’re already way past that. We’re actually starting to import workers to provide human OOBM.

        You don’t answer my call? I’ll just text the migrant worker we chained to your leg to flick your ear until you pick up.

        Maybe that sounds extreme, but guess who’s company wasn’t impacted by the Crowdstrike outage.

        • ramble81@lemm.ee
          link
          fedilink
          English
          arrow-up
          1
          ·
          2 months ago

          I didn’t say it was, nor did I say UEFI was the problem. My point was additional applications or extensions at the UEFI layer increase the attack footprint of a system. Just like vPro, you’re giving hackers a method that can compromise a system below the OS. And add that in to laptops and computers that get plugged in random places before VPNs and other security software is loaded and you have a nice recipe for hidden spyware and such.

    • Howdy@lemmy.zip
      link
      fedilink
      English
      arrow-up
      0
      ·
      edit-2
      2 months ago

      Was a windows sysadmin for a decade. We had thousands of machines with endpoint management with bitlocker encryption. (I have sincd moved on to more of into cloud kubertlnetes devops) Anything on a remote endpoint doesn’t have any basic “hygiene” solution that could remotely fix this mess automatically. I guess Intels bios remote connection (forget the name) could in theory allow at least some poor tech to remote in given there is internet connection and the company paid the xhorbant price.

      All that to say, anything with end-user machines that don’t allow it to boot is a nightmare. And since bit locker it’s even more complicated. (Hope your bitloxker key synced… Lol).

      • LrdThndr@lemmy.world
        link
        fedilink
        English
        arrow-up
        0
        ·
        2 months ago

        Bro. PXE boot image servers. You can remotely image machines from hundreds of miles away with a few clicks and all it takes on the other end is a reboot.

        • wizardbeard@lemmy.dbzer0.com
          link
          fedilink
          English
          arrow-up
          0
          ·
          2 months ago

          With a few clicks and being connected to the company network. Leaving anyone not able to reach an office location SOL.

          • LrdThndr@lemmy.world
            link
            fedilink
            English
            arrow-up
            1
            ·
            2 months ago

            Hey, it’s not perfect, but a fix that gets you 10% of the way there is still 10% you don’t have to do by hand. Don’t let perfect be the enemy of good, my man.

    • irotsoma@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 months ago

      The bloat isn’t for workers, otherwise there’d be enough people to go reboot the machines and fix the issue manually in a reasonable amount of time. It’s only for executives, managers, and contracts with kickbacks. In fact usually they buy software because it promises to cut the need for people and becomes an excuse for laying off or eliminating new hire positions.

    • GiveMemes@jlai.lu
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 months ago

      As the post was stating, they get bloated by relying on vendors rather than in-house IT/Security.

      My grandfather works IT for my state government tho and it’s a pretty good gig according to him

  • AnAmericanPotato@programming.dev
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 months ago

    This doesn’t seem to be a problem with disaster recovery plans. It is perfectly reasonable for disaster recovery to take several hours, or even days. As far as DR goes, this was easy. It did not generally require rebuilding systems from backups.

    In a sane world, no single party would even have the technical capability of causing a global disaster like this. But executives have been tripping over themselves for the past decade to outsource all their shit to centralized third parties so they can lay off expensive IT staff. They have no control over their infrastructure, their data, or, by extension, their business.

  • r00ty@kbin.life
    link
    fedilink
    arrow-up
    0
    ·
    2 months ago

    I think it’s most likely a little of both. It seems like the fact most systems failed at around the same time suggests that this was the default automatic upgrade /deployment option.

    So, for sure the default option should have had upgrades staggered within an organisation. But at the same time organisations should have been ensuring they aren’t upgrading everything at once.

    As it is, the way the upgrade was deployed made the software a single point of failure that completely negated redundancies and in many cases hobbled disaster recovery plans.

    • DesertCreosote@lemm.ee
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 months ago

      Speaking as someone who manages CrowdStrike in my company, we do stagger updates and turn off all the automatic things we can.

      This channel file update wasn’t something we can turn off or control. It’s handled by CrowdStrike themselves, and we confirmed that in discussions with our TAM and account manager at CrowdStrike while we were working on remediation.

  • TechNerdWizard42@lemmy.world
    link
    fedilink
    English
    arrow-up
    0
    arrow-down
    1
    ·
    2 months ago

    Issue is definitely corporate greed outsourcing issues to a mega monolith IT company.

    Most IT departments are idiots now. Even 15 years ago, those were the smartest nerds in most buildings. They had to know how to do it all. Now it’s just installing the corporate overlord software and the bullshit spyware. When something goes wrong, you call the vendor’s support line. That’s not IT, you’ve just outsourced all your brains to a monolith that can go at any time.

    None of my servers running windows went down. None of my infrastructure. None of the infrastructure I manage as side hustles.