• MagicShel@programming.dev
    link
    fedilink
    arrow-up
    100
    ·
    edit-2
    4 months ago

    Respect? That’s sheer terror. If you’ve rebooted and the problem remains, you have an actual problem they need to troubleshoot.

    “Have you rebooted twice? Have you rebooted fifteen times?

    - Crowdstrike Technical Support

      • Farid@startrek.website
        link
        fedilink
        arrow-up
        14
        ·
        4 months ago

        As a tech enthusiast, this doesn’t surprise me one bit; one of my most used repair techniques is to do nothing and wait. A lot of problems just go away by themselves. But I’m still very curious how the 16th reboot was fixing that bluescreen.

        • mrlavallee@lemmy.world
          link
          fedilink
          English
          arrow-up
          10
          ·
          4 months ago

          I think the reasoning was that there was a race condition between the code causing the bluescreen and the code updating to avoid the bluescreen so rebooting 15 times would give a lot of opportunities for the updater to win the race.

          • Farid@startrek.website
            link
            fedilink
            arrow-up
            7
            ·
            4 months ago

            But if it was a race condition, then some computers would just boot normally. I didn’t see anyone report that the issue was happening selectively. And that wouldn’t even be fix, just a one-off boot. Unless the file is removed the issue will come back on next reboot.

            • sugar_in_your_tea@sh.itjust.works
              link
              fedilink
              arrow-up
              2
              arrow-down
              1
              ·
              4 months ago

              It’s probably one central server controlling access to the network or distributing images or something. So they need to reboot one machine in that cluster enough times and all of the machines in the cluster will work.

              The vulnerability broke every machine, the fix was the one that took multiple reboots to apply.

              • Farid@startrek.website
                link
                fedilink
                arrow-up
                1
                ·
                4 months ago

                I’m not sure we are talking about the same issue. In case of CrowdStrike, the update pushed a botched file that crashed the kernel on boot. Until the file was removed, the machine wouldn’t even boot to be patched.

                • sugar_in_your_tea@sh.itjust.works
                  link
                  fedilink
                  arrow-up
                  2
                  ·
                  4 months ago

                  Yes, that’s what I’m talking about.

                  I’m saying that in production, the screens and whatnot probably aren’t fetching that file on boot, they’re probably pulling from some central server. So in the case of an airport, each of those screens is probably pulling images from a local server over PXE, and the server pulls the updates from CrowdStrike. So once you get the server and images patched, you just power cycle all of the devices on the network and they’re fixed.

                  So the impact would be a handful of servers in a local server rack, and then remote power cycle. If they’re using POE kiosks (which they should be using), it’s just a simple call to each of the switches to force them to re-PXE boot and pull down a new image. So you won’t see IT people running around the airport, they’ll be in the server room cycling servers and then sending power-cycle commands to each region of the airport.

        • variants
          link
          fedilink
          English
          arrow-up
          4
          ·
          4 months ago

          The trick is to sleep, have an issue? Take a nap and come back to find it working somehow

      • sugar_in_your_tea@sh.itjust.works
        link
        fedilink
        arrow-up
        2
        arrow-down
        1
        ·
        4 months ago

        Nah. The best solution is to shut down, unplug it, wait 10 seconds, plug it back in, then boot it back up. A good tech will say 30s or even a minute, because they know you’re not going to count the 10 seconds.

        The goal here is to clear the capacitors, most of which will drain within that 10s.