Skip to main content

Global Impact of a 1% Windows Outage: Insights from the CrowdStrike Issue and the Benefits of Managed Desktop as a Service

· 9 min read
Ruben Spruijt

Global Impact of a 1% Windows Outage

In today's digitally interconnected world, even a seemingly small security update can have catastrophic consequences. The recent incident involving a problematic update from CrowdStrike, a well-known cybersecurity solution, highlights this fact. The update affected less than one percent of Windows devices, yet the implications were massive and global.

In this blog post you will learn more about the CrowdStrike update incident, lessons learned, and future precautions from Dizzion, a major Desktop as a Service (DaaS) vendor. We will explore why such an outage had a massive impact globally and how Desktop as a Service (DaaS) solutions, including those with a full-service managed option, like Dizzion can ensure faster recovery, with minimal impact to operations.

The CrowdStrike Incident

Microsoft estimates that CrowdStrike's update affected 8.5 million Windows devices, or less than one percent of all Windows machines. Despite this seemingly small percentage, the impact was significant since so many business systems run Windows. Cybersecurity incidents like this one are often like icebergs; you only see the tip but there is far more vulnerability hiding under the surface. Many more customers experienced issues that aren't publicly reported. Companies often claim they are down for "maintenance" to soften the impact of this kind of problem and make it seem routine. For some, the Crowdstrike update led to the infamous Microsoft Blue Screen of Death (BSOD) on Windows PCs and servers - which often does not recover automatically, without some manual intervention. These crashes disrupted operations in airlines, banks, retail and healthcare. This crash impacted Windows only and did not affect Mac or Linux PCs or servers. However, *nix systems are not impervious to unplanned crashes. Similar issues were reported a couple of months ago, affecting Debian and Rocky Linux machines as well.

While CrowdStrike issued an update shortly after the issue was discovered, mitigation was not always automatic, and typically required manual intervention by IT professionals to recover from the BSOD and reboot the Windows machine. For companies managing CrowdStrike on their own, their own IT departments were responsible for mitigating the issue. For companies using a managed service, like Dizzion DaaS, the service provider took care of the issue, leveraging 24x7 operations to respond immediately. For example, for Dizzion Managed DaaS customers in the US, the issue was detected and resolved by Dizzion in the early morning hours, before most US businesses started their day on Friday July 19.

Manually rebooting Windows at United baggage claim - SFO

Manually rebooting Windows at United baggage claim - SFO

Encryption, Modern Device Management and Device Proliferation

The proliferation of both BYOD and corporate devices, along with their varied locations, has significantly complicated the IT infrastructure landscape. This issue is twofold, worsened by the post-COVID-19 shift to remote work and the growing reliance on cloud services. Companies may have thousands of remote locations, each adding layers of device management complexity. Devices on home networks often aren't connected to the corporate domain and may require VPNs for management, including device encryption solutions such as BitLocker. This lack of connectivity puts many devices at risk, as end-users cannot access them if issues arise, and in some cases, they can't even send emails to open IT support tickets. Failures during device boot and subsequent loops are particularly challenging, requiring physical intervention. Managing this complexity is key and the need for skilled IT professionals is evident, especially with labor cuts and outsourcing. Modern Workspace solutions such as virtualized applications and Desktop as a Service, can address these challenges efficiently since management is centralized and secure access to applications and desktops are device independent. Any device with a browser can securely access corporate applications.

Social Media Post: "all hands-on Deck...that's just 120 of 2000 laptops"

Social Media Post: "all hands-on Deck... that's just 120 of 2000 laptops"

How can DaaS solutions address today's security and availability challenges?

Desktop as a Service is a cloud computing solution that delivers virtual desktops and applications to end-users over the internet, enabling centralized management and scalability. Dizzion DaaS solutions allow users to access their virtual desktops from any device with a browser, ensuring flexibility and efficiency in IT operations. DaaS solutions can significantly improve recovery efforts during disruptions like the recent CrowdStrike incident. Here are some of the key advantages of using DaaS:

Central Image Deployment and Management

DaaS enables centralized management and deployment of virtual desktops and applications, simplifying operating system, applications and (3rd party) security solutions such as CrowdStrike and Windows Defender for Endpoints. This centralization reduces the time and effort required to address issues across multiple endpoints, ensuring a swift response to any OS, application or (3rd party) security application problems.

Rapid Deployment

DaaS facilitates the rapid deployment of virtual desktops and applications. This capability minimizes downtime and enables employees to resume work quickly, which is crucial during recovery periods.

Scalability and Multi-Cloud

Modern DaaS solutions such as Dizzion Frame - allow organizations to scale resources up or down based on demand. This scalability ensures that critical services remain available even during peak recovery periods, helping businesses maintain operations without significant interruptions. Multiple regions and multiple different cloud providers can be used to ensure availability and capacity.

Good Practices for Non-Persistent Images

Disabling automatic updates for non-persistent machines is a leading practice in DaaS environments. If end-users with their non-persistent machines encounter issues, rebooting them resets them to a known good state. This approach can be particularly useful in mitigating the effects of problematic updates like the one from CrowdStrike.

Diverse Endpoints

DaaS supports various endpoint devices; including Windows, MacOS and Linux PCs, Thin Clients (HP, IGEL, Unicon, Stratodesk, 10Zig), Chrome OS devices, Linux or just using an HTML5 browser on any device. With no data stored on the endpoint, it simply serves as a gateway to virtual desktops and applications, adding an extra layer of resilience against Windows-specific corporate issues.

How Dizzion Ensures Swift Mitigation and Recovery

The primary lesson from this incident for businesses is the necessity for clear communications, robust preparedness and efficient, fast recovery mechanisms. IT disruptions of this scale underscore the importance of having powerful and proven recovery protocols in place to minimize downtime and maintain business continuity. Dizzion demonstrated the benefits of using a fully managed DaaS service during the CrowdStrike incident. Many of our customers use CrowdStrike in their Dizzion DaaS environments, including government, retail, technology, and education sectors. For these customers, Dizzion's managed service offering ensured swift mitigation and recovery.

While CrowdStrike issued an update shortly after the problematic one, mitigation was not automatic and often required manual intervention. For our Managed customers using CrowdStrike, Dizzion was able to detect, manage, and correct the issue before the workday on Friday began. As a result, Dizzion's customers in the US did not experience significant issues, highlighting the value of a fully managed DaaS service.

Lessons worth sharing from Dizzion - a major DaaS provider

  • Controlled Updates: Be selective when automatically approving and deploying Windows Operating System or (3rd-party) application updates such as CrowdStrike without validation. Control and stability are crucial for every business, though this must be balanced with the need for timely updates to minimize security risks

  • Backup, easy rollback and recovery: Implementing systematic backup and rollback capabilities can significantly aid in recovery during outages caused by problematic updates. Dizzion DaaS provides capabilities to manually or automatically schedule backups of non-persistent desktops, golden images, personal desktops and utility servers.

  • Robust Incident Response: When bad things happen, there needs to be a process implemented immediately that handles communication, remediation, and recovery in the least amount of time. Incident response teams are on standby, ready to quickly respond to any issue, ensuring coordinated efforts, identifying root causes, and prioritizing recovery to resume business as usual swiftly.

  • High Availability and Multi-Cloud Configurations: the Dizzion DaaS control plane is highly available with built-in platform redundancy and with the ability for customers to run workload VMs across regions and infrastructures, supporting Azure, AWS, GCP, IBM Cloud and Nutanix AHV to maintain accessibility during hardware or software failures.

  • Global Managed Service: Dizzion customers and partners benefit from a global support organization with extensive experience in handling enterprise clients and international deployments. For example, when issues arose in EU deployments, lessons learned were swiftly applied to mitigate similar problems for US customers.

Conclusion

The recent CrowdStrike update incident serves as a powerful reminder of the critical need for robust IT recovery protocols, balanced update procedures and clear communication strategies. Even a small disruption, affecting less than one percent of Windows PCs and Servers, can have widespread consequences across various sectors. This underscores the importance of deploying robust systems and being prepared with efficient recovery mechanisms to minimize downtime and maintain business continuity.

DaaS solutions, such as those offered by Dizzion, play a crucial role in addressing these challenges. With customer focused global support, mature incident response processes, industry expertise, centralized image management, rapid re-deployment, and scalable resources, Dizzion Managed ensures immediate mitigation and recovery during IT disruptions. Dizzion's managed service demonstrated its value by efficiently handling the CrowdStrike incident, ensuring that our customers experienced minimal impact.

Collective Wisdom: The Value of Sharing Information

The information below is helpful for CrowdStrike and mitigating potential security related issues:

Thanks for reading; if you have any questions, don't hesitate to get in touch with me!

Ruben Spruijt
Field CTO, Dizzion
ruben.spruijt@dizzion.com
@rspruijt

About the Author

Ruben Spruijt

More content created by

Ruben Spruijt
Ruben Spruijt is an accomplished Field Chief Technology Officer (CTO) specializing in End User Computing (EUC). In this influential role, Ruben contributes to company and product strategy, alliances, analyzes EUC technology trends, provides product and industry insights to fellow (executive) colleagues, and establishes and leads vibrant communities of customers, partners, and ecosystem partners. Ruben is a Microsoft Most Valuable Professional (MVP), NVIDIA GRID Community Advisor, and was in the Citrix Technical Professional (CTP) program and VMware vExpert for many years. He is based in the Netherlands where he lives with his wife and three kids. This tough mudder travels the world spreading tokens of knowledge hidden in stroopwafel from the land of nether. Everywhere he travels, he shares information and sprouts understanding. He frames his experience in End User Computing so that others can learn the root of the technology, and what is most important in life.