Skip to main content
Navigating IT outages: Lessons from CrowdStrike and Microsoft for ITSM excellence
Share on socials

Navigating IT outages: Lessons from CrowdStrike and Microsoft for ITSM excellence

AP Matthews
AP Matthews
December 6, 2024
5 min read
notepad with various points
AP Matthews
AP Matthews
December 6, 2024
5 min read

In today's digital age, the smooth operation of IT systems is paramount for businesses worldwide. A cyber outage can bring operations to a grinding halt, impacting everything from travel to healthcare. The recent CrowdStrike and Microsoft outage underscored this reality, as a software update inadvertently disrupted IT systems globally, taking Windows machines offline and affecting critical infrastructure across various industries.

The Ripple Effect of the CrowdStrike Outage

The incident, labeled "the largest IT outage in history," shut down critical infrastructure systems, canceled over 4,000 flights, and stalled financial and healthcare services. An estimated 8.5 million Windows devices were disabled, highlighting the severe consequences of downtime. According to Gartner, such downtime can cost companies up to $300,000 per hour, emphasizing the need for robust ITSM strategies to mitigate these impacts. The outage affected 98% of companies in some way, with 86.75% experiencing downtime and 38.25% facing severe operational disruption of more than 24 hours.

Understanding the True Cost of Downtime

Downtime is not just a financial burden. It can damage a company's reputation, erode customer trust, and increase operational expenses. The CrowdStrike incident is a stark reminder of the importance of having safeguards in place to prevent such disruptions. Here are four major impacts of service disruption:
1. Negative Brand Perception: Service interruptions can frustrate customers, leading to a loss of trust and damaging the brand's integrity.
2. Decreased Customer Loyalty: Customers may switch to competitors, resulting in a loss of loyalty.
3. Increased Customer Service Demands: Major incidents create a backlog of complaints and queries, straining resources.
4. Operational Expenses: Service disruptions can lead to additional costs, such as compensating customers or paying for extra resources.

Key Strategies for Effective ITSM

1. Visibility, Insights, and Automation: IT teams need comprehensive visibility and actionable insights to manage and prevent outages. Monitoring tools provide full-stack observability, turning data into meaningful insights in real-time.
2. Centralized, Real-Time Observability: Organizing recovery plans in a centralized location facilitates quick response and accountability, helping companies bounce back from outages efficiently.
3. Proactive Monitoring: While no system is immune to outages, proactive monitoring enables teams to perform preventative maintenance, addressing issues before they escalate.

Investments and Strategic Shifts Post-Outage

The CrowdStrike incident has catalyzed significant changes across organizations. Notably, 86% of enterprises plan to increase budgets for software development and new hires, with 27% increasing budgets by over 11%. The focus on quality assurance/testing (36.25%) and IT operations (34.25%) indicates a prioritization of operational resilience and quality control. Additionally, significant investments are planned in core technical capabilities, with hiring intentions for software developers (31.50%), DevOps engineers (31.00%), and security specialists (30.00%).
The incident also led to a reassessment of software update and patch management, with 29.75% implementing more rigorous testing and 29.5% adopting a more cautious approach. Furthermore, 83.25% of organizations are considering or implementing diversification of their software and service providers, reflecting a move towards reducing dependence on single vendors.

Leveraging Atlassian Tools for ITSM Excellence

Adaptavist offers solutions to minimize service disruption impacts using tools like Atlassian’s Statuspage and Jira Service Management (JSM). These tools support seamless communication during incidents, reducing downtime and enhancing service delivery. Statuspage provides real-time status updates, helping maintain customer trust, while JSM integrates with Jira to streamline incident management and prioritize critical issues.

Conclusion: Charting a Resilient Future

The CrowdStrike outage serves as a powerful lesson in the importance of robust ITSM practices. By adopting proactive strategies and leveraging the right tools, businesses can mitigate the impacts of service disruptions and build a resilient future. The incident has driven substantial improvements in software development practices (80.75% positive impact) and heightened cybersecurity awareness (80% positive impact), underscoring the potential for positive transformation in response to challenges.
For more information on managing service outages effectively and integrating JSM with Jira, get in touch with our experts today.
Written by
AP Matthews
AP Matthews
Principal Marketing Manager, Strategic Solutions
ITSM