var dataLayer = {}; var digitalData = {}; -->

 

 

  • BFSI
  • Non Financial Risk
  • Blog
  • Data Centre
  • CRISIL Global Research and Risk Solutions
  • Financial Services
July 23, 2024

The CrowdStrike crash and after

Crisil's services and preventive measures for clients

by Arul Yagappan, Head of Non-Financial Risk Solutions, Crisil Integral IQ

 

On July 19, 2024, a flawed software update by CrowdStrike Holdings, Inc., led to significant information technology system outages across organisations worldwide that rely on the company’s Falcon endpoint security products.

 

Companies scrambled through the day to restore operations and secure data.

 

The snafu underscores critical vulnerabilities in the deployment and management of cybersecurity solutions, and serves as a stark reminder of the complexities and risks associated with rolling out updates to essential security systems.

 

We analyse the event and its implications here:


1. Importance of rigorous testing

 

The failure highlights the necessity for extensive pre-deployment testing. Organizations must ensure updates are rigorously evaluated in controlled environments that simulate production settings. This practice can help identify potential issues before they affect critical systems.


2. Phased rollouts and risk mitigation

 

Implementing phased rollouts can mitigate the risk of widespread outages. By gradually deploying updates and monitoring their impact, organizations can catch and address problems early, preventing a single point of failure from cascading into a larger issue.


3. Vendor communication and support

 

The incident emphasizes the importance of robust communication channels between cybersecurity vendors and their clients. Prompt, clear, and effective communication from vendors like CrowdStrike is crucial during crises to guide customers through remediation steps and provide timely updates on fixes.


Recommendations and our services

 

To ensure resilience against similar incidents in the future, firms should adopt the following strategies. We can help organisations mitigate the impact of such disruptions through:

 

A. Rigorous testing of updates

 

  • Pre-deployment testing: Perform extensive pre-deployment testing in a controlled environment to identify potential issues before the updates are rolled out to production systems (Security Week1)
  • Simulated environments: Use simulated environments that mirror the production environment to detect any adverse effects of the updates.

 

B. Phased rollouts

 

  • Gradual deployment: Implement phased rollouts of the updates, rather than deploying them on all systems simultaneously. This approach allows issues to be detected and resolved with minimal impact (CrowdStrike2)
  • Pilot groups: Use pilot groups to evaluate the updates on a small subset of systems before a full-scale deployment.

 

C. Robust backup and recovery systems

 

  • Regular backups: Ensure regular backups of critical systems and data so they can be quickly restored in case of an outage.
  • Disaster recovery plans: Maintain comprehensive disaster recovery plans that outline procedures for restoring services quickly following an incident (CrowdStrike2)

 

D. Enhanced monitoring and alerting

 

  • Real-time monitoring: Implement real-time monitoring solutions to detect and respond to anomalies swiftly.
  • Automated alerts: Set up automated alerts for unusual system behaviours or failures to enable rapid response.

 

E. Vendor management and collaboration

 

  • Vendor communication: Maintain close communication with the vendor to stay informed about updates, patches, and potential issues.
  • Service level agreements (SLAs): Ensure SLAs with the vendor include provisions for timely support and issue resolution during outages (Security Week1, CrowdStrike2)

 

F. Redundancy and high availability

 

  • Redundant systems: Design systems with redundancy to ensure that if one component fails, another can take over without service interruption.
  • High-availability configurations: Use high-availability configurations to minimise downtime and ensure continuous service availability.

 

G. User training and awareness

  • Training programmes: Conduct regular training programmes for IT staff and end-users to ensure they are aware of the latest cybersecurity threats and best practices.
  • Incident response drills: Perform regular incident response drills to ensure the team is prepared to manage real-world disruptions effectively.

 

H. Comprehensive documentation

  • Detailed documentation: Maintain detailed documentation of all systems, processes, and procedures, including update protocols, rollback procedures and incident response plans.

Sources:
1Security Firm Discloses CrowdStrike Issue After ‘Ridiculous Disclosure Process’
2CVE-2024-3400: What You Need to Know About the Critical PAN-OS Zero-Day