The 2024 Microsoft outage is an unscheduled reminder to use best practices in technology change.
The repercussions of the CrowdStrike update have us shook. The 2024 Microsoft outage is a timely reminder that rushed or mismanaged system changes can lead to chaos. How can you avoid the pitfalls of poorly planned technology changes?
Invite folks to the table.
First, consider the scope and implications of your change. Then ensure the right people are involved in planning, testing, and adoption. Identify and engage every group who might be affected by the change; solicit their input, identify impacts, and make sure you’re aligned.
- System Interdependencies: What other systems are involved with the system that is changing? Consider both upstream and downstream applications.
- Stakeholder Impact: How will this change affect the lives of employees, partners, and customers? How will they react? Ask questions, investigate thoroughly, and avoid making assumptions.
Consider the scope and implications of your change.
Test early, often, and thoroughly.
Experimenting and evaluating are crucial components of any change implementation. Test early, when it’s less painful to fix things. Test often, so you catch errors at each stage. Test thoroughly, so there are no surprises. Here are some essential testing strategies:
- Functionality Tests: Ensure the program/system is functioning as designed within the Sandbox environment.
- Platform Tests: Verify that the published program/system operates correctly in various live environments.
- Blind Individual Tests: Have individuals outside of the project team test the program/system to ensure usability and functionality from an unbiased perspective. Studies show that the closer you are to something, the less objective you are. Your brain automatically sees it the “right” way rather than the way it actually is.
- Stress Tests: Conduct stress tests to catch any defects early and minimize their impact on the organization and other stakeholders.
These protocols help identify and rectify defects early, ensuring a smooth and reliable implementation with minimal disruptions to your operations..
Timing is everything.
There is a long-running adage in the programming world – “Don’t deploy on Friday.” Some view it as a joke, others a jinx, but many consider it a must.
Risk management exists for a reason; no system is perfect, no team is perfect, so it’s prudent to plan accordingly. Not only do Friday deployments reduce the margin for error to fix an issue, but key stakeholders are often less available. This has implications for your team, but also your stakeholders, who might not see any communications or troubleshooting materials you share on a Saturday. Finding the right time to launch your change mitigates risk and improves adoption.
By incorporating these practices, organizations can better manage changes and prevent the type of widespread issues that resulted from the recent CrowdStrike update.