You might also be interested in
- Total: 2
- Total: 2
Disaster Planning & Recovery
You’ve been hacked! The building fell into a sinkhole! Aliens came and took your servers! Sharknado hit the business park! (Hey, if we’re going for disasters, let’s talk about fun ones!) What do you do after disaster strikes? Curl up in a corner and cry or get things back up and going? Welcome to the role of IT Warrior. We don’t stop, we quickly get things going again! Let’s chat about how to handle the disaster and recover.
Before the Disaster
A huge part of recovery is the planning before the disaster happens. This involves meeting with the entire organization to determine what the policies and priorities are, and how to legally handle things. You may need to discuss your plans with outsiders as well to provide you with an external point of view for things you may have missed. We’ll go through the basics to at least get you started down the discovery road. This isn’t meant to be all-inclusive. Your plan may or may not have all the information we discuss below.
What types of disaster are you planning for? Sharknado and aliens might be far-fetched for most people. Everyone thinks about ransomware or hackers. What about losing power or internet connectivity to your building? Fire? Earthquake? Tsunami? Gas leak? Mothra coming out of the center of the earth – ok, maybe that last one is far-fetched again. Some disasters may not apply to your location. Some disasters require different responses. Losing power to the building is going to require some different or additional steps compared to a ransomware attack. Based on the types of disasters you anticipate, you create different plans for each. List out these disasters in your documentation so you don’t forget to plan for something. As you plan, you can find the common steps as you figure them out. Note these common steps so that you can figure out how to adjust the documentation. You’ll want to be able to update one spot and have everywhere that is sharing those steps be updated. Otherwise, you’ll need to update multiple spots; that’s error-prone not error-proof.
Your company is going to have policies about what must be recovered in what order for the company can be back up and running most efficiently. Account for these policies in your plan. Don’t think just about IT. What about phones, security systems, power, climate, the CNC machines, the loading dock? Anything that contributes to what the company produces or does for business could be affected and needs to be part of the plan. You may need to reroute your phones to another call center or personal cell phones, you may need to switch to a different geolocated cloud server. How will your remote team members be affected by these changes? How will your customers and suppliers be affected? Where will deliveries arrive, and how will you get shipments out? Work with the stakeholders and determine the proper priorities for what must be back in service in what order or what can be relocated so that it is brought in service faster.
Part of your plan should include communication. Who tells your team members what happened and what expectations are for the team during the recovery process? Who communicates with law enforcement? What (and when) do you tell your customers and vendors?
For every team member involved, ensure there is a backup member who is cross trained to handle the situation. If disaster strikes, Betty may be down in the Bahamas on vacation and won’t be able to get back in a timely fashion. Someone else will need to take on her responsibilities. Again, this is part of being prepared.
Being IT, we’ll easily come up with the plan for the server room. Restore things from backups, migrate to the cloud instance that has been on standby, magically spin up virtual machines to handle the rollovers. These are things that we’re used to. But we also need to help with the planning for the rest of the organization. Help your entire team so that the whole organization succeeds.
Review the Plan
Now that you have a plan, have others outside the team review. Have management review the plan so they are aware of the steps and expectations. Have management sign off on the plan!! This makes sure they have read it and support it. Or at least if they say, “I didn’t know we would need x”, you can respond “it was in the document you said you read”. When disaster happens, you need their support to help pull it off. They are going to be providing communication while you work the plan. Make sure they know what is expected of them.
Have your legal team review your plan. If you get hacked or hit with ransomware, what are the appropriate steps for involving legal, insurance, and law enforcement? Depending on the type of attack and where it comes from, you may want/need to report the attack to different agencies. Local police, local fire, FBI, and CIA are examples that may be interested in the attack. Your legal team can help provide guidance as to which resource to involve in what circumstances, and when to involve them during the plan. Disturbing a crime scene removes traces that enforcement may want so they can find the culprit. How does this fit into your plan to get things back to normal as quickly as possible since the servers are the evidence, and you need to restore the server back to normal.
Review your plan with your insurance provider. Do they notice anything you are missing in your plan? Do they provide coverage that you don’t already have (and want) for any of your scenarios? Because you have a plan, will they give you a discount? Insurance companies want to minimize their risk and exposure by minimizing your risk and exposure. Spending a few hours going over your plan can save them significantly if disaster does happen.
Regularly review the plan. Have you thought of a new scenario? Has something changed with your environment that would alter the plan (new machines, removing hardware, you acquired a new business)? This will help minimize confusion and last-minute thinking when disaster does happen.
Recovery
Now it’s time for all that planning to pay off! Get out that plan and start working your way through it! You’re going to be stressed trying to get things going and minimizing the damage (if it is ongoing). Having the plan in place gives you the steps without you scrambling and thinking on the spot…ok, at least not thinking on the spot as much. You can’t plan for every scenario, but if you can get most of the steps down and tested, it’s going to go a long way when you get the notification at 3:30am that you have sharks in the lobby.
Get management involved quickly. They won’t like the surprise, but most people would rather hear there’s a problem early in the problem instead of waiting 10 hours and then finding out there’s a bigger crisis. You documented their duties in the plan; let them do them. They should be helping with communication and being a firewall between you and other people affected so that you can focus on the recovery.
It's going to be rough, take a deep breath and focus. If you panic, everyone else will too. Just like in an Emergency Room, the people doing the work have to remain calm and focused. If they get flustered, everyone around them will join in and create more chaos and confusion. Now is not the time for extra confusion.
Document everything you can about the disaster. Take pictures or screenshots. Note times – this is the great thing about modern cameras is they will tag the time and location of the image. This will help provide documentation for your insurance provider…you did pay their bill, right? It also helps law enforcement if they need to pursue an investigation.
If the disaster involves an intrusion (hacker, physical break in, ransomware), realize that you may need to isolate equipment until you can certify it is clean. Hackers and ransomware love to leave time bombs and back doors around so that they can get in again. If you’re not sure that things are clean, you may want to set up a separate isolated network and as you verify a device is clean, add it into the new network. Some endpoint protection will allow you to isolate machines, and force a rescan of devices en masse. This way you’re not walking around scanning devices one by one. Look for ways that your planning can help you perform a task faster or more efficiently to save you time and effort. The goal is to get everything back to normal as quickly as possible. If a tool isn’t going to help with that, perhaps it’s time to find a better tool.
Post Disaster
Review what happened and what could change in how it was handled. Review the ENTIRE plan so that you aren’t fixing one scenario. If you learned something that affects your other scenarios or perhaps discovered another scenario that needs documenting, update the plan. Do this review as quickly as you can after the disaster while the information is fresh in your mind.
Once things are stable, take some time off. You worked hard, probably outside normal hours, it was stressful. Sometimes a disaster can be quickly handled, sometimes it takes multiple days. You body and brain will need a break to reset. Perhaps this should be part of your plan as well so that management gets behind the idea.
Closing
Yes, planning this out is a huge process and requires input from several people. But if you don’t plan it out, when disaster happens, you’re going to be scrambling to deal with the issues. That will slow you down, create chaos, and generally upset a lot of people.
Review and update your plan regularly. Every week? No. Every quarter or year, yes. Get those environment changes documented so you are prepared.
If possible, test your plan regularly. That doesn’t mean you have to cut the power lines to your building. But do a simulation. Make sure your team knows what to do when. Perhaps the simulation is talking through things instead of doing them. Yes, it’s difficult to build an isolated network for testing, but do you know all the steps, and do you have the necessary gear to do this? If you’re prepared, the situation will get handled as smoothly as possible. It won’t be fun, but it also doesn’t have to be a nightmare.
- last updated February 4, 2025