Disaster recovery
From MissionTechWiki
[edit]
The Plan
It is wise to have plans in place in case of a disaster in the place where your people work. A good backup strategy is not enough. If your office burnt to the ground then how long would it take before everything was up and running again and you had somewhere to restore your backups to? There are many stories of such disasters within missions. Are you being "wise as serpents" as you plan for the future?
Things to consider are:
- How long could would you be out of action for if you didn't take the measures suggested below, and is this time unacceptable? Usually there is a trade-off of time and money with a solution. Throwing more money at the plan can often result in less down-time when a disaster strikes.
- Do you have all your passwords accessible to you if your data repository disappears? Do you have a backup person with access to those passwords in case the disaster is your IT person disappears?
- Having standby servers in another location with the same software/operating system installed as your main servers, and with compatible tape drives if you are using tape backup.
- Having an agreement with a nearby sympathetic organization that you could use one of their servers if you needed to, or they keep one of your standby servers in their office.
- If the Email system went down, do you have a fallback method - web access to email for example? Do you have a backup email server so your emails queue up somewhere until things are working again? (we have seen email servers taking up to three days to get going again due to unexpected hardware issues)
- Similarly for any phone equipment.
- If you have staff working in potentially dangerous countries, how do they contact you if your phones or email go down? If they use Skype, what if the disaster is that someone accidentally digs through your buried Internet connection and it takes the service a few days to get it back?
[edit]
Why have a plan?
[edit]
Stories
- One mission on the ICCM list had a rain-storm and a pool of water formed on their flat-roof. The drainage system failed, as did part of their roof, dumping gallons of water straight onto their server-rack. All their servers were instantly fried; most of their disks were toast. This was in the USA. How much damage to your hardware does your plan take into account?
- One organization had their techie taken away from them. Actually, what happened was the director died of a heart-attack, and the next-in-line for the directorship was the techie. At the same time as the transition was happening, their email-server crashed. Does your disaster plan take into consideration that your techie might not be there to do the work?