On Disaster Planning

Oh, the huge manatee!Last week, I was teaching the Security Class for php|architect and talked not only about protecting your applications from security vulnerabilities but what to do after you've found (or have been notified of) one.

Unfortunately, I have some bad news for you, it's not a question of “if” you'll have a problem, but a question of when.

Depending on the type and scope of the vulnerability, this can range from “doh” to “call the lawyers and update your resume”.  Either way, it can be a disaster.

And we all know there are two times to come up with a disaster recovery plan:

Before it happens OR

While it's happening… oh wait, that's not planning.

So yes, the only time to come up with a plan is before you need it.  If you wait until something happens, that's not a plan, that's playing it by ear.  For small problem and/or some organizations, that may be workable.  If you're dealing with sensitive company, health-related, financial, legal, or a variety of other types of information, you need to plan.

Depending on your industry and customers, there are a variety of things that need to be included, but there are three things that are common to every technology disaster plan:

Who is in charge of what? – Who on the team gets called first?  Who communicates with customers? Who has access to the full audit/system logs?  Who diagnoses the problem? Who restores the backups? Who makes the decision to take the system offline or restore from backup?

What is the backup/restoration procedure? – How often are the backups made? Where are the backups stored? Has the restoration process been tested recently?

Who is told what when? – When is the rest of the team called?  When are customers notified and how much are they told? When is The Boss called?  If the Press calls, who talks to them? If the Press is notified, who does it and when?

Years ago I worked at a company with a miserable technology stack, little desire or funding to improve it, and a boss that made Tom Cruise look well-adjusted.  The point is that when we had problems – often weekly – the policy was that we had to come up with a solution before we involved him.  If we involved him and then researched the problem, we'd be harassed and belittled until we finally figured something out… which is hard to do when someone is screaming at you.

When a problem was found, our policy – determined by the CTO – was to immediately freeze all order processing, dive into debugging and testing, and attempt to write a patch before the boss was told.  Once the boss was told, we had to throw in the line “and we have a patch to take care of it”.