Disaster Recovery System Testing BEFORE Going to Production

Your organization needs a robust disaster recovery program in place. The days of going back to manual workarounds in the absence of core critical systems is forever fading for many time sensitive processes. In fact, often the people that used to do the work manually are retired or have moved on to other positions. If critical systems are down for an unacceptable period of time it can severely imperil your company.

Information Technology (IT) should have direct responsibility for disaster recovery. You should work with them to map the current availability of systems (RTO and RPO) to the business requirements
All systems must have a run-book
All systems must be tested per your company’s policies (more on testing below)

System information, run-books and disaster recovery test results should all be maintained in your automated BCM tool, rather than in spreadsheets. Maintaining this type of data in spreadsheets has many drawbacks. A few include:

Spreadsheets are difficult to keep up-to-date
Spreadsheets have a limited audit trail
Spreadsheets tend to become siloed
Spreadsheets make it more difficult and time consuming to analyze application data against upstream and downstream applications and business processes

Using a BCM tool (recommendations) makes all of those negatives go away. You can maintain data in structured fields and use attachments if additional information in spreadsheets or word processing documents are required, such as network diagrams.

Having all this information in a BCM central repository along with your business continuity requirements will empower you to do some really nice automated analysis including real-time gap pulse reports. For instance, if changes are made to any system in your enterprise it can be analyzed in real-time and a risk profile can be updated. A robust BCM tool can use rules and workflow to deliver alerts to the right people at the right time!

I get excited just thinking about it. I have set up these types of rules, triggers and workflows on many occasions and it is great!

It is important you are capturing the information that will enable you to do the analysis and produce the reports and metrics. As a long time successful software developer, I learned to start at the end. I realize you will need ad-hoc reports as you mature but for now try to determine your near-term needs. Mock up some reports. Think about reports, alerts…then work backwards to understand what information you need to capture to make your dreams a reality.

Do I sound excited? Well darn it, I am excited!!!

Some of the information you might want to capture in your system will include the following. Be sure to add more to meet your needs:

Application name
Description
Application owner and contact info
Purpose of application
Processes that use the application
Vendor name
Vendor representative (name/email)
Critical (Yes/No)
IT System RTO (hours)
IT System RPO (hours)
Run-book completed? (Yes/No)
Disaster recovery plan completed? (Yes/No)
Was last DR test successful? (Yes/No)
List of issues from the last DR test
Production data center
Backup data center
If vendor hosted, was a SAS-70 or data center walk through completed? (Yes/No)
If hosted locally – must it be local?
Number of users
Number of servers
Type of server (dedicated, virtual, operating system….)
Contract / license expiration date
Owner of equipment
Fail-over tested (Yes/No)
Is the application being recovered in the primary data center? (you would be surprised)
Full backup frequency
Incremental backup frequency
Type of backup (digital tape, vaulting…)
If backup is tape, where is it stored?
If backup is tape, who transports it?
Systems dependencies – input
Systems dependencies – output
Comments, concerns, results

I know it is a lot and IT may not have it now but it is important to get the ball rolling. Engage them and find out where they are. This is critical stuff.

You MUST do your disaster recovery tests ‘before the baby is born’! (By the way, I love that line. Unfortunately, my editor would not let me use it in the title of this post)

Your policy should be that prior to any new system going into production a disaster recovery test be completed with the business actively participating. The results must be signed off by the business owner. I will repeat one more time – there must be a written, tested and approved disaster recovery plan in place – PRIOR to going live.

If your organization has a Project Management Office (PMO) they should make disaster recovery testing and sign-off a ‘toll-gate’ part of every new system implementation. If the user has not signed off on the Disaster Recovery User Acceptance Test (UAT), the system cannot go into production, until it is completed.

In my experience as both an IT and BC professional, once a critical system has gone into production the urgency and incentive to complete a disaster recovery test is greatly reduced to ‘someday’ or ‘when we have time’… which often never comes. The DR testing will be pushed back indefinitely or likely forgotten as teams move on to the next critical project.

Unfortunately, your butt will be on the line when the untested system goes down or a virus hits and there is no backup, incompatible tape backup, RTO/RPO does not meet the business requirements or maybe all of the above!

So, I strongly advise you not to wait until ‘after the baby is born’. When the production system goes down is NOT the time to test or to think – woulda, coulda, shoulda.

Next step tips regarding disaster recovery testing:

Tip – Ask IT when the last disaster recovery testing was done and where the results are stored. Review them for critical systems information results, gaps and issues. Were they corrected and tested again? Did the business sign off on the corrections?

Tip – Ask IT if disaster recovery testing is on the new system development roadmap. If it is not, it must be added asap.

Tip – Make all requests in writing and keep a copy of the email trail and the final decision. Otherwise, your butt will be on the line when issues arise.

Tip – Partner with IT and management to develop the enterprise ‘before the baby is born’ disaster recovery testing policy and get it signed off on.

Tip – If a policy is subsequently put in place it is a significant accomplishment to be highlighted on your next annual job review.

Tip – If a policy does not get put in place, the email trail just might save your job when a critical system is not available during a disruption and the inevitable finger pointing begins. I have seen this play out, so please be forewarned.

Disaster Recovery System Testing BEFORE Going to Production

Published by Marty Fox on September 28, 2020September 28, 2020

Active Shooter Detection

Active Shooter – The Time To Plan Is NOW!

Digital Transformation

Digital Transformation in Industrial-Age Companies

AI

Artificial Intelligence(AI) – Why I am So Excited!

Disaster Recovery System Testing BEFORE Going to Production

Published by Marty Fox on September 28, 2020September 28, 2020

Related Posts

Active Shooter Detection

Active Shooter – The Time To Plan Is NOW!

Digital Transformation

Digital Transformation in Industrial-Age Companies

AI

Artificial Intelligence(AI) – Why I am So Excited!