Assessing business continuity plans in real time
This information is from my new book, The Ultimate Business Continuity Success Guide: How to Build Real-World Resilience and Unleash Exciting New Value Streams. I hope you enjoy the chapter and my book.
Assessing business continuity plans in real time:
Developing a real-time understanding of every facet of a resilience program is a passion of mine! In this chapter I will share some ideas on the value it has provided in empowering me to do real-time assessment analysis with minimal manual effort. A major goal of this chapter is to get you thinking about how you can get to real-time and why it is so valuable to you and your organization.
I would just like to mention that my door is always open to you. After reading this chapter if you get excited but are not sure about next steps on automating your program, please email me. I would be happy to ‘talk shop’ with you.
The criticality of assessing your plans:
Plans are ‘living documents’. They MUST be kept current. Plan owners or coordinators must maintain (update) plans on a regular basis during officially scheduled plan maintenance periods. Many organizations schedule plan maintenance and assessments once or twice a year. In my opinion, this is not adequate.
In addition to the scheduled maintenance periods, plan owners and coordinators must update their plans whenever there is a change to their process requirements. The plan MUST reflect the current state of the process. This is the only way to insure that we can analyze the process to the capabilities to recover. If plans are outdated we are ‘flying blind’ and there will be unpleasant surprises at time of disruption. I have seen it get ugly.
I suggest you reinforce the importance of keeping plans accurate during tabletops and recovery exercises. Also, create awareness using all of the tools we will discuss in the awareness part of the book.
To properly assess a plan you must:
- Review critical fields in the plan and underlying BIA
- Compare dependency requirements against capabilities
- Score the ‘recoverability’ of the plan against a set of rules and thresh-holds you create with your business partners. Some fields will carry more ‘weight’ than others when defining criticality. You will program this into the assessment formula.
There are many moving parts in your organization and the underlying plans. Risks and requirements can change on a daily or weekly basis. Here are just a few critical elements that must be analyzed as part of a process plan assessment:
- RTO and RPO
- Recovery seats and location requirements
- Staffing requirements
- Upstream and downstream systems requirements
- Telecom requirements
- Equipment requirements
- Skillset requirements
- Supplier requirements
- Vendor requirements
In addition to the above, you should add additional dependencies that are important to your organization
But wait, there is more to the assessment process… Prior to the ‘official’ assessment analysis you must email process owners to remind them to review and update their plans. Ideally, this should include a series of reminder communications as the official start date gets closer. The email and process updates are often separated out into the maintenance phase, but for the purpose of this chapter we will group it into automating the assessment process.
After the assessment analysis you will want to provide the process owners with a scoring of their plans as far as the ability to recover the process. You will customize the scoring rules for your organization. Critical gaps, such as the omission of phone contact numbers or the indication of only a primary recovery location instead of the required primary and alternate recovery location, may require the process owners to make updates to their plan. In that case there would be additional emails to be sent back and forth.
The problems with assessing plans manually:
As you can see from the paragraphs above, it can be very time consuming to get process owners to update their plans. It also takes a great deal of time for you to do the analysis against many fields in the plans and to subsequently contact the process owners to report the probability of recovering their plans. In many cases, depending on the scoring, the cycle will continue. The process owners will have to do additional updates to their plans to make them ‘recoverable’ or they may have to get a waiver to accept the risk of a delayed recovery. Then you must re-assess the plan and send additional emails. It can be two or three iterations to get to a ‘Green’ recovery probability. Honestly, this process gets me tired just thinking about it.
Tip – Whether you do assessments manually or through automation, when the process attains a ‘Green’ or ‘Recoverable’ status send the process owners a nicely formatted digital ‘Certificate of Success’. They worked hard. They deserve it and they will appreciate it. I have passed many offices where process owners framed and proudly displayed the certificate you sent them.
When you factor in the need for you to report progress and gaps to management and other stakeholders the reality is if you are a mid to enterprise size organization there is no cost-effective and real-time way for you to properly manually assess all of the information in your program. As they say at the race track, ‘you are leaving serious money on the table‘.
I have consulted with organizations that have devoted hundreds of hours to manually assessing their plans on a semi-annual or annual basis. It is a tedious and error prone process. BRBC teams are lean-and-mean. Can you really afford to dedicate highly skilled professionals to this task when there is an alternative that makes this pain go away?
It is troubling that at any time during the six month or annual assessment window that your plans may become ‘unrecoverable’ due to changing dependencies. Unfortunately, you would not even realize the risk until the next round of assessments, which might be too late. It would keep me up at night.
Fortunately, we can make all this pain and inefficiency go away. In doing so we will also save our organization money. Finally, management will be very happy.
Automating your program can make all the difference in quality and getting your life back:
A well designed assessment system can assess, score and report metrics on hundreds or thousands of BIA’s and plans in less than 5 minutes. The BCM system, or your mass notification system, can send notifications leading up to the assessment and post analysis. Your new end-to-end automation will provide tremendous return on investment.
Tip – If you set up your pre-assessment and post assessment email notifications in your BCM or mass notification system, it makes life very easy for you and provides value to the recipients.
Your automated system should have the capability to extend upstream and downstream to identify hidden gaps that can impact your organization. This will enable you to map and analyze in real-time internal and external threats. For example, your customer service process may have an RTO of <4 hours but may be dependent on a process upstream that has identified their RTO as 48 hours. This type of gap is easy to flesh-out in real-time with an automated system. Some off the shelf systems provide this capability as part of their base offering. You can build it in other systems using rules and workflows. You can also develop it in-house if you have the programming resources and time. Trying to do this type of analysis manually would be impossible.
Your automated assessment system will happily work 24×7 and it will never need a coffee break. In my opinion, it does not get any better!
Automation will allow you to identify changes that impact your organization from end-to-end. You can build simple or complex rules (algorithms) and workflows to trigger events and alert the right people at the right time. Generally building these algorithms does not require programming.
Tip – Include the supply chain in your end-to-end analysis. Critical gaps often lurk in tiers 1, 2 and 3.
The impact/risk insight you will derive from the system allows you to automatically report details to process owners and to provide a summary to middle and upper management in a dynamic colorful high level dashboard. You will be serving each audience with exactly what they need to do their job.
Imagine a cool real-time dynamic graph that changes color to alert the right people of risks and opportunities – green, yellow and red. It will provide them with a holistic real-time vision of the organization. You can also develop critical reports that are automatically emailed to the proper people.
The key is having all of your information in a central repository along with your business continuity requirements which will empower you to do some really nice analysis.
Do I sound excited? Well, darn it I am excited!!!
Tips on building your new automated system:
The best way to begin building your system is slowly. As BRBC professionals, we preach the value of preparation. When designing a software system, preparation is key to building a tool that will meet your business requirements. The biggest mistake and the primary reason most new systems fail in organizations, is lack of preparation. New programmers sit down and start coding without understanding and documenting the business requirements. It is critical that the business drives the system requirements, not IT.
Earlier in the book, in the BIA and plan creation sections, I stressed that you should insure you are capturing the proper information that will enable you to do the analysis and produce reports and metrics that you require. I realize you will need ad-hoc reports as you mature but for now try to determine your near-term needs. What sort of metrics and reports do you want the system to produce? What are your program and plan requirements? What makes a plan recoverable or not recoverable? Mock up what you will need from your new system and you will have a clear understanding of what needs to be captured. Think world-class for output and insight. Don’t hold back. Think about reports, alerts…then work backwards.
Examine, in detail, how you are currently doing your manual assessments. List each step. You may be able to automate the steps you are currently using and/or you may find opportunities for improvements to extract additional value from the process. If another teammate is doing the manual assessments, sit with that person and document the steps. I am sure he or she will be thrilled that you are trying to automate this tedious job. There are more interesting things they can be doing.
Depending on the size of your organization you may want to engage a project manager to help you build the assessment process automation requirements analysis. If you are a small company you can build a requirements document on your own. The requirements document will clearly spell out what you need out of the system.
You set the rules – the system does the work!
Here are a few examples of the types of criteria that may be important to you when assessing the viability of your plans. Each of these, and most every other field in your system, is an opportunity for automation, predictive analytics, real-time metrics and custom notifications. These are just examples that you must set to your own criteria:
- All team members must have two or more contact numbers
- All processes must have a primary and secondary recovery location listed
- All process plan recovery locations must be 10, 25, 50… miles from the production location
- All underlying IT systems must have the capability to deliver the appropriate RTO and RPO required by the process
- All IT systems and processes upstream and downstream must have RTO’s and RPO’s to meet business requirements
- I like to add additional interesting and fun analysis that I can follow-up on such as:
- Comparing normal staffing to recover staffing requirements. For example, if the sales process has 100 employees and requires 100 third party recovery seats, I would want to question that. I speak more on recovery seat analysis automation below as an opportunity for automation. I build a threshold rule once for each process and the system does the work
Tip – If a process has zero updates to any data during the past couple of maintenance cycles in the past year or two years, it may indicate that the process owner is not thoroughly reviewing the plan and just checking off the ‘review complete’ checkbox while watching TV. The system can track this and it can email you an exception report so you can further analyze to insure the process is being properly reviewed.
Build or Buy?:
You have two choices to make your new automated system a reality. You can build it in-house or purchase a third party Business Continuity Management System. The system requirements you and your IT team document will help you make a decision to either build or buy. Although I am an experienced professional programmer, I have found ‘buy before build’ often makes sense. Of course, even when I buy I am very demanding that my team has the ability to dig into the system and customize to our hearts content. You will see that come through in the following chapters on selecting BCM and mass notification systems. The interesting thing is that good vendors love my questions and know I am ‘pushing them’ to help make their products better.
Tip – If you have an organization with scarce database programming skills or a full slate of projects in the pipeline it might make sense to buy.
Tip – If you have an organization with good in-house database programmers that are not fully dedicated to current projects, then building a system in-house might make sense. You will also want to factor in future upgrades and system maintenance in your decision to build or buy.
Whether you build your system in-house or buy a solution, it is critical that it has the capabilities to not only act as a repository for data and to build simple reports, but it also has capabilities for you to easily design rules, triggers and workflows, enabling you to analyze thousands of inter-dependencies in minutes.
Before we finish this chapter please allow me to share one example of how automating a previously time consuming, error prone analysis can provide great benefits for many years to come. While reading this use-case please use your imagination and think of all the complex analysis you would like to do that is currently impractical or impossible using manual means. Through automation anything is possible! When someone tells me a process cannot be automated that is when I get super excited and I prove them super wrong. You will be able to analyze all of your data in real-time if you think it out and execute the development of your new system properly.
In this example we will work through a Recovery Seat Gap Analysis I created. Over the years, I have matured the analysis from a laborious, error prone manual endeavor to an automated real-time gap and metrics powerhouse.
In this example, the term ‘recovery seats’ bundles recovery people, desks, desk-phones and computers. I describe the process from end-to-end. The general approach described can apply to other types of analysis.
You will see that once you build the logic you can often re use it and extend it. In the tech world that is called using extensible objects. There is no need to go there in this chapter though.
We will follow the basic implementation process of preparing and keeping it simple which I described above and in other parts of the book.
Decide on the output you want the system to deliver (working backwards as described above):
The output of the recovery seat analysis is universally useful, whether you are using internal recovery, external recovery or a combination of the two. Understanding your recovery seat requirements can mean money in your pocket . You do not want to overbuy, where you would have too many seats and waste money. You do not want to underbuy, and not have the necessary number of seats. On many occasions, I have significantly reduced expenditures using these real-time metrics and trend-analysis reports.
I like to produce detailed reports listing each recovery site and the number of seats subscribed. If it is oversubscribed, the system will automatically highlight the gap red and trigger the appropriate notifications.
I also like to do summary dashboard metrics for management that changes in real-time (including the colors if the location is oversubscribed). When I do a demo of the system for management I show-off the real-time changes as they watch in awe. Often they comment it reminds them of a real-time stock chart. Again, this is not hard to do.
Capture the right data (described above and in the BIA and plan development chapters):
Tip – During plan building make sure you discuss recovery requirements with process owners. Consider implementing primary, secondary and tertiary recovery contingencies. This provides you with great resilience to activate depending on the scope of the disruptive event you are encountering.
Process owners will describe their seat requirements and all other dependencies. You will capture and document the information they provide. During subsequent maintenance cycles they will update their recovery seat requirements in the BCM system without your involvement. Analytics will automatically be triggered and communicated when they complete their updates.
As information is captured you must rationalize if you have adequate seats at each recovery location to support the business. For example, if Production site ABC has 13 processes with 200 recovery staff and they want to recover at local internal site DEF which has 35 available seats or a 3rd party vendor site which has 75 shared seats, then you and the process owner must understand the gaps. The only way you will know this is to analyze the data. You can then buy more seats or recover at another location. It will depend on your assets, resources and the impact of the process not recovering in the required time frame as documented in the BIA.
You will want to map the employee recovery requirements against the available internal and external recovery locations. Break it down by primary, secondary and tertiary requirements and by time frame buckets. I typically use <4 hours, <24 hours, < 72 hours, <96 hours, <168 hours (1 week) and > 168 hours. I recommend using the hourly buckets so you can ‘apples-to-apples’ easily analyze trends and requirements. These results will give you instant insight into your gaps and risks. Depending on the results of the analysis your system will perform specific actions including scoring the ability to recover the process and sending the appropriate emails to the process owner and management.
Your powerful 3rd party or in-house developed BCM relational database solution should allow you to build rules and workflows to compare the seat requirements of the process versus the capabilities of each recovery location. The matching process is pretty simple. The building of the algorithms (rules) should also be easy with the right system. A key feature of a system which I value, and I discuss in the chapter on BCM tool selection criteria, is the ability to build rules using simple boolean logic (and, or, between) and conditional branching (if, while…). Boolean logic + conditional branching = Success!
Manually this type of analysis would be a hassle at the very least and impossible to do every time a seat requirement change was made by a process owner in their plan. I know first-hand, as I used to do it manually and it was awful, inefficient and error prone! There was serious ‘lag-time’. Unfortunately disruptive events could not wait for me to manually review hundreds of plans and recovery locations to identify gaps. Fortunately my automated system takes milliseconds to complete the analysis for as many plans as required, with no manual effort.
The new data recovery seat requirement is automatically matched against the recovery location capability table record and analyzed with a relatively simple algorithm I cobbled together. The right people are immediately alerted if there is a gap. For example, imagine the sister recovery site has a total of 30 available seats. The process in question had originally indicated 25 seats. The process has grown through success and the new recovery requirement is 40 seats. Obviously, we do not currently have enough seats so there is a gap that must be addressed. An email is automatically generated and sent to the authorized people and yourself of course. Decisions can then be made on perhaps increasing the recovery seats at the sister site or using another site, mobile trailer, etc. The last thing you want is critical recovery staff fighting over too few recovery seats at time of disruption.
This is a simplistic example of easily creating value through automation. The point is, no matter how complex the issue is, it is possible for your system to analyze everything in real-time, provided it has the data and the capabilities. True real-time automation is a beautiful thing and so much fun to build!
I hope this chapter whet your appetite for automation and real-time analytics. Going from manual to automated, is like going from a horse-and-buggy to a Ferrari! Have fun and let me know if you need any suggestions. Now, let us continue on and learn about the tools and technologies that will enable you to assess your business continuity plans in real-time.