The Infrastruggle

DRaaS: A Beginner's Guide, Part 2

Whether or not you outsource your disaster recovery, it's still up to you to properly plan.

Click here for Part 1

In the first part of this series, we looked at the meaning of Disaster Recovery as a Service (DRaaS) as a cloud-based service offering. Specifically, we examined some of the gaps in the DRaaS business value narrative, ranging from the limited first-hand knowledge of disaster recovery (DR) planning requirements in many would-be service providers, to the more technical constraints of passing large quantities of data to a remote location across a network (latency and jitter), to the complexity created by current IT memes (hypervisor computing, for one) and how it is increasingly difficult to come up with an easy-to-design-and-deploy recovery strategy.

These are not insurmountable barriers to effective DRaaS, of course. If you (and your DRaaS vendor) have the commitment and the budget, you might actually be able to develop some workarounds to many of these issues.

The point is, you need to know why you are interested in DRaaS in the first place, and to what extent you are willing to go to devise a workable cloud DR strategy.

The Sales Pitch
Why consider DRaaS?  The patent answers are available on just about any service provider's Web site or whitepaper. They're usually presented in one of several ways:

  • You lack the skills and knowledge, or perhaps the budget or personnel, to take on the disaster recovery/business continuity project yourself. Outsourcing is much easier.

I actually hear this a lot from otherwise intelligent business IT folk who are just back pedaling to stay abreast of the new cloud concepts and products that seem to be inundating everyone on a daily basis.

The fact is that there are no experts in disaster recovery planning. There never have been. Those who have survived a disaster and helped their organizations live to fight another day should have the street cred to call themselves experts, but they rarely do. Disasters have a humbling effect. In most cases, practitioners who have dealt with them echo what an old planner said to me after a hurricane: "If we had a few more of these, I might actually get to be pretty good at this."  Experienced planners rarely if ever manifest a "guru" mindset.

  • You lack knowledge of how to plan or an understanding of what a plan should look like. Better to use templated services from a cloud service vendor.

This is the same argument folks used to use to justify "canned plans" -- templates that one could fill in, then pass to the internal auditor to demonstrate that a DR plan existed. Truth is, there is no fill-in-the-blanks plan worth anything. And all plans are imperfect responses to events that, in the best of circumstances, will never need to be used.

The best plans comprise a series of modular procedures that can either be implemented as a whole in response to a large disaster impacting an entire facility or a large geography, or partially in response to lesser interruptions such as the failure of a disk drive or RAID array, a software glitch, or virtual machine failure.

The best strategy is to write the plan as best you can, using your own vision for how a response should proceed in the face of an interruption event. You can modify the details after testing to correct for your errors.

There is no one-size-fits-all template, so even if you go with a cloud service, you are likely going to need considerable customization. Realizing this fact may force you to reconsider the value of the DRaaS provider.

  • Use our Disaster Recovery/Business Continuity so you can focus on more strategic initiatives.

I get it. Properly done, DR is hard work. It would be nice to simply outsource it to a service provider who has the time to dedicate to doing it right. And, of course, virtually every DRaaS provider offers a Chinese menu of services that may include backup, replication, failover clustering over distance (or "geo-clustering"), fast virtual machine re-hosting, and maybe some network recovery capabilities and user work locations. All these services sound pretty good, and some may actually work as advertised. But they do not constitute, in and of themselves, a workable DR strategy.

The problem is that outsourcing tends to work well for routine tasks and processes, not for challenging ones. To outsource DR, you need to figure out what part of the process you are actually outsourcing.

Business Impact Analysis
Real DR planning begins with something that consultants call a business impact analysis. Big term for a simple thing. To plan effectively, you need to start by taking inventory of your business processes, then identifying the applications, data and infrastructure that support each process. Business processes can be classified in terms of criticality: processes that will create real havoc for the company if interrupted, whether in tangible terms like lost revenue or intangible terms like lost reputation, may be termed "critical." Their applications, support infrastructure, and data inherit this business process criticality like so much DNA. Without knowing the business process served, data is just a bunch of ones and zeros.

Once business processes have been examined and both their apps/data/infrastructure dependencies and interdependencies with other business processes are known, the next step is to build objectives for their restoration in the event of an outage.

Objectives cover how quickly after an interruption event the data needs to be made available to apps and end users. Zero downtime objectives require a very different kind of recovery plan than do objectives that allow for a couple of weeks of access interruption before pain sets in. Not all business processes are the same, and "one-size-fits-most" DR strategies generally fit no one's needs very well. The best DR strategies build on the data from the impact analysis and the objectives formulated based on recovery priorities and boundary conditions such as available budget and testing efficacy (more on this later).

Only, most DRaaS providers don't offer anything like business impact analysis or recovery objective setting services. They assume that you have already performed this "heavy lift" yourself (most firms haven't), and you are just using their service as part of a strategy that you have already devised.

All About the Benjamins
Alternatively, some service providers don't concern themselves with how prepared you are to use their services in an intelligent way. To be honest, many vendors would just as soon have a customer purchase a premium support agreement and replicate everything to their cloud than to have the customer do his homework, classify apps and data intelligently, determine the right service requirements based on criticality and priority of recovery, and use external service providers only to augment what they can't do by themselves. Frankly, the zeros on the check are much more numerous if the customer defaults to a replicate everything mindset.

A "top of the line" DRaaS provider should offer business impact analysis guidance as a precursor to selling any other services. This is a simple recognition of the reality of disaster recovery: rarely does everything need to be back up and running at the same time or at the same level of alacrity that customers enjoy during normal operations. Very few business processes require "always on" zero downtime service, and many can be restored to 10 percent or 20 percent of normal processing efficiency without creating an issue -- especially given the fact that they may only be used by a "skeleton crew" of operators during the emergency period.

One final thought on this subject: outsourcing is not easier than developing a plan in-house. The primary purpose of do-it-yourself planning is not only to rationalize the priority of process restoration, but also to facilitate testing. Testing isn't done only to validate the recovery strategy; its other (and probably more important) function is to rehearse those staff who will be involved in recovery. That is about the only way to ensure that there are a cadre of personnel who can keep their heads in an emergency when everyone else is losing theirs.

  • You have been told that DR is simpler now that hypervisor-based application hosting is highly available. DRaaS is a simple add-in to virtual infrastructure and cloud computing.  

In Part 1, we noted that hypervisor vendors are now fielding their own software-defined storage (SDS) stacks and in a few cases software-defined networking (SDN) stacks, basically in an effort to be like IBM circa 1977. That means that a VMware hosting environment doesn't readily share its data or storage or other resources with any other hypervisor computing stack, or with non-virtualized workload and data.

This silo-ing of IT is getting worse all the time, and it has now extended into the cloud services arena. What was once heralded as a vendor-neutral, standards-based, agile IT outsourcing service represented by pictures of fluffy clouds with beautiful birds and hot air balloons and rainbows and sunflowers has become "battle clouds" with different service providers embracing different hypervisor software and hardware stacks in a ceaseless battle for market share.

As a consequence, just finding a DRaaS provider with the resources and competence to handle the recovery requirements for the multiplicity of hypervisor software stacks in your shop may be a daunting exercise. Truth be told, you may need to engage multiple service providers, each specializing in a particular hypervisor model or application or database. That can turn a "simple outsourcing" arrangement into a hellish exercise in multi-vendor management and integration. There is nothing simple about DRaaS.

Bare Minimums
There is nothing inherently wrong with the concept behind DRaaS. One vendor recently argued that DRaaS services from his company are aimed at small to medium businesses which lack the resources to do an effective job of the DR planning task, and whose infrastructure is generally pretty simple (one vendor's hardware or hypervisor stack). They like the elasticity model of DRaaS -- the "only pay for what the service provider needs to stand up when you have an outage" model -- whose only on-going cost is the storage of block change data snapshots fired across a WAN or MAN link at given periods.

Truth be told, this is the bare minimum service required for a snowball's chance at recovering from a serious disaster with a facility wide or broader footprint. The vendor said (paraphrasing) that big disasters weren't really in the recovery plan. If a smaller or medium sized firm is whacked with a superstorm, they will probably never be un-whacked, and they understand that. He said that large firms had the resources to do their DR without using a DRaaS provider, and that they were better served by hot site vendors or by building their own redundant facilities.

No Excuse for Failing To Plan
While the above perspective might provoke a hot debate at a conference, the reality it exposes is simple: DRaaS is a set of services that, like any other DR-related technologies and techniques, should be used judiciously. It isn't a substitute for effective DR plan development, but instead functions as a set of options for possible use in building a business-savvy strategy for continuing your business computing efforts in the face of unplanned interruption events.

In the final analysis, DRaaS is just another technique on a spectrum of techniques that have always run the gamut from laissez-faire (do nothing but take a backup) to full redundancy (replicate everything in your data center at least 80 kilometers away). It still requires planning to use the right services at the right costs for the desired outcomes.   


Subscribe on YouTube