The Infrastruggle
Disaster Recovery-as-a-Service: A Beginner's Guide, Part 1
Why you still need disaster recovery, despite what some vendors are telling you.
Click here for Part 2
Recent events, ranging from terrorist attacks to weather emergencies, have many business and IT planners concerned about the readiness of their facilities and processes for unplanned interruptions (AKA disasters). While hypervisor vendors and their paid analysts have contended that high availability trumps DR, that failover clusters eliminate the need for disaster recovery planning, more sensible counsel says otherwise. Enter Disaster Recovery as a Service (DRaaS).
A Growth Area
DRaaS has seen considerable growth in the past two years, with industry analyst forecasts of 50 percent increases in market share annually through 2020 becoming increasingly commonplace. This is despite the fact that there is no agreement on what DRaaS actually means; a fact that hasn't stopped numerous companies, most of which have no background in actual business continuity or disaster recovery services, from hanging out a shingle and declaring themselves experts.
On the one hand, the lack of a common definition or set of standards for evaluating competitive offerings finds a receptive customer base, especially among the preponderance of firms that would simply not undertake continuity planning at all were it not for legal or regulatory mandates compelling them to do so. But for firms that are sincere in their desire to protect their business processes (and automation support) from prolonged unplanned interruptions or outages, lack of evaluative criteria is problematic.
Some worry that, as was the case in the 1980s when "hot sites" (shared commercial data center recovery facilities) gained popularity, service providers might appear who lack the prerequisite skills and knowledge to deliver the promised recovery services. Some may not even possess the infrastructure described in online brochures; a situation that many unwitting clients may not discover until a disaster actually occurs!
Enterprise Players
Some vendors that have entered the "enterprise" part of the market -- providing services catering to larger firms -- may enjoy a warm reception simply by virtue of their recognized brand names. These include IBM, long time hot site vendor Sungard, VMware vCloud Air Disaster Recovery, Amazon Web Services, Google and Microsoft. The latter three are actually the leading "industrial farmers" of the cloud, and many other smaller service providers nest their services under these cloud umbrellas.
In addition to enterprise technology and industrial cloud providers, other DRaaS offerings have been announced by providers of well-established data protection software vendors. Unitrends, Symantec, Arcserve, Veeam, Acronis and others have been actively building capabilities, or working with surrogates in the managed hosting services market, to roll out their own DRaaS models that leverage familiar products and brands. This approach benefits the hosting provider by providing another service that can be rolled into their menu of service offerings.
DRaaS Provider Considerations
Unfortunately, whether helmed by large recognizable service organizations or software providers-turned-DRaaS providers, many of these offerings are hampered by one or more of the following issues:
- The service provider has never before supported disaster recovery planning, testing, or implementation requirements. Because of this, they may not have the skills, knowledge or procedures for support customer needs beyond those requirements coupled to a specific hypervisor technology or backup/replication software product. The first rule of DR planning is that there is no "one-size-fits-most" strategy for DR, yet this is precisely the model embraced by a broad range of service providers in the market today.
- Service providers lack the capability to support the specialized infrastructure or workload diversity of the client. Each hypervisor vendor seems bent on creating a silo of hardware and software technology that will support only the operation and availability of workloads tailored to that proprietary stack.Many DRaaS providers have elected to partner with a specific hypervisor vendor and support only the continuity of workloads hosted on that infrastructure. However, as IDC and Gartner have pointed out, by the end of 2016, roughly 70 - 75 percent of workloads will be virtualized, running on a small subset of servers (about 21 percent) in a given business. The balance, some 25 percent of workloads, will not be virtualized at all and will run instead "natively" on traditional operating systems (about 79 percent of existing servers) so as not to comprise the high performance requirements of transaction processing systems and other big revenue-generating applications (see Figure 1).
At a minimum, the typical shop will have at least two recovery targets -- hypervisor-based and physical workloads -- but in most companies, the recovery targets will be even more diverse. Beginning this year, surveys of IT decision-makers have suggested that firms were diversifying the number and type of hypervisors they were electing to use. If each is an isolated island of hardware and software, then the requirements of a DRaaS provider will include support for a variety of workload and infrastructure types. Clearly, this is a problem for many of the newer DRaaS providers. Â
- While some nascent DRaaS providers may have a clue about the basic support requirements for data protection and/or application re-hosting -- two of the three basic components of DR -- they are generally hamstrung by the third requirement for successful recovery: network reconnection and/or redirection (Figure 2).
Typically, the service provider must outsource network services to a voice/data network provider, whether a core carrier network service provider; independent or competitive network providers; Internet service provider; metro area network service provider; or combination of all. In point of fact, even if the DRaaS provider delivers its service in accordance with the most demanding SLAs, such noteworthy performance may be completely irrelevant if the network services on which DRaaS service delivery depends are not as reliable, available or predictable.
- Related to the above is the issue of distance that data must travel. Core to the DRaaS value proposition is the ability to safely store a redundant copy of mission critical data that can be accessed/restored promptly in the wake of a problem with the original data or an interruption in access. Increasingly, the data that is being protected/replicated includes virtual machines (files that contain the descriptions of machine configuration, operating system, application software and data), so the definition of data restoration has bled over into the definition of application re-hosting in many cases. Whether the data being protected is simply records used by an application or an entire VM, replication across a wire over distances greater than 80 kilometers is subject to latency and jitter that show up as differences between the state of data at the production site and at the recovery site. The greater the distance, the greater the delta.
Many DRaaS providers reassure their customers that "data deltas" are not a problem for them, given the bandwidth of MPLS, SONET or other metro area network services provisioned for cloud access, and/or the short distances that data will travel.
Bandwidth has nothing to do with latency, but travel distance certainly does. So, while it may be true that latency will not be problematic in a short transmission "across town" using the network facility of a single vendor, such a configuration usually translates to greater customer risk: the customer's redundant data is in sufficiently close proximity to its production data that both may be consumed by the same disaster, especially one with a large regional footprint like many natural disasters (hurricane, earthquake, ice storm, flood, wildfire and so on) and many more man-made hazards (nuclear accident, terrorism, chemical spill and so on.)
So, while many DRaaS providers may offer adequate protection against localized disasters -- failed equipment or media or facility-wide outages -- few can also claim to provide protection from milieu or regional outages. For the most part, the DRaaS industry is following the lead of certain hypervisor vendors who insist that high availability failover clustering with ongoing data replication is the "new" disaster recovery planning, noting that upwards of 90 percent of downtime is related to localized hardware faults, software glitches, operator errors or malware attacks, and that only 10 percent of downtime is due to events with a regional scope. Such thinking is inherently flawed.
These are only a few of the challenges confronting the delivery of a true disaster recovery service using cloud and virtual server technology. This, however, has not deterred some companies from pursuing such a strategy simply as a means to satisfy auditors and regulators that a continuity strategy is in place. If a disaster happens that cannot be handled using the established DRaaS program, the cynical view is that the business will likely not be around to impose career-limiting penalties on the decision-makers who took shortcuts with continuity planning.
On the other hand, some companies are seeking support only for part of their overall DR strategy using cloud services. For example, firms with small IT staff may lack local skills or resources to undertake DR planning projects successfully, making a service provider model very attractive.
This is an idea that several data protection service providers have seized upon: why not avail a smaller firm of "enterprise class" data protection software by hosting that software for them and customizing it for their needs? Such an idea has merit, especially in organizations (large or small) with remote and branch office networks whose data needs to be protected. The right service providers can enable a centrally managed and highly automated backup program, with data copies relayed to either centralized or decentralized collection points that can be accessed rapidly in the event of an outage.
Moreover, many DRaaS companies are beginning to offer archive services in addition to standard data backup or replication services. Truth be told, upwards of 70 percent of the data stored on every hard disk in a business is rarely re-referenced and never changed. It is a mix of inert, archive-candidate data; orphan data; contraband data; and data copies that, if eliminated from disk and flash infrastructure, could enable a substantial amount of expensive production storage infrastructure to be returned to productive use, bending the storage capacity demand (and cost) curve. The savings from implementing a hosted archive could well pay for the entire data protection and data preservation program.
Wanted: A Straight Shooter
Bottom line: DRaaS is poorly defined and in too many cases fraught with logical and practical flaws in terms of strategy and service delivery. That means the sector is ripe for a straight-talking vendor with a practical knowledge of data protection and business continuity requirements to step in with a set of solutions that can be priced affordably for the consumer.
This is the first of a series of columns to help business and IT professionals sort the wheat from the chaff when evaluating DRaaS offerings. It is also hoped that vendors of these services will read these columns carefully and perhaps agree to participate in completing a Request for Information questionnaire that will be used to create a free DRaaS buyer's guide in the near future.
About the Author
Jon Toigo is a 30-year veteran of IT, and the Managing Partner of Toigo Partners International, an IT industry watchdog and consumer advocacy. He is also the chairman of the Data Management Institute, which focuses on the development of data management as a professional discipline. Toigo has written 15 books on business and IT and published more than 3,000 articles in the technology trade press. He is currently working on several book projects, including The Infrastruggle (for which this blog is named) which he is developing as a blook.