Virtual Architect

Strategies for Disaster Recovery

The move to virtualization bestows some revolutionary benefits. Affordable DR is one of them.

In my experience, few organizations that make the move to virtualization do so initially for disaster recovery (DR) reasons. Server consolidation, data center cooling considerations and power cost savings are all more prevalent drivers sporting hard cost savings that drive an organization's first jump into virtualizing the IT environment.

These drivers are compelling and their direct return back to the business' bottom line make virtualization an excellent sell to the corporate bean counters. But they only tell part of the story. Once in place, the virtualization infrastructure itself immediately enables some new and powerful mechanisms for how an IT infrastructure can be administered. Two of these are virtualization's twin augmentations to server backups and disaster recovery.

A Brand New Ballgame
Let's consider first how virtualization changes the ballgame in terms of server backups. In the physical world, backups are in many ways an exercise in blind trust. With thousands or tens of thousands of files and folders on any particular server, the job of server backups is to capture as many of them as possible. In some cases, due to their state or activity, backup software may be unable to grab certain files or folders. In others, a server's files can be spread across multiple instances of backup media, the loss of any one file compromising the success of a restoration. This inherent complexity complicates the process of restoration in the case of a server failure.

Conversely, within a virtualization environment, all of those thousands of files and folders that make up each disk are usually "squished" down into a single file. VMware has its .VMDK files. Microsoft and Citrix XenServer use .VHD. All of these are single-file representations of a server disk partition. Wrapping the server's state into this single file simplifies the backup process because one file equals one server.

Considering this model, the requirements needed to add DR options to virtual backups get a lot easier and a lot less expensive. In the physical world, DR is both expensive and an operational nightmare. In many environments, the only true way to manage active failover DR was through the creation and administration of a secondary site. In that site were provisioned what amounts to duplicate environments, mirroring the servers and configurations of the production site. Got six database servers? You'll need six more at your DR site. Making a configuration change to "\\dbserver1" in the production network? You'll need to make a similar change to its pair on the DR side. Without tedious attention to detail, configurations between production and DR sites could easily grow out of sync, making the DR site useless for a failover event. Due to these kinds of challenges, disaster recovery in all but the world's biggest and most important IT environments rarely grows out of the planning stages. Though the consequences of a disaster are dramatic, the cost impact of true active failover is simply too high for most environments.

Once an environment moves to virtualization, however, all of these complexities melt away.

Figure 1

With virtualization, it's possible to consider disaster recovery as little more than adding "off-site replication" to "backups." This sounds overly simplistic, but from a very high level, adding DR to your virtualization environment involves the network activity associated with transferring a copy of those single file backups to an alternate location.

Numerous products exist that aid in the management and administration of this replication, but the central theme in each is in enabling a mechanism for snapshots to be stored in another facility for a worst case scenario. The greatest benefit with all of these products is that, depending on your budget, the quantity and type of servers stored at the backup facility, and your recovery time objective, there's a solution that can work for your environment.

Let's take a look at three high-level strategies, starting first with the lowest cost and slowest return-to-operations metrics:

Lower cost, slow restore. All virtualization solutions natively include the management tool exposure needed to take an image-level backup of a virtual machine (VM). By using some careful scripting or low to no-cost utilities, these tools can be inexpensively used to transfer server disk states to removable media for off-site rotation.

The servers and storage needed to host the restored VMs after a disaster event don't need to be physically present at the backup site during normal operations. Instead, consider contracting with a distributor to deliver them on 24-hour notice after a disaster. Your restoration process will require receiving and installing the server infrastructure at your DR site and then restoring the server images from backup media. The process won't be quick, but it will be inexpensive.

Medium cost, medium restore. Moving up a notch in the cost structure, if you need a faster restore process consider purchasing cold-spare servers and data storage to reside at the disaster recovery site. This equipment need not be running, but it does need to be preconfigured to be ready for supporting restored VMs.

What's useful about post-disaster operations in some industries is that a complete restore may not be necessary because not all servers are critically important. You can also leverage the power of consolidation to support greater numbers of VMs per host than you would normally do in production. While for production operations, you may be able to support 5X VMs per host, during a disaster you may compress down to 10X per host. Performance will suffer, but you're likely to have fewer employee needs while in post-disaster operations.

Higher cost, immediate restore. For environments with little tolerance for downtime, today's third-party virtualization tools add immediate restore capabilities if you can tolerate the cost. To support this kind of environment, ensure that enough network connectivity is available to support the near-continuous transfer of server disk states from production to DR site. Also, ensure that hot-spare or warm-spare servers are pre-positioned at the DR site and preconfigured to support powering-on VMs at the moment of the disaster. Third-party tools like DoubleTake Software, DataCore SANMelody, and VizionCore's vRanger and vReplicator, among others, are all possible add-on toolsets that enable this functionality.

The majority of the cost for any of these solutions does not necessarily lie within the software, but instead within the duplicate hardware that lies unused at the DR site waiting for a disaster. Your decision about which capabilities you need and what cost you can absorb will relate most to your tolerance for downtime.

About the Author

Greg Shields is Author Evangelist with PluralSight, and is a globally-recognized expert on systems management, virtualization, and cloud technologies. A multiple-year recipient of the Microsoft MVP, VMware vExpert, and Citrix CTP awards, Greg is a contributing editor for Redmond Magazine and Virtualization Review Magazine, and is a frequent speaker at IT conferences worldwide. Reach him on Twitter at @concentratedgreg.