Cloud-Based Disaster Recovery: How To Avoid the 'Gotchas'

Be sure to familiarize yourself with the limitations and capabilities of cloud DR.

Backups and disaster recovery (DR) are the standard first step into the cloud for many organizations. As a result, it is where organizations are most likely to encounter -- and have to overcome -- the myriad of non-technical problems that accompany outsourcing IT. What are these cloud gotchas, and how do they manifest for those attempting backups and DR to the cloud?

Cloud-backed data protection is an attractive solution. So attractive that is has infiltrated the daily lives of most of the western world. Sync-and-share solutions such as Dropbox, OneDrive nd Sync are often the first exposure users have to the concept.

Sync-and-share solutions allow individuals the ability to place files in a local folder and have those files automatically be uploaded to the cloud. Once in the cloud they're versioned and offer a limited ability to search through the history of a file in case a mistake was made and a previous version needs to be recovered. Sync-and-share solutions aren't what most IT practitioners would call a "proper" backup solution, but for the vast majority of personal users, they're the "better than nothing" solution that's most likely to get used.

Expectation Management
Sync-and-share solutions are a great opportunity to examine one of the key problems of cloud computing: Namely, that expectations rarely match reality. IT practitioners have a pretty good idea of how sync-and-share solutions work. As a result, it rarely occurs to us to sit users down and have the talk with them about the cloud not being magic. Unfortunately, many users encounter cloud-based solutions with erroneous preconceptions, and this "knowledge" leads to errors.

One common belief is that sync-and-share solutions adequately protect users against ransomware. They don't. Modern ransomware makes numerous changes to files over time, ultimately running out the number of versions of files kept by the sync-and-share solution. There's even a little game of cat-and-mouse going on with some of them where the sync-and-share vendor tries to add some level of ransomware detection based on access patterns, and the ransomware evolves new access patterns.

Proper backups -- solutions designed specifically for the purpose of backing up data -- have different approaches to the problem. They'll set files read only once initially uploaded, or allow unlimited versions combined with alerting systems administrators of an infection.

Sync-and-share solutions are only profitable because they sell users way more capacity than most of them would ever use. This oversubscription means that one good malware infection that increases file version numbers -- and thus storage usage -- could put a sync-and-share operation out of business.

Proper backup to the cloud solutions don't have this problem. They make the customer pay for usage. As a result, they have the ability to focus on data retention instead of data availability. Two different approaches, but used by many individuals to solve the same problem. One of the two -- sync and share -- is emphatically not suited to the task, but a decade of popular misconception will mean it continues to be used in this way.

All cloud solutions suffer from this issue to some extent. It makes expectation management one of the first -- and most difficult -- hurdles for new cloud administrators to overcome.

Disaster Recovery
If expectation management related to simple backups can get messy in a hurry, the issues surrounding DR quickly become downright dangerous. Unlike the nearly universal sync and share, systems administrators are not widely familiar with the limitations and capabilities of cloud DR.

At first glance, cloud DR should be no different than DR to an organization-controlled second site. The No. 1 issue administrators are going to encounter in both cases is ensuring that workloads function on the DR site the same as they do on the primary.

Here, the most common problem is network configuration. On the production site workloads are set up with specific network configurations that allow them to interoperate with other workloads without conflicts. This may include static IP addresses, and usually involves DNS and firewall rules.

Making the workloads ready to be usable during a DR scenario involves setting the workloads up with dynamic IP addresses, configuring other workloads to access services offered by DNS instead of IP, and configuring the rest of the network infrastructure to automatically -- and rapidly -- update DNS, firewall, intrusion detection, threat protection and other services to accommodate the shifted workload.

This is the first step toward making workloads composable, and it's not easy. When an organization owns both sides of the DR solution, then the infrastructure is a lot more forgiving of mistakes made in making workloads composable. If an admin forgets to make an IP address manually, for example, administrators can simply log in to a virtual machine's console and change the IP address.

Not all cloud technology offers this ability to manually restore workloads to operation. For many cloud solutions, if the workload doesn't come up in such a manner that it can be remotely accessed using the operating system environment (OSE) native remote access tools, then the workload is irretrievable.

During a crisis that triggers a failover to the cloud is not the time that systems administrators want to discover that their DR solution is useless because key workloads aren't composable. Unfortunately, DR solutions are often sold using perfectly configured, best-case demos, creating false expectations with administrators, or simply never introducing them to potential problems in the first place.

Lack of time to thoroughly research the solution, lack of training in the intricacies of the offering, lack of budget to engage more full-serviced cloud offerings, and above all a lack of testing combine to make this first step into the cloud a disastrous one for many organizations.

Cloud Business Practices
The problems involved in moving to the cloud are human as much as they are technical. They involve our susceptibility to marketing and sales fluff, a decade of preconceptions and overcoming the innate desire not to be bothered with difficult problems.

Outsourcing your IT to a cloud provider is supposed to make IT easier, but it can only be easier if you accept that the cloud isn't somehow a magic talisman, but is, in fact, nothing more than outsourcing. If you hired a custodial service to take care of the office and grounds after hours, you wouldn't simply click a few buttons on a Web site and never think about it again.

When outsourcing custodial services there are practical issues to be concerned with: how keys will be made available, the insurance and reputation of the custodians, what to throw out and what not to throw out. So it's odd -- but somehow nearly universal -- that organizations don't have the same level of engagement or consideration with outsourcing their IT to a cloud provider. This is one place where appropriate businesses practices can make a big difference.

IT practitioners with a great deal of cloud knowledge and a lot of time to prepare on-premises workloads for the transition to the cloud can simply saunter on up to Amazon Web Services (AWS) with a credit card and make magic happen. 10 years after AWS was born, the number of IT practitioners with the skills to do this remains small.

There are, however, a number of smaller regional providers available that offer a more managed service, and it's worth mandating their use, at least in the beginning. Use of these solutions can solve a number of problems, starting with being more forgiving of mistakes with composability, but extending into other areas.

Smaller services providers are more likely to have VMware and Hyper-V hosts available, eliminating the added complications of converting workloads as part of the DR process. Smaller services providers often have experts available to help with the really tough problems, like ensuring that bare metal workloads can effectively use a cloud solution for DR purposes.

Mandating rigorous and thorough testing is another business-level decision that must be taken before embracing the cloud. Whether the cloud is being used for backups, DR or for production workload testing, more testing and testing all over again needs to be a constant requirement.

It's all too tempting to treat the cloud like a fire-and-forget solution that magically solves problems with which you don't want to deal. Adapting business practices to the cloud needs to begin with countering exactly that tendency, and these business practices need to stay in place even as comfort and experience with cloud solutions grows.

The cloud doesn't come with a license to disengage your brain. Cloud services providers, however, do come with training wheels. And for all your cloud needs -- from backups to DR and beyond -- they're a great place to start. At least until you're ready to ride on your own.

About the Author

Trevor Pott is a full-time nerd from Edmonton, Alberta, Canada. He splits his time between systems administration, technology writing, and consulting. As a consultant he helps Silicon Valley startups better understand systems administrators and how to sell to them.


Subscribe on YouTube