Cloud-Based Disaster Recovery: How To Achieve Near-Zero Data Loss

With a mixture of backup methods, understanding the workloads to be protected and tackling technical debt, even the smallest organizations can achieve near-zero data loss.

Everyone wants a data protection solution that backs up up all of our data, with no losses, and restores workloads to functionality instantly. In the real world, few can afford this. So what compromises can be made, and how does using the cloud as a data protection destination affect these choices?

Getting disaster recovery (DR) to work at all -- let alone reliably -- isn't simple. Networking issues are a common problem. You must be aware of backup consistency and automation concerns. Changing your business practices and embracing workload composability are often required. While getting a handle on continuous data protection (CDP) versus snapshots, data churn and bandwidth management will determine the viability of your DR attempt.

With data protection, there's a lot to learn, and this is hardly half of it.

In the real world, none of us get to ride a unicorn to work. Our servers aren't made out of magic, and our Internet connections don't defy physics. We need to back up our data, and protect our workloads, but reality says we can't afford the data protection we really want, and there are very few easy buttons that instantly solve all our woes.

As a result, we all have to make compromises when it comes to data protection. Most of those compromises mean that we have to accept a recovery point objective (RPO) that's non-zero. This means that, should a disaster occur, some data loss is inevitable.

Not all workloads need to experience data loss in the event of a disaster, and business needs -- not technical ease -- are what need to be the determining factor in data protection decisions. So, bearing in mind that you can't have your cake and eat it, too, how should organizations make choices about data protection?

The starting point for data protection compromises should be determining which workloads are least affected by data loss. Consider as an example a workload that processes -- but doesn't store -- data locally. Such a workload could conceivably be backed up as little as once a month, and the only data loss in rolling back to the previous month's copy would be that some updates aren't applied, and that a month's worth of logs have been lost.

In the case of this processing-only workload, both the updates and the logs are solvable problems. If the workload is managed using a configuration management tool, then updates would be applied automatically. Reverting to a month-old backup copy would result in the backup copy updating and then rebooting, then coming online fully patched and ready to do its job.

Similarly, making use of log centralization -- a typical component of application and server monitoring solutions -- can ensure that log files aren't lost, even in the event of a workload rollback. In almost every instance applications can be separated from the data they operate on, as well as from the results they produce, including their logs.

It's the data that workloads operate on and produce that's important, not the workloads themselves, and this data can typically be centralized. Doing the hard work of separating application and operating system environment (OSE) configuration from the data -- known as making a workload fully composable -- frees organizations to concentrate resources on those workloads that cannot be separated from their data, or which are themselves centralized repositories.

Inseparable Workloads
Even if an organization puts the maximum amount of effort into making as many workloads composable as possible, there will always be a few workloads where such efforts will fail. Many databases are the back-end storage for applications that cannot afford to lose even a second's worth of data. These databases will need to be actively replicated between the production and backup sites.

Similarly, file storage that underpins workloads will also likely have to have a real-time replication capability. These are classic examples, but they're not the only ones, and it's the edge cases that usually matter the most.

It's easy to type some words into a blog and say: "If you want to save money on data protection, make your workloads composable." The truth is that doing so isn't easy. Each and every application is different. Many application developers are lazy, or are developing applications that follow poorly thought out standards of how applications should be designed from 20 or more years ago.

Once upon a time it was perfectly normal for applications to scatter libraries and other critical resources liberally around the OSE. It was perfectly normal to store some configuration data in the Windows registry, some in flat text files and some in small databases that were installed alongside the application within its OSE.

Applications might be hardcoded to look for the data upon which they act in a specific location, or they might only work with a database that's installed on the same OSE as that application. Applications do all sorts of weird things, are designed in all sorts of confusing ways, and in many cases the administrators responsible for managing and maintaining them are not experts in those applications.

For all of these reasons there will always be some applications that just cannot be made composable. Once these applications have been identified, then rational decisions about how to proceed with data protection efforts can begin.

Small Pipe, Big Data
Once the non-composable workloads are identified, the rate at which they generate writes to storage – known as data churn -- must be measured. For data protection efforts to be successful, you must have an Internet or WAN connection with enough capacity to handle the data churn of all the workloads that cannot be made composable.

If the rate of data change exceeds your upload capacity, the entire effort is doomed before it begins. If, however, you can obtain connectivity to your chosen off-site data protection location that has more capacity than the data churn of non-composable workloads, the rest is reasonably simple.

Databases should be set up to replicate in real time to the off-site location. This ensures failover of those workloads without data loss. Snapshots of databases should be taken at the off-site location on a regular basis to allow for recovering deleted data, or data from before a change was made.

File servers, configuration management servers, log centralization servers and non-composable workloads should be backed up using CDP or near-CDP backup methods. This ensures that these workloads lose as little data as possible during a failover event.

Everything else is fungible. The composable workloads -- which hopefully make up the bulk of what you have to back up -- can be backed up infrequently, or not at all. How frequently they get backed up -- and how much they impinge upon your network capacity -- depends on how thoroughly you managed to make those workloads composable, as well as the support you have from your data protection destination provider.

Managed Services
A fully composable workload can be reconstituted from a configuration file. Docker containers are often considered the ultimate expression of this concept. You can run a few commands on an appropriately configured server and shortly thereafter a container is created, an application is downloaded from the Internet and installed in that container, the application's configuration is injected and the application is connected to the data upon which it will operate.

Many cloud providers operate in a similar fashion, even if they're not using containers. Standardized workload templates often exist for common workloads. Depending on the infrastructure solution used by the cloud provider, configuration can be injected into a newly created workload, in a process known as adding context. Where context-capable workload templates exist on the cloud provider side, this can mean that only updated configuration files are required to provide full disaster recovery for that workload.

Currently, these workloads will be rare, but their availability is growing. When context-capable templates aren't available, using traditional snapshot-based backups is a perfectly acceptable approach to backing up composable workloads. How frequently these backups occur is largely a matter of comfort level, but it's not unreasonable to limit them to once a week or even once a month.

It is fairly normal to schedule backups for these workloads to trigger during off hours, when data churn on non-composable workloads is at the lowest, so as to not impinge upon network connectivity for those more sensitive workloads.

Because of the complications discussed, it isn't recommended that any organization -- no matter how competent -- attempt to go it alone on data protection. It's worth engaging directly with the data protection software vendors, application vendors and especially the data protection destination supplier. Because organization-owned and -controlled DR sites are increasingly rare, this means working closely with a cloud provider, something that isn't always possible with the big four public cloud providers.

Regional services providers are worth considering for data protection destination. Not only can they lend expertise in designing a data protection solution, working with them might mean an organization will be able to substantially reduce costs.

For example, they may be able to create templates for some or all of your composable workloads, removing the need to back them up at all. They may also have lower cost solutions available for database replication, or be able to help put in place data efficiency solutions that help reduce network capacity demands.

Ultimately, successful data protection will require a mixture of backup methods, understanding the workloads to be protected and tackling technical debt. Every situation is different, but near-zero data loss in the face of disaster is becoming possible for even the smallest of organizations. You just have to know where to make the necessary compromises.

About the Author

Trevor Pott is a full-time nerd from Edmonton, Alberta, Canada. He splits his time between systems administration, technology writing, and consulting. As a consultant he helps Silicon Valley startups better understand systems administrators and how to sell to them.


Subscribe on YouTube