In-Depth

Disaster Recovery Best Practices for the Real World

No organization is immune to disaster, whether it's ransomware, cloud outages or unexpected system failures. That's why having a solid disaster recovery plan is critical -- especially for Microsoft environments. But what are the best ways to protect your systems and data?

Well, the best ways to protect your systems and data can be detailed in a set of best practices for disaster recovery.

And that's exactly what data expert Karen Lopez did today in her session of the half-day tech-education online summit, titled "Disaster Recovery 101: Top tips & Strategies for Microsoft IT Pros, which is now available for on-demand replay, including her session titled "Disaster Recovery Best Practices for the Real World."

Lopez, a senior data architect with an extensive background in development processes and data management, has all kinds creds to justify that "expert" status as she's a Microsoft MVP, Data Platform, along with being a Microsoft Certified Trainer, vExpert, and much more.

Her nearly one-hour session was jam-packed with information -- too much to cover in full here -- so we'll just focus on a few main best practices that she highlighted, and you can see her flesh out the others in the replay.

Let's take a looks at just three real-world best practices for disaster recovery that Lopez shared in her session: Restores Are the Goal, Not Backups; Implement Immutable Storage and Enforce Separation of Duties; and Inventory and Classify Your Data -- Especially Dark Data.

Restores Are the Goal, Not Backups
Lopez emphasizes that the ultimate purpose of any backup strategy is successful recovery -- not the backup itself. She argues that many organizations focus heavily on backup routines, metrics, and dashboards, but fail to give equal or greater attention to testing and validating restores.

As she put it bluntly, "We don't need backups; we only need restores." While she acknowledged this is an exaggerated statement, it underscores her core message: disaster recovery success depends on your ability to get systems back online -- not just on whether your backup job ran. "I see all these dashboards and metrics of backup success and failures ... and when I ask them, where's the same dashboard for your recovery? They say, 'Well, we don't do that very often, so we don't need a dashboard for it.' I'm like -- absolutely you do."

Instead of relying on rigid backup routines or manual recovery tests, Lopez advocates for shifting both mindset and operational priorities to restoration-readiness. That means proactively testing restores -- not just periodically, but in a strategic, risk-based way.

"Testing restores is mandatory. Again, we don't need backups -- so this restore thinking is what I would like you to have."

Karen Lopez, @datachick

"Testing restores is mandatory. Again, we don't need backups -- so this restore thinking is what I would like you to have." To that end, she recommends using automation and AI to support restore testing through statistical sampling, which can broaden coverage without overloading teams. "Restore testing doesn't have to be done every day or for every backup. You can use statistical sampling -- based on business value, risk, or usage -- to guide which restores to test over time."

Lopez also cautions against the common practice of leaving backup and recovery responsibilities in the hands of application developers or other teams who may lack proper tools and oversight. "We should not have application-driven designs and implementations of backup and recovery. It's unfair to application teams who don't have the tools, monitoring, or time to ensure recovery works." Instead, she argues, restore planning must be intentional, centrally managed, and backed by proper observability. Ultimately, what matters is not how many backups you have, but whether your data is intact and recoverable when you need it. As she summarized it: "Modern disaster recovery means: is the data where it's supposed to be? Has it been changed? Can we get it back and keep the business up and running?" And at the end of the day, she reminds us: "Business resumption planning is really the goal of disaster recovery."

Implement Immutable Storage and Enforce Separation of Duties
Lopez strongly advocates for the use of immutable storage as a core component of a resilient disaster recovery strategy. In her view, backup data must not only be stored, but also protected from tampering, deletion, or ransomware encryption. Immutable storage -- particularly when backed by hardware-enforced controls such as confidential computing -- ensures that backup data "cannot be changed," creating a critical line of defense against internal and external threats. As she put it, "One copy should be immutable and there should be zero restore errors."

But even the strongest technical solution can be undermined if it lacks the right operational guardrails. That's why Lopez pairs immutability with a call for strict separation of duties between the teams managing backups, the primary data, and the immutable stores themselves. Without this, attackers -- or even well-meaning admins -- may have the ability to subvert immutability protections. She shared a compelling example: "I actually have in one of my demos for ledger tables in SQL Server and Azure SQL DB, a way that I can mess with the immutable data ... but it's not because I changed the data in immutable storage. It's because I pointed to a different immutable storage. And that was because I had admin rights both on the both the data side and the immutable storage."

In addition to software-based protections, she notes that "confidential computing can be one version of that, which is ... protected by both hardware and software together." She also points out the growing importance of air-gapped backups and zero-trust architectures in complementing immutability and guarding against restore sabotage. Her emphasis is clear: trust in the recovery process depends on protecting the integrity of backup data at every level -- technical, procedural, and organizational. As she says, "We need to be able to trust that the data we're looking at is the right quality ... is the right granularity -- and without immutability somewhere in the process ... " trust is compromised.

Classify Your Data -- Especially Dark Data
Lopez dedicates a substantial portion of her presentation to the often-overlooked but critical practice of data inventory and classification, with special focus on "dark data." She defines dark data as information that is collected but unused, forgotten, or unknown -- such as logs, metadata, configuration files, or even AI prompts stored within applications. As she explains, "All it really is is data that's collected and unused or unknown." This type of data may be invisible to backup systems, overlooked in compliance reviews, or worse, exposed unintentionally, creating significant security and operational risk.

Lopez warns that dark data can accumulate in unexpected places -- especially as systems grow more complex, distributed, and AI-enabled. "That could be logs, metadata ... configuration ... storage ... location data, old data, data that's forgotten about ... and then things that are new, like AI prompts." She provides a particularly striking example involving software-defined infrastructure where critical configuration files weren't included in the backup pipeline: "They were using DevOps and software-defined everything ... but they weren't backing up all the configurations ... they had to go scrounge for local copies of those configurations -- YAML files, ARM templates -- because they didn't have access to the ones that they weren't backing up."

Her broader message is clear: You can't protect or recover what you don't know exists. This applies not just to forgotten files, but also to poor practices like users saving sensitive data locally. "I worked with someone ... the first thing she does with any system is she tries to copy all the data to a spreadsheet on her local laptop .... She thinks she can manually keep her changes in sync. It's physically and mentally and emotionally impossible to do that." Lopez stresses that if IT leaders don't include all relevant data in disaster recovery planning, they're not just risking data loss -- they're creating exposure points for breaches and compliance violations.

Her recommendation is not simply to track everything indiscriminately, but to implement meaningful data classification and awareness practices that inform what gets backed up, how it's protected, and whether it's recoverable. "You can't back up and you can't recover what you don't know about ... You also can't secure what you don't know about." This foundational practice supports not just disaster recovery, but data governance, privacy, and operational resilience across modern environments.

Extra Benefits
Of course, the value of attending such summits and webcasts in person is the ability to ask questions of the presenters, providing a rare opportunity for attendees to get one-on-one advice from a bona fide subject matter expert (not to mention the chance to win a great prize -- in this case a DJI Mini 4 Pro drone from sponsors Zoho and Zerto).

With that in mind, here's a list of upcoming events.

Featured

Subscribe on YouTube