Disaster-Proofing Your Cloud: 5 Tactics that Actually Work
Veteran cloud architect Joey D'Antoni returned to the Virtualization & Cloud Review virtual stage for the June 25 summit titled Beyond Backups: Cloud Resilience in the Age of Breaches with a session that tackled a perennial topic -- disaster recovery -- from a sharper, more modern lens. In the webcast presentation, "Disaster-Proofing Your Cloud: Data-Loss Prevention Tactics That Actually Work," Joey reframed data protection as more than just backups, using real-world cases and cloud-native tooling to outline a five-point framework for true resilience. It's now available for on-demand replay.
"We don't get paid for backups. We get paid for restores.""
Joey D'Antoni, Principal Cloud Architect at 3Cloud
"We don't get paid for backups," said Joey. "We get paid for restores." That guiding philosophy anchored the session, which moved beyond traditional DR to include policy enforcement, immutable infrastructure, cost-effective logging strategies, and automated recovery drills. Here's a breakdown of the five areas Joey argued matter most.
While his nearly one-hour presentation was too chock full of expert advice and commentary to address here, toward the end of his presentation, Joey outlined a five-point plan for cloud teams to build disaster-proof architectures that go beyond snapshots and backups.
[Click on image for larger view.] Five Things that Work (source: Joey D'Antoni).
While the session was packed with practical tips, real-world examples, and a focus on proactive, testable defenses, here's a summary of the key points he covered.
Secure by Policy: RBAC and Enforced Policies
Joey opened with a strong endorsement of cloud-native policy enforcement as a foundation for disaster-proof architectures. He urged attendees to treat policy as code using built-in tools like Azure Policies and AWS Service Control Policies (SCPs), emphasizing least privilege and automation-first thinking. "When we talk about bad security, we talk about not having granularity and how we can assign design privileges," he said.
He recommended using Role-Based Access Control (RBAC) to scope access permissions to the minimum necessary and highlighted Azure's Privileged Identity Management (PIM) as an effective tool to require elevation and auditing when higher-level access is needed. "It forces an audit trail when a user accepts a high-level role," he said.
To show the real-world impact of misconfigured access, he cited a Microsoft Graph incident, in which a developer's app in a development subscription was granted excessive permissions: "There was a breach last year where a developer built an application that's in a development Azure subscription that somehow got granted... permissions to read all of Microsoft Graph."
Prevent Deletion: Locks and Immutability
The session's second theme focused on measures to ensure critical resources and backups aren't accidentally or maliciously deleted. Joey highlighted two cloud features that support this: resource locks and immutable storage.
- Resource locks -- "There's a do not delete lock that simply prevents deletion of a resource, and then there's also a do not modify this resource lock," he said. These are especially useful in production environments, though he warned that improper use can hinder legitimate operations like disk expansion.
- Immutable storage -- "Microsoft and AWS provide immutable storage that you can immediately write to," he noted, adding that some customers even store these backups in a different subscription or tenant for added protection.
He also referenced the upcoming support in SQL Server 2025 for backing up directly to immutable storage and discussed the risks of ransomware attackers targeting backup deletion: "Our biggest fear of ransomware attacks isn't just that... threat actors invade our systems. It's that they invade our systems and delete all of our backups."
Detect Threats: Logs and Alerts (Watch Out for the Cost)
For threat detection, Joey focused on cloud logging and anomaly detection. He advised organizations to structure logging strategies carefully to balance visibility and cost. "Logging can be more expensive than your compute," he warned.
He recommended segmenting log collection by application or resource group and using streaming queries to monitor thresholds and trigger alerts. "You can have what are called standing queries, and you're querying each row as it passes through," he said. For example, a spike in system temperature or resource consumption could be flagged automatically.
Joey also suggested integrating log analysis with SIEM and SOAR platforms for automatic remediation workflows. Common use cases include detecting privilege escalations and unusual access patterns.
Plan For Chaos: Test and Simulate
"Backups are completely useless if you haven't tested them," Joey said. He emphasized the importance of automating disaster recovery drills to validate restore procedures and uncover hidden gaps in recovery workflows.
He described testing options that are now widely available in the cloud, such as Azure Site Recovery, AWS native tools, and third-party platforms. "You can do testing without impacting your production environment at all," he said.
The session also touched on chaos engineering as a best practice for advanced teams. Joey referenced Netflix's Chaos Monkey and noted that Azure now offers a similar service. "What Chaos Monkey would do would go around and shut down services in the environment," he said, describing it as a way to ensure services fail gracefully.
Align People, Process, and Platform -- Not Just Tech
Joey closed by reminding attendees that tools alone aren't enough. "This is not just a technical problem," he said. "We have to align people and process and our platform." He argued that resilient systems depend on cross-functional coordination between devs, architects, operators, and business stakeholders.
He encouraged teams to apply tighter controls to production environments while offering developers greater flexibility in lower-tier environments. "I try to think about what my developers are going to need to do and anticipate their needs," he said, noting that locking down production doesn't mean stifling innovation across the board.
For teams just starting their cloud resilience journey, he recommended a simple first policy: "Start with a policy that requires all of your production servers and databases to have backups on them."
Joey's five-point plan offers a practical framework for cloud teams looking to move beyond snapshots and start building cloud-native defenses that are proactive, testable and sustainable.
And More
And, although replays are fine -- this was just today, after all, so timeliness isn't an issue -- there are benefits of attending such summits and webcasts from Virtualization & Cloud Review and sister sites in person. Paramount among these is the ability to ask questions of the presenters, a rare chance to get one-on-one advice from bona fide subject matter experts (not to mention the chance to win free prizes -- in this case $5 Starbucks gift cards which were awarded to the first 300 attendees during a session by sponsor Rubrik, a leader in cloud data management and enterprise data protection which also presented at the summit).
With all that in mind, here are some upcoming summits and webcasts coming up from our parent company in the next month or so:
About the Author
David Ramel is an editor and writer at Converge 360.