Dan's Take

Are You Ready for Your Datacenter To Fail?

Turbonomic offers advice on preparing for the worst.

Turbonomic's Eric Wright, Principal Solutions Engineer, reached out to me to weigh in on the topic of reliability and availability in on-premise, off-premise and hybrid computing environments. After reading my articles on cloud outages and how to form a productive response, Wright wanted to present its views on the following topics:

  • Designing for when the cloud goes down
  • Cost considerations in the cloud: how to decide between public, private and the hybrid cloud for your enterprise
  • Multi-cloud design: how operations can diversify a company's application assets
  • OpenStack deployments
  • Why the "lift and shift" approach to moving cloud workloads is not a good idea

Rather than trying to provide a detailed point-by-point analysis, I'll offer a few bullets to summarize the discussion.

  • It's wise for solution architects to remember that all components, regardless of whether they're hardware or software, fail. Unfortunately, we now live in a world in which business people with only a limited amount of technical skill are selecting Software-as-a-Service (SaaS) or Infrastructure-as-a-Service (IaaS) solutions.
  • If a resource is critical, multiple instances of it should be available so that work can be migrated if a failure -- or the conditions leading up to a failure -- are detected.
  • Regardless of where the processing and storage are located (on-premises, off-premises or a combination of the two), it's vital that enterprises have tools that: understand the available resources; can monitor them in real time; can make decisions about allocating those resources based upon the enterprise's policies; and can then implement those decisions.
  • It can be far better to switch access between available resources than trying to grab something from a rapidly failing environment and hope it can be relocated before the crash.

Although the technical details are different between vCenter, Hyper-V, OpenStack, AWS or your favorite environment, the concepts are the same.

Turbonomic claims that its "patented Autonomic Platform continuously analyzes application demand and automatically allocates shared resources in real-time." The key point is that their technology implements economic policies to monitor and manage IT resources.

Dan's Take: Outages are a Fact of Datacenter Life. Plan Accordingly
It's been quite a while since I last spoke with Wright. I can't remember if it was when he was working as VMware vExpert, Cisco Champion or demonstrating his expertise in virtualization, OpenStack, business continuity, PowerShell scripting and systems automation. What has always been clear is that he knows his stuff and it is worth listening to.

Although I expected to quibble about how enterprises should plan for outages, I found that Wright and I were singing the same tune throughout the entire conversation. I can't think of a single point that he made that I haven't made in presentations, reports, ebooks and articles I've published. It was quite refreshing.

While Turbonomic isn't the only company saying these things, I think it would be worth your time to learn more about what they're doing, how well it works for its clients (they have a rather impressive customer list) and how it could benefit your organization.

About the Author

Daniel Kusnetzky, a reformed software engineer and product manager, founded Kusnetzky Group LLC in 2006. He's literally written the book on virtualization and often comments on cloud computing, mobility and systems software. He has been a business unit manager at a hardware company and head of corporate marketing and strategy at a software company.


Subscribe on YouTube