To Design for Public Cloud Infrastructure, Think Different
Lately, I've been having conversations with several customers on design principles and approaches to the public cloud. What I've noticed is that many IT pros have taken an overly simplistic approach to the public cloud, viewing it as a mere replica of their on-premise, highly virtualized environments.
Before digging deeper into that conversation, let me explain why and how cloud infrastructure is different than traditional enterprise IT infrastructure. It's all about the applications: traditional client/server applications that have been developed on the x86 platform and which constitute the majority of applications in the enterprise today. Those apps require a reliable infrastructure. In this configuration, the application is completely unaware of the infrastructure and doesn't interact with it at all. It's the engineering team's responsibility to ensure a reliable infrastructure platform.
Typically, designing for availability options is done at the hardware, operating system and hypervisor levels (some are handled at the database level). At the operating system level, you can configure clustering, do network load balancing, write custom scripts that check on, and remedy, services in the event of a failure, and so on. At the hypervisor level you can enable high availability, configure fault tolerance and integrate with business continuity systems for failover to different sites. From a database perspective, you can configure replications and high availability between the database software. All of these solutions have one thing in common: the application is unaware of what anyone else is doing. It simply expects a reliable infrastructure.
Cloud infrastructure, on the other hand, it is a completely different ballgame. Applications that were built to run in the cloud are very much aware of, and in control of, the infrastructure. This awareness by the application allows it to provide availability services to itself. If, for example, it needs an additional Web server spun up due to heavy load or loss of other Web servers, it has the authority to make those API calls and provision the necessary services. These types of applications are almost grid-like, in the sense that the failure of certain hardware components won't cause a service outage.
But what if you want to take advantage of public cloud infrastructure, and your applications aren't designed to be infrastructure-aware? First, you should take some time to understand the design principles that were adopted at every layer, from storage to compute and so on. Next, you must design with the mindset that the infrastructure's unreliable. Following these principles will allow you to successfully design your migration to a public cloud.
Understanding the system design will tell you how many availability zones exist, for example, and what your service provider's service level agreements say about service outages. With Amazon Web Services (AWS), an outage is only declared if two availability zones in the same region fail. So if you design your environment and your workloads to all be in the same availability zone, and they fail, it's your fault – you didn't do your homework .
Designing for an unreliable infrastructure involves spending more time up front designing your workload placement and configuration in a way that takes advantage of the scale and options available. You also have to consider "what if?" scenarios: do you cluster across two availability zones or three? Do you need to worry about an entire region going down? If so, how do you fail over to another region? You need to ask questions like this.
The cloud isn't more complicated or less reliable than any other infrastructure; it's simply different. Since you didn't build it, you must learn and understand it before migrating workloads. Take the time to properly plan and place workloads in the cloud, and you'll find that the reliability and cost is manageable.
Posted by Elias Khnaser on 09/02/2014 at 11:23 AM