The Cranky Admin
Weighing the Cloud Decision
When to move, or whether to move at all.
Rolling your own cloud is hard. Whether your goal is an internal-facing or external-facing cloud, there are operational hurdles that are entirely separate from the technical challenges. The virtualization administrators will often be the ones asked to sort everything out. Doing so requires a learning a new way to think about workloads.
The first challenge is to define the word "cloud." There are many tech marketers who would have you believe that standing up a virtualization cluster with HA and DRS is a "cloud." It runs VMs. The management tools automate some basic aspects of infrastructure management: clearly that's a cloud!
If that's all there was to standing up a cloud, the public cloud wouldn't exist. A recent Spiceworks survey has virtualization adoption at 76 percent for 2016, with another 9 percent expected in 2017. This is despite Gartner's predictions of market saturation and an increasing trend toward physicalization.
"Traditional" virtualization -- VMware or Hyper-V -- is the norm. If it could deliver what people wanted from the public cloud, where would the public cloud be getting customers from? Some might exclaim that the subscription model is the driver, but that's unlikely; it's easy enough to lease on-premises solutions if you like paying per month.
What makes a cloud a cloud must be more than virtualization. I would argue that it is the self-service nature of public cloud solutions, ranging from end-user friendly GUIs to developer-friendly APIs that makes a cloud a cloud. Resource allocation tracking and integrated billing are probably important to many as well.
Traditional virtualization administrators are used to babying individual workloads. We stand them up, install the operating systems, patch them and tweak their resource demands. We busy ourselves with "keeping the lights on," have absolute control over what goes on "our" infrastructure, and get paid for it. When you think about it, that's a pretty sweet gig.
Once we deploy a cloud infrastructure, we've moved into a different role. We're no longer gardeners doting on VMs individually; we're industrial farmers who automate templates and images, bury ourselves in monitoring solutions, fret about billing, supply chain management, economics and efficiencies of scale.
Instead of someone coming to us with a problem and providing them a solution, our job becomes creating a series of pre-canned solutions for people to choose from, and trying to make those proof against conceivable problems end-users might encounter. The end-users have control over their workloads, not the administrators. Trained our whole lives for an almost autocratic control, that change is of itself hard to cope with.
In order to track resource allocation and bill it out, clouds cost things differently than a traditional in-house thinks about resources. For one thing, oversubscription is rarely baked into cloud software in quite the same way as it's practiced in traditional virtualization.
Every account that signs up to a cloud can create one or more virtual datacenters. These datacenters have x amount of RAM assigned to them, y amount of storage and z number of virtual CPUs. These are usually hard commitments. If, for example, 16 virtual CPUs are assigned to a virtual datacenter, then those 16 virtual CPUs are not available for use by other virtual datacenters.
While this makes resource allocation and billing easier, it often leads to less efficient utilization resources. Human nature and bitter experience both tell us to over-allocate resources. If you think you need 2GB of RAM, assign 4GB "just in case." If you predict you need 500GB of storage space, allocate 1TB because your guesses might be off. We're rarely all that confident in our own predictive ability, and our aversion to conflict means we don't want to have to redo workloads down the road.
Without a cloud infrastructure, virtualization administrators can compensate for this by looking at the system as a whole. If CPU usage only ever spikes to 30 percent across the cluster, but RAM is at the redline, you can probably get more mileage out of that cluster by adding RAM, instead of adding more nodes with more CPU. If workloads are oversubscribing their storage by 50 percent, you can see great efficiencies from thin provisioning.
This change from a usage-based model to an allocation-based model can be hard on budgets and those who must explain them.
Many traditional virtualization setups allowed for the allocation of hardware to individual business units. Sales bought their equipment and licences on their budget; IT just provided the datacenter it lived in. Moving everyone over to virtual datacenters that they allocate and grow on their own makes a move away from servers, storage and networking that individual business units can wrap their arms around and hug.
From a licensing perspective, this often makes things cheaper and easier. One business unit -- IT -- does all the negotiating, and negotiates a single agreement with the lowest possible rates. Hardware can see similar efficiencies; IT no longer has to track multiple budget-firewalled deployments for growth. They just need to make sure the cloud as a whole doesn't run out of resources.
The challenge here is that cloud software has moved the overprovisioning from physical clusters to virtual datacenters. The old school refresh model of overprovisioning at purchase time and then running for 3-5 years until the next refresh doesn't work with a cloud setup.
Instead, IT teams must move to a more continual growth analysis mode. Hardware and licenses will have to trickle in on an as-needed basis if there is to be any hope of controlling costs, let alone coming close to the costs that public cloud providers can offer. It is the end users of your cloud that will want to provision on a "fire and forget" basis, meaning overcapacity can no longer be IT's crutch.
Cloud computing is all about amplifying the smallest effort. If you build your OS images with a Puppet, Chef, Saltstack or Ansible agent inside, plus provide the relevant infrastructure as part of every virtual datacenter, you can handle updates, desired configuration and so on in an automated fashion. The infrastructure VMs for each virtual datacenter are all spawned from a central image, with default configurations defined by administrators. So on and so forth.
Getting all of this right -- from individual images to costing to supply chain management -- takes a lot more effort per item for the virtualization administrators than individually herding VMs, but pays real benefits at scale. What each of us needs to answer is at what scale moving to cloud infrastructure provisioning makes sense, and whether changing everything about how we approach IT is worth it.
Trevor Pott is a full-time nerd from Edmonton, Alberta, Canada. He splits his time between systems administration, technology writing, and consulting. As a consultant he helps Silicon Valley startups better understand systems administrators and how to sell to them.