In-Depth
Avoiding the Pitfalls of Virtualization
Virtualization is rarely as simple to implement and manage as it has been made out to be. Here's what to look out for when planning your organization's next virtualization project.
No technology in recent memory has come with as many promises as server virtualization. As I'm sure you know, all of these promises can be broken down into one simple concept: Virtualization allows you to consolidate a bunch of underutilized servers into a single server, which allows the organization to save a bundle on maintenance costs.
So with server virtualization promising such a dramatic boost to an organization's return on investment (ROI), even in a bad economy, what's not to like? What many organizations are finding out is that in practice, virtualization is rarely as simple to implement and manage as it has been made out to be. In fact, there are numerous potential pitfalls associated with the virtualization process. In this article, I want to take a look at some of these pitfalls, and at how they can impact an organization.
Subpar Performance
While it's true that virtualizing your data center has the potential to make better use of server resources, any increase in ROI can quickly be consumed by decreased user productivity if virtual servers fail to perform as they did prior to being virtualized. In fact, it has been said that subpar performance is the kiss of death for a virtual data center.
So how do you make sure that your servers are going to perform as well as they do now when virtualized? One common solution is to work through some capacity planning estimates, and then attempt to virtualize the server in an isolated lab environment. But this approach will only get you so far. Lab environments do not experience the same loads as production environments, and while there are load simulation tools available, the reliability of these tools decreases dramatically when multiple virtual servers are being tested simultaneously.
While proper capacity planning and testing are important, you must be prepared to optimize your servers once they have been virtualized. Optimization means being aware of what hardware resources are being used by each virtual server, and making any necessary adjustments to the way hardware resources are distributed among the virtual machines (VMs) in an effort to improve performance across the board for all of the guest servers on a given host.
Network Management Difficulties
When organizations initially begin to virtualize their servers, they're often surprised by how difficult it can be to manage those virtual servers using their legacy network management software. While any decent network management application will perform application metering, compile a software inventory, and allow remote control sessions for both physical and virtual servers, there are some areas in which traditional network management software is not well-equipped to deal with VMs.
One example of such a problem is that most of the network management products on the market are designed to compile a hardware inventory of all managed computers. If such an application is not virtualization-aware, then the hardware inventory will be misreported.
Likewise, some of the network management applications on the market track server performance, but performance metrics can be greatly skewed in a virtual server environment. While the skewed data may not be a problem in and of itself, it is important to remember that some network management products contain elaborate alerting and automated remediation mechanisms that engage when certain performance problems are detected. These types of mechanisms can wreak havoc on virtual servers.
Finally, legacy network management software is not able to tell you on which host machine a virtual server is currently running. It also lacks the ability to move virtual servers between hosts. While most virtualization products come with their own management consoles, it's far more efficient to manage physical and virtual servers through a single console.
Virtual Server Sprawl
So far I have talked about various logistical and performance issues associated with managing a virtual data center. Believe it or not, though, a virtual server deployment that works a little too well can be just as big a problem. Some organizations find virtualization to be so effective, that virtual server sprawl ends up becoming an issue.
One organization ended up deploying so many virtual servers that it ended up with more server hardware than it had before it decided to consolidate its servers. This completely undermined its stated goal of reducing hardware costs.
For other organizations, virtual machine sprawl has become a logistical nightmare, as virtual servers are created so rapidly that it becomes difficult to keep track of each one's purpose, and of which ones are currently in use.
There are some key practices to help avoid virtual server sprawl. One of them is helping management and administrative staff to understand that there are costs associated with deploying virtual servers. Many people I have talked to think of virtual servers as being free because there are no direct hardware costs, and in some cases there's no cost for licensing the server's OS. However, most virtual servers do incur licensing costs in the form of anti-virus software, backup agents and network management software. These are in addition to the cost of the license for whatever application the virtual server is running. There are also indirect costs associated with things like system maintenance and hardware resource consumption.
Another way to reduce the potential for VM sprawl is educating the administrative staff on some of the dangers of excessive VM deployments. By its very nature, IT tends to be reactive. I have lost count of the number of times when I have seen a virtual server quickly provisioned in response to a manager's demands. Such deployments tend to be performed in a haphazard manner because of the pressure to bring a new virtual server online quickly. These types of deployments can undermine security, and may impact an organization's regulatory compliance status.
Learning New Skills
One potential virtualization pitfall often overlooked is the requirement for the IT staff to learn new skills.
"Before deploying virtualization solutions, we encourage our customers to include storage and networking disciplines into the design process," says Bill Carovano, technical director for the Datacenter and Cloud Division at Citrix Systems Inc. "We've found that a majority of our support calls for XenServer tend to deal with storage and networking integration."
Virtualization administrators frequently find themselves having to learn about storage and networking technologies, such as Fibre Channel, that connect VMs to networked storage. The issue of learning new skill sets is particularly problematic in siloed organizations where there's a dedicated storage team, a dedicated networking team and a dedicated virtualization team.
One way Citrix is trying to help customers with such issues is through the introduction of a feature in XenServer Essentials called StorageLink. StorageLink is designed to reduce the degree to which virtualization and storage administrators must work together. It allows the storage admins to provide virtualization admins with disk space that can be sub-divided and used on an as-needed basis.
In spite of features such as StorageLink, administrators in siloed environments must frequently work together if an organization's virtualization initiative is to succeed. "A virtualization administrator with one of our customers was using XenServer with a Fibre Channel storage array, and was experiencing performance problems with some of the virtual machines," explains Carovano.
He continues: "After working with the storage admin, it turned out that the root of the problem was that the VMs were located on a LUN cut from relatively slow SATA disks. A virtualization administrator who just looked at an array as a 'black box' would have had more difficulty tracking down the root cause."
Underestimating the Required Number of Hosts
Part of the capacity planning process involves determining how many host servers are going to be required. However, administrators who are new to virtualization often fail to realize that hardware resources are not the only factor in determining the number of required host servers. There are some types of virtual servers that simply should not be grouped together. For example, I once saw an organization place all of its domain controllers (DCs) on a single host. If that host failed, there would be no DCs remaining on the network.
One of the more comical examples of poor planning that I have seen was an organization that created a virtual failover cluster. The problem was that all of the cluster nodes were on the same host, which meant that the cluster was not fault tolerant.
My point is that virtual server placement is an important part of the capacity planning process. It isn't enough to consider whether or not a host has the hardware resources to host a particular VM. You must also consider whether placing a virtual server on a given host eliminates any of the redundancy that has intentionally been built into the network.
Multiple Eggs into a Single Basket
On a similar note, another common virtualization pitfall is the increasingly high-stakes game of server management. A server failure in a non-virtualized data center is inconvenient, but not typically catastrophic. The failure of a host server in a virtual data center can be a different issue altogether, because the failure of a single host can mean the unavailability of multiple virtual servers.
I'll concede that both VMware Inc. and Microsoft offer high-availability solutions for virtual data centers, but it's worth noting that not all organizations are taking advantage of these solutions. Besides, sometimes it's the virtualization software that ends up causing the problem. Take for instance a situation that recently faced Troy Thompson, of the Department of Defense Education Activity division.
Thompson was running VMware ESX version 3.5, and decided to upgrade his host servers to version 4.0. While the upgrade itself went smoothly, there were nine patches that needed to be applied to the servers when the upgrade was complete. Unfortunately, the servers crashed after roughly a third of the patches had been applied. Although the virtual servers themselves were unharmed, the crash left the host servers in an unbootable state. Ultimately, VMware ESX 4.0 had to be reinstalled from scratch.
My point is that in this particular situation, a routine upgrade caused a crash that resulted in an extended amount of downtime for three virtual servers. In this case, all three of the virtual servers were running mission-critical applications: a Unity voice mail system, and two Cisco call managers. Granted, these servers were scheduled to be taken offline for maintenance, but because of the problems with the upgrade, the servers were offline for much longer than planned. This situation might have been avoided had the upgrade been tested in a lab.
Best Practice Recommendations
I do not claim to have all of the answers to creating a standardized set of best practices for virtualization. Even so, here are a few of my own recommended best practices.
Test Everything Ahead of Time
I've always been a big believer in testing upgrades and configuration changes in a lab environment prior to making modifications to production servers. Using this approach helps to spot potential problems ahead of time.
Although lab testing works more often than not, experience has shown me that sometimes lab servers do not behave identically to their production counterparts. There are several reasons why this occurs. Sometimes an earlier modification might have been made to a lab server, but not to a production box, or vice versa. Likewise, lab servers do not handle the same workload as a production server, and they usually run on less-powerful hardware.
When it comes to your virtual data center, though, there may be a better way of testing host server configuration changes. Most larger organizations today seem to think of virtualization hosts less as servers, and more as a pool of resources that can be allocated to VMs. As such, it's becoming increasingly common to have a few well-equipped but currently unused host servers online. These servers make excellent candidates for testing host-level configuration changes because they should be configured identically to the other host servers on the network, and are usually equipped with comparable hardware.
Some Servers Not Good for Virtualization
Recently, I've seen a couple of different organizations working toward trying to virtualize every server in their entire data center. The idea behind this approach isn't so much about server consolidation as it is about fault tolerance and overall flexibility.
Consider, for example, a database server that typically caries a heavy workload. Such a server would not be a good candidate for consolidation, because the server's hardware is not being underutilized. If such a server were virtualized, it would probably have to occupy an entire host all by itself in order to maintain the required level of performance. Even so, virtualizing the server may not be a bad idea because doing so may allow it to be easily migrated to more powerful hardware as the server's workload increases in the future.
At the same time, there are some servers in the data center that are poor candidates for virtualization. For example, some software vendors copy-protect their applications by requiring USB-based hardware keys. Such keys typically won't work with a virtual server. Generally speaking, any server that makes use of specialized hardware is probably going to be a poor virtualization candidate. Likewise, servers with complex storage architecture requirements may also make poor virtualization candidates because moving such a server from one host to another may cause drive mapping problems.
Virtualization technology continues to improve, so I expect that in a few years fully virtualized data centers will be the norm. For right now, though, it's important to accept that some servers should not be virtualized.
Consider Replacing Your Network Management Software
As I stated earlier, legacy network management software is often ill-equipped to manage both physical and virtual servers. As such, virtual server-aware management software is usually a wise investment.
Avoid Over-Allocating Server Resources
It's important to keep in mind that each host server contains a finite set of hardware resources. Some of the virtualization products on the market will allow you to over-commit the host server's resources, but doing so is almost always a bad idea. Microsoft Hyper-V Server, for example, has a layer of abstraction between virtual CPUs and Logical CPUs (which map directly to the number of physical CPU cores installed in the server). Because of this abstraction, it's possible to allocate more virtual CPUs than the server has logical CPUs.
Choosing not to over-commit hardware resources is about more than just avoiding performance problems; it's about avoiding surprises. For example, imagine that a virtual server has been allocated two virtual CPUs, and that both of those virtual CPUs correspond to physical CPU cores. If you move that virtual server to a different host, you can be relatively sure that its performance will be similar to what it was on its previous host so long as the same hardware resources are available on the new server. Once moved, the virtual server might be a little bit faster or a little bit slower, but there shouldn't be a major difference in the way that it performs, assuming that the underlying hardware is comparable.
Now, imagine what would happen if you moved the virtual server to a host whose processor cores were already spoken for. The virtualization software would still allocate CPU resources to the recently moved server, but now its performance is directly tied to other virtual servers' workloads, making it impossible to predict how any of the virtual servers will perform at a given moment.
As you can see, there is much more to server virtualization than meets the eye. Virtualization is an inexact science with numerous potential pitfalls that can only be avoided through proper planning and testing.