In-Depth
Planning Primer: Pitfalls
Build a solid and long-lasting virtual infrastructure around virtualization's many moving parts, including servers, storage, network, security and management.
Let's just get it virtualized, and
then we'll worry about the rest ..." is a phrase I hear all too often from administrators in the field eager to
virtualize. After all, tools such as PlateSpin PowerConvert and VMware Converter allow admins to convert physical
computers to virtual machines (VMs) in as little as a few mouse clicks. Rushing to virtualization is always enticing.
Vendor return on investment (ROI) calculators show enormous and fast returns on the virtualization investment, so the
sooner an organization virtualizes, the sooner it can reap the financial and operational rewards of virtualization.
While this makes sense in theory, moving too quickly toward virtualization can often lead you into a series of
pitfalls, including:
- Infrastructure instability
- Failure to meet required service levels
- Poor performance
- Reduced infrastructure security
Many organizations typically have a few naysayers that don't want to virtualize, so
falling into a virtualization migration pitfall may give the naysayers the ammo they need to set any planned
virtualization project back months or even years. Don't worry. I'm not going to spend any more time shouting out
warnings against virtualizing. The bottom line is that with the right plan, virtualizing your IT infrastructure will
give you the benefits you expect, and probably additional benefits that come from looking at IT problems differently
once a virtual infrastructure is in place.
Memory Share and Share Alike?
Hypervisors that support memory sharing, such
as VMware's ESX, can improve consolidation ratios when similar applications and operating systems are located on the
same physical host. Memory sharing will consolidate redundant Read-only memory pages into single-instance Read-only
pages. The result can be increased consolidation density as high as 40 percent.
Vendors that don't offer memory sharing
will push back and tell you that such features cause slow performance. That's true, and the comparison is similar to
the timing of when transaction logs are committed to databases. Scanning memory pages for redundancy can be a resource
-intensive task, but resource utilization is not a big deal if the hypervisor waits for periods of low utilization for
performing the memory-consolidation jobs, which is the case with ESX. So when you first consolidate, you may not see an
immediate savings on shared-memory consolidation.
If you wait a week, you'll see a significant difference. You may ask,
"If I have to wait a week, what good is it?" Considering that physical servers can remain online without a reboot for
months to even years (depending on maintenance and patch requirements), you'll have plenty of time to enjoy the fruits
of memory sharing. Also, considering that you'll continually add VMs over time, memory sharing can allow you to stretch
your hardware investment further than you may be able to when running hypervisors that don't support memory sharing.
Shared resources will also place a strain on service-level requirements. Service levels are best assured by configuring
resource pools, which give you the ability to pool VMs and guarantee them specific service levels relating to resource
consumption.
Resource Utilization and Performance
Capturing resource utilization is an integral part of consolidation
planning and is used to determine consolidated VM physical placement, as well as to disqualify some systems from being
virtualized.
As a general rule of thumb, the following resource characteristics will prevent a server from being
virtualized:
- Specialized hardware requirements unsupported by the virtualization engine
- Extremely high-resource utilization on the bare metal
Specialized adapters such as video capture cards are not supported in VMs on server
virtualization platforms; however, if consolidation is a top priority, you could leverage an OS virtualization solution
such as Parallels Virtuozzo Containers to consolidate the system and still allow each virtual container to access the
specialized hardware device.
High-resource utilization will often prevent a server from being virtualized during the
initial consolidation project. For me, high is 50 percent; if a host is using more than half its CPU resources on the
bare metal, holding off on virtualizing the server is generally a good practice.
Following are guidelines to avoid
performance pitfalls in each core resource: CPU, Memory, Storage, Network.
CPU
Most consolidation projects involve moving
workloads from slower CPUs on older systems to faster CPUs on newer systems, so before-and-after CPU performance is
never an apples-to-apples comparison. Consolidation tools capable of performing what-if analysis against new hardware
(such as, "What will performance look like on an IBM x3650?") can take much of the guesswork out of before-and-after
CPU performance.
Organizations are often wary of virtualizing CPU-intensive workloads such as database servers, but it
is possible to virtualize these workloads as long as you don't oversubscribe the CPU. CPU oversubscription occurs when
the number of virtual CPUs (vCPUs) on a physical host is greater than the number of physical CPUs (pCPUs) on that host.
For example, eight two-way VMs (16 total vCPUs) could run on a four-way quad-core system (16 total physical cores)
without taxing the hypervisor's CPU scheduling workload. Of course, pCPUs are frequently oversubscribed for non-CPU-
intensive workloads, so adhering to vCPU-to-pCPU affinity is only necessary for CPU-intensive apps.
Interoperability
must also be considered when planning the compute architecture. Today, for example, you cannot live migrate a VM
between an Intel platform and an AMD platform. Even in the case of an offline migration, you still need to be careful
because hypervisors do not fully virtualize physical CPUs. This is important because applications with licensing or
activation bound to the CPU may require a reactivation once their associated VM is moved from an Intel host to an AMD
host.
One other consideration is symmetric multiprocessing (SMP). VMs should only be assigned more than one vCPU if the
applications running inside the VM can be marshaled to multiple CPUs and hence take advantage of them. Otherwise, the
added CPU scheduling overhead associated with the extra vCPU could actually degrade performance.
Some migration tools
don't provide the option to select the number of vCPUs in a target VM. So if a two-way physical host was migrated to a
VM, the VM would automatically be configured with two vCPUs. If that happens and only one vCPU is needed, you should
remove the second vCPU before powering on the new VM for the first time. Once a single vCPU VM powers on, you should
verify that it has a uniprocessor HAL driver installed in Device Manager. While the VM will run fine with an SMP HAL
driver, CPU performance will be slightly degraded.
Memory Loss
Memory is frequently discussed in virtualizing planning
because it's the most common bottleneck. This is because many of today's hypervisors (with ESX and VMware Server being
two exceptions) don't support memory overcommit.
The effect is that in a physical server with 16GB of physical memory,
the total memory allocated to all VMs running on the server cannot exceed 16GB. Because VMs are sized based on the
maximum amount of memory needed under their peak workload, hypervisors without memory-overcommit support will not
provide the same consolidation density as those that do. VMs on the same physical server rarely need all of their
memory at the same time, so memory overcommit allows the hypervisor to page some of a VM's memory to disk and allocate
physical memory to the VM as needed.
A common practice is to just throw more memory at a performance problem, but that
doesn't always work. Take Exchange Server 2003 for example, which is capable of using up to 4GB of RAM. Running
Exchange 2003 in a VM with more than the 4GB of allocated memory would simply be a waste of resources. Be aware of the
memory maximums associated with each virtualized app and allocate memory accordingly.
Finally, you should also keep a
close eye on hardware-assisted memory virtualization, which is a technology currently shipping on quad-core Opteron
processors. Intel's hardware-assisted memory virtualization (named Extended Page Tables) should be available later this
year.
Hardware-assisted memory virtualization has shown substantial improvements in enterprise application performance
in virtualized environments, especially with multithreaded apps. The bottom line is that hardware-assisted memory
virtualization allows VMs to modify their own physical page tables and removes the bottleneck of shadow page table-
based memory virtualization. When selecting hardware and virtualization platforms, check to see that both support
hardware-assisted memory virtualization, especially if you have future plans to virtualize multithreaded enterprise
applications.
Storage
I personally only deploy virtualization solutions as part of high-availability (HA) clusters, so in
my opinion, shared storage is a must. When evaluating storage solutions, you need to be careful to ensure that adequate
bandwidth will be available for all VMs sharing a particular storage interface. For example, suppose an iSCSI array
allows you to aggregate two 1GB interfaces into a larger 2GB interface. While that's good, such a feature is only
useful if you can perform similar aggregation on the other end of the data path-at the hypervisor. Not all software
iSCSI initiators support multipath, so if multipath and aggregated links are needed to meet I/O requirements, you'll
need to use a hardware iSCSI HBA, such as the QLogic QLE4062C.
All I/O connections-network and storage-should be
redundant, as a single point of failure will impact multiple VMs on a given host. So a substantial investment in
networked storage may be required, depending on what you already have in place.
When evaluating arrays, it's important
to look for features that improve VM storage scalability (such as thin provisioning) and also provide serverless backup
options such as snapshots.
Planning Checklist |
Step 1: Conduct a thorough analysis of the IT infrastructure that includes both
technical and non-technical constraints. Leverage the collected data to access virtualization platform and hardware
feasibility for the consolidated environment.
Step 2: Verify hardware compatibility (server, storage, network) and
software compatibility (operating systems, applications) and support for the prospective virtual environment.
Step 3:
Re-architect existing business processes for compatibility with the virtualized environment before the consolidation
project completes.
-C.W.
|
Network
Virtual networks give you the ability to allow multiple VMs to share physical network
interfaces, and with that ability it's essential to ensure you're not voiding any internal segmentation restrictions
when architecting the virtual switch infrastructure. Some organizations still require physical isolation of security
zones, which means that VMs will need to be physically isolated on different physical networks in some circumstances.
As with storage, your consolidation analysis should provide detailed information on average and peak network I/O
requirements for a given system. Keep in mind that if you plan to change your backup architecture following the
migration, your network I/O requirements may be much lower (assuming an earlier LAN-based backup) and may impact the
number of physical resources you need for the consolidation project.
You should also plan for plenty of network ports.
Most organizations dedicate two ports as a network interface card (NIC) team for management and cluster heartbeat
traffic. So as a good practice, you should not consider the two onboard ports that come with a physical server as part
of the consolidation analysis. Instead, plan to leverage the server's available PCIe expansion slots for dividing up
both network and storage I/O. Like with the management network, all production network ports should be teamed in order
to provide both load balancing and failover support. If availability is a requirement, you should avoid hypervisors
that don't support NIC teaming.
When purchasing NICs, it's always a good idea to offload as much CPU work to the NIC as
possible, so it's a good idea to purchase NICs with TCP offload engine (TOE) support. TOE NICs improve VM network
performance and reduce the hypervisor's CPU tax resulting from processing TCP overhead.
Beware: Sales Pitches
It's
always easy to take what a virtualization vendor says at face value, but experience has shown most of us that
significant differences usually exist between a marketing checkbox and a true product feature. For instance, there's a
huge difference between virtualization HA that relocates VMs to the next node in the cluster following a physical
server failure and one that invokes a fan-out failover that equally distributes VMs across all remaining physical nodes
in the cluster.
For some virtualization products, VM failover occurs solely by logical node order. So if VMs are
running on node 1 and node 1 fails, every VM will try and start on node 2. The ones that can't start on node 2 will
move to node 3 and so on. With that system, failover can take significantly longer to complete. This is why when you
evaluate virtualization solutions you must do so on a three-node cluster at a minimum. This will let you validate the
failover behavior of the HA solution. Live migration behavior also varies by vendor, so it's equally important to
validate how live migration will work on a production workload.
Virtualization solutions are not the same, so when
sizing up hypervisor or OS virtualization platforms, keep the following questions in mind:
- Operating system and
application compatibility and support: Will my applications and OSes be supported on the virtualization platform?
-
Performance: Will the virtualization platform outperform competing platforms?
- High availability (HA): What are the
available HA and failover options?
- Automation and management: How well does the
virtualization platform integrate
with my existing management infrastructure?
- Hardware compatibility: Can I use any of my existing server, storage and
network hardware with the new solution?
Vendor histories of skeptical, one-sided benchmarks have caused most to take
benchmarks with a grain of salt. Virtualization benchmarks aren't much different. For example, benchmarks with
unrealistic workloads (such as running only one VM per physical server) or unrealistic configurations (such as those
that use RAID 0r back-end storage) should be viewed with extreme caution, as you should not expect to see similar
results in a real-world implementation consisting of multiple VMs per physical host and fault tolerant back-end storage
(for more on benchmarking considerations, see "Planning Primer: Benchmarking").
Hardware choice for the consolidated
solution is critical as well. Virtualization deployments often involve the purchase of new server and networked storage
platforms. When it comes to virtualization, scalability should be your primary concern. For example, some blade
solutions will look really good for virtualization on the outside, but when you look a little closer you may see that a
particular blade chassis only supports 18 physical I/O ports per chassis, for example. Consider a blade chassis with 14
blades and 10 VMs per blade and you have a total of 140 VMs sharing 18 I/O ports. In most cases, the math just doesn't
work.
Not all newer third-generation blade solutions have such chassis I/O restrictions, so by no means are blades
always a bad choice for virtualization. Some organizations prefer to go with very large servers in order to reap the
rewards of a high-consolidation density; however, you still need to consider the impact of physical system failure and
failover time.
In some cases, it's better to go with a lower consolidation density on 2U servers and have quicker
failover than to go with 4U servers and a prolonged failover. Of course, all of this falls into the proverbial "it
depends" IT assessment, as I've successfully deployed blade-, 1U-, 2U- and 4U-based servers in virtualization projects.
1U servers may not make sense for enterprise workloads, but are often a good fit for a few VMs running in a branch
office. Take a couple of 1U servers and Stratus Avance or Marathon FT, for example, and you have a very cost-effective
branch office virtualization solution.
Don't Panic! But Fear the Shortcut
There are plenty of tools that make it
extremely easy to convert a physical system to a VM, but just because it's easy to get virtualized doesn't mean that
you should hurry up and do it.
Instead, taking the time to properly plan around virtualization's technical and non-
technical pitfalls is the safest way to ensure you'll successfully realize the benefits of virtualization without any
ensuing panic.