Planning Primer: Benchmarking -- Virtualization Review

Planning Primer: Benchmarking

An important part of your infrastructure planning is determining the impact of virtualization on hardware and software performance. Here's how to do it right.

By Edward L. Haletky
08/15/2008

Performance testing is an important aspect of planning your virtual infrastructure; but what to test-or even how to test-is a significant challenge.

How significant? Consider, for example, the market leader. VMware Inc. from the beginning has refused to allow the posting of performance numbers related to its products. Recently, it has relented and allowed some carefully benchmarked stats to be posted, but that serves as only a start.

The other issue is that even with good benchmarking, the numbers may be of limited value to your planning process, given the endless variables that differentiate one company's IT infrastructure from another's.

That's why virtualization should be tested individually-it's problematic to rely on numbers from vendors or other companies. They may post their results, but because no two companies have the identical workload for virtual machines (VMs), those numbers may not be usable.

It's true even with similar applications, as your apps may not have a comparable workload. For example, a recent poster to the VMware Communities forum (for which I'm a moderator) stated that he was running HP Systems Insight Manager in a VM on top of Windows 2003 Enterprise Edition, and experiencing performance issues. I run the same environment, with no performance issues. Why? Because my workload is much different than the workload in question. Thus, even though I have the same app and OS versions, our experiences are different.

It goes beyond workloads, too. For example, other VMs hosted by the server affect performance, because each VM shares the major resources of the virtualization server, including memory, disk I/O, network I/O and CPU.

Apples to Apples
It's important to make sure the VMs mimic the real world exactly. That means picking your application suite carefully and mimicking the physical servers as much as possible. If the application uses Windows Server 2003 Enterprise, don't install Standard within the VMs. Nor would you want to reduce the amount of memory in the VM until you're sure the real world doesn't use the memory.

There's one caveat to that rule: multiple CPUs. It's advisable to use only one virtual CPU unless you're sure the app load makes use of all the CPUs involved. That's because unused virtual CPUs will degrade performance. If your app is threaded, like SQL, or uses many processes, like Citrix, multiple CPUs will make a difference; but in general, most apps don't make use of secondary CPUs or cores.

On the Edge
After planning your workload and VM hardware setups, it's important to test some uncommon, or "edge," situations that occur in the virtual world. Start with testing the effect on a VM when a snapshot isn't committed and allowed to grow. This is a critical test, as you can determine what issues may occur if a backup fails. Knowing this in advance will help in problem solving.

Second, determine the impact on performance when VMotion or Live Migration is employed to move a VM from host to host. (Note that Microsoft's Quick Migration will only impact the performance of other VMs, because it powers off the VM for movement, then powers it back on.)

Create a list of actions that can occur and test the performance on each of these edge conditions, along with others that pertain specifically to your app. I find that what are often considered edge conditions happen quite regularly. It's also important to realize that the mix of conditions will affect the performance of all the VMs on the system, so applying them to just one VM is insufficient for thorough testing.

Test Your Plan
Once you have a good test plan, it's time to run the tests and revise the plan. When testing, it's essential to keep in mind the goal of your tests; are you trying to see how the VMs respond under normal and abnormal loads, for instance, or are you trying to determine how many of the same VMs you can run on the system without having performance impacts?

Tests using SQL and Exchange servers, to take other examples, may try to determine the maximum number of simultaneous transactions that can take place for each VM. Each of these need some method of loading up the VMs in question.

That requires a data generator. Microsoft provides tools to do this for some of its products, such as Exchange, but in most cases you'll have to come up with your own. This takes some thought. Say, for instance, you want to generate a workload for a virtual file server. Would you constantly view the same file? Or would you create something that would view and modify many different files to demonstrate a real-world load? Would you look at only Read or Write performance?

Choosing the proper load-generation tool is critical to a successful performance measurement. In general, tools like Iozone and Iometer are good at applying extreme loads, but aren't very good at depicting real-world workloads.

Planning Checklist

1. Choose workload appropriately.

2. Choose data generators.

3. Consider virtualization and other specific edge cases.

4. Document the plan carefully.

5. Run the test.

6. Interpret and redefine tests.

7. Re-run tests.

8. Write plan for continual monitoring.

-E.H.

Fine-Tuning
After running the tests, it's time to fine-tune them for maximum usefulness. A first step is to change the number of virtual CPUs (vCPUs) involved so that there's no longer a one-to-one mapping between vCPU and physical CPU (pCPU). For example, four single vCPU VMs would map one vCPU to one core on a two-way, dual-core pCPU server. The ratio can go as high as 8:1 in some cases. Without it, you won't know the impact on performance when the count of VMs per host is increased.

Another fine-tuning step is to modify the amount of memory allocated to the VMs. In this case you're testing two conditions: The least amount of memory you can assign to the VM, and the effect when the memory on the host is fully overcommitted. Overcommitment occurs when there's more allocated memory than the system has available. That situation could happen in a VMware high-availability (VMware HA) scenario, where a host dies and the migrated VMs are started up on other hosts in the cluster of virtualization servers.

Next, fine-tune the network tests. If your app is network-intensive, it's important to test what could happen when other VMs start sucking up more bandwidth. You may need to implement quality of service (QoS) limits on other VMs, or the one in question. Run a network packet generator from a VM to test this eventuality.

The last fine-tuning step is to run your tests using different remote-storage technologies to determine which works best for your application load. You can choose from NFS, iSCSI, SAN or local VMFS storage using a myriad of possible solutions. You may be surprised by the results. For SAN solutions, you may need to modify the queue length variables in order to run different tests, or enable jumbo frames to get better performance.

I/O Considerations
Because I/O often ends up being the major bottleneck, especially with quad-core CPUs changing the face of virtualization, you should run multiple tests to determine the proper settings for your VM workload. Make small changes to your environment and document each change as you make it, or your performance testing may result in a hodge-podge of difficult-to-interpret results. Workloads should include real-world disk I/O and network I/O tests of the applications to be run.

A good performance-testing plan is crucial to planning the proper tasks for implementing your virtual environment. There are three critical concepts to remember going forward:

Performance of any given VM on the system is dependent upon the actions taking place within other VMs on the system
No two app workloads between entities are the same
Use real-world data-generation methods to feed the workloads to virtualize

Post-Test: What Now?
VMmark is a good starting point for performance testing, but don't be fooled into thinking it's comprehensive; you can use it as a jumping-off point, but it's imperative to develop your own performance test plan.

In addition, remember that performance planning doesn't end with the test of the environment prior to deployment. Testing must be continuous: Workloads may behave differently than the test environment, workloads may change over time and the underlying hardware may also change.

To that end, part of any design should include change processes. That should include how to balance the workloads across all virtualization servers (if there are more than one, which is generally the case). Determine the upper values for resource utilization that dictate the need for more hosts, for another example. These upper values need to be found consistently by using the same tool every time.

Know Thy Tools
It may sound obvious, but knowing your measurement tools is also a part of performance planning. If the tool uses a graph, for example, a good plan will outline how the graph should be read. In many cases this is obvious, but not always.

Take the VMware Infrastructure Client (VIC) and its performance graphs, for instance. If you don't have VirtualCenter (VC), these graphs live for only as long as the VIC stays connected to the host. In VC, they're around as long as the VC database is available. The graphs within the VIC can also be cryptic or have too much information. There are actually separate

Y-axes on several of the graphs-a left and a right-and if you look at the wrong side for the values, you get two different views and could miss critical data. These subtle confusions should be spelled out in any document about performance monitoring created during the performance-planning stage.

Finally, use the data from the implementation to create better benchmarks by continually monitoring performance. You will need to repeat the tests when new hardware or different applications are placed within your virtual infrastructure. You may want to go as far as requiring a performance test before a new VM is created. Again, try to get the workload as real world as possible.

Performance planning can give the virtual infrastructure admin insight into combating performance problems when they arise from inevitable spikes in activity. These may be edge conditions or a VM with serious issues. A well laid-out plan to alleviate virtual bottlenecks is a very important part of this process. Most of these should be determined during the performance-planning phase. Key items to consider are the Fibre Channel-host bus adapter (FC-HBA) queue length or the placement of VMs on hosts and storage.

Peak Performance
Planning for performance is more than just determining the tests to run and running them to get some numbers from the workloads selected. It's also about determining what to do when a performance problem occurs; deciding which monitoring tool (or tools) to use; how to interpret the data it represents and more. Keep in mind, however, that a good plan starts with a good testing workload.