Lab Experiment: Hypervisors -- Virtualization Review

Lab Experiment: Hypervisors

Is there really a performance difference between hypervisors from VMware, Microsoft and Citrix? We put them under a microscope and stress-tested them for days to find out the answer.

By Rick Vanover
03/02/2009

When it comes to hypervisors, there are many choices now. That's good news for consumers, who were much more limited even one year ago. Still, when admins go shopping, they look first at hypervisors from VMware Inc. (ESX), Microsoft (Hyper-V) and Citrix Systems Inc. (XenServer). Given that situation, we wanted to do a side-by-side, exact comparison under identical conditions to determine which of these is the best-performing product. After all, the hypervisor is still the engine that drives virtualization, even with all the management and third-party products built on top.

To that end, we put on our white lab coats, slipped on our pocket protectors, dusted off our calculators and poked, prodded and put the three challengers through a battery of tests to find out how they perform a variety of virtualization functions. So swipe your biometric authentication card at the door and join us in the hypervisor laboratory to see how each specimen stacked up against the competition.

Test Objectives
All the hypervisors offer essentially the same base functionality. In this series of tests, the objective was to put the same workloads on each one and see how they stack up. The types of workloads tested varied, to simulate a typical environment in which some virtual machines (VMs) are stressed, and some aren't. Each platform was subjected to the same test plan parameters, to give a fair accounting of their performance.

When examining the results, keep in mind that we only tested and compared the hypervisors. While the management layer is critically important to implementations of any size, the fundamental unit of virtualization-the hypervisor-needs to be stacked up against the competition to see how each performs under various workloads. This stems from business pressure to justify the platform selection. While there are cases for each platform, including the management and supporting tools available, hypervisor performance is the cornerstone of any virtualization solution.
In the current market, the pressure is on ESX as the prevailing market-share leader for server virtualization. Because of this, VMware has a bull's-eye on its back, and Microsoft and Citrix are holding the biggest shotguns. These offerings are most likely to be the top choices among companies, so it made sense to limit our testing to these three. (But stay tuned-we'll test other platforms in future issues.)

Comparison Parameters
Each platform was tested as a single, unmanaged installation of the hypervisor on the exact same equipment inventory. The guest VMs were a collection of systems that included one Windows Server 2003-based guest, running SQL Server 2005. We also increased the number of guests running Windows 2003, to scale the workload upward. All guest operating systems are 32-bit, which gives an accurate picture of the current technology landscape in most virtualized data centers.

One of the more difficult things to quantify in virtualized environments is user experience. To help measure this important factor, one of the tests measured a task in terms of completion time. This response time was measured by the start and stop times of a database task. The tests also collected disk, CPU and memory performance.

For calculating memory, disk and CPU operations we used tools from PassMark Software. We would like to thank the company for providing the software for this comparison.

The key to making this cross-comparison of the hypervisors is ensuring a consistent environment. For each test, every hypervisor was installed on the same server using the same disk system, processor quantity and memory quantity. The same hardware was used for each test and all software was installed and configured the same way for each test.

Test Environment
From the test plan, there were three main tests performed, as seen in Table 1.

The same database system was used throughout all the tests and maintained the same workload. One important part of that workload was a SQL agent job that performed cleanup and re-indexing of a database while logging the start and stop times.

The last test step was to restore the database to the pristine state so that the workload was exactly repeatable. The start and stop time was logged on the database system into a separate database, then used to calculate an average completion time. The SQL job used CPU, memory and disk for its tasks.

The measurable units in this test were recorded from the PassMark software and the SQL database job stop and start times. A higher value in the number of operations in the disk, CPU and memory is better. The lower SQL agent job run time indicates quicker performance.

Test Caveats
Note that this comparison is only focusing on base hypervisor performance. We're not comparing management tools, though they had to be used on each platform to some level. This can make a difference on how VMs are provisioned or resources managed. Each guest VM was provisioned and allocated resources for RAM and CPU according to the test plan, and those resources were managed only by the hypervisor engine. Specifically, in CPU, there were no limits placed on the guest VMs in regard to MHz usage for each guest's single assigned CPU. Further, different platforms do manage this differently, so there's no entirely equal way to provision them.

Before we get too wrapped up in how these three products measured up to the same test, it's important to understand workloads and their varied behaviors. Different workloads may or may not be suitable for virtual environments, so it may make sense that some workloads would not be virtualization candidates, as all workloads are not created equal. Repeating this test or something similar may be a worthwhile investment in your own environment, especially if you can use capacity-planning software to model your expected results.

Now let's see how ESX, Hyper-V and XenServer handled the same workload.

Microsoft Hyper-V
Hyper-V was the first product compared, and it performed quite differently from expectations. Hyper-V has been a focus of Microsoft dev efforts, and it shows. Overall, Hyper-V did well in this comparison and proved itself a worthy product.

Setting up VMs through Hyper-V was easy. The Windows environment makes the use and administration of the host server intuitive, even on the stripped-down core edition. For Hyper-V, most Windows admins turn to Task Manager to gauge performance. Be careful, though: within Hyper-V, the guest VM workloads are not entirely reflected in the performance section of Task Manager. That's something administrators may need to get used to if Hyper-V lands in their data centers. (While not part of the comparison, Hyper-V Manager shows each VM's processor as a percentage in use from the host. So, though Hyper-V is running on a Windows server, there are new tricks to learn in measuring the hosts' performances.)

In our tests, Hyper-V did well in all categories-it's a real, viable competitor for the competition. Table 2 shows Hyper-V's comparative performance.

Test Definitions

CPU operations are measured as the number of integer and floating-point mathematical operations per hour.

Disk usage operations are measured as the number of reading and writing of file operations per hour, with a 32KB file block size, consuming up to 15 percent of the free space of the C: drive on the guest virtual machine.

RAM operations are the number of operations that populate and unpopulate the system memory in five sequences.

Citrix XenServer
XenServer may not have ESX's reputation or Microsoft's marketing muscle, but the hypervisor definitely held its own in this comparison. In fact, it did better than that: XenServer performed best over the largest range of categories in our tests, as you can see in Table 3.

XenServer's test results are impressive, but are they enough to justify a replacement of your current hypervisor? For environments with virtualized systems that have a large number of CPU- and memory-intensive workloads, it may be a good choice. The caution is that those high I/O workloads flirt with not being good virtualization candidates, so some administrators might instinctively place these workloads on physical systems. Make no mistake, however: XenServer did extremely well, posting excellent performance numbers.

VMware ESX 3.5
Because my core experience is VMware-based virtualization, one of my hidden test objectives was to determine if VMware is the right hypervisor in an environment with a large number of VMs and lighter workloads. This is typical for many virtualization environments: workloads that aren't saturated with heavy-hitting VMs that pound the memory and CPU all day long. This helped me determine if VMware is the best bang for the buck.

For the first two tests of heavy workloads, VMware underperformed both XenServer and Hyper-V. For the lighter workloads on the third test, the results were almost indistinguishable across the platforms, but ESX had the best results in three of the four categories. Table 4 shows the ESX results.

Lessons Learned
After doing these comparisons of ESX to Hyper-V and XenServer, it's clear that at the hypervisor level, ESX is optimized for a large number of less-intensive workload VMs. For intensive workloads that may not be optimized for memory overcommit apps, Hyper-V and XenServer should definitely be considered-even if that means adding another hypervisor into the data center.

Does that mean ESX should be ripped out? Absolutely not. Remember that this is one specific type of test, and didn't take into account other critical aspects of a virtualized environment like management tools, advanced features like VMotion, or third-party application availability. All that goes into the decision-making process, and the factors should be weighted according to your organization's priorities.

Expert Guidance

To ensure the validity of our test results and testing environment, we enlisted the help of Stuart Yarost to formulate and validate the test plan. Yarost is an ASQ Certified Software Quality Engineer and Certified Quality Engineer with more than 22 years' experience in the software and quality fields. Yarost currently holds the position of Vice Chair of Programs for the ASQ Software Division. A special thanks to Yarost for his help.

Think You've Seen Enough?
There was no overall winner here; no way to say, "This is the best hypervisor for every situation." But some general conclusions can be drawn. For CPU- and memory-intensive applications, XenServer and Hyper-V are attractive and have proven their mettle. For a large number of light to moderate workloads-or if you decide that memory overcommit, for example, is important-ESX may be the answer. What is entirely clear, however, is that all three hypervisors are legitimate virtualization platforms, and that no single company has a monopoly on virtualization any longer.

More Information

Test Environment
We tested selected virtualization platforms for workload performance. Following are the parameters of the test environment and tested platforms:

Platform Guest Requirements

Support Windows Server 2003 R2 with Service Pack 2 (x86 edition) as a guest operating system
Permit the provisioning of guest virtual machines in one of two configurations:
Memory at 1024MB, 10GB local drive space
Memory at 2048MB, 20GB local drive space
No CPU limits in place

Platform Host Requirements

Support an installation on the three parallel systems of identical configuration: Dell PowerEdge 2950 2x2 @ 3.0GHz, 16GB RAM, 360GB local storage on one single array (RAID 5)
Permit use of local array for guest virtual machines

Test Environment Requirements

Permit the use of PassMark BurnInTest 5.3 Professional on the guest operating system
Permit remote console access for performance reporting
Load the two selected databases and configure as follows:
- Run the SQL Agent job to perform the following job:
  
  Use RWVTEST2
  INSERT INTO IterationLog (Start) VALUES (getdate())
  GO
  Use Master
  RESTORE DATABASE [RWVTEST] FROM DISK = N'I:\MSSQL_BACKUPS\RWV-TEST.bak' WITH FILE = 1, MOVE N'vmWare_Infrastructure3_Data' TO N'F:\Microsoft SQL Server\MSSQL.1\MSSQL\Data\RWVTEST.mdf', MOVE N'vmWare_Infrastructure3_Log' TO N'G:\Microsoft SQL Server\MSSQL.1\MSSQL\Tlogs\RWVTEST_log.ldf', NOUNLOAD, REPLACE, STATS = 10
  GO
  USE RWVTEST
  DBCC SHOWCONTIG (VPX_HIST_STAT1)
  DBCC SHOWCONTIG (VPX_HIST_STAT2)
  DBCC SHOWCONTIG (VPX_HIST_STAT3)
  DBCC SHOWCONTIG (VPX_HIST_STAT4)
  DBCC DBREINDEX ('VPX_HIST_STAT2', '', 70)
  DBCC DBREINDEX ('VPX_HIST_STAT1', '', 70)
  DBCC DBREINDEX ('VPX_HIST_STAT3', '', 70)
  DBCC DBREINDEX ('VPX_HIST_STAT4', '', 70)
  Use RWVTEST
  
  DBCC SHRINKFILE(vmWare_Infrastructure3_Log, 20 )
  GO
  Use RWVTEST2
  INSERT INTO IterationLog (Finish) VALUES (getdate())
- Permit the retrieval of the start and stop times from the IterationLog table

Platforms Compared
VMware ESX 3.5
Microsoft Hyper-V
Citrix XenServer 5

Base Workload
The base of the test had all platforms run the following configuration: 1 Microsoft SQL Server 2005 database server on Windows Server 2003 [System C]

Server ran three databases. Details of the databases:

DB Name: RWVTEST, 4GB (contained a large database)
DB Name: RWVTEST2, 4MB (time tracking of RWVTEST database)

Upward Scaling Workload
Workload pool collection (WLP). This collection was composed of the following:

2 CPU cycle test servers running Windows Server 2003
2 Memory test servers running Windows Server 2003
CPU cycle test servers ran PassMark BurnIn Test version 5.3 Professional for processor test
Memory test servers ran PassMark BurnIn Test 5.3 version 5.3 Professional for RAM cycles
Secondary WLP test included disk I/O tests

Each platform added one WLP as possible until failure.

Provisioning
Each platform provisioned the systems as follows:

System C:

2048MB RAM,
10GB OS
40GB DB

WLP Systems:

1024MB RAM,
10GB OS

All Systems: 1x CPU, single CPU awareness

(insert exclusively online text here or at end of article)

ESX and Memory Overcommit
VMware's memory overcommit technology puts ESX in a unique class among the hypervisors compared. For many administrators and managers, it's a key factor in deciding to use VMware for virtualization. Simply put, the memory overcommit feature allows greater guest-to-host ratios by allowing more memory to be assigned to guest VMs than is physically available on the host.

Because this is an important differentiator among hypervisors in our test, one extra test was performed with ESX only. The host was crammed with 19 VMs, each with 1GB RAM (with 16GB on the host, leaving it short of RAM and causing memory overcommit to kick in). The third test was repeated with the same memory, disk and CPU configuration. Results showed that while you can load more guests onto the host, there's no free lunch. There was a dip in performance and database response time, as seen in Table A.

The results show that there's a definite cost to stacking more VMs on the host and using memory overcommit; it should come as no surprise that the number of operations per hour go down and response time increases. But it allows you to do things you cannot do with Hyper-V and XenServer; and because most systems sit idle most of the time, memory overcommit can be extremely valuable in the data center. Many administrators use it frequently, and approach overcommit ratios of 2-to-1 or 3-to-1 regularly.

Although XenServer and Hyper-V currently do not offer memory overcommit, we decided to include test information for the technology because it's a key core feature of ESX.
-- R.V.