Lab Experiment: Hypervisors

Is there really a performance difference between hypervisors from VMware, Microsoft and Citrix? We put them under a microscope and stress-tested them for days to find out the answer.

When it comes to hypervisors, there are many choices now. That's good news for consumers, who were much more limited even one year ago. Still, when admins go shopping, they look first at hypervisors from VMware Inc. (ESX), Microsoft (Hyper-V) and Citrix Systems Inc. (XenServer). Given that situation, we wanted to do a side-by-side, exact comparison under identical conditions to determine which of these is the best-performing product. After all, the hypervisor is still the engine that drives virtualization, even with all the management and third-party products built on top.

To that end, we put on our white lab coats, slipped on our pocket protectors, dusted off our calculators and poked, prodded and put the three challengers through a battery of tests to find out how they perform a variety of virtualization functions. So swipe your biometric authentication card at the door and join us in the hypervisor laboratory to see how each specimen stacked up against the competition.


Test Objectives
All the hypervisors offer essentially the same base functionality. In this series of tests, the objective was to put the same workloads on each one and see how they stack up. The types of workloads tested varied, to simulate a typical environment in which some virtual machines (VMs) are stressed, and some aren't. Each platform was subjected to the same test plan parameters, to give a fair accounting of their performance.

When examining the results, keep in mind that we only tested and compared the hypervisors. While the management layer is critically important to implementations of any size, the fundamental unit of virtualization-the hypervisor-needs to be stacked up against the competition to see how each performs under various workloads. This stems from business pressure to justify the platform selection. While there are cases for each platform, including the management and supporting tools available, hypervisor performance is the cornerstone of any virtualization solution.
In the current market, the pressure is on ESX as the prevailing market-share leader for server virtualization. Because of this, VMware has a bull's-eye on its back, and Microsoft and Citrix are holding the biggest shotguns. These offerings are most likely to be the top choices among companies, so it made sense to limit our testing to these three. (But stay tuned-we'll test other platforms in future issues.)

Comparison Parameters
Each platform was tested as a single, unmanaged installation of the hypervisor on the exact same equipment inventory. The guest VMs were a collection of systems that included one Windows Server 2003-based guest, running SQL Server 2005. We also increased the number of guests running Windows 2003, to scale the workload upward. All guest operating systems are 32-bit, which gives an accurate picture of the current technology landscape in most virtualized data centers.

One of the more difficult things to quantify in virtualized environments is user experience. To help measure this important factor, one of the tests measured a task in terms of completion time. This response time was measured by the start and stop times of a database task. The tests also collected disk, CPU and memory performance.

For calculating memory, disk and CPU operations we used tools from PassMark Software. We would like to thank the company for providing the software for this comparison.

The key to making this cross-comparison of the hypervisors is ensuring a consistent environment. For each test, every hypervisor was installed on the same server using the same disk system, processor quantity and memory quantity. The same hardware was used for each test and all software was installed and configured the same way for each test.

Test Environment
From the test plan, there were three main tests performed, as seen in Table 1.

Test Plan

The same database system was used throughout all the tests and maintained the same workload. One important part of that workload was a SQL agent job that performed cleanup and re-indexing of a database while logging the start and stop times.

The last test step was to restore the database to the pristine state so that the workload was exactly repeatable. The start and stop time was logged on the database system into a separate database, then used to calculate an average completion time. The SQL job used CPU, memory and disk for its tasks.

The measurable units in this test were recorded from the PassMark software and the SQL database job stop and start times. A higher value in the number of operations in the disk, CPU and memory is better. The lower SQL agent job run time indicates quicker performance.

Test Caveats
Note that this comparison is only focusing on base hypervisor performance. We're not comparing management tools, though they had to be used on each platform to some level. This can make a difference on how VMs are provisioned or resources managed. Each guest VM was provisioned and allocated resources for RAM and CPU according to the test plan, and those resources were managed only by the hypervisor engine. Specifically, in CPU, there were no limits placed on the guest VMs in regard to MHz usage for each guest's single assigned CPU. Further, different platforms do manage this differently, so there's no entirely equal way to provision them.

Before we get too wrapped up in how these three products measured up to the same test, it's important to understand workloads and their varied behaviors. Different workloads may or may not be suitable for virtual environments, so it may make sense that some workloads would not be virtualization candidates, as all workloads are not created equal. Repeating this test or something similar may be a worthwhile investment in your own environment, especially if you can use capacity-planning software to model your expected results.

Now let's see how ESX, Hyper-V and XenServer handled the same workload.

Microsoft Hyper-V
Hyper-V was the first product compared, and it performed quite differently from expectations. Hyper-V has been a focus of Microsoft dev efforts, and it shows. Overall, Hyper-V did well in this comparison and proved itself a worthy product.

Setting up VMs through Hyper-V was easy. The Windows environment makes the use and administration of the host server intuitive, even on the stripped-down core edition. For Hyper-V, most Windows admins turn to Task Manager to gauge performance. Be careful, though: within Hyper-V, the guest VM workloads are not entirely reflected in the performance section of Task Manager. That's something administrators may need to get used to if Hyper-V lands in their data centers. (While not part of the comparison, Hyper-V Manager shows each VM's processor as a percentage in use from the host. So, though Hyper-V is running on a Windows server, there are new tricks to learn in measuring the hosts' performances.)

In our tests, Hyper-V did well in all categories-it's a real, viable competitor for the competition. Table 2 shows Hyper-V's comparative performance.

Microsoft Hyper-V

 

Test Definitions

CPU operations are measured as the number of integer and floating-point mathematical operations per hour.

Disk usage operations are measured as the number of reading and writing of file operations per hour, with a 32KB file block size, consuming up to 15 percent of the free space of the C: drive on the guest virtual machine.

RAM operations are the number of operations that populate and unpopulate the system memory in five sequences.

Citrix XenServer
XenServer may not have ESX's reputation or Microsoft's marketing muscle, but the hypervisor definitely held its own in this comparison. In fact, it did better than that: XenServer performed best over the largest range of categories in our tests, as you can see in Table 3.

Citrix XenServer

XenServer's test results are impressive, but are they enough to justify a replacement of your current hypervisor? For environments with virtualized systems that have a large number of CPU- and memory-intensive workloads, it may be a good choice. The caution is that those high I/O workloads flirt with not being good virtualization candidates, so some administrators might instinctively place these workloads on physical systems. Make no mistake, however: XenServer did extremely well, posting excellent performance numbers.

VMware ESX 3.5
Because my core experience is VMware-based virtualization, one of my hidden test objectives was to determine if VMware is the right hypervisor in an environment with a large number of VMs and lighter workloads. This is typical for many virtualization environments: workloads that aren't saturated with heavy-hitting VMs that pound the memory and CPU all day long. This helped me determine if VMware is the best bang for the buck.

For the first two tests of heavy workloads, VMware underperformed both XenServer and Hyper-V. For the lighter workloads on the third test, the results were almost indistinguishable across the platforms, but ESX had the best results in three of the four categories. Table 4 shows the ESX results.

VMware ESX 3.5

Lessons Learned
After doing these comparisons of ESX to Hyper-V and XenServer, it's clear that at the hypervisor level, ESX is optimized for a large number of less-intensive workload VMs. For intensive workloads that may not be optimized for memory overcommit apps, Hyper-V and XenServer should definitely be considered-even if that means adding another hypervisor into the data center.

Does that mean ESX should be ripped out? Absolutely not. Remember that this is one specific type of test, and didn't take into account other critical aspects of a virtualized environment like management tools, advanced features like VMotion, or third-party application availability. All that goes into the decision-making process, and the factors should be weighted according to your organization's priorities.

Expert Guidance

To ensure the validity of our test results and testing environment, we enlisted the help of Stuart Yarost to formulate and validate the test plan. Yarost is an ASQ Certified Software Quality Engineer and Certified Quality Engineer with more than 22 years' experience in the software and quality fields. Yarost currently holds the position of Vice Chair of Programs for the ASQ Software Division. A special thanks to Yarost for his help.

Think You've Seen Enough?
There was no overall winner here; no way to say, "This is the best hypervisor for every situation." But some general conclusions can be drawn. For CPU- and memory-intensive applications, XenServer and Hyper-V are attractive and have proven their mettle. For a large number of light to moderate workloads-or if you decide that memory overcommit, for example, is important-ESX may be the answer. What is entirely clear, however, is that all three hypervisors are legitimate virtualization platforms, and that no single company has a monopoly on virtualization any longer.

More Information

Test Environment
We tested selected virtualization platforms for workload performance. Following are the parameters of the test environment and tested platforms:

Platform Guest Requirements

  • Support Windows Server 2003 R2 with Service Pack 2 (x86 edition) as a guest operating system
  • Permit the provisioning of guest virtual machines in one of two configurations:
  • Memory at 1024MB, 10GB local drive space
  • Memory at 2048MB, 20GB local drive space
  • No CPU limits in place

Platform Host Requirements

  • Support an installation on the three parallel systems of identical configuration: Dell PowerEdge 2950 2x2 @ 3.0GHz, 16GB RAM, 360GB local storage on one single array (RAID 5)
  • Permit use of local array for guest virtual machines

Test Environment Requirements

  • Permit the use of PassMark BurnInTest 5.3 Professional on the guest operating system
  • Permit remote console access for performance reporting
  • Load the two selected databases and configure as follows:
    • Run the SQL Agent job to perform the following job:

      Use RWVTEST2
      INSERT INTO IterationLog (Start) VALUES (getdate())
      GO
      Use Master
      RESTORE DATABASE [RWVTEST] FROM DISK = N'I:\MSSQL_BACKUPS\RWV-TEST.bak' WITH FILE = 1, MOVE N'vmWare_Infrastructure3_Data' TO N'F:\Microsoft SQL Server\MSSQL.1\MSSQL\Data\RWVTEST.mdf', MOVE N'vmWare_Infrastructure3_Log' TO N'G:\Microsoft SQL Server\MSSQL.1\MSSQL\Tlogs\RWVTEST_log.ldf', NOUNLOAD, REPLACE, STATS = 10
      GO
      USE RWVTEST
      DBCC SHOWCONTIG (VPX_HIST_STAT1)
      DBCC SHOWCONTIG (VPX_HIST_STAT2)
      DBCC SHOWCONTIG (VPX_HIST_STAT3)
      DBCC SHOWCONTIG (VPX_HIST_STAT4)
      DBCC DBREINDEX ('VPX_HIST_STAT2', '', 70)
      DBCC DBREINDEX ('VPX_HIST_STAT1', '', 70)
      DBCC DBREINDEX ('VPX_HIST_STAT3', '', 70)
      DBCC DBREINDEX ('VPX_HIST_STAT4', '', 70)
      Use RWVTEST

      DBCC SHRINKFILE(vmWare_Infrastructure3_Log, 20 )
      GO
      Use RWVTEST2
      INSERT INTO IterationLog (Finish) VALUES (getdate())
    • Permit the retrieval of the start and stop times from the IterationLog table

Platforms Compared
VMware ESX 3.5
Microsoft Hyper-V
Citrix XenServer 5

Base Workload
The base of the test had all platforms run the following configuration: 1 Microsoft SQL Server 2005 database server on Windows Server 2003 [System C]

Server ran three databases. Details of the databases:

  • DB Name: RWVTEST, 4GB (contained a large database)
  • DB Name: RWVTEST2, 4MB (time tracking of RWVTEST database)

Upward Scaling Workload
Workload pool collection (WLP). This collection was composed of the following:

  • 2 CPU cycle test servers running Windows Server 2003
  • 2 Memory test servers running Windows Server 2003
  • CPU cycle test servers ran PassMark BurnIn Test version 5.3 Professional for processor test
  • Memory test servers ran PassMark BurnIn Test 5.3 version 5.3 Professional for RAM cycles
  • Secondary WLP test included disk I/O tests

Each platform added one WLP as possible until failure.

Provisioning
Each platform provisioned the systems as follows:

System C:

  • 2048MB RAM,
  • 10GB OS
  • 40GB DB

WLP Systems:

  • 1024MB RAM,
  • 10GB OS

All Systems: 1x CPU, single CPU awareness

(insert exclusively online text here or at end of article)

ESX and Memory Overcommit
VMware's memory overcommit technology puts ESX in a unique class among the hypervisors compared. For many administrators and managers, it's a key factor in deciding to use VMware for virtualization. Simply put, the memory overcommit feature allows greater guest-to-host ratios by allowing more memory to be assigned to guest VMs than is physically available on the host.

Because this is an important differentiator among hypervisors in our test, one extra test was performed with ESX only. The host was crammed with 19 VMs, each with 1GB RAM (with 16GB on the host, leaving it short of RAM and causing memory overcommit to kick in). The third test was repeated with the same memory, disk and CPU configuration. Results showed that while you can load more guests onto the host, there's no free lunch. There was a dip in performance and database response time, as seen in Table A.

Performance Dip

The results show that there's a definite cost to stacking more VMs on the host and using memory overcommit; it should come as no surprise that the number of operations per hour go down and response time increases. But it allows you to do things you cannot do with Hyper-V and XenServer; and because most systems sit idle most of the time, memory overcommit can be extremely valuable in the data center. Many administrators use it frequently, and approach overcommit ratios of 2-to-1 or 3-to-1 regularly.

Although XenServer and Hyper-V currently do not offer memory overcommit, we decided to include test information for the technology because it's a key core feature of ESX.
-- R.V.


Reader Comments:

Mon, Dec 21, 2009 Karthik Balaguru

Interesting comparison ! The number of CPU operations per VM appear to be supprisingly double for VMware compared to XenServer or Hyper-V for the Test 3 results shown in the three tables . VMware provides lots of other tools and hence cost factor is more or less balanced at this point of time. If VMware slashes its prices, it would become tough for both XenServer and Hyper-V.

Mon, Dec 14, 2009 LEXUS

Seems to me like hyper-v & xen server are better products than vmware.

Fri, Apr 10, 2009 Joe Germany

In the last test with 19 VM's and ESX, I see a certain misunderstanding of how memory overcommit is intended to be used.

No administrator should ever push all vm's on the same ESX box to their memory limits at the same time.

The results achieved in the last test, is exactly that what is to be expected under the given circumstances.

You can't multiply your RAM using memory overcommit. It's all about intelligent usage of the RAM.

Mon, Apr 6, 2009 Eric Wu Toronto

One other thing, were supporting drivers installed? VMware Tools and Xen PV drivers both help with IO, especially with Xen PV!

Mon, Apr 6, 2009 Eric Wu Toronto

I see two problems with this. First, only SQL testing? How about other services testing that could put more load in other areas of the system?

Second, we're to assume that only Windows would be used??? Saying one hypervisor is better based on windows guest performance testing can't reflect real world setups.

Mon, Mar 30, 2009 Armchair General London

Wow, confirmation of what I've been seeing for quite some time. ESX performance is lacking and the VMware marketing department have been less than honest all along. No more VMware for me!
AG

Thu, Mar 19, 2009 MattG Anonymous

Do Dell's have the same ESX Battery Backup cache slowness issue (fast with the BBC, really slow without)as HP Smart Arrays? If so was this accounted for?

Thu, Mar 19, 2009 ANon Anonymous

Could their be VMFS alignment issues?

Thu, Mar 19, 2009 Angelo Brazil

Would it be possible to post more details on how the tests were conducted?

It's a little weird for a processor to do almost twice as much operations and the same number of operations on the memory, while executing the "same" workload (Test 3 Citrix x ESX)

Wed, Mar 18, 2009 Anonymous Anonymous

Lameness. Use a benchmarking expert next time.

Tue, Mar 17, 2009 Anonymous Anonymous

Don't know about emulated drivers and if they take more CPU, but if they do, how come the test shows ESX doing more disk ops that Hyper-V. This stinks somehow but I can't even tell the config of the hardware they ran on so it's pretty well cloaked.

Ken

Mon, Mar 16, 2009 John Gilham San Diego

I would be interested to see 64-bit performance, as Hyper-V was written from the ground up for 64 bit. Next time perhaps. It seems a tie IMHO…with Hyper-V and Xen the slight winner if you take costs in to consideration.

I'm a Microsoft Partner without $$$ invested in VMWare...so I'm a bit biased.

@Chip Brady
VMWare uses emulated drivers which require more CPU overhead for some IO operations.

John G

Sun, Mar 15, 2009 Davud UK

It is good to see, even if it is unofficial, soem stakes in the ground that are independent.

Could you confirm the results for test 3 for VMWare vs Citrix? It seems strange that the CPU operations are vastly different, but the other metrics are all identical, including the time taken. This would imply some very strange system behaviour.

Thanks

David

Thu, Mar 12, 2009 Anonymous Anonymous

These results make no sense at all.
While I believe ESX is a great product, test 3 shows it being nearly twice as good as XenServer in terms of CPU. Other tests are equally as unbelievable, although in the opposite direction.

Sorry guys, but I prefer benchmarks to come from industry recognised, standard tools - Not scripts that have been hacked together and have no relevance to the real world.

Thu, Mar 12, 2009 Anonymous Anonymous

Wow, amazing results from a completely objective source. Good job.

Wed, Mar 11, 2009 Anonymous Anonymous

print this

Tue, Mar 10, 2009 anonymuos Anonymous

Can you do another specific type of test with SLES 10 SP2 as the guest in hypervisor and paravirtualization (OS-aware and modified for virtualization) mode? All three hypervisors support these tests.

Mon, Mar 9, 2009 Doug Anonymous

I would like to see how these test compare to this same server running stand-alone.

Sun, Mar 8, 2009 Chip Brady CPOS

I think that your test two results may contain an error. Did you notice that the SQL job completion time goes up from Hyper-V to XenServer to ESX? But the total amount of disk operations per VM is highest with ESX then XenServer then Hyper-V. It looks like the ESX system is doing the most work.

Add Your Comment:

Your Name:(optional)
Your Email:(optional)
Your Location:(optional)
Comment:
Please type the letters/numbers you see above