ESXi: Embedded HYPErvisor?

VMware's lightweight version of ESX is a good product. But is it all it's cracked up to be? Our mythbuster does some debunking.

The worst-kept secret in the virtualization industry in fall 2007 was VMware's new embedded hypervisor, ESXi (formerly known as ESX 3i). Everyone knew that the technology was coming, but no one would comment on it officially. VMware Inc. announced ESX 3i on the first day of the festivities at its North American VMworld conference.

The company wasted no time in getting ESXi to market, as evidenced by deals to embed ESXi in servers from most major OEMs, including Dell Inc., Hewlett-Packard Co., IBM Corp. and Fujitsu Ltd. Other hypervisor makers, including Citrix Systems Inc. with XenServer, are following suit.

But does the hype match the reality? That's the question we try to answer.

ESXi is a good product with many benefits; other proclaimed advantages, however, are simply marketing ploys. We'll take a look at six areas where ESXi is being marketed without merit, divulge the real reason that VMware is pushing this model, and try to predict the legacy ESXi will leave on the virtualization industry.

ESXi promised several improvements over VMware's standard hypervisor, ESX Server (renamed simply "ESX"). Let's look at them in detail. First, we'll give the company spin on the feature; then we'll look at the reality.

SPIN: ESXi is small! Smaller is better!
REALITY: ESXi is small, but it doesn't necessarily follow that smaller is better.

The industry has gone gaga about the fact that VMware was able to strip ESX down to 32MB. While an impressive accomplishment, it has nothing to do with creating an embedded hypervisor. While reducing the size of the hypervisor may help facilitate "embedding" the product, it's certainly not necessary to do so. The size of flash drives today could easily support the full ESX stack, even with the service console that was removed from ESXi.

SPIN: ESXi is more secure!
REALITY: ESXi lacks the service console, but does that really make it more secure?

A lot of people are under the misconception that an embedded hypervisor automatically provides better security, but there's simply no evidence to suggest that this is true. The logic for this assertion falls into two camps:

  • An embedded hypervisor has a smaller code footprint and therefore coding mistakes are less likely to be present.
  • The absence of a service console/privileged domain/root partition means that the number of possible attack vectors have been decreased.

Let's take a look at both of those assertions, starting with smaller equals less code equals more secure. The idea that fewer lines of code (LOC) absolutely equals fewer bugs is a myth that has existed in IT for far too long.

For example, imagine you write want to assign a value to an integer variable based upon a series of Boolean comparisons:

 1:  int x = b1 ? 1 : b2 ? 2: b3 ? 3 : b4 ? 4 : 5;

The above line of code says "assign x the value of 1 if b1 is true, otherwise assign x the value of 2 if b2 is true, otherwise assign x the value of 3 if b3 is true, otherwise assign x the value of 4 if b4 is true, otherwise assign x the value of 5." While the line of code is perfectly legal, it is not entirely straightforward. Let's take a look at the same logic written in a different manner:

 1:  int x;
 2:  if ( b1 )
 3:  {
 4:    x = 1;
 5:  }
 6:  else if ( b2 )
 7:  {
 8:    x = 2;
 9:  } 
10:  else if ( b3 )
11:  {
12:    x = 3;
13:  }
14:  else if ( b4 )
15:  {
16:    x = 4;
17:  }
18:  else
19:  {
20:    x = 5;
21:  }

Both code snippets accomplish the same task, the first snippet in one line and the second in twenty-one lines. While I agree that the first method is far more elegant, the logic is much easier to read in the second, and thus less likely to be error-prone. It is also possible for very simple, very short, very straightforward statements to contain a mundane error that drastically changes an expected outcome. For example, what is the value of y after the following loop?

 1: int x = 1; int y = 1;
 2: for ( x = 1; x <= 100;="" ++x="" );="" 3:="" {="" 4:="" y="" *="x;" 5:="">

The value of y should be the factorial of 100, instead it is still 1. Can you spot the error? I accidentally typed a trailing semi-colon at the end of line No. 2. As you can see, a simple piece of obvious code can easily be subject to a seemingly innocuous error that has great consequences.

The reality is that reducing the number of LOC can sometimes add confusion; and even when it doesn't, as you can see, mistakes can still be made.

The second observation has problems as well. VMware and other companies assert that because embedded hypervisors, such as ESXi, do not have a privileged operating system (such as the ESX service console), security is improved by reducing the number of attack vectors the service console would otherwise present.

This is true to an extent, but the service console in any hypervisor solution should be used for management only, and thus should exist on an isolated management network.

Furthermore, it should be configured with the minimum number of open TCP or UDP ports, reducing its accessibility to would-be hackers. Embedded hypervisors may make it less likely for a penetration event to occur, but only by assuming that IT admins are not already configuring the previous products correctly.

In short, embedded hypervisors don't increase security-they simply save IT admins from shooting themselves in the foot.

SPIN: ESXi is more stable!
REALITY: The removal of hard disks does not remove the possibility of service failure.

In a data center environment, change can be bad. Or rather, too much change can be counter-productive to maintaining an ongoing computing infrastructure. Embedded hypervisors are supposed to help reduce change by enabling an anonymous virtualization infrastructure.

Figure 1
[Click on image for larger view.]
Figure 1. This table, recreated with information from Dell's Web site on April 4, 2008, proves you can get VMware's standard hypervisor ESX shipped on bare metal -- just like ESXi.

This is because they eliminate the need for servers to have direct-attached storage (DAS), and servers without disks equals fewer moving parts; in turn, that means fewer drive failures, which equals a static configuration.

The answer to this is simple: So what? Hard disks are inexpensive, so if one fails, does it really matter? Because the actual virtual machine (VM) data is living on the network storage device rather than DAS, the only systems that are possibly affected are the actual virtual servers. IT administrators should be deploying virtual servers in farms with a RAID-1, so service shouldn't be affected if one disk in one server fails.

While I agree that removing a failure dependency is a good thing, it's not as if ESXi totally protects the server from the possibility of going down. As I said at the beginning, while there's some truth to what is being said about ESXi, a lot of it is just blown out of proportion. For example ...

SPIN: ESXi is great because it ships on bare metal!
REALITY: So does the current version of ESX.

One of the most talked about possibilities in regard to ESXi is how independent hardware vendors (IHVs) are going to ship the product on bare metal straight from the factory. Would someone please explain to me how this is any different than what the option that IHVs already offer? Look at Figure 1, a table that contains information pulled verbatim from Dell's Web site.

What's different about IHVs shipping ESXi? Perhaps it's ESXi's ability to automatically configure itself for your data center (see the next item).

SPIN: ESXi's self-configuration capabilities rock!
REALITY: The same capabilities shouldn't be exclusive to ESXi.

ESXi automatically configures itself for your existing virtualization environment by discovering existing storage and automatically registering VMs on the new server.

I agree that this is an extremely nice feature. The problem, though, is that all versions of ESX, embedded or not, should include this capability. The fact that the original version of ESX remains devoid of this feature seems to be, in my opinion, an artificial decision designed to hasten the acceptance of ESXi.

Figure 2
[Click on image for larger view.]
Figure 2. Virtualization is part of the DNA of many of today's data centers, sitting at a low level.

ESXi only recently adopted the high-availability capabilities of its big brother with the release of VI 3.5.1. The reason for the late entrance of such an important part of ESX into ESXi's portfolio is because the HA component of ESX actually runs in the service console that was pulled out of ESXi.

The full version of ESX has the ability to run any software feature that appears in ESXi, so the fact that a feature built into ESXi isn't included with ESX means one of two things: Either the two products are completely separate code branches, or a conscious decision was made to not include all of the features of ESXi in ESX.

Now, the former may certainly be true; but it does open up VMware to the problem of competitors criticizing ESXi as a brand-new product without the proven history of ESX.

SPIN: ESXi is inexpensive!
REALITY: Not after the upsell.

VMware announced that Dell could make ESXi free, or available for a minimal charge. Presumably, other IHVs will follow suit in order to compete with Dell; in the end, however, the cost difference may be inconsequential.

Here's why: VMware may partner with IHVs to "give away" the hypervisor, but they know that customers will still have to pay another $5,750 (minimum) in order to get features like the VMware Distributed Resource Scheduler (DRS) and VMotion.

One of the benefits of ESXi is supposed to be its ability to automatically integrate itself into an existing VMware Infrastructure (VI) by discovering storage, networks and using DRS and VMotion to take ownership of existing VMs from other ESX servers (embedded or not). However, unless the IHVs are also going to absorb the $5,750 for every copy of ESXi shipped in order to give away DRS and VMotion as well, customers will not reap this advertised benefit of ESXi.

ESXi is certainly an evolution, but no revolution. So why then are IHVs in a mad rush to get onboard and behind ESXi?

The Real Reason Behind ESXi
ESXi is certainly a nice product -- even if some of its supposed value -- adds are not all they're cracked up to be. Yet the manner in which the IHVs are tripping over each other to embed ESXi in their servers seems to indicate one of two things: either there are abilities hidden in ESXi that have yet to be discovered, or the IHVs have hopped onto the hype train along with everyone else. The former is not likely, and hopefully the latter is not true.

The most likely reason IHVs have jumped onboard the ESXi train is to avoid reselling another OS with their servers. Instead, IHVs are pushing ESXi so they can offer what is essentially a virtualization appliance. And this move on behalf of the IHVs is what will not only cement the legacy of ESXi, but perhaps that of virtualization as a data center infrastructure layer as well.

ESXi's Legacy
At VMworld 2007 in San Francisco, John Chambers, CEO of Cisco Systems Inc., gave a keynote address on the role of networking and virtualization in the data center. He spoke about how networking was the DNA of data centers, and how very soon virtualization would also form the building blocks that services are created upon.

He was right, but also late: It's already happening. Virtualization already permeates the data center DNA in everything from storage to networking, and x86 server virtualization has existed in the mainstream for several years now (see Figure 2). With ESXi, the DNA has evolved.

However, it's important to remember that ESXi isn't a mutation of data center DNA; it's merely an evolution. ESXi has some nice features, but don't get caught up in the hype that surrounds it and think it's a panacea that will solve all your data center woes.