How Close Should You Monitor IOPs and Datastore Latency

I believe virtualization rolls in circles of sorts. When we got started with virtualization we were focused on squeezing in enough memory to consolidate the workloads we wanted, new processors that came out with more cores with a better architecture caused us to architect around them and then we zeroed in on storage.

With storage, we definitely need to be aware of IOPs and datastore latency. But, how much? You can turn to tools to identify problems with latency, such as VKernel's free StorageVIEW. But where do you really look for the latency? Do you look at the storage controller (if it provides this information), do you look at the hypervisor with a tool like StorageVIEW, or do you look in the operating system or application to get this data? With each method, your mileage may vary, so you may want to be able to get them from each measure -- your results surely won't be exactly the same all of the time.

IOPs are something that can give administrators headaches. Basically, a drive is capable of an IOP rate that is pooled together for all of the hard drives in a pool, whatever the technology may be. So, if you have 12 drives spinning capable of 150 IOPs, you can extend 12X150 IOPs up to 1800 IOPs for that collection of disk resources. Very many other factors are involved, and I recommend reading this post by Scott Drummonds on how IOPs impact virtualized workloads, as well as other material on the Pivot Point blog.

For the day-to-day administrator concerned with monitoring the run mode of a virtualized infrastructure, do you look at this all day long? I'll admit that I jump into the datastore latency tools when there is a problem and my biggest issue is that application owners can't give me information about the IOPs of their applications. Getting information about read and write behavior also is a challenge.

How frequently do you monitor these two measures and how do you do it? Share your comments here.

Posted by Rick Vanover on 08/19/2010 at 12:47 PM4 comments


Citrix, Free Software and Virtualization

For virtualization administrators, there are a lot of free things out there today. But how much you can do with all of this free stuff becomes harder to see. Citrix takes a different approach, however. I have long thought that the Citrix virtualization offering for virtualizing server infrastructures with XenServer has been the best free offering among free ones from Microsoft, VMware and Citrix. I'll go out on a limb and say XenServer is easier to set up than Hyper-V; you can flame me later on that.

The free stuff topic is one of the most passionate topics I deal with as a blogger and in my community interaction. All the time I receive e-mails and talk with administrators who want to get started with virtualization (I volunteered free help!), yet don't have the resources like other organizations. Inquiries frequently come from non-profit, educational or government installations that are very small and are run by people who have to balance everything on their own. In these situations where I volunteer help, the question usually ends up to be, which free product do I end up using?

When it comes to a server virtualization offering for most datacenter consolidation offerings, I usually end up recommending Hyper-V or XenServer simply out of the features offered. VMware isn't even in the conversation other than me saying, "I really know how to do it this way, but you can get a lot more out of the free offering from either of these two."

Citrix continues this trend with the release candidate of XenClient. The XenClient Express product is a type 1 client hypervisor for laptops. It was released in May and the second release candidate will be out later in 2010.

Recently, I had a chance to chat with Simon Crosby on the Virtumania podcast, episode 23. Simon gave a great overview of some XenClient use cases as well as how each use case can roll in a number of different technologies. Simon also shed some light on the frequently confusing Microsoft relationship with Citrix. Their relationship has been unique for over a decade, and I've settled on calling it "co-opetition" between the two software giants.

What is clear is that Citrix is still a player in the virtualization space. Whether it be new innovations such as XenClient, a strong XenServer offering, or the robust display protocol HDX; Citrix will drop in solutions across the stack. For me, it means I get some lab time with XenClient.

Have you tinkered or read about XenClient? If so, share your comments here.

Posted by Rick Vanover on 08/17/2010 at 12:47 PM1 comments


VMware is Not the Novell of Virtualization

The old saying that history repeats itself does indeed manifest itself in the IT realm. Recently, I read Brandon Riley's Virtual Insanity post where he asks: Is VMware the Novell of Virtualization?

Virtual Insanity is a blog I read frequently. Regular contributors include a number of VMware and EMC employees. Now that the transparency topic is out of the way, let's talk about the whole idea that VMware is acting like Novell years ago when NetWare was pitted against Windows NT. A consultant working with me back in the mid 1990s when I was mulling going to Windows NT for file servers asked me, "Why would you replace your file server with an inferior product?"

That was so true then, technically speaking. NetWare 3 and 4 back in the day offered file server functionality that still to this day is not matched by the Microsoft offering. Don't believe me? Look up the NetWare flag command and see if you can do all of that with a Windows file system.

When it comes to virtualization, will Hyper-V or some other virutalization platform supercede VMware whom currently has the superior product? My opinion is that this will not be the case.

VMware innovates at a rate not seen by the fallen hero in NetWare and other products who may have "lost" the Microsoft battle. I will concede that innovation isn't enough, but I see the difference-maker is the greater ecosystem of infrastructure technologies which are moving at this same rate. Whether this be cloud solutions, virtual desktops, server virtualization or virtualized applications; the entire catalog becomes solutions that make technology decisions easy for administrators. Couple the superior technology with that decision, and I think the case is made for VMware to not have a similar fate as Novell.

Where are you on the fate of VMware and a sense for the history of NetWare? Share your comments here.

Posted by Rick Vanover on 08/11/2010 at 12:47 PM14 comments


How Much Hype is Too Much for VMworld?

One of the most definining community aspects of virtualization is manifested annually at VMworld. This year in San Francisco, I am particularly intrigued about what is coming. Partly because I am still somewhat surprised and perplexed that vSphere 4.1 was released somewhat close to the upcoming VMworld conferences.

I am perplexed because VMware has historically strived to have the conferences center around a major announcement. Since vSphere 4.1 was recently released, the major announcements will presumably be about "something else." The enticement is definitely enhanced with social media sites such as Twitter. VMware CTO Steve Herrod recently tweeted, "Most announcements I will have ever done in keynote!" Coming into this conference, I don't have any direct information about what the announcements may be, which piques my curiosity.

What value are announcements anyway? In 2008's VMworld, VMware announced and previewed vSphere, yet we patiently held on until May 2009 for the release. An announcement lacks the immediate relevance to most of the virtualization community, but definitely helps shape the future decision process for infrastructure administrators. Larger, more rigid organizations may keep a "minus 1" level of version currency. Others organizations may work aggressively to engage in pilot programs and be on the cutting edge of the technologies. The organizational tech climate truly will vary from customer to customer among the VMworld attendees. What I don't want to do with announcements -- or, more specifically, the anticipation of announcements -- is to build up too high of an expectation.

Make no mistake, I look forward to VMworld more than any other event that I participate in. What will VMworld be for me this year? Hopefully fun, informative and community-rich! Hope to see you there!

Posted by Rick Vanover on 08/10/2010 at 12:47 PM0 comments


Pay-per-VM Pricing

One of the aspects of vSphere from the recent 4.1 release is the introduction of per-VM pricing for advanced features. My initial reaction is negative to this, as I'd like my VM deployments to be elastic, to expand and contract features without cost increases. With my grumblings, I decided it was important to chew on the details of per-VM pricing for vSphere.

The most important detail to note is that per-VM pricing is only for vCenter AppSpeed, Capacity IQ, Chargeback, and Site Recovery Manager virtual machines. Traditional VMs that are not monitored or managed by these add-ins are not (yet) in the scope of per-VM pricing. Other vSphere products that are not included in per-VM pricing: vCenter Heartbeat, Lab Manager, Lifecycle Manager, and vCenter Server.

For the products that do offer per-VM pricing, it'll be bundled at 25 virtual machines. Personally, I think this unit is too high. If it is per-VM, I think it needs to be brought down, maybe to the 5 VM mark with 25 VM licensing unit available also. Likewise, maybe a 100 VM licensing unit would act to front-load the licensing better than a 25 VM chunk of licenses. Here's how it breaks down per 25-pack of per-VM licenses:

  • vCenter AppSpeed: $3,750
  • vCenter Capacity IQ (per-VM pricing available late 2010/early 2011): $1,875
  • vCenter Chargeback: $1,250
  • vCenter Site Recovery Manager: $11,250

(VMware states that "VMware vCenter CapacityIQ will be offered in a per VM model at the end of 2010/early 2011, and you can continue buying per processor licenses for vCenter CapacityIQ until then.")

Each product has a different per-VM cost, but that is expected as they each bring something different to the infrastructure administrator. Breaking each of these down to a per-VM price, the 25 packs equate to the following:

  • AppSpeed: $150 (each VM)
  • Capacity IQ:75
  • Chargeback: 50
  • Site Recovery Manger: 450

To be fair, I don't think these prices are too bad. Take a look at your costs for your operating systems, system management software packages or other key infrastructure items and you'll notice it's right in line with those costs. The issue I have is that most organizations still do not have a well-defined model for allocating infrastructure costs. Solving that will be a bigger challenge for organizations who are dealing with that situation.

What concerns me is will these products, or future products that introduce per-VM pricing, justify the "new" expense? Meaning, each of these VMs that are now subject to per-VM pricing ran fine yesterday; why are they asking for more money? On the per-VM pricing sheet, VMware explains that current customers will receive details soon on their specific situation; however. Simply speaking, if I am a vCenter Site Recovery Manager customer today; I've invested what I need to provide that environment. If the SnS renewal for Site Recovery Manager comes due and these workloads are now in the per-VM model, I foresee a conversation where they might be asking for more money.

Like all administrators, I just hate going back to the well for more money. I just hope this doesn't become the case.

Posted by Rick Vanover on 08/05/2010 at 12:47 PM3 comments


Religious Issue #7: Racks or Blades?

One of the most contentious issues that can come up in virtualization circles is the debate on whether to use blades or chassis servers for virtualization hosts. The use of blades did come up in a recent Virtumania podcast discussion. While I've worked with blades over the years, most of my virtualization practice has been with rack-mount chassis servers.

After discussing blades with a good mix of experts, I can say that blades are not for everybody and that my concerns are not unique. Blades have a marginally higher cost of entry, meaning that up to six blades are required to make the hardware purchase more attractive than the same number of chassis servers. The other concern that I've always had with blades is effectively shrinking a domain of failure to include another component, the blade chassis. Lastly, I don't find myself space-constrained; so why would I use blades?

Yet, it turns out that blades are advancing quite well with today's infrastructure. In fact, they're progressing more swiftly than my beloved rack-mount chassis server. Today's blades offer superior management and provisioning, ultra-fast interconnects and features on-par with CPU and memory requirements of the full server counterparts. A common interconnect between blades is 10 Gigabit Ethernet, which, in virtualization circles, will lead the way for ultra-fast virtual machine migrations and can make Ethernet storage protocols more attractive.

So, why not just go for blades? Well, that just depends. Too many times, other infrastructure components are in different phases of their lifecycles. The biggest culprit right now is 10 Gigabit Ethernet. I think that once this networking technology becomes more affordable, we'll see the biggest obstacle removed for blades. Cisco's Unified Computing System requires a 10 Gigabit uplink, for example. Blades can also allow administrators to take advantage of additional orchestration components that are not available on traditional servers. The Hitachi Data Systems Unified Compute Platform blade infrastructure allows the blades to be pooled together to function as one logical symmetric multiprocessing server.

Blades bring the features; that is for sure. Which platform do you use, and why? Are you considering changing to blades? Share your comments here.

Posted by Rick Vanover on 08/03/2010 at 12:47 PM3 comments


Storage for Virtualization on Flash? Yes.

Planning the storage arrangement for virtualization is one of the most critical steps in delivering the right performance level. Recently, I previewed a new storage solution that is a good fit for virtualization environments due to its use of flash media instead of traditional hard drives.

Nimbus Data's Sustainable Storage is a flash-storage-based storage area network. Flash-based SANs have been available for a while, but Nimbus is priced competitively with traditional disk solutions using "spinning rust." The S-class storage systems start at $25,000 for flash storage solutions accessible via 10 Gb Ethernet.

Using flash-based storage systems in a virtualized environment is not new. In fact,  Microsoft's Hyper-V news earlier in the year about achieving over one million IOPs using its iSCSI initiator was performed on a flash SAN.

The Nimbus solution offers a number of opportunities for Ethernet-based storage protocols in and out of virtualization. This includes iSCSI, CIFS and NFS support. The iSCSI support is important so VMware environments can utilize the VMware VMFS filesystem for block-based access. VMware folks will quicky run off to the  hardware compatibility guide to look up this Nimbus product. During a briefing I received on the product, that was the question I asked and the answer is that Nimbus is working on the integration with VMware to officially be a supported storage configuration. Hyper-V support is also available for the S-class flash storage systems.

The advertised performance on flash-based storage is simply mind-boggling. Nimbus is cable of delivering 1.35 Million IOPs and 41 Gb/s throughput. With competitive pricing on this storage product, does this sound interesting for your virtualization environment? Share your comments here.

Posted by Rick Vanover on 05/27/2010 at 12:47 PM5 comments


P2V Tip: VMDK Pre-Build

I am always looking for ways to make a physical-to-virtual conversion go better. While I love the venerable VMware vCenter Converter for most workloads, I still find situations where I can't allocate the time to do a conversion this way.

In addressing some unstructured data systems, a new approach revealed itself: the VMDK pre-build. When I say unstructured data, I am simply referring to situations such as a very large number of very small files. While I'd rather deal with a database putting this content into a blob format, I'm often dealt the unstructured data card.

So, what do I mean by a VMDK pre-build? Basically, I deploy a generic virtual machine within vSphere and attach an additional VMDK disk. From that generic virtual machine, I launch a series of pre-load operations onto the additional VMDK disk. The pre-load copies the unstructured data ahead of time to a VMDK disk. This can be done via a number of tools, including the quick and easy
Robocopy scripting options, RichCopy graphical interface and more advanced tools that you can buy from companies like Acronis or DoubleTake. By using one of these tools to pre-populate the VMDK file, you can save a bunch of time on the actual conversion.

Taking the VMDK pre-build route, of course, assumes that the Windows Server has a C:\ drive that is separate of the collection of unstructured data. At the time of the conversion, using VMware vCenter Converter for the C:\ drive will take a very short amount of time. After the conversion, you simply remove the VMDK from the generic virtual machine, and attach it to the newly converted virtual machine. There, you've saved a bunch of time.

The tools above can be tweaked to add some critical options on the pre-load of the VMDK disk. This can include copying over Windows NTFS permissions, as well as re-running the task to catch any newly added data. Robocopy, for example, will proceed much quicker once the first pass is completed and pick up only the newly added data.

The tip I provide here isn't the solution for a SQL Server or Exchange Server, of course, but the use cases can apply to anything that has a large amount of data that may take a long time to convert via the traditional methods.

Have you ever used this trick in performing your P2V conversions? If so, share your experience here.

Posted by Rick Vanover on 05/25/2010 at 12:47 PM5 comments


5 Big Considerations for vCPU Provisioning

When provisioning virtual machines, either from a new build or during a physical-to-virtual conversion, questions will always come up on how many virtual processors to assign to the host. The standing rule is to provision specifically what is required in order to allow the hypervisor to adequately schedule resources on each virtual machine. The vCPU scheduling is effectively managing simultaneous streams of virtual machines across all available physical CPU cores. A two-socket, four-core server running a hypervisor would provide up to eight cores available at any time. At any time, a number of scenarios could happen including the following:

  • One virtual machine with eight vCPU requests processor time and it is permitted.
  • Four virtual machines, each with one vCPU, request processor time; and it is permitted with four idle cores.
  • Four virtual machines, each with four vCPU, request processor time; and two of them are permitted while the other two are put into a CPU-ready pattern in ESX or ESXi if all four need CPU cycles from each virtual machine. If the sum of vCPU cycles is less than 8, all virtual machines can be accomodated.

The CPU-ready pattern (which should really be called wait) is not a desired outcome. Here are five points that I use in determining how to provision vCPUs for virtualized workloads:

  1. Start with application documentation. If the application says it needs one processor at 2.0 GHz or higher, then the virtual machine would have one vCPU.
  2. Do not apply MHZ limits. The ultimate speed limit is that of the underlying physical core. I came across this great post by Duncan Epping explaining how establishing frequency limits may actually slow down the virtual workload.
  3. Downward provision virtual machines during P2V conversion. The P2V process is still alive and well in many organizations, and current hardware being converted may enumerate four, eight or more vCPUs during a conversion. Refer to the application requirements to step them down during the P2V conversion. Also see my Advanced P2V Flowchart for a couple of extra tips on provisioning virtual machines during a P2V.
  4. Start low, then go higher if needed. If the application really needs a lot of processor capacity, it should be well-documented. If not, start low. The process to add processors is quite easy, but in many cases the system needs to be powered off. Hot-add functionality can mitigate this factor, but only a limited number of operating systems are ready to go with hot-add currently.
  5. Keep an eye on CPU ready. This is the indicator that virtual machines are not getting the processor cycles they are configured for. This can be due to over-provisioned virtual machines crowding the scheduler, or simply too many vCPUs stacked into the infrastructure. ESXTOP is the tool to get this information interactively, and a third-party management product can centralize this data.

Provisioning for vCPUs is one of the more artistic aspects of infrastructure technologies. The hard fact tools are there, but how you craft your infrastructure's vCPU standards will impact the overall performance of the environment.

What tips do you have for provisioning vCPU? Share your comments here.

Posted by Rick Vanover on 05/20/2010 at 12:47 PM4 comments


Taking VMLogix LabManager CE for a Test Drive

In an earlier post, I mentioned how there are a few solutions for cloud-based lab management solutions. I have been kicking the tires with VMLogix's LabManager CE.

LabManager CE is a fully enterprise-class lab management solution hosted in the Amazon EC2 cloud. What's even more impressive is that the management interface is also in the EC2 cloud (see Fig. 1).

LabManager CE interface
Figure 1. The LabManager CE interface is hosted on Amazon EC2, requiring no infrastructure footprint for cloud-based lab management. (Click image to view larger version.)

LabManager CE is built on the Amazon Web Services API. EC2 instances are created through Amazon Machine Images (AMIs). An AMI is simply a virtual machine with the EC2 infrastructure. The EC2 infrastructure uses a Citrix Xen hypervisor to run the AMIs. The AMIs can either be built individually on your own environments or you can pull them from the EC2 AMI repository. For the demo I am using, I have a few AMI's pre-positioned for the LabManager CE demo (see Fig. 2)

In the cloud, you're on your own.
Figure 2. Various AMIs are available to deploy workloads in the cloud. (Click image to view larger version.)

One of the things I like best about LabManager CE is that each AMI can be deployed with additional software titles installed. This means that a base AMI can be built to the configured specification, but with more software added as part of the deployment with the EC2 infrastructure. Fig. 3 shows an Apache Web server engine being added.

Adding Software Packages via LabManager
Figure 3. Software packages can be added to the cloud-based workload as it is deployed. (Click image to view larger version.)

Thus far, I've launched a virtual machine from an existing AMI and added a single software title to be installed once the virtual machine is deployed. But when it comes to lab management, there are going to be other users that will need infrastructure on demand.

LabManager CE's user profile management is sophisticated in that each user can be configured with a personal portal on EC2. Basic options include setting up how many virtual machines the user is able to launch and how much RAM they can use. These are direct correlations to Amazon Web Services infrastructure charges, so this allows a cap to be put on the expenses per user. Fig. 4 shows a user being created with these options.

Adding Users with LabManager CE
Figure 4. Users can be added with basic parameters, as well as extended options within LabManger CE. (Click image to view larger version.)

Future Options with Network Connectivity
When it comes to cloud-based lab management solutions, there is one clear obstacle related to network connectivity. Enterprises simply don't want this traffic running over the Internet at large. The solution lies with the forthcoming features that will be delivered via Amazon's Virtual Private Cloud (VPC). In my discussions with VMLogix and other cloud solutions providers, this is by far the number one topic being addressed with all available resources. VPC is still officially a beta and is an Internet-based virtual private network to cloud based workloads. Once VPC is a finished product from Amazon Web Services, it will be clear that VMLogix and other cloud partners will have incremental updates to their products to roll VPC into their products.

This is a very quick tour of VMLogix's cloud-based lab management solution. When it comes to moving a workload to the cloud, I can't seriously take the solution forward until something like VPC is a refined product. Like many other administrators, I want to see how this will work from the technical and policy side, and how it would fit into everyday use for a typical enterprise is another discussion altogether. I will say this is much easier than using Amazon Web Services to launch EC2 AMI instances through tools like ElasticFox or just using the Web portal directly.

What do you think of cloud-based lab management? Share your comments here.

Posted by Rick Vanover on 05/18/2010 at 12:47 PM0 comments


Peripheral Virtualization over Ethernet

A post by Vladan Seget got me thinking about using virtual machines with various peripheral devices. Systems that require serial, parallel or USB connectivity are in many cases not virtualization candidates. System administrators can utilize a number of options to virtualize peripheral I/O.

For serial port connections, RS-232, RS-422 and RS-485 devices are by no means common but still are in place for line-of-business solutions that may interface to non-computer systems. A number of products are available for these applications. I have used both the Digi PortServer and the Comtrol DeviceMaster series of products. Both applications are similar in that the devices (serial ports) are extended to the Ethernet network via a special driver provided by Digi or Comtrol. They have management software so you can configure the ports to run in RS-232, RS-422 or RS-485 serial emulation mode. Various products support various levels of serial modes. Some are RS-232-only, while others will support all three modes.

For USB peripherals, the de facto product is the Digi Anywhere USB Ethernet USB hub. Like the serial port products, this device will extend USB ports to a server over the Ethernet network. Digi AnywhereUSB now also couples RS-232 serial ports on the same device as USB ports, which is a nice feature.

In both situations, using virtualized peripheral I/O comes with a couple of considerations. If the server (assuming Windows) was converted from a physical machine that had serial or USB ports, the virtual driver will install very easily. For a new-build virtual machine that has never had a USB or serial port installed, it takes a manual process to add the base serial or USB driver to Windows to enable the enhanced driver to work correctly.

It is important to note that virtualized peripheral I/O isn't going to be as fast as the directly attached alternative, so make sure the application in question can function correctly with these devices in use.

For parallel port applications, I'm not aware of any product that will extend a parallel port over the Ethernet network, though they may exist. I've had to provision USB and serial ports to virtual machines, but not yet a parallel port.

Have you ever had to utilize peripheral I/O for virtual machines? Share your comments here.

Posted by Rick Vanover on 05/13/2010 at 12:47 PM7 comments


vMotion Traffic Separation

I was explaining the vMotion process to someone for the first time and explained how the migration technology transports a running virtual machine from one host to another. I love my time on the whiteboard, and I simply illustrate the process from each perspective of the core resources of the virtual machine: CPU, disk, memory and network connectivity.

The vMotion event has always been impressive, but it doesn't go without design considerations. Primarily, this traffic is not encrypted for performance reasons. VMware explains that primarily it's the way it is and not to panic. While, I'm not a security expert, I take heed of this fact and architect accordingly.

In my virtualization practice, I've implemented a layer-2 security zone for vMotion traffic. Simply put, it's a VLAN that contains the migration traffic. The TCP/IP address space is entirely private and not routed or connected in any way to any other network. The vmkernel interfaces for vMotion on each host are given a non-routable IP address in an address space that is not in use in other private networks. For example, if the hosts are in a private address space of 192.168.44.0/24 the vmkernel interfaces for vMotion are configured on a private VLAN with an address space of 172.16.98.0/24. Take extra steps to ensure that your private VLAN address space, the 172 network in the example above, is not in use via the routing tables of the ESX or ESXi host.

The default gateway is assigned to the service console (ESX) or management traffic (ESXi). If the private address space assigned for vMotion exists at any point in the private network, this can cause an issue if the network is defined in the routing tables, even if not in use. This is a good opportunity to check with your networking team, explaining what you are going to do for this traffic segment. I've not done it, but IPv6 is also an option in this situation.

Each time I make any comment about security and virtualization, I imagine security expert Edward Haletky shaking his head or piping in with good commentary. Anticipating what Edward would say, some security levels are not adequate with layer-2 only separation. There are two more secure options, according to Edward. The first is to use separate physical media for the vMotion traffic on a completely isolated switching infrastructure and the second is to not enable vMotion.

How do you segment virtual machine migration topic? Share your comments here.

Posted by Rick Vanover on 05/11/2010 at 12:47 PM1 comments


Subscribe on YouTube