Virtualizing Demanding Applications, Part 2
Last time we covered the compute tweaks and configurations necessary when virtualizing tier-1 applications. Let's now examine the rest of the stack:
Memory
I am often asked if memory speed matters when making a purchase, whether it will increase performance. I get asked that a lot, especially for VDI, and my answer is a resounding "No!" I would much rather go for more memory than to go with less memory at a higher speed level.
That being said, when configuring memory for a virtual machine make sure that you reserve the full memory allocated for that VM. Again, tier-1 applications require a bit more finesse in order to ensure quality of service and performance.
Network
Have you heard the phrase "it's never the network"? network admins always joke around that their stuff is rock-solid and it is never the network's fault. Well, that is true to some extent with tier-1 applications. Still, there are a few things you can tweak:
Virtual switch load balancing policies -- If you are wondering what policy to use for the vSwitch that will give you the best performance, I am telling you right now that I have yet to see an environment where the default policy was not more than enough. Sure, you can always use the IP Hash policy, but that presents complexity and you have to make sure you have EtherChannel properly configured on your physical switches if you go down that path. My recommendation is stay with the default on this one.
NetIOC -- Many admins have shied away from enabling Network IO Control, for demanding applications. I strongly urge you to enable that setting. Tier-1 applications require a certain level of quality of service that we can only guarantee if we enable it.
Separate IP Storage Network -- NFS is great, IP storage is becoming very popular, but I have to stress physical and logical separation of the IP storage network, have dedicated physical NICs, have dedicated physical switches, and design it for performance, availability and resilience. Once again, remember that the objective here is tier-1 applications -- when those go down, you get calls from people you usually don't get calls from so don't design it inappropriately.
Watch out for vMotion Bandwidth -- vSphere 5 now is capable of handling monster VMs, up to a 1TB. That is a lot of memory. When designing vMotion take into consideration those monster VMs that might not necessarily be 1TB of memory, but are larger than average. These VMs require a lot more bandwidth to move around, so don't do it over 1GB networks, 10GB is good, multiple 10GB is better. If you ask me, InfiniBand is even better.
Disable Interrupt Coalescing -- Uusing ethtool you can disable interrupt coalescing, which can boost your network performance quite a bit. Of course, when you do that you are lowering latency at the expense of increased bandwidth. For demanding applications that are latency-sensitive like databases, mail servers, terminal servers and XenApp servers, etc., it's a good setting to disable and you can absorb the increased bandwidth resulting from that setting being shut off.
Storage
The big evil storage ... if not configured properly, storage can be the root cause of most of the issues you have in your virtualized environment. It's even more true when it comes to demanding applications. It's time to break down the barrier between you and your storage admin -- you're now friends and the cold war days of owning the stack are long gone. Storage admins need to know about virtualization as much as virtualization admins need to understand storage. vSphere offers significant storage capabilities as follows:
Alignment -- Older operating systems like Windows Server 2003 need their virtual disks aligned. Tom Hirt at TCPdump.com discusses this topic in great detail here. One thing to note is that if you are using Windows Server 2008 or later you do not need to worry about alignment at the operating system level.
Snapshots -- Wwe all love snapshots. They make our lives easier, but don't take a snapshot and leave it forever. A snapshot should have a purpose; once it outlives its usefulness, delete it. Why? Because aging snapshots grow in size and can have a negative effect on your VM's performance.
RDM vs. VMDK -- Many blogs are torn on the issue of whether to use RDMs over VMDK and vice versa. From a performance perspective, you will not see a noticeable difference between them. From a manageability perspective, though, VMDKs are easier to manage. Where RDMs become a problem is when you start to creep up on the maximum LUNs per ESX host, which is 255.
You also want to chat with your storage admin, especially if you are using Fiber Channel storage so that all these LUNs being attached don't over run the port they are connected on. Queue depths come into play as well here. For optimized performance, I strongly recommend you have your storage admin get involved when going down this route. I am not implying that you should never use RDMs; I actually like them and use them in most environments especially when I want to present a large LUN to a VM or if I am using Microsoft Clustering Services.
Multi-pathing -- I cannot stress enough the need to have multi-pathing enabled and configured properly on your ESXi hosts, and I don't mean round-robin here -- I am talking about true multi-pathing.
SIOC -- Storage IO control should be enabled as well, which will allow you to maintain quality of service levels for those demanding applications.
VAAI -- vSphere APIs for Array Integration should be enabled and used wherever and whenever possible, especially with block (FC and iSCSI). That being said, VAAI is now also available for NFS as well with vSphere 5, so I strongly recommend that you check your array and if it supports VAAI, then use it. VAAI will significantly improve performance of your tier-1 applications.
RAID Configuration -- Understanding the profile of the application will help you place the application on a datastore configured to handle its IO needs. Don't approach storage from a capacity perspective alone; understand the physical limitations and what you need to get more performance out of your storage. RAID levels are important, so place the application correctly.
Spindle Count -- If RAID is important, then your spindle count is even more important. A 600GB 16k SAS drive will give you the same IOPS as a 140GB 15k SAS drive; so, again, capacity is not a problem, it is making sure we have enough horsepower in pool to match the requirements of the application. Otherwise , it will run slow, and the user experience will be bad and you will hear voices saying "this application cannot be virtualized; let's put it back on physical hardware..." Let's avoid these conversations -- we can virtualize better than 95 percent of all applications if we do it right.
Next time, I will cover the virtual machine virtual hardware configuration and discuss some application specifics that can help you as you tackle virtualizing demanding applications.
Posted by Elias Khnaser on 02/29/2012 at 12:49 PM