How Close Should You Monitor IOPs and Datastore Latency
I believe virtualization rolls in circles of sorts. When we got started with virtualization we were focused on squeezing in enough memory to consolidate the workloads we wanted, new processors that came out with more cores with a better architecture caused us to architect around them and then we zeroed in on storage.
With storage, we definitely need to be aware of IOPs and datastore latency. But, how much? You can turn to tools to identify problems with latency, such as VKernel's free StorageVIEW. But where do you really look for the latency? Do you look at the storage controller (if it provides this information), do you look at the hypervisor with a tool like StorageVIEW, or do you look in the operating system or application to get this data? With each method, your mileage may vary, so you may want to be able to get them from each measure -- your results surely won't be exactly the same all of the time.
IOPs are something that can give administrators headaches. Basically, a drive is capable of an IOP rate that is pooled together for all of the hard drives in a pool, whatever the technology may be. So, if you have 12 drives spinning capable of 150 IOPs, you can extend 12X150 IOPs up to 1800 IOPs for that collection of disk resources. Very many other factors are involved, and I recommend reading this post by Scott Drummonds on how IOPs impact virtualized workloads, as well as other material on the Pivot Point blog.
For the day-to-day administrator concerned with monitoring the run mode of a virtualized infrastructure, do you look at this all day long? I'll admit that I jump into the datastore latency tools when there is a problem and my biggest issue is that application owners can't give me information about the IOPs of their applications. Getting information about read and write behavior also is a challenge.
How frequently do you monitor these two measures and how do you do it? Share your comments here.
Posted by Rick Vanover on 08/19/2010 at 12:47 PM