Lies, Damned Lies and Benchmarks (Part 1)
A lot of my spare time (which I define as "time not spent blogging") is dedicated to running performance experiments with storage configurations using complex computer systems and applications, which is actually rather fascinating. It may sound geeky, but I figure that if you've already gotten to the point of reading my blog, you won't complain about the geeky part.
Computer system performance is one of the most talked about but least understood concepts. Modern computers function as we expect them to, at least for the most part. The question is how fast they can perform those expected functions. Of course, at some point the quantitative difference in performance becomes a qualitative difference in functionality. If you don't believe that, try browsing your Facebook page (or almost any modern website) using a 10-year-old personal computer, or try watching Netflix online over a dial-up modem connection.
These are pretty obvious and easy examples of the performance difference and, like most things that are easy, they are not very interesting. A much more interesting and useful question is how to determine a practical real-world performance value of a given system before you have committed significant amounts of treasure to buying it and how to ensure that the ownership experience does not turn into an exercise in frustration. Or, putting it simpler, how do you know in advance that the system will perform according to your expectations?
First it would be nice to know what these performance expectations are. This is not as simple as it seems. We all can specify our performance expectations in terms of "I want it now!" but that is hardly a productive approach. We need some standard way to specify the performance requirements.
Often we know the performance data for the system components. For example, the new laptop you want your boss to buy you for Christmas has a CPU rated at 3GHz. This is wonderful, but how much do you as a user really care about that? Do you know how it would translate into the real-life experience you'd have with that laptop? Would you be able to watch your Netflix in HD or just in the standard resolution?
Now is the time to say the magic word: "benchmark." A benchmark is a set of procedures designed to simulate real usage and produce a quantitative result that is supposed to reflect the real-world performance of a given system. Benchmarks are often packaged as self-contained software tools to simplify their application.
The value of a benchmark is determined by how closely it approximates the real-world environment. Well, it should be determined by it, but often the benchmarks are designed to show off a specific technology. Because benchmarks produce numbers, marketing loves them and will go to great lengths to get a number that presents their product in the best light. These great lengths have been known to include specifically crafting benchmarks to show off the strengths of a specific vendor product. There is nothing wrong with it...as long as your concept of "real-world" coincides with that of the vendor. So it is up to you as a user to determine whether a given benchmark actually attempts to measure things important to you and your real world. This is a trivial task and is left as an exercise for the reader.
Second, running a benchmark is similar to running a scientific experiment. You really have to know what you are doing, which includes making sure that you use the right tools and understand the raw results of your measurements. It took 15 minutes to copy a movie image from one drive to another. Copying another image of the same size took 30 minutes. Do you understand why?
You have to account for the environment and various factors that may affect the result. If you are concerned about the bad quality of your Netflix online movies, for example, you should first check to see if your kid has installed and is running a torrent client. "Wait," you say, "she's only three months old!" Well, knowing kids these days, you should probably still check.
Jokes aside, storage performance benchmarking is probably the most obscure, least understood, and most misinterpreted area of computer technology. It is more of an art form, sometimes a black art. The main reason is that the absolute majority of storage systems are built with mechanical devices--disk drives. The huge gap in timing parameters of electronics and mechanical components leads to very tricky behaviors and difficulties in analyzing and predicting the performance.
Storage hardware vendors have developed most of the storage-specific benchmarks to provide a uniform way to compare their products. They could be quite sophisticated and in fact are not geared to put the best light on any particular vendor. They are, however, geared to put the best light on the storage hardware industry as a whole.
One thing to remember is that at the very basic level ALL products from storage hardware vendors can be described as a bunch (often a very large bunch) of commodity disk drives positioned behind (or inside) some kind of specialized computer. The disk drives are commodity products, they come from a very few well-known manufacturers, and all storage hardware vendors use the same drives. So the real differentiation and the value-add that storage vendors provide is in that specialized computer in front. It is important to understand this also accounts for the high price tag of storage systems: If you buy a single disk drive you'd expect to pay less than $100/TB, but if you buy a storage system the price is closer to $1,000/TB.
What is missing in the storage benchmarks results is the price factor. You can get the performance numbers as defined by the specific benchmark rules, but you can't get the price per unit of performance.
Next time, I'll discuss the details of storage benchmarks and how to interpret them in the world of virtualization.
Posted by Alex Miroshnichenko on 11/04/2010 at 12:48 PM