The Benchmark Obsession Must End -- Virtualization Review

The Benchmark Obsession Must End

Very little good -- and much bad -- comes of it.

By Trevor Pott
03/29/2017

Technologists are obsessed with benchmarks, and this obsession sometimes stands in the way of obtaining and implementing the right product for the job. The issues surrounding benchmark fetishism can resonate both with the practitioner and the vendor. When misinformation around benchmarks occurs, either side can throw up roadblocks, and everyone loses.

Being a consultant can sometimes be hard. We're often portrayed as amoral, greedy and emotionally detached from either customers or the companies who contract us. There are days I wish that were true: I personally find it hard not to emotionally invest in my work.

In the past week alone I have personally encountered issues with both vendors and end users professionally self-harming due to an inability to see past benchmarks. I have also had to deal with vendors who similarly refuse to engage with me as a technology journalist because of their fear of benchmarks.

To say that I find this frustrating would be only the beginning. I wouldn't be doing what I do if I didn't have some passion for technology. There is a part of me that wants to see great technology deployed; to see it in action and to see all involved, both vendor and customer, benefit from it. There's a satisfaction in the concept I find I have trouble adequately articulating.

Vendor Benchmark Terror
Storage vendors have always been a little bit touchy about benchmarks. Back in the days of spinning rust having the best code, the most well-vetted hardware and firmware stacks and the best controllers really did matter. A 2 percent performance difference could cost a sale. A 10 percent performance difference could wreck a product line.

Because of this, storage vendors learned to ruthlessly control the testing environments used for benchmarks and whittled their support down to only the barest minimum of qualified stacks of attached equipment. They guaranteed, quite rationally, nothing unless everything was exactly as was tested in their labs. But, for all that the old disk array vendors were bad, hyperconverged vendors are worse.

These attitudes persist even as circumstances have changed. Flash, for example, changed everything. Spinning magnetic drives produced, at best, 250 IOPS per drive. Individual SSDs now exist that can produce more than 1,250,000 IOPS per drive. Even the most pedestrian of datacenter SSDs produce 40,000 IOPS per drive, making them a completely different world than spinning disks.

For the overwhelming majority of customers, speed simply isn't a problem with storage in 2017. You have to be hitting your storage pretty hard to notice the difference between hybrid and all-flash solutions in the first place. And if you can identify a performance difference between all-flash arrays on real-world workloads, you're into some very niche stuff.

Another major change is that storage arrays and hyperconverged solutions are no longer only the province of the enterprise or government purchaser. Everyone buys them, including the commercial midmarket and even SMBs.

Most of these organizations wouldn't be able to tell the performance difference between a hybrid Tintri T-850 and a bleeding-edge, top-tier EMC DSSD array because they simply wouldn't throw enough at either unit to matter; SSDs are simply that much faster than the average organization's workloads.

Vendors, however, are terrified someone might benchmark their units in something other than the best possible light. They would love for you to benchmark their gear, but only if those benchmarks are done using the synthetic tests that make their equipment seem the best, and then only with precisely calibrated lab environments where the vendor is absolutely certain you'll get the highest possible results.

They're stuck in this mentality that, even for the products not aimed at the niche of the niche of the niche where performance is still king, they must lay claim to this performance crown. Real-world workloads, network environments and anything else be damned.

Nobody Else Exists
I don't benchmark that way. I have always been more interested in real-world results than artificially manufactured scenarios. Datacenter design and operations are about more than squeezing every last possible IOPS out of the storage; it is balanced against cost, coping with legacy workloads, heterogenous environment realities, mixed workload usage, internal company politics, supply chain issues and more.

Real datacenter architects balance optimizing one subsystem against the effects that optimization would have on other subsystems, and try to build a stable and performant whole. Thus when storage performance reached "who cares" levels, we turned our attention to other items ranging from storage capacity to ease of use and automation.

In other words, there's a lot more to storage than just speed. Reviews of storage need to reflect this. To discuss anything other than speed is to walk away from the quantitative to the qualitative. Quantitative results, like benchmarks, are straightforward and easy to judge. You either got the same results as the vendor, or you didn't. Qualitative results are more subjective and can really only be judged by comparing a solution against others. And this is where storage vendors wet themselves in terror.

Vendors can't cope with the idea of comparative reviews. Especially fair and thorough ones. Each vendor's marketing boss has a need to crush the competition in every single aspect; any mention that there is an area where someone else does better means the competition's sales teams have something to call out during bidding processes.

I work with vendors. I understand their concerns. I also work with sysadmins and I know that this attitude is horribly, horribly broken.

Flubbing the Sale
After all that hypothetical mumbo-jumbo, a real-world example is called for. Over the past few years I have heard some very negative tales regarding the sales practices of a leading hyperconvergence provider. Almost all of it can be traced back to the marketing and executive layer's allergy to honest competitive comparison.

Competitive Proof Of Concepts (POCs) would be run by customers and multiple vendors would be invited to submit solutions for consideration. Rather regularly, the solution submitted by this hyperconvergence provider would be substantially sub-par when compared to competing solutions.

When the sales team was informed of this, they would either have the sales engineers configure the solution differently, or come back with a completely different class of units. This would occur two, three and sometimes even four times.

The result in virtually every case was that this vendor got thrown out of the POC. Predictably, the practitioners involved in the POC tell the tale at conventions, in private community forums and slack channels, and so on. Word spreads, to the point that more than 100 such incidents have thus far reached my ears.

I know many of the sales people involved. Most of them aren't slimy, lacking in ethics or incompetent. They lack the tools to adequately and accurately size the solution to the customer needs. They absolutely lack the tools to size against competitors because information about how their solutions stack up is very hard to come by.

This creates a dilemma for the sales body and the customer. The customer can test A against B using benchmarks, but has a lot more trouble gauging when giving up some performance is worth it for the other benefits. Vendors that win are those that can walk in and say straight up "this model is not going to be as fast in these scenarios as these competitors here, but we expect based on your needs assessment that won't matter. Instead we bring to the table lower prices/more features/better support/etc."

Despite this, vendors willing to do comparative analysis remain few and far between. This is as true for internal sales collateral and channel-facing content as it is for vendor-supplied external-facing content and officially-sanctioned third-party reviews. Any comparative analysis work done internally is often closely guarded, lest it leak to the tech press and someone find out that their widget isn't the best at all things.

Practitioner Confusion
The IT industry is in chaos. This is considered a fairly normal state of affairs for an industry constantly undergoing evolutionary pressure from all sides; but the notion of self-service solutions in the form of "clouds" has really thrown a wrench in the works for everyone.

Today, we can buy cloud-in-a-can solutions off the shelf that can completely transform every aspect of how we design, implement and provision IT to both internal and external customers. We can use public cloud solutions, private cloud solutions, service provider hosted solutions or any mix of the three.

Everyone has something to sell, and everyone is struggling to justify not only their next sale, but the very reason for their company to even exist. Every datacenter refresh, every purchase is not merely an opportunity to compare one vendor against the next: it is now a requirement for IT practitioners to ask themselves if they should continue doing things the way they have been doing them at all, or consider burning it all down and building it back up into something radically different, even if it's game-changing to their employer.

Against this background, our collective obsession with benchmarks must end. As practitioners we need to understand that many of the bottlenecks that existed when we were earning our stripes no longer apply. CPU is cheap. Storage performance is cheap. We can't tell who and what to buy based solely off of the numbers, so we need to look at our datacenters more holistically.

More importantly, vendors need to get over their paranoia of comparative analysis. Tech is now a cloud-inspired free-for-all. The storage array vendor isn't just competing against the next array vendor over, but against the very concept of arrays -- and even of local workloads -- existing.

About the Author

Trevor Pott is a full-time nerd from Edmonton, Alberta, Canada. He splits his time between systems administration, technology writing, and consulting. As a consultant he helps Silicon Valley startups better understand systems administrators and how to sell to them.