The Cranky Admin

vSAN 6.5: A Real-World Review

VMware's updated software-defined storage product mostly finds its mark.

Also See: vSphere 6.5: A Real-World Review

November saw VMware release both vSphere 6.5 and its companion, vSAN 6.5. As growth flattens for vSphere, VMware is counting on vSAN, NSX and other projects to keep revenues increasing, Wall Street happy, and stock prices rising. vSAN is VMware's hyper-converged Infrastructure (HCI) offering, and I wanted to see if it's ready for prime time.

Launching in early 2014, vSAN is a relative newcomer to the HCI market. Market leaders Nutanix, SimpliVity and Scale Computing were all founded in 2009, with products moving rapidly to market. By the time vSAN came out, these three companies between them dominated the HCI market, with dozens of other players ranging from Maxta to EMC dipping an oar in.

There is good reason for VMware to have invested in vSAN. Despite (or perhaps because of) the diversity and competition in the field, HCI is arguably the fastest-growing market in storage. In most instances, HCI vendors focus on ease of use, removing the need for a dedicated storage administrator and putting the power in the hands of the virtualization team.

Thanks to VMware's global sales apparatus, vSAN is handily matching its competition on customer uptake, despite being late to the HCI party. With 6.5, vSAN can claim relative feature parity with the other major players: the critical enterprise HCI features are all there. While VMware lags behind in some places, it steals a march in others.

New Features, Improvements
vSAN 6.5 does not change much when compared to its immediate predecessor. A great start for those looking to explore the totality of vSAN's features is Tom Fenton's excellent deep dive into vSAN 6.2. Despite the incremental nature of the update, there are some welcome new features with the current release.

In a grudging acknowledgment that pushing vSAN for primary storage means storage-heavy nodes will be important to some customers, VMware has added support for 512e drives to vSAN. This move allows for some modern capacity-oriented drives to be used in vSAN clusters. Sadly, support for the increasingly common and often less expensive 4K native drives remains absent.

A new 2-node Direct Connect deployment mode allows vSAN deployments with only 2 nodes. There are many 1U 2-node solutions on the market, allowing for a highly-available VMware cluster in SMB and ROBO scenarios where cost or node count is a factor. This requires a witness VM located on either a separate local node or at the central office. It requires fiddling around in the command line to enable.

The vSphere Docker Volume Service allows containers to work with vSAN in an integrated fashion, while the new iSCSI target allows vSAN to export storage to workloads not part of the vSAN cluster proper.

As is normal with a VMware release, improvements have been made to PowerCLI. PowerCLI can now be used to monitor the health of and remediate a vSAN cluster. Sadly, dedupe and compression is still only available to all-flash configurations; however, you can now deploy all-flash configurations (without said dedupe and compression) using a Standard vSAN license. If you really want that data efficiency, though, you still have to pay for an Advanced license.

Shortcomings
This last is billed by VMware as customer-friendly, and represents my strongest point of disagreement with the virtualization giant regarding the latest iteration of their storage software. VMware claims officially that the reason for limiting data efficiency technologies to flash is "performance," and treats the availability of all-flash clusters as part of the standard license as some sort of gift they are bestowing.

I vehemently disagree. I believe the choice regarding performance should rest with the customer, not the vendor. Who is VMware to say what a customer's performance vs. capacity needs are? Especially when their competitors don't have the same restrictions.

"Deep and cheap" matters. While it's something that all HCI players try very hard to avoid, VMware's vSAN team is particularly allergic to it. They see vSAN as a storage solution to run primary workloads such as databases, virtual desktop infrastructure (VDI) or Exchange servers. They don't really see envision people virtualizing 100TB file servers on top of vSAN or trying to build cheap, highly-available archival storage clusters from it.

vSAN can do amazing things with high-end hardware. I have easily gotten 150,000 IOPS out of a single node. I have built clusters that break various IOPS barriers: 1 million IOPS, then 2 million,  4 million, and recently 10 million IOPS. Without question, vSAN 6.5 is a speed demon that crushes all competitors on the market at raw speed.

Porsches vs. Hyundais
I can't help feeling, however, that the vSAN team is building a race car because it's there that the excitement – and the margin – lies. But it's not where the utility to most customers lies. VMware would like us all to politely ignore the part where most people drive their cars in cities with prescribed speed limits, or that much of the commercial traffic is about shifting goods in bulk. Having recently worked on a survey for a client that involved talking to a number of administrators around the world, the majority of us are seeing total cluster demand for storage of between 40,000 and 50,000 IOPS across a 4-node cluster.

10K IOPS per node is more than is polite to ask of magnetic disks. Even for these mixed-use clusters, the days of all-magnetic storage are clearly over. In addition to this, most companies have the odd cluster of workloads which hammer storage, and all-flash increasingly makes sense to cover these edge cases.

What's important to me, however, is that my own survey work correlates with others that say up to 70 percent of data in today's businesses is cold, infrequently accessed and thus functionally archival. As I see it, VMware's vSAN team exists in a very prescribed bubble of enterprise class storage, enterprise use cases and enterprise niches.

In the real world, businesses use HCI clusters for mixed workloads. They do away with both NAS and SAN and start putting bulk storage onto HCI. As yet, VMware has not satisfactorily addressed this usage model.

vSAN 6.5 In Practice
Bellyaching about VMware's narrow focus aside, vSAN is actually a great product. As previously mentioned, it's king of the hill in terms of speed. What's more important, however, is how it stands up to use, abuse and misuse.

I performed the usual battery of tests on the vSAN cluster: yanking power cords out of individual nodes, pulling network cables and shutting the whole cluster down at the same time without warning. I misconfigured the network inside the vSphere client to isolate nodes, and I played "pull out the hard drive" more than once.

Survival Skills
Within reason, vSAN survived. There are some things vSAN really doesn't like. Don't pull drives out of two nodes and then plug them back in to the wrong node. In fact, if you're going to pull a drive out of a node, just be kind and wipe the drive before putting it back in.

vSphere 6.5 comes with a lovely (if primitive) disk partitioning tool to help you clear disks in order to prepare them for use in vSAN. For the most part this works well, and is a critical tool. Drives cannot be consumed by vSAN unless they are blank.

Be careful with drives that had been previously formatted as VMFS. If your ESXi boots off of a USB drive and one of the drives you plug in has a VMFS partition, it will set about placing its scratch config on that drive. That's traditional behavior for ESXi; the bug is that the vSphere 6.5 UI doesn't allow you to change this in either the HTML5 client or the Flex client.

Ultimately, I was only able to resolve this issue using tech support mode using the information here. I placed the scratch on an NFS volume and rebooted. I was then able to clear the disks and add them into vSAN, moving scratch back to the vSAN volume after creation. It would be great if VMware addressed this in the vSphere UI.

The vSAN UI is excellent, despite being hindered by reliance on the Flex client. The UI is simple and easy to understand without having to read any sort of instructions. From adding disks to creating stretch clusters across multiple sites, vSAN 6.5 proved to be remarkably straightforward.

vSAN stood up well under all sorts of workloads, including: databases, Exchange, VDI, general file servers and vast quantities of Web servers lit up as containers. I ran piles of benchmarks to try to tip it over, even a bunch of GPU-backed VDI instances in which I had the AIs play video games against each other while running recording software just to see if I could trip it up with big spikes in usage. vSAN didn't even blink.

Bottom Line: Proper Enterprise Storage
To the doubters and the HCI haters, it's time to stop shaking your brooms at the future, come down off the front porch and join the kids on the lawn. vSAN 6.5 is proper enterprise storage. It's resilient, performant and the bugs have been beaten out.

Yes, there are still things to grouse at VMware about, but these are business-related discussions, not technical ones. Whatever philosophical differences may or may not exist between customer and vendor, vSAN 6.5 is a technically sound product.

About the Author

Trevor Pott is a full-time nerd from Edmonton, Alberta, Canada. He splits his time between systems administration, technology writing, and consulting. As a consultant he helps Silicon Valley startups better understand systems administrators and how to sell to them.

Featured

Subscribe on YouTube