In-Depth
The Rise of Hyper-Converged Infrastructure
What HCI means to your data storage plan.
Many of the big-name hyper-converged infrastructure (HCI) vendors were founded in 2009, meaning we're coming up on a decade of HCI. Everyone knows that it's storage and compute in a single system, but what does that actually mean? How does HCI fit within a messy, non-optimal, brownfield environment, and has it lived up to the hype?
To understand HCI, you need to understand a bit about the evolution of storage -- or at least storage buzzwords -- over the past decade. When HCI emerged in 2009 the buzzword of the day was "software-defined storage (SDS)."
SDS is the perfect buzzword because it's approximately as meaningless as saying "human-built structure." Yes, human-built structure narrows the field a little. We're clearly not talking about a bowerbird's bower, or a termite mound, but "human-built structure" is still so broad a definition that is has very little real-world utility.
Almost all data storage today is software-defined. Even a lot of physical data storage is software-defined: I myself have implemented filing solutions for hard-copy papers that use barcodes, QR codes or NFC to track where documents are stored, and even help automate lifecycle management of those documents.
What SDS was originally supposed to mean was "storage software that commoditizes storage vendors in the same way that VMware commoditized server vendors." It eventually came to be used to mean "any storage that isn't a traditional storage array," which is meaningless.
A storage array in 2018 is an entirely different animal than a storage array from 2009, but won't get called SDS by most IT practitioners because of its lineage. An HCI solution provided by a vendor that requires customers to only use a narrow set of nodes provided by that same HCI vendor, however, will usually be considered SDS.
Being more clear about this: SDS is a buzzword that, in reality, was coined by startups as a polite way of saying "EMC, NetApp, IBM, HP and so forth are too expensive, so buy storage from us instead." It has no other real-world meaning.
Data Fabrics
What SDS was supposed to stand for -- the commoditization of storage -- is still a relevant concept. For a time, the term "storage virtualization" was used. This was intended to draw a parallel to VMware's commoditization of server vendors, as well as the idea that storage could be moved between storage solutions as easily as one might move virtual machines between hosts in a cluster.
This never took off in large part because VMware was better at actual storage virtualization than most startups. vSphere has a feature called "storage vMotion" that makes moving storage between different solutions simple, assuming your workloads are virtualized with VMware. There's also VVOLs, which is supposed to make managing LUNs easier, the success (or failure) of which is a discussion best left for another day.
The original concept of SDS has re-emerged in the form of a data fabric. A data fabric is a distributed application that combines all storage donated to it into a single pool of storage. The software then cuts the storage up according to whatever parameters it's given and serves it up for consumption.
With open data fabrics, storage can be anything: cloud storage, whitebox storage, local server storage, storage-area networks (SANs), network-attached storage (NAS), you name it. That storage can then be made available in any manner that the data fabric vendor chooses to allow: iSCSI, Fibre Channel, SMB, NFS, Object or even appearing to nodes as local attached storage. A truly open data fabric can consume any storage, anywhere, and present that storage for consumption as any kind of storage desired.
While most data fabrics could be as open as described earlier, most aren't. There's nothing preventing any data fabric vendor from incorporating any storage source they choose, or emitting storage in any format they wish. The hard part of building data fabric software are the bits that choose where to put different data blocks, and applying enterprise storage features such as compression, deduplication, snapshotting, rapid cloning, encryption, and so forth. Compared to that, consuming a new type of storage, or adding an emission shim so storage can be consumed in a different format, is easy.
Vendors, however, restrict the capabilities of their data fabrics for all the same reasons that traditional storage vendors engaged in drive locking and pushed for regular three-year refreshes. This brings us to HCI.
HCI-Plus
The vendor of your data fabric software matters, because the vendor determines how flexible that data fabric will be. Let's consider HCI. HCI is essentially a data fabric consisting of servers that have local storage, which donate that storage to the fabric, and which use their spare compute capacity to run workloads.
While there's no reason that most HCI solutions couldn't incorporate other storage sources into their data fabric, many HCI solutions are restrictive. Customers are only allowed to add a narrow class of nodes to a cluster, cluster sizes are constrained to a small number of nodes and the nodes themselves are often not particularly customizable.
More open data fabrics allow for an "HCI-Plus" approach. With open data fabrics customers can buy servers filled with storage and build clusters that behave exactly like HCI. They can also add their old storage arrays or cloud storage to the mix, using them to provide cold or archival storage, snapshot storage, and so forth.
Real-World Data Fabric Use
It's reasonable to wonder why an HCI+ approach to data fabrics is even entertained. It sounds great on paper, but if you're going to build a data fabric, why not just build it all out of whitebox servers, use a traditional HCI approach and be done with it? Maybe tack on some cloud storage as an off-site destination for snapshots or cold storage, but is there really a point behind welding on anything else?
The answer to this is complicated. If you're designing a brand-new deployment, then an HCI+ approach has very limited appeal. Traditional HCI does the job, with even basic cloud replication or backup capabilities showing tepid uptake from HCI customers.
Greenfield deployments, however, don't represent the majority of environments. Most organizations are a messy mix of IT. Different tiers or classes of infrastructure from different departments all refresh at different times. Mergers and acquisitions can often mean organizations are running multiple datacenters, each with their own entirely distinct design and approach to IT.
Many organizations aren't yet ready to embrace HCI -- let alone open data fabrics -- for mission-critical workloads. For these organizations, a decade isn't enough time for a technology to meet their standards of reliability, they'll stick with the tried, true and expensive for years to come.
Even in traditionalist organizations, however, there's usually a push to find efficiency wherever possible. Non-mission critical workloads, dev and test, as well as archival storage are all areas where administrators are more open to trying "new" technologies. This is where data fabrics -- and more specifically the HCI+ approach -- are gaining the most traction.
The Problem with Classic HCI
The problem with classic HCI solutions is that they're very restrictive regarding cluster composition. The ratio of storage capacity to number of CPU cores to RAM isn't particularly flexible. Additionally, individual nodes within the cluster can only have so many SSDs (and so many NVMe SSDs), limiting the performance of any given node's storage.
So long as you know exactly what you want to put on a given cluster before you buy it, you're probably good. You can specify the cluster's capacity and performance to meet those exact needs. In the real world, however, priorities change, new workloads are introduced and the workloads assigned to most clusters at their retirement look nothing like the original design.
A common reaction to this scenario is for administrators to overprovision their clusters. Overprovisioning is a time-honored tradition, but it sort of defeats the purpose of using HCI in the first place. HCI is usually sold on the basis that you can grow your cluster as needed, when needed, saving money as you go.
Unlike more open data fabrics, however, classic HCI solutions don't let you just add a pair of all-NVMe nodes for performance, or a great big box of spinning rust for capacity. This is where HCI-plus comes in.
Not only will an HCI-plus approach let you create HCI clusters with diverse node composition, it allows for taking all the old storage arrays kicked loose from the mission-critical tier, and giving them a second life by adding them to the non-mission-critical tier's data fabric.
Buzzword Bingo
Like it or not, nomenclature matters. Few (if any) vendors that bill themselves as HCI vendors take an HCI-plus approach. Vendors that do offer a more open data fabric than classic HCI haven't yet collectively settled on a means to describe what they do, in part because there's a lot of room for variation between classic HCI and a completely open data fabric.
The inability of vendors to settle on nomenclature has made educating customers about what vendors do -- and how they differ from their competitors -- rather more difficult than it really needs to be. This, in turn, makes it difficult for customers to find the best fit for their needs. In part, this is why the storage market is such an unruly mess.
What's important to bear in mind is that, while many HCI vendors bill themselves as "the new hotness," pointing to traditional storage arrays as being an archaic solution, HCI is already almost a decade old. It's not new, and in those 10 years many HCI vendors have been shown to be just as prone to lock-in, overcharging and needlessly restricting customers as the vendors they sought to displace.
Open data fabrics offer the same promise as the storage revolution we were promised a decade ago; they're what HCI should have been. It remains to be seen if existing HCI vendors will open up their products to at least offer an HCI-plus approach, or whether they'll fade away, ceding territory to a new generation of open data fabric vendors eager to define their own buzzwords, and make their own mark on IT history.
About the Author
Trevor Pott is a full-time nerd from Edmonton, Alberta, Canada. He splits his time between systems administration, technology writing, and consulting. As a consultant he helps Silicon Valley startups better understand systems administrators and how to sell to them.