In-Depth
Storage: Virtualized vs. Software-Defined
Is there a difference between these two seemingly similar storage concepts? Yes, and it has to do with how that storage is managed.
One of the things I've learned in the decades I've worked in IT is that IT has more buzzwords than just about any other industry. Sometimes these buzzwords get so overhyped and misused that they become almost meaningless. Take the term cloud as an example. A cloud used to refer to a service running on the Internet. Today, it's possible to have multiple private clouds running in your own datacenter. The term has become so ambiguous that I even heard a radio commercial for a Canadian car dealership in which Bluetooth smartphone connectivity was being referred to as cloud technology.
Another ambiguous IT buzzword (or buzz phrase) is Big Data. It seems to most often refer to large sets of unstructured data, but I've had people tell me the definition is incorrect and that Big Data refers solely to compressed video files.
Certainly, cloud and Big Data aren't the first IT buzzwords to be overhyped, misused and abused, nor will they be the last. Right now the buzz phrase of the moment seems to be software-defined. As far as I can tell, VMware Inc. kick-started the overuse of this phrase a couple of years ago with its software-defined datacenter concept, but the software-defined trend has rapidly proliferated to other vendors and technologies.
I don't want to get off on a rant, but like so many other buzzwords and buzz phrases, there's a fair amount of confusion about the software-defined trend, especially when it comes to software-defined storage. The phrase itself, software-defined storage (often shortened to SDS), seems to have a number of different (and sometimes almost contradictory) meanings. So what is SDS, and how does it differ from storage virtualization?
I think the best way to answer this question is to take an objective look at storage virtualization, and what it entails. After all, storage virtualization is a well-defined term with a meaning that seems to be generally accepted. After a crash course in storage virtualization, I will talk about SDS and how it might differ from storage virtualization.
First, What Storage Virtualization Is
Storage virtualization typically refers to storage abstraction. In other words, there's a software layer sitting between the server (usually a VM, but not always), and this software layer provides the server with a different view of the underlying storage than what actually exists in the physical world. This allows the full storage capacity to be combined or sub divided on an as-needed basis.
Suppose, for example, that a physical host contained five 1TB drives. A storage virtualization component might make it appear as though the host contains a single 5TB volume, rather than five separate disks.
OK, I realize some of you are screaming, "Come on Posey. You don't need software-defined storage for that. You can accomplish the same thing with a RAID controller." This is very true. A RAID controller can be used to manipulate disks so they appear as one. Keep in mind, however, that disk consolidation is really only one example of how storage virtualization can be used.
Before I get into a discussion of some of the other things that can be done with storage virtualization, I want to point out that storage virtualization and RAID controllers can sometimes be at odds with one another. Typically, the job of storage virtualization is to create a pool of physical resources that can be used on an as-needed basis, with tremendous flexibility. A RAID controller, on the other hand, links disks together at the hardware level.
Recently, VMware released a new Virtual SAN (vSAN) feature. In spite of the feature's name, the vSAN feature is really a storage virtualization feature. What's interesting about it is that VMware recommends RAID controllers be configured to provide JBOD storage so that the vSAN can control each disk independently, as opposed to provisioning storage at the hardware RAID level.
To give you a more concrete example of how storage virtualization works, think about how Microsoft has implemented storage virtualization in Windows Server 2012 R2. Windows Server 2012 R2 provides storage virtualization support, even without Hyper-V being installed.
The Windows Server 2012 R2 storage virtualization feature is exposed through the Server Manager. You can access it by clicking on File and Storage Services, followed by Storage Pools. Incidentally, the File and Storage Services role is installed by default when you deploy Windows Server 2012 R2.
The Storage Spaces container allows you to create a collection of storage pools. A storage pool is really nothing more than a logical grouping of physical storage devices. For example, if you look at Figure 1 , you can see I've created a storage pool called MyPool. This pool contains four physical drives.
There are a few things worth paying attention to in the figure. First, even though the screen capture only contains a single storage pool, Windows allows you to create multiple storage pools. You can create different pools for different purposes. For instance, you might create a pool of high-speed disks for use with applications that require a high rate of disk I/O. Similarly, you could create a pool of commodity storage for use in situations where a lot of capacity is needed, but not a lot of performance.
It's also important to keep in mind each physical disk can only belong to a single storage pool. In the case of the server shown, I would not be able to create another storage pool because all of the disks within the server have already been assigned to the existing storage pool.
Something else I want to point out in Figure 1 is that the disks installed in the system match one another. Even so, it doesn't have to be this way. You can mix and match disks of different sizes and different capabilities within a storage pool. In fact, it's quite common for a single storage pool to contain a combination of traditional hard disks and solid-state disks. I'll talk more about that later.
The reason why you can mix and match disks of various capacities within a single storage pool is because of the way the Windows OS uses the disks. Remember, Windows Storage Spaces is a storage virtualization feature. This means with the exception of some relatively obscure Windows PowerShell cmdlets, this is the only place in the entire OS where the raw physical disks are exposed (unless you choose not to include the disks in a storage pool, then they're visible throughout the system).
Rather than making direct use of the disks included within the storage pool, Windows requires the administrator to create virtual disks on top of the storage pool. It's the virtual disks that serve as a layer of abstraction between the physical storage and the rest of the OS.
Throughout much of its history, Windows virtual disks have been associated with VMs. Even though there's a way to associate a virtual disk with a Hyper-V VM, the virtual disks created through Windows Storage Spaces aren't specifically intended for use with VMs. In fact, if you look at Figure 1, you'll notice a drive letter has been assigned to the existing virtual disk. If you open Windows Explorer, you'll see that this virtual disk is treated as a physical disk by the rest of the OS. It's very difficult to tell the difference between a virtual disk and a physical disk outside the interface shown.
To show you what I mean, take a look at Figure 2. Here, Windows Explorer displays all of the disks within the system. Windows Explorer makes no distinction between physical and virtual disks. In the figure, disk C: is a physical disk and F: is a virtual disk, and yet they look and behave similarly to one another.
So, why does Windows use virtual disks? As the figures show, Windows allows you to combine the capacity of multiple physical hard disks into a single virtual hard disk (or into a collection of virtual hard disks). There's more to it than that, however. When you create a virtual hard disk through the interface shown in Figure 1, you're able to define the virtual hard disk structure at the software level. Windows allows you to create virtual disks that use underlying stripe sets, two-way mirrors, three-way mirrors or parity sets.
When you create a virtual disk, Windows provides options for the underlying virtual disk structure based on the disks present within the storage pool. For example, if the storage pool contains only two physical disks, then you obviously wouldn't be able to create a parity set, but you could create a two-way mirror.
Similarly, if a storage pool contains three disks of varying capacities, then it would be possible to create a parity set, but the size of the parity set would be limited by the size of the smallest disk within the storage pool. The remaining capacity on the larger disks wouldn't be wasted, however. You could create additional virtual hard disks within the remaining capacity.
Earlier I mentioned that storage pools can contain a mixture of traditional and solid-state hard disks. The reason why Microsoft allows you to do this is because Windows Server 2012 R2 supports the creation of storage tiers. Tiered storage automatically places the most frequently read storage blocks on solid-state storage so that those blocks can be read with maximum efficiency. Less-frequently accessed blocks remain on traditional storage. Storage tiers also reserve a portion of the solid-state storage capacity for use as a write-back cache. The write back cache smoothes out write operations by allowing data to be first written to high-speed storage, and then copied to the slower, but higher capacity storage when the I/O load decreases.
What Makes Software-Defined Storage So Special?
Now that I've spent a considerable amount of time talking about storage virtualization, let's talk about SDS. Let me say right off the bat that SDS sometimes refers to storage virtualization. The two terms are often used interchangeably. Even though storage virtualization tends to have a fairly narrow definition, SDS does not. The term has been used to describe a wide variety of approaches to storage management.
I'm not even going to try to delve into an exhaustive and fully comprehensive list of every technology that has been referred to as SDS. There are so many different ways the term has been used that I think building a comprehensive list would probably be a futile effort. Even so, I want to tell you about some of the more common ways the term has been used.
Although I personally disagree with this one, I've heard the term SDS used to refer to virtual disks or to VMware virtual disk volumes. I'm assuming the basis behind this is that virtual disks do create a layer of abstraction between a physical or a virtual server and the underlying physical storage. Even so, referring to a virtual disk as SDS seems like a bit of a stretch.
Another usage for the term SDS is that it sometimes applies to clustered file systems. At first, this one might seem a little bit counter-intuitive because it doesn't really have much to do with virtualization in the traditional sense, and yet this definition still somehow seems more plausible than simply referring to a standard virtual disk as SDS.
The basis behind a clustered file system is that technologies such as the Microsoft Distributed File System (DFS) present users with a completely different view of a file system than what exists in the physical world. The view the user sees might include files and folders scattered across a variety of resources. In some cases, the underlying servers might also use redundant copies of data as a way of providing fault tolerance or performance improvements through load balancing.
I've also heard the use of technologies such as storage profiles or Storage QoS referred to as SDS. In case you aren't familiar with storage profiles, they're a mechanism offered by both VMware and Microsoft as a way of classifying storage. The basic idea is that by implementing these storage classifications, it becomes easier to place a VM on the most appropriate storage type. For example, in a vSphere environment, there's a feature called Policy Driven Storage that helps an administrator select a datastore based on a VM's storage requirements.
Storage QoS is a Microsoft feature that can be used to throttle storage I/O as a way of preventing a VM from consuming a disproportionate share of the hardware's IOPS capabilities. In my opinion, features such as storage profiles and storage QoS do not constitute SDS by themselves, although those features could conceivably be aspects of SDS.
As you can see, there's quite a bit of disagreement within the industry as to what SDS really means. I think some of the uses I've mentioned in this article get it partially right, but are too narrow in scope.
In my opinion (which you might disagree with), storage virtualization refers to the pooling of storage resources in a way that allows the capacity to be used on an as-needed basis. SDS, on the other hand, seems to be more about abstracting storage capabilities rather than storage capacity. As such, storage QoS, storage profiles and clustered file systems might be considered SDS features, but they're not the very definition of SDS.
It's All in the Capabilities
Although I've weighed in on the differences I perceive between storage virtualization and SDS, the IT industry has yet to adopt a solid definition for SDS. Of course, I think this will change over time. Even so, I think that there will always be a degree of overlap between SDS and storage virtualization.
If you consider my definition in which storage virtualization refers to capacity, while SDS is more about storage capabilities, the overlap seems completely natural. After all, what good are storage capabilities without capacity? But over time, I think these two terms will become far less ambiguous.