Storage, The New Virtualization Frontier

If you're only virtualizing your servers or desktops, you're missing out on one of the best bets: virtual storage.

If 2007 was the year of server virtualization, then 2008 may be the year storage virtualization finds its way into many data centers.

Make no mistake about it -- server virtualization has left an indelible mark on data centers though consolidation, application mobility and higher availability. But as data centers begin to reap the benefits server virtualization has to offer, IT administrators will divert their attention toward another consolidation-rich target: storage.

Like its server virtualization brother, storage virtualization promises better utilization of data center storage resources, decreased power and space consumption, higher availability and greater storage infrastructure flexibility as a whole. But server and storage virtualization are vastly different in implementation. While server virtualization creates a virtual machine (VM) environment for operating systems and applications, storage virtualization is more of a functional layer that virtualizes the storage interface and data layout, obfuscating where the data is stored. This virtualization layer adds enhanced storage functionality that can increase storage utilization, agility and availability.

But before we get into what storage virtualization is, let's talk about the impetus for storage virtualization and its benefits.

Growth Issues
Statistics show that data growth is out of control; storage consumption is so voracious, capacity is used almost as fast as it can be provisioned. Indeed, according to research firm IDC, total capacity of disk systems shipped grew 49.4 percent to 1.3 Exabyte in Q4 2007.

Given this situation, IT organizations must focus on mitigating the effects of rapid data growth. These effects include:

  • Rising costs. As data growth increases, storage-related capital expenditures often blow the IT budget. Fibre Channel storage array platforms -- and their associated disk drives -- are an expensive capital investment. Even more expensive are the HBAs and switches necessary to connect servers to these platforms.
  • Data availability. Although many storage platforms often include high-availability features, such as redundant controller cards, multiple I/O ports and redundant power supplies, they represent a single point of failure for the data stored within. Enterprise arrays -- which have the highest redundancy features -- offer the best protection, but also the highest cost, and cannot protect against data center failures such as power outages. But as data grows, many IT organizations cannot afford the high-availability features, such as replication, on enterprise arrays. Replication -- a key storage virtualization feature -- is required to increase data availability.
  • Storage inefficiency. Perhaps the greatest concern -- the real enemy -- is inefficiently stored data. In many data centers, storage is viewed as a "black box" that keeps data and little regard is given to how the storage is allocated or how data is managed. The result is overcommitted idle storage, massive data duplication (especially for repetitive data such as documents, operating systems and application installation files), and stale data that remain on disk until doomsday.

Advanced Platform Features
How does storage virtualization help to solve these issues? To understand the benefits of storage virtualization, we must first look at the advanced storage features that virtual storage platforms provide. Many storage virtualization products include advanced storage functionality, such as:

LUN aggregation (aka "meta-LUNs"): Can increase performance (more controllers working on the same data set) and can reduce risk if RAID algorithms are applied.

Advanced RAID: Previous RAID implementations assigned an entire disk to a RAID set. A three-disk set was capable of creating several RAID 5 LUNs. Using advanced RAID implementations, only a portion of disks are assigned to a LUN.

For example, a portion of storage from three disks can create a RAID 5 LUN. Using this approach, storage is based on capacity, rather than hard-wired to a disk, increasing storage mobility and enabling features such as thin provisioning and advanced tiering.

Thin provisioning: Reduces the amount of storage required to create a LUN or volume by allocating just enough capacity to store the current data set. For example, the storage administrator can create a 1TB LUN, but the storage device may provision 25GB required to store client data. When a client creates more data, the storage virtualization layer will dynamically provision additional capacity as needed.

Continuous data protection: Uses snapshots to continuously record changes to the data set. Using these snapshots and a log file, the administrator is able to recover the data set back to a specific date and time.
Storage tiering: Moves data to different types of media based on some data attribute (or metadata). In many cases, the administrator uses policies to manage storage tiering. For example, the administrator may create a policy specifying that all data that hasn't been accessed for three months is to be migrated from faster, more expensive drives to slower, cheaper ones. Using storage tiering, an administrator can more effectively manage the available storage.

Data deduplication: Reduces the amount of data stored to disk by preventing duplicate blocks or files stored by multiple clients. In many cases, the storage virtualization layer keeps a record of the clients that "own" the data and keeps the single instance until the last client deletes the data. Although data deduplication is more of a futuristic feature, block-based data deduplication may be a standard feature in the years to come.

Other storage virtualization features include replication, snapshots, caching and quality of service. All of these features are used to increase storage utilization, reduce risk, raise data availability and increase storage performance. In turn, the benefits of storage virtualization decrease storage energy and space consumption, lower budget spent on storage systems and decrease storage system prices through commoditization.

What Is Storage Virtualization?
Storage virtualization is an evolving term. Volume management, file systems, RAID, distributed file systems and zones of a SAN are all examples of virtual-storage constructs. More recently, the term has come to mean a block-based virtualization layer that resides somewhere in the I/O path between the client and the storage device. By residing in the I/O path, the storage virtualization layer can make intelligent decisions regarding what/where/how data should be written to disk.

There are three types of storage virtualization implementations, each with strengths and weaknesses.

3 Types of Storage Virtualization

Switched-Based Storage Virtualization
This is an application that resides in a server connected to an intelligent Fibre Channel SAN switch. The application implements storage virtualization functionality, such as replicating data on the various SAN ports within the switch.

The advantage to switch-based storage virtualization is speed. Although the switch must offload storage decisions to the virtualization server, the speed is faster than other types of virtualization and is not tied to a single storage platform. However, switch-based virtualization is very expensive and lacks some storage virtualization functionality, such as thin provisioning. Its price has kept it from catching on.

Appliance-Based Storage Virtualization
In this case, an appliance -- residing between the client and the storage array -- implements the storage virtualization layer and becomes the new storage target (rather than the storage device). When the client sends data to the appliance, the storage virtualization software running on the appliance can enable storage virtualization features such as data replication, thin provisioning or take snapshots.

Advantages are versatility and cost. Storage virtualization appliances are often software apps running on a low-cost server platform such as an x86 server. This type of platform can aggregate any storage array behind the appliance, allowing the admin to purchase lower-end storage arrays. However, storage appliances sometimes incur additional management overhead because the storage virtualization software isn't integrated with the hardware or OS platform management software. Thus, in some cases, admins may need to employ three or more consoles to manage the appliance.

Array-Based Storage Virtualization
In array-based virtualization solutions, the storage virtualization layer lives entirely within a storage array. In fact, many storage arrays already include many virtualization features, such as thin provisioning and replication.

The advantage to this approach is better manageability (the storage array is sold as a single turnkey solution). Array-based virtualization is probably the best place for storage virtualization functionality because arrays have the right combination of price, performance and management. However, array-based storage virtualization has some versatility/interoperability drawbacks. Because the storage virtualization layer is implemented inside the array, some virtualization features are difficult to implement. Thus, array-based storage virtualization functionality usually resides within a single storage array or within similar arrays from the same vendor. For example, LUN aggregation across heterogeneous arrays -- a key cost-saving feature -- is not an array-based storage virtualization feature. By limiting storage choice, array-based virtualization loses some of the storage virtualization value proposition.

Market Players
So, who are the market players in storage virtualization? Well, almost every storage vendor has some type of storage virtualization platform in their product portfolio. For example, many of the storage arrays from EMC Corp., Hewlett-Packard Co., IBM Corp. and LSI Corp. enable storage virtualization features such as advanced RAID functionality, replication and snapshots. However, storage virtualization within these platforms varies to a large degree. Storage virtualization-like server virtualization has done to Microsoft -- has caught many of these vendors off guard. They're late coming to the market with virtualization platforms outside of the array.

However, some of the larger vendors have storage virtualization platforms beyond the array. Among the larger storage vendors, IBM and Hitachi Data Systems Corp. are leading the way. IBM's SAN Volume Controller, an appliance-based solution, and the HDS Universal Volume Manager (part of the HDS Universal Platform Storage array) can utilize external heterogeneous array platforms as the back-end storage device. In addition, EMC has a storage virtualization platform (Invista). Invista is a switch-based solution that has seen lackluster adoption.

But perhaps the real innovation in the storage virtualization market is happening outside of the large vendors. The storage virtualization market is blanketed by storage appliance vendors such as 3PAR Inc. and Compellent (array-based solutions), as well as FalconStor Software, LeftHand Networks Inc. and DataCore Software Corp. (appliance-based solutions). Although small, these vendors have made a name for themselves in storage virtualization, much the way VMware Inc., Citrix Systems Inc., Virtual Iron Software Inc. and Parallels have in server virtualization. A couple of the larger vendors have taken notice. LSI purchased StorAge and Dell Inc. has purchased EqualLogic, both storage virtualization platforms.

Note that some array vendors may not support a storage virtualization layer in the I/O path between the client and the array. Before buying a storage virtualization product, be sure your storage array vendor of choice will support the virtualization platform "in front" of the array. Storage virtualization is becoming increasingly popular. Larger array vendors feel the customer pressure to support storage virtualization and are increasingly doing so.

As the demand for storage virtualization grows, consolidation through market acquisition will begin to solve these support issues. Companies like 3PAR, LeftHand and DataCore will likely be scooped up by the larger companies. Thus, now is the time to evaluate storage virtualization. The benefits are immediate and the issues are diminishing.