What's So Special About VM-Aware Storage? -- Virtualization Review

What's So Special About VM-Aware Storage?

VM-aware storage may very well be the next big thing in the evolution of enterprise-grade storage. Here's why.

By Saqib Jang
10/01/2013

One of the hot topics at VMworld 2013 San Francisco was the concept of storage becoming better aligned with virtualization. Simply put, virtualization vendors have recognized the need for what's being called VM-aware storage.

One popular topic was Virtual Volumes. VMware Virtual Volumes (vVOLs) is a proposed storage interface that delivers VM-granular provisioning, snapshots and cloning. While VM-aware storage encompasses more than just the vVOLs interface, the significant investment and evangelism of VMware Inc. highlights the increasing importance of having storage systems that are better aligned with virtualization.

VM-aware storage is the next evolutionary stage in enterprise storage -- storage specifically designed for virtualized environments. The consensus among IT organizations and experts is that virtualization has created a range of new requirements for storage. Legacy architectures fall short in meeting these requirements because they're designed for a traditional I/O model generated by bare-metal workloads.

IT is looking to virtualize more and more effectively. As was evident at VMworld, storage is holding most IT organizations back from deploying large-scale virtualization, including virtualizing I/O-intensive, tier-1 applications such as business-critical Online Transaction Processing (OLTP) workloads. So what are the requirements for an effective VM-aware storage approach?

Performance Matters
Optimum virtualization performance requires storage solutions that can address the unique challenges virtualization presents. The latest generation of servers can easily support upward of tens of virtual servers. Each of those servers can generate its own I/O stream. As a result, the I/O patterns that virtual environments generate are far more random than those generated by applications running on bare-metal servers. This I/O-blender effect translates into strong performance degradation on traditional storage. This in turn results in IT deferring virtualization of I/O-intensive, tier-1 applications and manually isolating workloads.

While solid-state drives (SSDs) or flash-based storage can deliver latencies and throughput performance that are orders of magnitude better, that alone isn't the solution to address the challenges of large-scale virtualization. This is primarily due to the cost, which can be five to 10 times that of equivalent disk capacity.

While deduplication and compression are advantageous for minimizing virtualization capacity requirements, their use alone can't bridge the gap between the cost differential of flash and disk storage. As a result, hybrid storage -- a combination of flash and disk -- has come to the fore to address the challenges presented by the I/O blender effect.

You can deploy flash a number of ways in hybrid storage. Legacy storage systems often incorporate it into an existing disk-based architecture. It's sometimes used as a cache or bolt-on tier, while continuing to use disk I/O as part of the basic data path.

While a flash-based read cache is easier to implement, sustained write performance that scales for hundreds or thousands of VMs on a single system is challenging, especially without a complete understanding of the underlying VMs. Similarly, using a flash-based tier where active data is staged in the flash tier and migrated to the disk tier on a periodic basis also suffers from disk-based performance in case of a miss or an already full flash tier.

Another emerging hybrid architecture uses detailed and dynamic VM I/O profiling. Here, all active data and metadata is kept in flash and only cold data is evicted to archive disks. Using in-line deduplication, compression and VM working set analysis, this approach lets you service more than 99 percent of all I/O from flash for high levels of throughput and consistent sub-ms latencies for both read and write operations.

Performance guarantees for virtualized applications are also a concern if an enterprise wants to run tier-1 applications using a virtualized infrastructure. Many applications built using shared, virtual infrastructure tend to present a noisy neighbor problem. An application's performance can suffer if another application experiences a spike in demand and draws more heavily on shared resources.

VM-aware storage systems must support a mixed workload of hundreds of VMs, each with a unique I/O configuration. As volumes of traffic ebb and flow, the VM-aware storage system should analyze and track I/O for each VM, delivering consistent performance where it's needed most. VM quality of service (QoS) functionality should also be transparent without requiring any manual storage tuning.

The issue of VM alignment poses real challenges. Misaligned VMs can magnify I/O requests, consuming extra IOPS on the storage array. The impact can snowball as the environment grows with a single array supporting hundreds of VMs.

VM-aware storage requires VM auto-alignment. Storage should dynamically adapt to the VM layout and automatically align all VMs as they're created, migrated or cloned. You should be able to use VM-aware storage and enjoy performance gains with no VM downtime or required intervention.

VM Data Management
A VM-aware storage array should have all critical data-management operations operating at the VM level. This helps manage large-scale virtual environments to the vDisk level with complete, granular control. The VMware vVOLs interface tackles part of this with provisioning, protection and cloning.

The onus is still on the underlying storage system to ensure functionality is efficiently implemented for scale. A VM-aware storage solution supporting 1,000 VMs needs to be able to handle anywhere from tens to hundreds of thousands of vVOLs. This also factors in multiple vVOLs for the individual VMs, their individual vDisks and their snapshots.

Traditional storage systems provide cloning capabilities can vastly complicate VM deployment, cloning and management operations. A VM-aware storage system requires space-efficient cloning operations at the individual VM level. This eliminates the limitations of legacy storage architectures that necessitate complex provisioning and management.

VM-aware storage requires you to build on snapshots to support individual VM cloning capabilities. You do this by either taking a new snapshot or cloning a snapshot. This way, you can create hundreds of virtual clones in an instant, all of which are space-efficient and run at highest performance. You can then quickly access, power on and put into service your cloned VMs. This facilitates more efficient use cases such as virtual desktop infrastructure (VDI), development and testing, business intelligence (BI), and database testing.

VM-aware storage also requires efficient VM replication from a primary to a secondary array. VM-level replication lets you apply protection policies to individual VMs, rather than arbitrary units of storage such as volumes or LUNs. It also lets you easily establish a snapshot and replication policy for an individual VM or set of VMs.

VM-level replication requires replicating deduplicated and compressed VM snapshots from one storage system to another. The only thing you'd send across the network would be actual changed blocks or missing data. As a result, VM replication should be highly WAN-efficient. It may also help you perform remote cloning. This would make distributing golden images for workloads such as VDI with multisite high availability (HA) efficient and simple.

Because VM-aware storage is designed exclusively for virtualized environments, it requires built-in storage intelligence and control. This frees you up from having to do low-level, VM-related storage configuration tasks. As a result, given the platform's built-in storage intelligence and VM control, administration tasks should be automatic.

VM Performance Tracking
Traditional storage systems provide a performance view from the LUN, volume or file-system standpoint. They can't isolate VM performance or provide insight into VM-level performance characteristics. It's difficult to understand situations such as the impact of a new VM workload without access to relevant VM performance metrics.

Identifying the cause of performance bottlenecks is time-consuming, frustrating and sometimes inconclusive. It requires iteratively gathering data, analyzing data to form a hypothesis and testing the hypothesis. In large enterprises, this process often involves coordination between several people and departments, typically over many days or even weeks.

VM-aware storage should give you a complete, comprehensive view of your VMs including end-to-end tracking and visualization of VM performance across the entire datacenter infrastructure. This ensures you can get at the critical statistics you need, simplifying the process of troubleshooting performance problems.

The Next Stage
VM-aware storage should optimally support large-scale virtualization storage requirements. It will deliver VM performance and density without the complexity. It also provides operations and controls for VM-granular data management and automation, and end-to-end insight into virtualized infrastructure for VM performance characteristics and troubleshooting.

About the Author

Saqib Jang is principal and founder at Margalla Communications, a technology consulting firm specializing in server and storage issues. Reach him at [email protected].