Getting Excited and Feeling Dismissive About Storage Virtualization
First off, a huge thank you to editor Bruce Hoard and the Virtualization Review team for inviting me to participate in this CTO blog forum. I look forward to exercising our collective thoughts on storage virtualization. Or should I say virtualization storage?
For most of my professional life I have been intimately involved in developing data storage management software. Having seen and touched a lot of technology over the years, I often experience déjà vu, something that other industry veterans can undoubtedly relate to. Whenever a new wave of innovation rises, be that client -- server computing, storage networking, storage virtualization or an iPhone, it is very easy to get excited about it (and commit to an expensive multi-year contract in return for dubious quality). And, at the same time, feel dismissive.
Storage virtualization: This is just RAID or volume management repackaged, right? Storage networking: Haven't they heard of NFS before? iPhone: Well, it is cute and I can use it to play Plants vs. Zombies. However, the main thing I have come to realize is that it takes time to understand the true impact of new technology.
I must admit, I had some of those dismissive feelings when server virtualization started emerging years ago. And I definitely did not fully appreciate the challenges that virtualization presented in the storage layer. It felt like a familiar if not trivial clustered storage problem, a problem that someone had surely solved. Well, that was almost true; a lot of vendors claimed to have solved it. Yet every large (and not so large) user of virtualization has plenty of gripes about how traditional storage worked (or didn't) in the virtual world.
It took a fair amount of time and effort to come to an understanding that the storage challenges presented by virtualization are rather unique, and that existing solutions were notaddressing them in a comprehensive fashion.
The unique storage challenges arise from the fact that virtualization changes the nature of the relationship between operating system software and computer hardware. In the old world, we always had a one-to-one relationship between a computer and an instance of the operating system running on it. The speed of creating (or as we say nowadays, provisioning) new computer systems was naturally limited by how fast we could procure hardware, connect it to a network and install an operating system on it. We did not really care about the storage an operating system image consumed as the boot disks were an integral part of the computer and were thought to be free. (They weren't, but we never noticed it.)
Virtualization changed that. Servers and workstations became software objects. And as software objects they acquired natural software lifestyles: We allocate them, copy them, move them around, back them up and delete them. They don't need to be running all the time, and can sit dormant for months making it easy to lose track of them while they eat a lot of disk space -- which we suddenly is not free. Server and workstation management in the virtualized world is in fact a storage management problem.
Physical server consolidation as one of the early benefits realized by server virtualization brought about another consequence: the cost of a physical system failing went up dramatically because now a failed piece of hardware brings down not one but several virtual machines. A few years ago a high availability cluster was something you reserved for a few select mission critical servers. Nowadays almost all of your servers are mission critical. Clusters are not only becoming ubiquitous, they also grow rapidly in scale. One of the most amazing features of virtualization is an ability to migrate live running virtual machines between physical servers for dynamic resource balancing which requires clusters of servers that interact seamlessly with clustered storage.
In general the storage problems in the virtualized world fall into three broad categories: performance and virtual machine density, storage sprawl, and virtual machine provisioning. They are all closely related and can be looked upon as different manifestations of the same underlying fact that virtual servers are different from physical. Virtual servers go through different life cycles, they produce different storage access patterns, and consume storage at different rates, and the list goes on. Old storage solutions only take you so far in the virtual world.
I welcome your comments as we frame these discussions and compare notes on the physical world moving into the virtual one.
Posted by Alex Miroshnichenko on 07/27/2010 at 12:48 PM