Clustered Filesystems: Why All the Hype?
Any energized discussion around virtualization, if conducted long enough, will inevitably lead to defining storage management, which will then quickly turn to the subject of a clustered filesystem.
Clustered filesystems ensure a uniform presentation of storage resources and provide coherent filesystem semantics across all nodes in a computing cluster. This means applications running on different nodes may access files through the same APIs and the same semantics as if they were all on the same physical system. The workflow should seem obvious when we think about such file operations as 'create and delete,' but the actual details are far more intricate.
For example, take two applications that read and update the same region of a file and all reads and writes are completed as a single call. Proper filesystem semantics would guarantee that each application would see either the old version of the region, or the new version of the region, but never a mix between old and new. A clustered filesystem would ensure the same guarantee even for applications running on different physical nodes without requiring any involvement of additional application-level locking.
There are other nontrivial aspects implement correct filesystem semantics. It is not my intent to give an exhaustive overview of these, but I do wish to point out that it is extremely hard to implement a clustered filesystem correctly and in a way that does not impose severe performance limitations. A scalable coherent clustered filesystem with single system image semantics continues to be the 'Holy Grail' of storage management. Since the introduction of server clusters, which came into broad IT practice during the mid-1990s, a number of clustered filesystem products were developed. These products have experienced limited acceptance in the marketplace, primarily due to the number of applications actually requiring such specific functionality.
Now, with the proliferation of server virtualization, clustering and clustered filesystems are again a hot topic. The majority of all virtualization deployments are in fact clusters. Some unique benefits of virtualization, like live migration of VMs between physical hosts, can be achieved only in cluster environments.
It turns out that storage management in a cluster environment is less than trivial. Early vendors like VMware figured this out and developed basic tools for this purpose. As VMware chose to name their tool VMFS (VMware File System), the notion that storage in the virtualized world requires a clustered filesystem quickly became commonplace. Many legacy clustered filesystem vendors saw this opportunity and started positioning their technologies as the solution to the storage problems of the virtual world.
If only it was that simple. A clustered filesystem will provide a basic shared name space. And it is rather trivial to configure individual files on a clustered filesystem as virtual disk images for individual VMs. However, the core access patterns are drastically different from that of a traditional single-system image environment as described above. A virtual disk in a virtual server cluster is never accessed from different physical nodes at the same time. In fact, it is never accessed by more than one VM at any given point in time. Of course the access to a virtual disk may shift to different nodes in a cluster as virtual machines migrate, but access will never be simultaneous. A clustered filesystem would provide correct semantics for accessing virtual disks, and will in fact provide much tighter semantics, but these are not useful for the virtual server world.
There are other requirements for virtual disk access semantics – ones that are not supported by the traditional clustered filesystem architectures. Virtual disks are interesting beasts: There are a lot of them, they all tend to be related and look very similar, they breed like rabbits in Australia and they eat space just like them. So you really need specially designed storage management technologies to cope with them. A legacy cluster filesystem doesn't come close.
But what about VMware VMFS? It is called a "filesystem," isn't it? Yes, it is called that, but if you look carefully at the functionality it provides, it is first and foremost about managing virtual disk containers, it is not a clustered filesystem as legacy vendors would have defined it.
It is far less important how a product or a technology is named than what it does. As this blog column develops, let's explore the specific requirements and features of storage management in the virtualized world in greater detail.
Posted by Alex Miroshnichenko on 08/02/2010 at 12:48 PM