The Hoard Facts

Blog archive

Serengeti Soups Up Apache Hadoop

Via a new open source project named Serengeti, VMware is bringing Apache Hadoop up to speed by enabling it to run in mainstream virtualization and cloud environments and simplifying its deployment, configuration and management. Beyond that, the open source software framework for data-intensive, distributed applications is being developed to contribute extensions that will make key components "virtualization-aware to support elastic scaling and further improve Hadoop performance in virtual environments."

VMware is also positioning Apache Hadoop, which has been languishing below its potential, as the de facto standard for big data processing. The company recognizes that this alleged distinction loses some of its luster when you consider the fact that deployment and operational complexity, along with the need for dedicated hardware, and concerns about security and service level assurance, prevent many enterprises from leveraging the power of Hadoop.

Enter Serengeti, which is available for free download under the Apache 2.0 license. According to VMware, "By decoupling Apache Hadoop nodes from the underlying physical infrastructure, VMware can bring the benefits of cloud infrastructure -- rapid deployment, high-availability, optimal resource utilization, elasticity and secure multi-tenancy -- to Hadoop."

VMware is understandably quick to link Hadoop with vSphere, noting that Serengeti is a "one-click" deployment toolkit that enables enterprises to leverage vSphere to deploy a highly available Apache Hadoop cluster in 10 minutes, including common Hadoop components such as Apache Pig and Apache Hive.

VMware is working with prominent Apache Hadoop distribution vendors, including Cloudera, Greenplum, Hortonworks, IBM and MapR to support a wide range of distributions.

In an effort to simplify and expedite enterprise deployments of Apache Hadoop, VMware is also working with the Apache Hadoop community to contribute changes to the Hadoop distributed File System (HDFS) and Hadoop MapReduce projects to make them virtualization-aware. The company also announced updates to Spring for Apache Hadoop, an open source project first launched in Feburary 2012 to make it easy for enterprise developers to build distributed processing solutions with Apache Hadoop.

All told, these projects and contributions have an ulterior motive, which is to "accelerate Hadoop adoption and enable enterprises to leverage big data analytics such as Cetas -- which was acquired by VMware this April -- to obtain real-time, intelligent insight into large quantities of data."

Posted by Bruce Hoard on 06/19/2012 at 12:48 PM


Subscribe on YouTube