Doug on Cloud

Blog archive

Hadoop and the Cloud

Hadoop is one crazy name for one crazy technology. Apparently the technology was named after a toy elephant, though it still sounds like it tastes pretty good!

Jack Norris with MapR Technologies, which offers a Hadoop distribution, recently told Enterprise Systems Journal why Hadoop makes so much sense in the cloud.

Before we tackle they why's, what is Hadoop anyway. According to Norris "it is a platform that allows enterprises to store and analyze large and growing unstructured data more effectively and economically than ever before. With Hadoop, organizations can process and analyze a diverse set of unstructured data including clickstreams, log files, sensor data, genomic information, and images."

That alone makes it perfect for the cloud. "Hadoop represents a paradigm shift. Instead of moving the data across the network, it's much more effective to combine compute on the data and send the results over the network," Norris explains.

This makes a lot of sense. Processing in the cloud is most often speedy on state of the art servers (the rub being, if they are over-virtualized or otherwise overcommitted).

What slows the cloud is the network in and out, so maximized processing in the cloud and minimizing transmission makes perfect sense.

The cloud is perfect for Hadoop because all the unstructured data Hadoop deals with is growing close to exponentially. "The issue is that these data sources are typically unstructured like social media or sensor data and are growing in volumes that outstrip the ability to process them with the existing tools and processes," Norris says. "Hadoop removes all these obstacles by providing a radically different framework that allows for easy scale-out of systems and for processing power to be distributed. Data from a wide variety of sources can be easily loaded and analyzed with Hadoop. There's no need to go through a lengthy process to transform data and a broad set of analytic techniques can be used."

But this is all too much for the typical data center. The cloud is ideal for handling all this growth.

Fortunately some cloud providers have distinct services for Hadoop.

Posted by Doug Barney on 10/16/2012 at 12:47 PM


Featured

Subscribe on YouTube