Dan's Take

BlueData Supports Hadoop and Spark on Docker

It's marketed as a simple way to get started with Big Data projects.

Big Data initiatives can be found in organizations of all sizes. When smaller enterprises begin their projects, they often discover that Apache Hadoop and Spark are projects that are composed of other projects, and that they must figure out how all of these projects work together, how to download them, how they should be installed and how they must be configured to work properly.

Sometimes, these smaller enterprises give up before they get a working computing environment together. BlueData believes that it has a solution, BlueData EPIC Lite, that makes it much easier for these enterprises to take the first steps in Big Data undertakings by reducing or eliminating the complexity imposed by these open source projects.

BlueData EPIC Lite
BlueData took its existing BlueData EPIC Enterprise and scaled it down to create an easy-to-install, -configure and -use BigData "sandbox." The Lite edition, designed to be installed as either a VirtualBox or Amazon EC2 instance, creates a smaller Big Data computing environment that can deployed by data scientists who don't have a great deal of experience with Hadoop installations.

BlueData EPIC Lite hosts Hadoop components in Docker containers to reduce the size and complexity of Big Data analysis environments. It includes a number of selected Big Data tools, including:

  • Cloudera CDH 5.2, a package consisting of Apache Hadoop and additional key open source projects
  • Hortonworks HDP 2.2, a package built on Apache Hadoop and based on YARN as the architectural center
  • Apache Spark 1.3.1, a fast cluster engine for Big Data computing

This lightweight package was designed to access data locally using the local filesystem or remotely using Apache's HDFS, NFS provided by a number of Unix or Linux systems, or Red Hat's Gluster file system.

BlueData EPIC Lite is available at no charge, and support is available through an online forum.

Dan's Take: There's Nothing Like Looking
I'm reminded of some advice in J.R.R. Tolkien's The Hobbit: "There is nothing like looking, if you want to find something. You certainly usually find something, if you look, but it is not always quite the something you were after." This thought distills some of the larger challenges faced by newcomers to the idea of Big Data analysis.

This type of analysis, unlike traditional Business Intelligence projects, is based upon searching through a huge amount of operational and machine data to tease out the best questions to ask.

I've spoken with a number of folks having titles such as "data scientist" or "DevOps" who work for midsize to small enterprises. One group understands the analysis of business data and how to "ask the right questions." The other group understands systems and computing architectures, but might not understand the best way to go about finding the right questions to ask.

In both cases, the individuals have been given the charter to use Big Data techniques and data analysis to help their organization find out more about customers, customer requirements and so on to improve the organization's success in the market.

BlueData is one of a number of suppliers offering tools to make Hadoop and other open source projects easy to use and install. Its goal is moving the use of these tools from a computer science project requiring a great deal of expertise to a tool an analyst can use.

If you've been stuck with a Big Data project and don't know where to begin, you might try downloading BlueData Lite as an initial step.

About the Author

Daniel Kusnetzky, a reformed software engineer and product manager, founded Kusnetzky Group LLC in 2006. He's literally written the book on virtualization and often comments on cloud computing, mobility and systems software. He has been a business unit manager at a hardware company and head of corporate marketing and strategy at a software company.


Subscribe on YouTube