Big Data Infrastructure Decisions: Data in Motion vs. Data at Rest
The best solution depends on what your data is doing.
- By Larry Parscale
Big Data is playing an increasingly significant role in business success as organizations strive to generate, process and analyze massive amounts of information in order to make better business decisions. But how do organizations ensure that they derive value from all this data? To extract meaningful insights, data must be approached in entirely different ways at different points in its lifecycle -- from creation to ingest, to comparative analysis based on multiple sources of data, to decision making and, finally, the action taken from that decision. The "stage" of Big Data is critical for determining the right infrastructure to host the applications that are collecting, managing and analyzing the data.
In general, data can be broken down into two basic categories -– data at rest and data in motion -– each with different infrastructure requirements based on availability, processing power and performance. The optimal type of infrastructure depends on the category and the business objectives for the data.
Data at rest refers to information collected from various sources and analyzed after the data-creating events have occurred. The data analysis occurs separately and distinctly from any action taken on the conclusions of that analysis.
For example, a retailer analyzes a previous month's sales data and then uses it to make strategic decisions about the present month's business activities. The action takes place well after the data-creating event. The data scrutinized may spread among multiple collection points consisting of inventory, sales price, sales made, regions and other pertinent information. This data may drive the retailer to create marketing campaigns, send customized coupons, increase/decrease/move inventory, or adjust pricing. The data provides value in enticing customers to return and makes a long-term positive impact on the retailer's ability to meet the needs of customers based on region.
For data at rest, a batch processing method is typically utilized. In this case, there's no pressing need for "always on" infrastructure, but there is a need for flexibility to support extremely large, and often unstructured, datasets. From a cost standpoint, public cloud can be an ideal infrastructure choice in this scenario because virtual machines can easily be spun up as needed to analyze the data and spun down when finished.
The collection process for data in motion is similar to that of data at rest; however, the difference lies in the analytics. In this case, the analytics occur in real time as the event takes place. An example here would be a theme park that uses wristbands to collect data about their guests. These wristbands would constantly record data about guest activities, and the park could use this information to personalize guest visits with special surprises or suggested activities based on guest behavior. The business is able to customize the guest experience, in real time, during the visit. Organizations have a tremendous opportunity to improve business results in these scenarios.
For data in motion, a bare-metal cloud environment may be a preferable infrastructure choice. Bare-metal cloud involves the use of dedicated servers that offer cloud-like features without the use of virtualization. In this scenario, organizations can utilize a real-time processing method in which high-performance compute power is always online and available, and is also capable of scaling out at a moment's notice.
Until recently, many organizations may have assumed public cloud to be the natural choice for this type of workload. However, as more companies host Big Data applications in the public cloud, they're confronting its performance limitations, particularly at scale.
Bare-metal technologies can enable the same self-service, on-demand scalability and pay-as-you go pricing as a traditional virtualized public cloud. Bare-metal cloud, however, eliminates the resource constraints of multi-tenancy, delivering the performance levels of dedicated servers, making it a better choice for processing large volumes of high-velocity data in real time.
Latency is also a key consideration for data-in-motion workloads, because a lag in processing can quickly result in a missed business opportunity. As a result, the integrity of network connectivity should go hand-in-hand with infrastructure decisions. A fast data application can only move as quickly as the network architecture that's supporting it, so organizations should look for multi-homed or route-optimized IP connectivity that's able to navigate around Internet latency and outages, ultimately improving the availability and performance of real-time data workloads.
Viewing data through the lens of one of these two general categories –- at rest or in motion –- can help organizations determine the ideal data processing method and optimal infrastructure required to gain actionable insights and extract real value from Big Data.
Larry Parscale is VP of Solution Engineering at Internap, a global Internet infrastructure provider.