6 Things You Don't Know About Deduplication in the Datacenter -- Virtualization Review

6 Things You Don't Know About Deduplication in the Datacenter

It's what you don't know, or know incorrectly, that can stump you in your deduplication efforts.

By Paul Kruschwitz
07/15/2013

Whether you work in an enterprise or a small business, deduplication is likely on your radar. That's because this standard data protection feature -- which was once only available at the enterprise level -- is now driving cost savings and efficiencies in data centers of all sizes. Unfortunately, the general awareness of deduplication doesn't mean users are leveraging it to its full potential. There are several critical facts about deduplication methodology, processes and performance that are not clear to IT leaders.

As companies face data growth rates of 50 to 60 percent every year, they need to take full advantage of deduplication's ability to eliminate redundant data, maintain manageable back-up windows, reduce storage and bandwidth costs, increase flexibility and data availability, and integrate with tape archival systems. To reap these benefits, IT leaders first need to know what they don't know. For many, that includes the following things:

1. Modern deduplication methods are flexible.
There's no reason to fear you're going to be stuck with whatever methodology you select at the outset. Whether your data center goes with post-processing, concurrent or inline deduplication, you can always switch it up later to meet the needs of your specific data sets and align your backup policies with business goals.

2. Global deduplication puts an end to server silos.
There is a common misconception in the market that data centers are full of isolated servers that have their own deduplication processes and fail to communicate with each other, thus creating multiple copies of data across systems, wasting backup space and defeating the overall purpose of deduplication. This is not the case. Global deduplication ensures that each node within the backup system is deduplicated against all the data in the repository, regardless of multiple application sources, heterogeneous environments and disparate storage protocols.

3. Deduplication repositories can scale as needed.
The very fact that data centers need deduplication to manage massive data growth makes it essential that these solutions are scalable. You do not need to swap out equipment to change or upgrade your deduplication solutions as space on your servers runs out. IT administrators can scale capacity to the backup target disk pool and build disk-to-disk-to-tape backup architectures around the deduplication system. Rather than replace deduplication repositories as needs increase, IT can use cluster and storage expansions to scale as needed. This also allows for smaller initial footprints and lower initial costs.

4. Deduplication can scale to high speeds and protect the overall performance of your entire environment.
Intelligent deduplication solutions can scale up to high speeds and pull data into post processing to take the pressure off the backup window and increase speed. Look for deduplication that can support the latest high-speed SANs to make sure you're prepared to handle fast deduplication times.

5. Deduplication can be deployed with high availability to always keep data available.
There is a popular myth that if one deduplication node fails, then the data is unavailable. This is a fallacy. In fact, the high availability of modern deduplication allows companies to add additional storage and back up any data in another node if there is a server or storage failure. Choose deduplication solutions that let you link multiple nodes. This eliminates the problem of a single point of failure by automatically failing over to another node. The result is that your data is available at any time, and your recovery time objective and recovery point objective after a failure or disaster are far shorter. Advanced deduplication delivers high availability backup nodes that scale independently of high availability cluster nodes, letting you handle large data sets or manage narrower backup windows.

6. Deduplication can write data to tape and accommodate your existing back-up processes.
You don't need to dump your backup procedure or tape archives in order to adopt deduplication. There's no need for a rip-and-replace approach when you can instead leverage a virtual tape user interface. This allows the deduplication appliance to replace tape with disk without altering the backup process. Many data centers have to hang onto their tape backup to meet archival and legal data retention requirements. In these cases, IT leaders need advanced, automated tape management capabilities within their backup and deduplication environments to simplify operations, decrease media consumption and reduce the expense of handling tape.

With business-critical data on the line, IT leaders cannot afford to get duped by deduplication myths. Intelligent deduplication solutions give data centers a global approach to flexible, scalable, high performing and highly available data protection and storage. Deduplication is an essential part of any comprehensive data protection plan, and data center administrators should make sure their understanding of this technology aligns with its true capabilities.

About the Author

Paul Kruschwitz is the director of product management at FalconStor Software. He has more than 20 years of experience in technology with a specific focus on data protection and deduplication technologies.