Using Image-based Backup with De-dupe Appliances
In the world of disk-based backup software and image-based backup, de-duplication appliances are a good option to maximize storage and improve long-term retention. Outlined below are a few key aspects you should consider in order to use de-dupe appliances effectively with image-based backup.
Backup Software Requirements for Sending Backup Jobs to De-Dupe Appliances
Most de-dupe appliances don't get along well when the backup software's compression is turned on, because compression scrambles the blocks each time and reduces your overall de-dupe ratio. Appliances also don't get along with encryption, as it causes the same problems with regard to scrambling the blocks around, which also tends to reduce your overall de-dupe ratio. Some appliances have to support your backup software archives format directly, while others are generic in nature and will support just about any kind of backup you send. I have seen situations in which some de-dupe appliances don't handle disk-based archives that change, so if your backup software changes the backup archive after it's created, this feature could cause issues with your de-dupe rate and/or long-term backup performance.
Definition of and Standards for Acceptable De-dupe Rates
You will hear everyone taking about their de-dupe rate, e.g. “We're getting 20x dedupe.” Well, what does that mean for your operations exactly? If you're getting 20x de-dupe rates, it means you're getting 20 terabytes (TB) of data for every one TB of actual disk space. When sending image-based backups, I hardly ever see the de-dupe rates less than 20x. So if your de-dupe appliance has six TB of local storage in a RAID configuration, you could actually store 120 TB of image-based backups.
Fitting that much data in such small footprint changes the game for disk-based storage, and changes how much data you can keep on hand before you might roll data off to tape.
Getting Optimum Results from Your Backup Software
Let your backup software do the distributed work of shrinking down much of the data that has to be sent to your de-dupe appliance. If your backup software has its own ability to make backup files as small as possible and complete the backup process as fast as possible, then let it do its job. For example, if your software supports the ability to skip unallocated blocks, zero blocks and deleted blocks, allowing it to complete this upfront work ahead of sending it to the appliance gets you optimum results with less network congestion and less impact on appliance CPU cycles. Without this upfront step, your network will be transmitting larger files and your de-dupe appliance will have to chew up CPU cycles de-duplicating zero blocks.
If your backup software supports incremental backups without a real performance hit, opt for incremental backup jobs. The change rate for incremental backups may only be 2 percent, which means incremental backups are only 2 percent of the size as full backups, and thus will further reduce the workload and duration of the backup job because the de-dupe appliance will have less work to do.
De-Dupe and Support for Offsite Redundancy
Most de-dupe appliances support replication from one appliance to another, or even from many to one for offsite redundancy. The major benefit is that you're able to send 20 times less data over your WAN link to another de-dupe appliance. Let's just calculate this out. If your change rate is about 2 percent and you have 5 TB of VM disk, your used space for these VMs might only be 50 percent. So, if we multiply 5 TB by 50 percent we get 2.5 TB of used data. Multiplying the 2.5 TB by a change rate of 2 percent gives you the result of 51 GB. We would then apply the 20x de-dupe rate to 51 GB, and we end up with only 2.5 GB on the disk after de-duplication.
Why is 2.5 GB important? When you only have 2.5 GB of data stored after a round of backups, you only have to send 2.5 GB over the wire to your offsite de-dupe appliance. If you have a four Mbps pipe between offices to replicate your two percent change of data 51 GB (2.5 GB after de-dupe) it would only take 1.5 hours, compared to 29 hours without de-dupe.
Image-based backup with data de-duplication appliances is a powerful combination. By turning off compression and letting the backup software and appliance each do the jobs for which they are optimized, you can save a lot of time and space for backup.
Posted by Jason Mattox on 06/11/2010 at 12:49 PM