Ridding Yourself of the Deleted Data that Haunts Virtual Backups
Do you know what vampires and deleted data from virtual machines have in common?
They’re hard to kill and can suck the life force out of your systems.
Just like vampires, deleted data has a way of coming back to haunt you after you think you’ve killed it off forever. And, like a vampire sucking blood, deleted data that isn’t really killed off will suck space out of your storage and performance out of your systems.
If you have been using image-based backup, you may have seen situations where the compressed backup size is larger than the used space of the operating system. This happens because when a Windows OS deletes a file, the file is never removed from the partition. If you have 10 GB of used space in a 20 GB VMDK, this means you have 10 GB of white space for Windows to use for deleted files. Windows will keep writing deleted data until it fills the partition; once the partition is full it will overwrite deleted data with more deleted data. When you add more actual data to the volume, the deleted data will be overwritten to allow room for more actual data.
This is where undeleted data really starts to haunt backup operations. When your disk looks like the graphic below, you end up with 5 GB of deleted data that will be included in the backup compression and possibly 2.5 GB of extra network traffic and storage on disk.
Undeleted data hampers system performance and wastes precious storage space. The more white space your system has, the more free reign you give the OS to fill empty file space with deleted data. The deleted data will only be replaced for two reason: 1) all existing white space is filled with previously deleted data, and there is no room for new deletions, or 2) all existing white space is full with deleted data and there is no space available for new actual data.
These scenarios are scariest for file servers and VMs with applications that use log-ahead writing, which include SQL, Exchange, AD, databases, Web services and transactional applications. File servers use home drivers and roaming profiles, which result in a large amount of deleted user files over time. Applications with log-ahead writing are vulnerable because they tend to write all changes to a change log, then merge the changes to the database in a controlled event. When the controlled event executes, the change logs all become deleted data.
So how do you kill undeleted data? Compression and de-duplication preserve space and give you some sense of security, but they don’t truly solve the problem. You can try manual tools like VMware’s Shrink or Microsoft’s Sdelete, which will shorten your backup time and remove the inflation of deleted data from your archive. These approaches might keep the space problem at bay for awhile, but it will still come back to haunt performance – because your the backup system will still have to spend time compressing white space or even de-duping white space blocks.
Why even run manual tools that have to be run on an ongoing basis and that will just add extra overhead to your VMs causing a lot of writes to your disk? That’s like setting yourself for a sequel, ensuring there will be more problems to deal with down the road.
You can put a stake through the heart of the undeleted data vampire to stop it from sucking space and performance from backup operations once and for all. The secret is not to allow white space into your backups in the first place. You can accomplish this by skipping over white space blocks during the backup process. Some solutions can automatically detect white space blocks, and exclude them from the backup operation. Plus, these solutions can be used with de-duplication and compression for even greater effect.
Skipping over white space is a new approach to virtual systems backup -- and one that lets you skip over having undeleted data suck space, performance and efficiency from your virtual environment.
Posted by Jason Mattox on 05/18/2010 at 12:49 PM