Victor Nemechek

Divide and Conquer to Overcome Large Database Backup Challenges

Blog Post created by Victor Nemechek on Sep 23, 2015

In my first job as a Backup Administrator, I always had a hard time getting going on large projects. I would look at the huge scope of the project and be paralyzed. I didn’t know where to start and would flail about for days or weeks. I can still hear the words of my mentor say “Divide and conquer, my boy. Divide and Conquer!” He taught me how to break down large projects into many small manageable tasks which made it easier to get started and, working on each task in parallel, quicker to finish.


That sage advice still works today for Backup Administrators struggling with backing up large databases. Surveys show that many organizations with large databases struggle with meeting SLAs and completing backup operations within the given backup window. Each year as volumes increase, these struggles to protect mission critical data stored in large databases increases and remain at the top of the list of concerns for data centers. It is like having too many cars on a one lane highway, it crawls!

A “divide and conquer” strategy of simultaneously backing up databases with multiple backup streams (multi-streaming) and backing up multiple sources to a single backup device (multiplexing) can dramatically speed up large database backups. Thankfully, this technique is supported by all major databases and all major backup applications. With multiple lanes, traffic can move more quickly and efficiently.

The difficulty with this approach comes on the backend with backup appliances that use traditional hashing techniques and inline deduplication. Backup applications leverage database tools, such as Oracle RMAN, to take care of all underlying database procedures before and after backup operations. So in addition to the metadata that the backup applications itself inserts into the backup stream, tools like RMAN also introduce their own metadata (time stamps and sequence numbers) into the backup stream. As a result, the complexity of multi-streaming, multiplexing and the intermix of metadata from the backup tools makes the backup stream becomes very difficult (if not impossible) for inline deduplication backup appliances, like Data Domain, to dedupe. Without deduplication you waste money storing redundant data and powering, cooling and managing more storage than you have to. With Data Domain you can’t have both speed and efficiency. Data Domain makes you choose: Do you want fast database backups or do you want efficiency?


Hitachi Data Systems, offers the only solution that can deduplicate multi-stream (up to 16 simultaneous streams), multiplexed database backups without sacrificing deduplication ratios.  Hitachi Protection Platform offers the most efficient database deduplication. It can find duplicate data in sub 8KB blocks with no impact on performance and capacity. It reports detailed deduplication ratios  and storage utilization by client and backup job and is compatible with Veritas NetBackup, HP Data Protector, EMC Networker and IBM TSM. With Hitachi Protection Platform you can have both: fast backup performance and high storage efficiency.

So, use the divide and conquer approach to solve your issues with large database backups, but make sure your backup appliance supports both multi-streaming and deduplication to ensure your backups are both fast and efficient.