Victor Nemechek

The Evolution of Deduplication Conversations

Blog Post created by Victor Nemechek on May 16, 2014

In 2005, Hitachi Data Systems was the first major storage vendor to deploy a Virtual Tape Library with deduplication into a production environment. Working alongside our then partner Diligent Technologies,  we delivered and installed a ProtecTIER deduplication gateway with Hitachi storage at Skoda Auto in the Czech Republic, beating to market a little company called Data Domain.


Back then when we talked to customers about deduplication the questions we had to answer were “What is deduplication?”, “How does it work?” and “Is my data safe?”. To give customers the confidence that their data would be safe, we often had to get down into the weeds and explain the difference between fixed and variable block chunking, how our 4GB RAM index worked  and how a binary diff was used to determine if data was a duplicate or not.


Fast forward nine years and our conversations are completely different. With thousands of systems in production around the globe, customers completely understand the power of deduplication, don’t care how our algorithm works and aren’t (particularly) worried about the safety of their data. Deduplication is now an established mainstream technology available from almost every storage vendor on the planet.

As deduplication has moved from being a backup-only technology to a primary storage technology, the conversations have evolved. Now the questions we have to answer are “Is it going to impact the performance of my system”, “How do I schedule and monitor it?” and “How do I avoid it interfering with production or backup operations?”

Once again HDS is leading the way with the industry’s first completely automated and non-disruptive deduplication for primary storage. The deduplication technology embedded in our Hitachi NAS Platform is fully automated and uniquely intelligent. Up to 4 high speed dedupe engines automatically eliminate redundant data when system is not busy. When file serving load reaches 50% of available IOPS, the deduplication engines throttle back to prevent impacting user’s performance then automatically resume when the system is less busy. No complex scheduling process required. HNAS has an intelligent deduplication process that knows when new data is added and automatically starts up the deduplication engines, as long as the system isn’t busy.

So, conversations with customers are much easier now. I just say “You set it, forget it and watch it reclaim up to 90% of existing storage capacity extending the life of your storage assets.”