Compression better than Dedup? NetApp Confirms!

The more I talk with customers, the more I find that the technical details of how something works is much less important than the business outcome it achieves.  When it comes to storage, most customers just want a device that will provide the capacity and performance they need, at a price they can afford–and it better not be too complicated.  Pretty much any vendor trying to sell something will attempt to make their solution fit your needs even if they really don’t have the right products.  It’s a fact of life, sell what you have.  Along these lines, there has been a lot of back and forth between vendors about dedup vs. compression technology and which one solves customer problems best.

After snapshots and thin provisioning, data reduction technology in storage arrays has become a big focus in storage efficiency lately; and there are two primary methods of data reduction — compression and deduplication.

While EMC has been marketing compression technology for block and file data in Celerra, Unified, and Clariion storage systems, NetApp has been marketing deduplication as the technology of choice for block and file storage savings.  But which one is the best choice?  The short answer is.. it depends.  Some data types benefit most from deduplication while others get better savings with compression.

Currently, EMC supports file compression on all EMC Celerra NS20, 40, 80, 120, 480, 960, VG2, and VG8 systems running DART 5.6.47.x+ and block compression on all CX4 based arrays running FLARE30.x+.  In all cases, compression is enabled on a volume/LUN level with a simple check box and processing can be paused, resumed, and disabled completely, uncompressing the data if desired.  Data is compressed out-of-band and has no impact on writes, with minimal overhead on reads.  Any or all LUN(s) and/or Filesystem(s) can be compressed if desired even if they existed prior to upgrading the array to newer code levels.

With the release of OnTap 8.0.1, NetApp has added support for in-line compression within their FAS arrays.  It is enabled per-FlexVol and as far as I have been able to determine, cannot be disabled later (I’m sure Vaughn or another NetApp representative will correct me if I’m wrong here.)  Compression requires 64-bit aggregates which are new in OnTap 8, so FlexVols that existed prior to an upgrade to 8.x cannot be compressed without a data migration which could be disruptive.  Since compression is inline, it creates overhead in the FAS controller and could impact performance of reads and writes to the data.

Vaughn Stewart, of NetApp, expertly blogged today about the new compression feature, including some of the caveats involved, and to me the most interesting part of the post was the following graphic he included showing the space savings of compression vs. dedup for various data types.

Image Credit: Vaughn Stewart, NetApp

The first thing that struck me was how much better compression performed over deduplication for all but one data type (Virtualization will usually fare well because in a typical environment there are many VMs with the same operating system files).  In fact, according to NetApp, deduplication achieves very little savings, if any, for the majority of the data types here.
 
The light green bar indicates savings with both dedupe AND compression enabled on the same dataset.  In 5 out of 9 cases, dedup adds ZERO savings over compression alone.  I can’t help but wonder why anyone would enable dedup on those data types if they already had compression, since both features use storage array CPU resources to find and compress or dedup data.  I am aware that in some cases, dedup can improve performance on NetApp systems due to dedup-aware cache, but I also believe that any performance gain is directly related to the amount of duplication in the data.  Using this chart, virtualization is really the only place where dedup seems particularly effective and hence the only place where real performance gains would likely present themselves.
 
The challenge for NetApp customers will be getting their data into a configuration that supports compression due to the 64-bit aggregate requirement, lack of an easy and non-disruptive LUN migration feature (DataMotion appears to only support iSCSI and NFS and requires several additional licenses), and no way to convert an aggregate from 32-bit to 64-bit.  Once compression has been enabled, if there is truly no way to disable it, any resulting performance impact will be very difficult to rectify.
 
On the other hand, any EMC customer with current maintenance can upgrade their NS or CX4 array to newer versions of DART or FLARE, and compression can be enabled on any existing data after the fact.  If performance becomes an issue for a particular dataset once compressed, the data can be uncompressed later.  Both operations are completely non-disruptive and run in the background.  While block compression only works on LUNs in a virtual pool, as opposed to a traditional RAID group, enabling compression on a normal LUN will automatically migrate the LUN into a virtual pool, perform zero-page reclaim, followed by compression, and the entire process is completely non-disruptive to the application.  Oh, and compressed data can still be tiered with FASTVP across SSD, FC, and SATA disk and/or benefit from up to 2TB of FASTCache.
 
I admit that there is a place for deduplication as well as compression in reducing the footprint of customer data.  However, based on what I’ve seen in my career as an IT professional, and with my customers in my current role at EMC, there are more use cases for compression than there are for deduplication when it comes to primary data, whether SAN or NAS.  Either way, if I was using a new technology for the first time on a particular data set, whether compression or deduplication, I would definitely want a backout plan in case the drawbacks outweight the benefits.

Tags: , , , , , , , , , , , , , , , , , , , , ,

  1. Larry Freeman aka DrDedupe’s avatar

    Hi Richard,

    Volumes and LUNs using NetApp inline data compression can be started, stopped, or uncompressed at any time. Regarding dedupe and compression, Users have 4 options for Volumes/LUNs:

    1. Compress only
    2. Dedupe only
    3. Compress and dedupe
    4. Don’t compress or dedupe

    The decision on which option to use is based on the amount of duplicate data, the compressability of the data, and the tolerance for performance overhead. NetApp provides best practice recommendations to tis Users based on the above parameters.

    Couple other notes, existing data can be compressed using a post-processing compression routine and compression can operate on block or file data using any protocol – NFS/CIFS/FCP/iSCSI/FCoE.

    Other than Vaughn’s blog, I’ll be posting regular updates to my blog with more detailed information on use cases and recommendation on both compression and deduplication.

    http://blogs.netapp.com/drdedupe/

    DrDedupe

    Reply

  2. Vaughn Stewart’s avatar

    Your blog post is exactly the reason why I would never work for EMC.

    NetApp delivers storage savings technologies which EMC cannot match for a multitude of reasons.

    here’s just one example. EMC demos using compression with VMware VMs, yet the EMC best practices paper states the use of compression with VMs is not supported. if its not supported than why is EMC demoing this capability?

    Please show us your performance data with compression enabled. netApp already published with VMware (their engineering reviewed and approved) the report showing NetApp dedupe outperforms traditional SAN arrays.

    http://blogs.netapp.com/virtualstorageguy/2010/08/fact-vmware-vsphere-on-netapp-is-faster-and-greener.html

    Stating compression is better than dedupe when EMC arrays only offer compression for limited production use cases is a questionable claim at best. I’d suggest you re-word this post or better yet delete it as it doesn’t paint a favorable picture of you or your employer.

    Cheers,
    Vaughn Stewart

    Reply

  3. nuniko’s avatar

    Its very intresting reading the opening of NTAP blog:”It’s no secret; NetApp wants to sell you less storage than any other storage vendor….”

    Its also not a secret NetApp wants to give you for free anything that will cause you spend much more money in the future….

    Like DeDup, Also NetApp compression increase your controller utilization, decrease performance and force you after a short time to upgrade your storage controller. this is of course bring NetApp much more revenue than disks you buy….
    BTW – I wonder why no one is talking about the “other” solution – the old Storwize company IBM just acquired. An appliance based solution solve all the problems customers need to face with while implementing NTAP compression.

    Reply

    1. storagesavvy’s avatar

      @nuniko,

      Good points.. I’m actually working with a customer now that has experienced major Filer sprawl due to controller limits rather than capacity. NetApp is not the only vendor with this problem, there are limits to every architecture, but I have seen 3:1 up to 5:1 consolidation in practice when moving from NetApp to EMC as far as front end controllers, regardless of capacity requirements, especially when File and Block are involved, since EMC Unified can scale up front end controllers without adding disks.

      Reply

    2. Mike Ivanov’s avatar

      Richard,

      We noticed the blog that Vaughn posted as well and were curious about his results too. Permabit has years of experience utilizing deduplication and compression technologies in our Enterprise Archive and Cloud products. Yes, both data optimization technologies have tremendous data efficiency benefits but, we tend to see the opposite results from what NetApp is seeing. In our most recent lab tests, we tested various data sets (VMware, MS SQL, Office 2007, User Directories and Exchange) comparing our deduplication technology in Albireo with traditional LZ compression technology. Keep in mind that Albireo has unlimited capacity and scale therefore the “universe” of data that can be deduplicated is much greater than that of NTAPs dedupe technology.

      In our tests, we saw deduplication outperforming compression in all cases. What’s even more interesting however is that when deduplication and compression were combined, the results were additive providing the highest levels of data optimization. This is not the case in Vaughn’s chart. In most cases NTAP does not have any additive benefit by leveraging both of their technologies. We can only assume here that the limitations of NTAPs dedupe technology (can only be used on per volume basis, limited scale, etc.) prevents it from being an added benefit when combined with their new compression. What’s also interesting is from Vaughn’s post is that it states that dedupe must be turned on to enable compression. If it’s not additive in data savings, then it’s just utilizing more resources with no additional gain.

      So, the bottom line here is that both compression and deduplication can provide data optimization but they differ in scope. Compression identifies “micro” duplicates (within a file) whereas dedupe identifies “macro” duplicates (across files and in the case of Albireo, across file systems and even LUNs). The result is that deduplication can offer a much higher level of savings (4-100x) compared to compression (2-4x). Our results also show that unlike compression, where the technologies are pretty standard and provide consistent results, deduplication technology is drastically different from vendor to vendor. You can see the results of our tests in a recent article by George Crump: http://www.storage-switzerland.com/Articles/Entries/2010/12/7_Optimizing_Primary_Storage.html. In our testing and validation, (Wikibon, ESG) Albireo was the most effective data efficiency solution. The results demonstrated data reduction over a wide range of data sets up to nine times greater than that of compression. Without exception, a capable dedupe product like Albireo combined with any compression technology produces best-of-breed data optimization results.

      Mike Ivanov – Permabit

      Reply

Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>