Tuesday, January 15, 2013

compression, dedup, and compression + dedup test results




So, I ran some test results with some VHD and VHDX vm files I had from a backup, and
 it was interesting to see the results of deduplication vs compression vs both at the same time.

I did 3 tests, each time copying the same set of 166 GB worth of VHD and VHDX backup files.

First option was dedup only, RECSIZE=16K
This required at least 2.6 GB of RAM in your arc_meta_limit and had a poor dedup ratio.

Second option was compression only, COMPRESSION=LZJB
This does use arc_meta_limit, obviously, but its not imperative that you be able to fit all of it in memory at once.

Third option was dedup on AND compression on. You can see that the compression interfered with the deduplication ratio. I would assume that is partly because the parts of the VHD that are highly compressible are also the ones that are dedup-able. The interesting thing here is that turning compression AND dedup on resulted in a faster write speed than just dedup. I would assume because it is trying to dedup 77.4G of data instead of deduping 166G of data.  The deletion time was also faster.

You can see the detailed results from the XLS screenshot:



The conclusion here (imho) is that dedup is VERY situational and typically is not going to be worth your while compared to LZJB or GZIP-X compression.

I supposed if you are storing multiple copies of the exact same files dedup + compression would come in handy, but I can't think of any situations that would come into play where a snapshot + clone wouldn't work better.

If you have a specific situation where dedup or dedup + compression wins over just compression for you, please let me know what that was.

3 comments:

  1. dedup is for when you have lots of VM all running near-identical versions of Windows. All those system files dedup nicely. Same for Office and other popular applications.

    ReplyDelete
  2. In what situations would that occur? If the disks are persistent, they will change enough over time (defrag/updates/user files/etc) that deduplication will get worse and worse over time.
    The only think I can think of is maybe web server farms in linux that don't host many files locally (and thus don't change often). If you have lots of nearly identical VM's of the same Windows machine, what are you using them for?

    ReplyDelete