File Server Migration to Server 2012 Part 3: Deduplication

Just to recap, we discussed the ways to migrate our data to a new file server and the caveats involved in Part 1. In Part 2, we took a look at a staged roll-out or a single fell-swoop for our migration. Here in Part 3, we'll take a look at the features we're looking to setup for our new file server. A couple of these are new or refined for Server 2012 but some of them have been around a version or two. The coolest and most exciting feature is Deduplication.

Deduplication

Deduplication is a new feature for Server 2012 and it's a big one. At it's most basic level, it uses an algorithm to break a file into "chunks". If you have multiple files that have the same "chunks", your file server can keep only one copy of the common chunk, thus, saving you space. If you really want to know more about the algorithm and how it works, there's a good explanation of it HERE. With Deduplication enabled on a given volume, it's not uncommon to see dedupe rates between 20% and 60%! Of course, this depends entirely on what kind of data you're storing on that volume so your mileage may vary. 

You might be drooling at the thought of all that used storage space you can free up but there are some caveats. For example, Deduplication isn't compatible with Exchange mailbox datastores, SQL databases, removable storage devices (such as flash drives or external harddrives) or Cluster Shared Volumes (CSVs) so this is geared more towards general file storage (like what a regular, every day, Joe Six Pack File Server is used for). It also isn't compatible with Operating System volumes. If you've been close to filling up your C:\ drive then this isn't the answer to your prayers. Deduplication is also a feature you'll need to enable for a whole volume, not just specific directories. That's probably not a deal-breaker for most people though. 

Enabling Deduplication on Server 2012 is actually quite simple. I've included some screenshots below; 

In Server Manager, you'll need to click Manage at the top > Add Roles and Features > and select the "Role-based or feature-based installation". As you can see from this screenshot, you'll need to drill down into File And Storage Services to install Deduplication. I already have it installed but installing Roles and Features in Server 2012 is pretty simple. 


Once Deduplication has been installed, go to File and Storage Services > Volumes and right-click on a volume that isn't your C:\. In this exact example, I'm using the 300mb recovery volume to show you the dedupe configuration. 


Once in the configuration window, you can see that setting up Deduplication is just the tick of a box. Simple, right? As you can see from here, you can set Deduplication jobs to only run on files older than X amount of days, to exclude specific directories or specific file types. Deduplication will ignore files with Bitlocker or EFS encryption enabled on them. 


If you click on the Set Deduplication Schedule button, you can get pretty specific when the Dedupe jobs run. By default, Dudupe jobs are set to run in the background most of the time and will pause if other jobs say they need the resources more. Dedupe jobs use a single CPU core and setting both schedules to run will use 2 obviously. From what I've read of other's configuring Deduplication on their file servers, leaving it set to run in the background works fine for most people. 


In order to best show some quick results and really put Dedupe to the test, I made a small extra disk for my test VM totaling about 10Gb. I then pulled our entire Network Administration share which has a nice 7.5Gb collection of random office docs, PDFs, Visio diagrams, wav files, etc. to put into this new 10Gb volume I made. Once the data was transferred over, I enabled Deduplication. I right-clicked on the volume, enabled Deduplication, set it to Deduplicate files older than 0 days (which should Dedupe everything since everything is older than 0 days), and nothing happened.

In all honesty, I was expecting a progress bar or something but got nothing. Even manually checking for Dedup jobs using the Get-DedupJob cmdlet in Powershell gave me nothing. In order to actually make Dedupe start working was I had to open a Powershell window as Administrator, Start a DedupJob for that volume manually, and then checked on the status of that job as shown below. 


Once it finished (which took about 10 to 15 minutes, by the way), I popped back over to Server Manager, refreshed and saw that I'd saved 1.79Gb out of 7.5Gb total. That's a Dedupe rate of 21%!


If I saved that much on 7.5Gb of data, I can't wait to see what it can do with 5Tb+ of it! Next up, File Screening.

Comments

Popular posts from this blog

Installing CentOS 7 on a Raspberry Pi 3

Modifying the Zebra F-701 & F-402 pens

How to fix DPM Auto-Protection failures of SQL servers