Last updated: Sep 24, 2021

What is Data Deduplication from Microsoft?

Data deduplication is the process of finding and deleting duplicate data on any drive without harming data integrity. Deduplication has several goals:

  • Store information in small blocks (32-128 Kb)
  • Identify the same blocks and save only one copy for each block (duplicates are usually replaced by links to a single copy and/or are compressed to suitable sizes)

This technology thus is aimed at optimizing storage capacity.

However, this feature only works on servers with NTFS (starting with Windows Server 2012 R2) or ReFS (since 2019, Windows Server version 1709 and higher).

How does it work?

  1. 1. The deduplication is based on two important principles.

First, any information is written to disk in its original form, and only then does the deduplication process takes place. That is, the elimination of duplicates and data storage occur independently of each other.

Secondly, neither the user nor the programs, when working with optimized volumes, are aware of their optimization. That is, file access remains unchanged.

  1. 2. If deduplication on a computer is activated, it starts each time the computer is turned on, depending on the settings.

During the optimization process, files are divided into blocks, then matching blocks are calculated and extra copies of the block are deleted (they are replaced with links). Containers are formed from the blocks, which, depending on the settings, are additionally compressed and placed in the storage of blocks.

  1. 3. After optimization and reading, files are redirected to filters.

The filter redirects the read operation to the corresponding blocks, and they, in turn, make up the stream for this file in the block storage. Modifications to file ranges that are subject to deduplication are not optimized to disk. They do this the next time they start.

Files larger than 32 KB are deduplicated; anything smaller is not affected by the deduplication process.

When deduplication is in progress, the file is broken into pieces no more than 128kb; this is called a chunk.

The main deduplication results (chunk storage) are located entirely in the “System Volume Information” folder in the root of the disk.

The bottom line is that data on the disk is indexed and duplicates are not written to disk. Thus, the data on the disk is stored in a non-standard way, resembling an archive, and cannot be accessed by a conventional NTFS / ReFS reader.

Where can I use data deduplication?

Deduplication can be used for:

  • General purpose file servers. Background optimization is enabled for it.
  • Virtualized backup servers. It could be, for example, Microsoft Data Protection Manager. For this type, background and priority optimization are also enabled by default.
  • Hyper-V virtual machines. It also includes background and priority optimization.

How do I install Data Deduplication?

This is usually done in either PowerShell or Server Manager.

  1. 1. If you prefer to use PowerShell: Open the PowerShell snap-in and enter:

“Install-WindowsFeature -Name FS-Data-Deduplication”

How to install deduplication via PowerShell.

You will see a slider with the installation process, and within a minute, your component will be present in your OS.

  1. 2. You can also install the File and Storage Services role and configure several settings.

Open the Server Manager snap-in. In the upper right-hand corner, select "Management”. Next, select “Add roles and Feature wizard." Select Server roles -> Data Deduplication.

How to enable deduplication via Server Manager.

Check Confirmation and click the Install button.

After installation, you can check the effectiveness of deduplication.

How do I turn on data deduplication?

This is also done using the aforementioned utilities.

  1. 1. For PowerShell:

In Command Prompt (as administrator), type the command:

How to enable deduplication with PowerShell.

And press Enter.

The operation is completed.

  1. 2. For Server Manager:

Open Server Manager and click on “File and Storage Services”. Next, select “Volumes” and right-click on the volume where you want to enable data deduplication.

Select Configure Data Deduplication.

Check “Enable data deduplication” and change the schedule as you like.

Next, click Apply and then OK.


Please rate this article.
56 reviews