Dedupe a dataset

<< Click to Display Table of Contents >>

Navigation:  How do I? >

Dedupe a dataset

There are 2 transforms for removing duplicate rows from datasets:

Dedupe removes duplicate rows, keeping only the first row in each group that it considers to be duplicates of each other

Unique creates a single aggregate row from each group that it considers to be duplicates of each other

 

Dedupe is simpler. Unique has more flexibility.

 

Using Dedupe

 

To keep only the first row with each Customer Id from this dataset:

 

dedupe example

 

To get this dataset:

 

deduped example

 

Drag the dataset file onto the Center pane of Easy Data Transform. Then click the Dedupe transform in the Left pane.

 

how to dedupe data

 

Then check the Customer Id column in the Right pane.

 

dedupe dataset

 

Only the first row with each Customer Id is kept. Use Sort if you want to change the order before the Dedupe transform.

 

If you only want to remove rows with the same Customer Id and Product Id, check both the Customer Id and Product Id columns.

 

If you want to use fuzzy matching to also remove rows that are similar, but not identical, set Matching to Fuzzy.

 

To see what rows will be removed and experiment with different options, click the Explore Duplicates... button.

 

visualize duplicates

 

Note that Dedupe takes account of whitespace. So you might need a Whitespace transform before the Dedupe transform.

 

See also the Dedupe documentation.

 

Using Unique

 

To aggregate all rows with the same Customer Id:

 

dedupe example

 

To get this dataset with:

The first Name for that Customer Id

A comma separated list of Product Ids for that Customer Id

The total Cost for that Customer Id

The last Date for that Customer Id

 

aggregate rows example

 

Drag the dataset file onto the Center pane of Easy Data Transform. Then click the Unique transform in the Left pane (if you can't see Unique, then check show advanced is checked in the Left pane).

 

unique transform

 

Then set the Unique options as shown below:

 

aggregate unique example

 

One aggregated row is created for each Customer Id. Use Sort if you want to change the order before the Unique transform.

 

If you only want to create a new aggregate row for rows with the same Customer Id and Product Id, set both the Customer Id and Product Id columns to Keep unique.

 

Note that Unique takes account of whitespace. So you might need a Whitespace transform before the Unique transform.

 

See also the Unique documentation.

 

See also:

Video: How to remove Excel duplicates

Video: How to merge rows in Excel