Dedupe a dataset

<< Click to Display Table of Contents >>

Navigation:  How do I? >

Dedupe a dataset

If you want to remove duplicate entries from a dataset, use the Unique transform. For example, to remove the 2 rows that have the same email from this dataset:

 

dedupe example

 

To get this dataset:

 

deduped example

 

Drag the dataset file onto the Center pane of Easy Data Transform.

 

dedupe excel sheet

 

Select the dataset then click the Unique transform in the Left pane.

 

dedupe example

 

Set the Email column to Keep unique in the Right pane. Set the First and Last columns to Keep first.

 

dedupe example

 

Only one row with each email is kept. The first and last names are set to the first occurrence in the sort order. Use Sort if you want to change the order before removing duplicates.

 

If you only want to remove rows with the same first name, same last name and same email, set First, Last and Email columns to Keep unique.

 

Note that deduplicating columns using Unique takes account of whitespace and case. So you might need to do Whitespace and Case transforms before the dedupe.

 

See the Unique documentation for a more detailed example.