Dedupe a dataset

<< Click to Display Table of Contents >>

Navigation:  How do I? >

Dedupe a dataset

If you want to remove duplicate entries from a dataset, use the Dedupe transform. For example, to remove the 2 rows that have the same email from this dataset:

 

data-dedupe-1

 

To get this dataset:

 

data-dedupe-2

 

Drag the dataset file onto the Center pane of Easy Data Transform.

 

dedupe-1

 

Select the dataset then click the Dedupe transform in the Left pane.

 

dedupe-2

 

Check Email in the Right pane to remove rows with duplicate emails.

 

dedupe-3

Only the first row with a particular email is kept. Use Sort if you want to change the order before removing duplicates.

 

If you only want to remove rows with the same last name and same email, check both Email and Last checkboxes.

 

Note that de-duplicating columns takes account of whitespace and case. So you might need to do Trim and Case transforms before the dedupe.