Dedupe a dataset

<< Click to Display Table of Contents >>

Navigation:  How do I? >

Dedupe a dataset

If you want to remove duplicate entries from a dataset, use the Dedupe transform. For example, to remove the 2 rows that have the same email from this dataset:

 

data-dedupe-1

 

To get this dataset:

 

data-dedupe-2

 

Drag the dataset file onto the Center pane of Easy Data Transform.

 

dedupe-1

 

Select the dataset then click the Dedupe transform in the Left pane.

 

dedupe-2

 

Check Email in the Right pane to remove rows with duplicate emails.

 

dedupe-3

Only the first row with a particular email is kept.

 

If you only want to remove rows with the same last name and same email, check both Email and Last checkboxes.

 

Note that de-duplicating columns takes account of whitespace and case. So you might need to do Trim and Case transforms before the join.