Clean a dataset

<< Click to Display Table of Contents >>

Navigation:  How do I? >

Clean a dataset

Data is often 'dirty' and needs to be  cleaned up before further processing. You can quickly find unwanted characters by looking in the Characters tab of the Right pane to see what types of characters occur in which columns.

 

data cleaning and profiling

 

Note that some of these categories are non-exclusive. For example: a space might be counted as a space, whitespace and a leading space, and a symbol might also be a non-ASCII character.

 

Hover over a cell for more details.

 

data profile details

 

You can turn the colored bars off.

 

data profile chart

 

And you can restrict it to sampling only a subset of rows for improved speed in large datasets.

 

data profile sampling

 

You can use the Characters tab in conjunction with the Replace and Whitespace transforms to quickly clean your data. Replace can easily remove non-ASCII, symbols etc by replacing them with nothing:

 

cleaning data

 

 

See also:

Video: How to clean data

Add missing data values

Profile a dataset