Clean a dataset

<< Click to Display Table of Contents >>

Navigation:  How do I? >

Clean a dataset

Data is often 'dirty' and needs to be  cleaned up before further processing. You can quickly find unwanted characters by looking in the Characters tab of the Right pane to see what types of characters occur in which columns.


data cleaning and profiling


Note that some of these categories are non-exclusive. For example: a space might be counted as a space, whitespace and a leading space, and a symbol might also be a non-ASCII character.


Hover over a cell for more details.


data profile details


You can turn the colored bars off.


data profile chart


And you can restrict it to sampling only a subset of rows for improved speed in large datasets.


data profile sampling


You can use the Characters tab in conjunction with the Replace and Whitespace transforms to quickly clean your data. Replace can easily remove non-ASCII, symbols etc by replacing them with nothing:


cleaning data



See also:

Video: How to clean data

Add missing data values

Profile a dataset