<< Click to Display Table of Contents >> Navigation: How do I? > Clean a dataset |
Data is often 'dirty' and needs to be cleaned up before further processing. You can quickly find unwanted characters by looking in the Characters tab of the Right pane to see what types of characters occur in which columns.
Note that some of these categories are non-exclusive. For example: a space might be counted as a space, whitespace and a leading space, and a symbol might also be a non-ASCII character.
Hover over a cell for more details.
You can turn the colored bars off.
And you can restrict it to sampling only a subset of rows for improved speed in large datasets.
You can use the Characters tab in conjunction with the Replace and Whitespace transforms to quickly clean your data. Replace can easily remove non-ASCII, symbols etc by replacing them with nothing:
You might also want to profile a dataset as part of cleaning.
See also: