How to find outliers in data


Datasets will often contain statistical outliers, particularly when the dataset is collected from self-reported sources, such as online surveys. These outliers can badly skew the results, so they need to be dealt with before the dataset can be used. Easy Data Transfrom can help you easily find and deal with statistical outliers.

The simplest way to find outliers is to profile the data by column. This will allow you to find suspicious looking minimum and maximum values. It will also show any non-numeric values.

outliers profiling

You can then edit the original data or use a Replace transform (to replace problem values) or a Filter transform (to remove problem rows).

The Outliers transform allows a more sophisticated approach finding statistical outliers in data, based on either the IQR (inter quartile range) or the standard deviation of the data in a column. Check show advanced in the Left pane to show the Outliers transform.

Show advanced transforms

Select the column of data you want to analyze for outliers. Set Score by depending if you want to score values based on either the IQR or standard deviation. Set Threshold depending according to how wide a range you expect the data to have (where the upper and lower fences are).

For example:

Check add score column to add a column showing the score for each value.

convert seconds to msecs

The Warnings tab warns you about any non-numeric values, and the Info tab gives you information on the IQR/standard deviation.

convert seconds to msecs

Set Action according to the action to take for values identified as outliers:

See the video above for more details.

Easy Data Transform can also help with merging, cleaning, filtering, enriching and reshaping your data. All with a few clicks and no coding required.

Try it free now!

Windows Logo Windows Download

v1.37.1 for Windows 11 / 10 / 8 / 7 (45 MB)

Apple Logo Mac Download

v1.37.1 for Mac 13.x to 10.13 (69 MB)


Questions or problems?

Email support@easydatatransform.com