Impute

<< Click to Display Table of Contents >>

Navigation:  Reference > Transforms >

Impute

Description

Infer values of missing data from other values in the same column.

 

Examples

Impute the missing age of Titanic passengers based on the average age of all passengers:

 

impute missing data values

 

Impute the missing age of Titanic passengers based on the median age of passengers with the same passenger class and sex:

 

impute missing data using media

 

Inputs

One.

 

Options

Check the column(s) which have empty values you wish to impute.

Set Using to impute using the Average (mean), Median or Mode of non-empty data values in each column.

Set Of to:

oAll rows to fill empty values in each column with the Average, Median or Mode of all non-empty values in the same column.

oAll rows with matching values for to fill empty values in each column with the Average, Median or Mode of all non-empty values in the same column where values match in the checked columns below.

Set Matching to Exact match if every field has to match and Best match to use rows that have the most matching values (or all rows, if no matches). For example, if you are matching on passenger class and sex then:

oExact match will only use values from rows where the passenger class and sex both match.

oBest match will use values from rows where the passenger class and sex both match, if available. Otherwise it will use values from rows where either the passenger class or sex match, if available. Otherwise it will use values from all rows.

Check case sensitive to use case sensitive matching.

 

Notes

Average and Median only work on numerical data (non-numerical values are ignored).

Mode treats all non-empty data values as text. Case and whitespace are taken into account.

If Mode is selected and there are multiple values with the same maximum frequency, one will be picked at random.

If you need to convert some values to empty before imputing, you can do this using a Replace transform.

 

replace unknown

 

You can use a Filter transform to remove rows with missing values instead of imputing values.

 

See also

Video: How to impute missing data

Fill

Interpolate