<< Click to Display Table of Contents >> Navigation: Reference > Transforms > Impute |
Infer values of missing data from other values in the same column.
Impute the missing age of Titanic passengers based on the average age of all passengers:
Impute the missing age of Titanic passengers based on the median age of passengers with the same passenger class and sex:
One.
•Check the column(s) which have empty values you wish to impute.
•Set Using to impute using the Average (mean), Median or Mode of non-empty data values in each column.
•Set Of to:
oAll rows to fill empty values in each column with the Average, Median or Mode of all non-empty values in the same column.
oAll rows with matching values for to fill empty values in each column with the Average, Median or Mode of all non-empty values in the same column where values match in the checked columns below.
•Set Matching to Exact match if every field has to match and Best match to use rows that have the most matching values (or all rows, if no matches). For example, if you are matching on passenger class and sex then:
oExact match will only use values from rows where the passenger class and sex both match.
oBest match will use values from rows where the passenger class and sex both match, if available. Otherwise it will use values from rows where either the passenger class or sex match, if available. Otherwise it will use values from all rows.
•Check case sensitive to use case sensitive matching.
•Average and Median only work on numerical data (non-numerical values are ignored).
•Mode treats all non-empty data values as text. Case and whitespace are taken into account.
•If Mode is selected and there are multiple values with the same maximum frequency, one will be picked at random.
•If you need to convert some values to empty before imputing, you can do this using a Replace transform.
•You can use a Filter transform to remove rows with missing values instead of imputing values.
•Video: How to impute missing data
•Fill