Unique

<< Click to Display Table of Contents >>

Navigation:  Reference > Transforms >

Unique

Description

Aggregate rows that have matching values in one or more selected columns.

 

Example

If you have a dataset of orders and you want to:

keep one row per Customer Id

keep the first listed Name for each Customer Id

concatenate Product Ids for each Customer Id, delimited by a Comma

sum the Costs for each Customer Id

keep the latest Date for each Customer Id

add a Count column showing how many rows in the input correspond to each row in the output

 

how to aggregate records by an id example

 

Inputs

One.

 

Options

Type text into the Filter field to temporarily hide columns whose names do not contain this text.

Set an Option for each column:

oOnly 1 row is kept where all the Keep unique columns have the same value.  Empty is counted as a value.

oKeep first keeps the first value in the current sort order.  Empty is counted as a value.

oKeep first non-empty keeps the first value in the current sort order that is not empty.

oKeep last keeps the last value in the current sort order.  Empty is counted as a value.

oKeep last non-empty keeps the last value in the current sort order that is not empty.

oSum sums any numerical values. Empty values are ignored.

oMaximum keeps the maximum numerical or date value. Empty values are ignored.

oMinimum keeps the minimum numerical or date value. Empty values are ignored.

oAverage takes the average (mean) of any numerical values. Empty values are ignored.

oConcat to concatenate values. Duplicate values are kept. All values are treated as text. Empty is counted as a value.

oConcat distinct to concatenate values. Duplicate values are ignored. All values are treated as case sensitive text. Empty is counted as a value.

oCount unique to count distinct values with no duplicate. E.g. A,A,B,C has 2 unique values: B,C. All values are treated as case sensitive text. Empty is counted as a value.

oCount duplicate to count distinct values with duplicates. E.g. A,A,B,C has 1 duplicate value: A. All values are treated as case sensitive text. Empty is counted as a value.

oCount distinct to count distinct values. E.g. A,A,B,C has 3 distinct values: A,B,C.  All values are treated as case sensitive text. Empty is counted as a value.

oCount empty to count empty values. Values containing whitespace are not empty.

oCount non-empty to count non-empty values. Values containing whitespace are not empty.

oSet empty sets the value to empty.

Use the Apply button to quickly set the option value for multiple columns.

Set Concat delimiter to add a delimiter between Concat and Concat distinct values.

Check ignore empty values for concat  to ignore empty values in Concat and Concat distinct columns.

Check add count column to add a column showing how many rows in the input dataset created each unique row.

Check case sensitive for case sensitive Keep unique matching.

Check allow drilldown to allow double clicking a row in the data table to drilldown to the rows in the upstream data that contributed to this row.

 

Notes

Rows are considered duplicates if they have exactly the same value in all the columns set to Keep unique. Comparisons are case and whitespace sensitive. You can use Case and Whitespace to change case and whitespace before deduping.

If no columns are set to Keep unique the transform won't do anything.

If you are using Keep first, Keep last, Keep first non-empty or Keep last non-empty the sort order is important. You can use Sort to change the sort order before this transform.

Cells containing whitespace are not considered empty.

Column filtering is sensitive to whitespace, but not case. Columns hidden by filtering are unselected.

The Dedupe transform is a less powerful (but simpler and faster) alternative to Unique.

You can reverse Unique concatenation using Split Col with Into as Rows

 

See also

Dedupe a dataset