Cluster

<< Click to Display Table of Contents >>

Navigation:  Reference > Transforms >

Cluster

Description

Classify rows with similar text into clusters

 

Examples

Clustering products by type:

 

clustering products by name

 

Correcting the state in an address:

 

correcting state in addresses

 

Inputs

One.

 

Options

Select the Column whose values you wish to use for clustering.

Select Clustering as:

oManual to cluster by Terms you provide (one per line).

oGuided to find clustering terms automatically, starting with the Terms you provide (one per line).

oAutomatic to find clustering terms automatically.

Set the minimum Closeness percentage for a value to a term to be clustered with a term.

Set Max clusters to the maximum number of clustering terms (Guided and Automatic only).

Set Max time to the maximum amount of time allowed for trying to improve the terms (Guided and Automatic only).

Uncheck case sensitive to ignore case.

 

Notes

Fuzzy matching is used to classify each row value according to which of the user supplied Terms it is closest to.

Additional Cluster and Closeness columns are added. The Cluster column shows the closest matching of the Terms (or the one highest one in the Terms list if 2 or more match equally). The Closeness column shows the fuzzy match score between 0 (no match) and 100 (exact match).

Row values that do not meet the minimum Closeness score for any term are set to <None>.

Set the minimum Closeness score to 0% to force all values to be assigned.

You can use a Filter to list all the rows with <None> in the Cluster column to guide you on any additional Terms you might want to add in Manual mode.

 

See also:

Ngram

Video: How to cluster text