Real-world data is often messy and inexact. The name of a company might be spelled in various different ways, with different uses of spaces and punctuation. For example, McDonalds might be spelled “McDonald’s”, “McDonalds”, “McDonald”, “Macdonalds”, “Mc Donalds” or “McDonald’s Corp.”. But you can cross-reference two inexactly matching datasets using the Fuzzy matching option in Easy Data Transform’s Lookup transform.
Easy Data Transform scores fuzzy matches against “McDonald’s” (case insensitive) as:
Text
Fuzzy match score
McDonald’s
100%
McDonalds
90%
McDonald
80%
Macdonalds
80%
Mc Donalds
80%
McDonald’s Corp.
62%
McDonald’s Corporation
45%
You can try fuzzy matching with Easy Data Transform yourself:
Install Easy Data Transform on your PC or Mac, if you haven’t done so already. It will only take a minute. There is a fully functional free trial and you don’t have to give us your email or sign up to anything.
Start Easy Data Transform. Make sure the Auto Run button is pressed in.
Drag your 2 datasets onto Easy Data Transform. Make sure the dataset you wish to lookup values in is vertically below the other dataset.
Select the two input items using Ctrl+click (PC)/Cmd+click (Mac) (or by dragging a box around them) and click the Lookup button in the Left pane.
Ensure that the Lookup item is selected. Then in the Right pane set the Top key column and Bottom key column to the common (‘key’) column and set the Bottom value column to the value you wish to lookup.
Set Matching to Fuzzy.
Set Closeness to how close a fuzzy match has to be to be considered a match. E.g. set it to 80% to make 2 values that are 80% the same a match.
Set Bottom values used to:
First best if you want to use the best match. If there are multiple values that are equally good, use the first one.
All best if you want to use the best matches. If there are multiple values that are equally good, use all of them. Duplicate values are only shown once.
All if you want to use all matches. Duplicate values are only shown once.
Set If no match depending on what you want to do for rows in the top dataset with no matches.
Check case sensitive to use case sensitive matching for keys.
Once you are happy with the results you can select the Lookup item and click the To File button at the bottom of the Left pane to output the results into a new file.
Note that you can set Bottom value column to the same as Bottom key column to see the matches.
If you want to do a fuzzy join, you can do this by doing a fuzzy lookup to add a column to the top dataset. Then use use this column as the key column to join the two datasets using a Join transform.
Note that fuzzy matching is significantly slower than exact matching. Especially if you are matching long items of text.
See the video above for more details.
Lookup is just one of the 63 transforms that Easy Data Transform supports. Easy Data Transform can also help with converting, cleaning, blending, filtering and enriching your data. All without coding.