Ask Your Question
0

Searching for words in single column that are similar, not exact duplicates

asked 2021-01-23 07:57:56 +0100

Bea gravatar image

updated 2021-01-23 13:28:29 +0100

karolus gravatar image

Trying to search for duplicates of company names in a single column. However, sometimes punctuation and spacing are different, so my formula does not always work. Please see my example below from my list of ~4k company names.

The formula used works for the first four company names listed --- the result is that they are not duplicates and the word 'DUPLICATE' does not appear in the formula column. The formula also works for the last two listed --- result is that they ARE duplicates and the word DUPLICATE appears in the formula column.

However, the formula does not work for the middle three company names listed (Dasher Dancer & Prancer PC). It does not identify them as duplicates, even though they are pretty much duplicates except for punctuation, spacing and caps.

What could I modify on my formula to make it work for those three names? Or what other function and/or formula would identify those three names as duplicates, or otherwise identify how similar they are? Am I asking the impossible?

Formula is: =IF(H557=558,"DUPLICATE","").

Here's the example from my list:

Blitzen, C.P.A., P.C.
Comet LLC
Christen M. Cupid, CPA, CFP
Dasher & Co.
Dasher Dancer & Prancer, PC
Dasher, Dancer & Prancer, P.C.
DASHER, DANCER & PRANCER, PC
Donner & Associates CPAs, PC
Donner & Associates CPAs, PC

Thanks

edit retag flag offensive close merge delete

1 Answer

Sort by » oldest newest most voted
0

answered 2021-01-23 13:32:51 +0100

dscheikey gravatar image

updated 2021-01-23 13:35:52 +0100

Hello Bea,

I have built a solution which, however, requires an help column. First, the company names are freed from the characters ".&" and additional spaces. If you want to remove more characters, add the character separated by a "|" character to the first part in the regex formula. Caution: Some characters must be masked with a "\" character. In the second column I count the occurrences in all cells and give the hint DUPLICATE if the entry occurs more than 1 time. Please test the formulas in the example document to see if this corresponds to your wishes.

C:\fakepath\DUPLICATE.ods

Small improvements are certainly possible. However, I would need more details. If this answer has answered the question sufficiently, then please mark your question as answered. The tick next to the answer that turns green with mouse over.

Jürgen

edit flag offensive delete link more

Comments

Hello and thanks, Jurgen. Your solution definitely works for numbers, and I agree - a help column is needed to remove the & and additional spaces, and I'm now on the right path.

Bea gravatar imageBea ( 2021-01-23 19:25:46 +0100 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2021-01-23 07:57:56 +0100

Seen: 25 times

Last updated: Jan 23