Searching for words in single column that are similar, not exact duplicates

Trying to search for duplicates of company names in a single column. However, sometimes punctuation and spacing are different, so my formula does not always work. Please see my example below from my list of ~4k company names.

The formula used works for the first four company names listed — the result is that they are not duplicates and the word ‘DUPLICATE’ does not appear in the formula column. The formula also works for the last two listed — result is that they ARE duplicates and the word DUPLICATE appears in the formula column.

However, the formula does not work for the middle three company names listed (Dasher Dancer & Prancer PC). It does not identify them as duplicates, even though they are pretty much duplicates except for punctuation, spacing and caps.

What could I modify on my formula to make it work for those three names? Or what other function and/or formula would identify those three names as duplicates, or otherwise identify how similar they are? Am I asking the impossible?

Formula is: =IF(H557=558,“DUPLICATE”,"").

Here’s the example from my list:

Blitzen, C.P.A., P.C.
Comet LLC
Christen M. Cupid, CPA, CFP
Dasher & Co.
Dasher Dancer & Prancer, PC
Dasher, Dancer & Prancer, P.C.
Donner & Associates CPAs, PC
Hello Bea,

I have built a solution which, however, requires an help column. First, the company names are freed from the characters “.&” and additional spaces. If you want to remove more characters, add the character separated by a “|” character to the first part in the regex formula. Caution: Some characters must be masked with a “” character.
In the second column I count the occurrences in all cells and give the hint DUPLICATE if the entry occurs more than 1 time.
Please test the formulas in the example document to see if this corresponds to your wishes.


Small improvements are certainly possible. However, I would need more details.
Hello and thanks, Jurgen. Your solution definitely works for numbers, and I agree - a help column is needed to remove the & and additional spaces, and I’m now on the right path.