Compare Two Lists in Calc

I would like to pose a question with something I am trying to do with a Calc file. In Column A I have a list of artists and songs about 2000 so far. In Column B I paste a list of say 100 artists and songs which I want to compare with Column A. The List I paste in generally comes from a web page and usually is in the form of Track Number - Artist - Song Title. Sometimes it may also have the time of each track and also the extension say MP3.

So as mentioned I am wishing to compare this list against Column A to determine what songs on the list I already have, this would be roughly about 80%.

To do this task I currently use a formula =ISNUMBER(MATCH(B2,$A$2:$A1998,0)) This works reasonably well except that I need to edit the pasted list and get rid of things like track numbers, any punctuation, Capitalize all words, get rid of empty spaces etc to get the result of True or False.

I do this editing usually through the Regular expressions In Find and Replace. My Issue is that even with this method I still get a lot of False Positives. Those that appear as False that I know are True.

Please forgive my long explanation that I hope makes sense. The Question I am posing is, do you think the same outcome can be achieved without all the editing, using a different Method, Formula or is it impossible. Could it work with say using Wildcards that ignore the editing that is usually required. I’m probably grasping at straws, but I thought the helpful people on this Forum may have a better way of doing things.

As suggested I am uploading the File.
ForumPost.ods (51.7 KB)

Version: 7.1.5.2 (x64) / LibreOffice Community
Build ID: 85f04e9f809797b8199d13c421bd8a2b025d52b5
CPU threads: 12; OS: Windows 10.0 Build 19043; UI render: Skia/Raster; VCL: win
Locale: en-AU (en_AU); UI: en-US
Calc: threaded

Edit your question and share some column A and B matching content.

@LeroyG Thanks I have uploaded file as suggested.

It will be better this way?
In column A, dots (.) have been removed and the unicode character U + 2019 () has been replaced with an apostrophe (').
ForumPost2.ods (53.2 KB)

@sokol92 Thank you for your reply and ForumPost2.ods. I need to study your formula because, to be honest I’m not sure what it is doing. The obvious things like ignoring (Single Version) or Live is clear just need to train myself to see what it is doing with “,” and “.” as some False outcomes are still True.
Version: 7.2.1.2 (x64) / LibreOffice Community
Build ID: 87b77fad49947c1441b67c559c339af8f3517e22
CPU threads: 12; OS: Windows 10.0 Build 19043; UI render: Skia/Raster; VCL: win
Locale: en-AU (en_AU); UI: en-US
Calc: threaded

The formula removes dot(s) in the search value, “(Live)”, “(Single Version)” and Track Number. Of course, the search accuracy is far from 100%. :slightly_smiling_face:

Guys, what about fuzzy lookup here?
The Fuzzy Lookup Add-In for Excel performs fuzzy matching of textual data in Excel.
ForumPost2.ods (59.6 KB)


Modified the formula. The result is the same as @sokol92’s (I found only one more entry), but it’s probably more convenient to edit the formula this way (see Pattern: skip (Live)|(Single Version)|, Pt. 1|).
Pattern:
^(\d{2,3}\s-\s)(.*?)(\s\((?:Live|Single Version)\)|, Pt\. 1|)$
E.g. 19 - The Isley Brothers - Summer Breeze, Pt. 1

What we skip, we enclose in regex groups, and what we extract, we transform with the formulas SUBSTITUTE, TRIM.
This non-greedy part “(.*?)” interests us. Without a question mark, it will eat whatever we want to skip next.
EDIT: Here you can add hyperlinks to quickly navigate to the source entry in the Dict field.

@eeigor Thank You for your effort in providing a workable solution. With only a basic knowledge of Excel I can honestly say I have never come across Fuzzy Lookup. Of the matches that were FALSE there were 6 that are not on the list in Column A. Of the others that were in Column A, a quick edit of deleting some commas or an apostrophe, were fixed easily and gave the TRUE match. Only one FALSE/TRUE I could find no reason for was Harold Melvin & The Blue Notes - If You Don’t Know Me By Now. Both appear to be identical yet have the False Match :grinning: Which Of course is not a problem because I know it’s a True result.

Trailing space in column A.

@sokol92 Good Pick up :+1:

Another Question if I can Please. How can I remove (space-space00:00:00) at the right end of text.
eg. Harold Melvin & The Blue Notes - If You Don’t Know Me By Now - 00:03:26 (time of track) I have searched but unable to find a solution.

As a generalized question this is a new one, and requires to open a new topic.
If a “nitpick” concerning a few cases where a suggested solution doesn’t work as expected, you should explicitly refer to that suggestion.
Anyway: The TRIM() function might be your friend.
Note: The current help is unprecise insofar. In addiotion to the described action, TRIM() also reduces any sequence of spaces between other characters to a single space.

How do you expect me to understand this?

We have already discussed this:
^(\d{2,3}\s-\s|)(.*?)(\s\((?:Live|Single Version)\)|, Pt\. 1|\s-\s\d{2}:.*|)$
Removes “(Live|Single Version|Album Version)”, and so on (in parentheses):
^(\d{2,3}\s-\s|)(.*?)(\s\((?:.*)\)|, Pt\. 1|\s-\s\d{2}:.*|)$

@eeigor Thank you kindly for that modification. ^(\d{2,3}\s-\s|)(.*?)(\s\((?:.*)\)|, Pt\. 1|\s-\s\d{2}:.*|)$. I had no clue on how to modify to skip the time. this has solved the issue.

A general answer to the general question would require to write a tutorial, and the result wouldn’t be complete.

Reduced basically to the case exemplified by the questioner’s attached sheet, the main needed functionality is the preparation of the contents of the lists in a way making them comparable in a special sense.

The preliminary step is substantial and requires string manipulations. As often in similar cases, also in this case the question and the examples don’t clearly specify the range of expectable deviations or state related assurances. A suggestion trying to give a solution can only be based on guesses therefore.

I did this by implementing a possible solution in a sheet derived from the mentioned attachent:
disask68465UploadedForumPost_re1.ods (183.2 KB)

The solution requires LibO V 6.2 or higher.