How do I find the differences between two lists in LibreOffice?

graviton · June 1, 2013, 2:53pm

Here’s my problem: I need to take two lists, each consisting of many tens of thousands of items, most of them identical between the two lists, and find the relatively few new items in the second list.

I posed this question on a tech advice site, asking if any software existed that could do this. One answer I got included a YouTube video where a guy showed exactly how to do it in Excel, using three separate techniques. The first two were “match” and “vLookup”.

So my question is: Do either of these two techniques have an equivalent in LibreOffice, and if so, where do I find explicit instructions on how to use them?

froz · June 1, 2013, 6:24pm

The best way of doing some comparisons on large amount of data would be database, but it is required that you understand database tables, SQLs etc. You probably don’t, because if you do, then you would probably not ask this question.

Lets try to solve this problem using LibreOffice Calc. If I understand correctly you have two lists of data. First list (e.g. list A) is in column A and second list (e.g. list B) is in column B. You have some data in list B that does not exists in list A. But as I understand you have no item in list A that does not exists on list B. You would like to know which data from list B do not exists on list A. Is it?

Simple sample:

A1: AAA

A2: BBB

A3: CCC

B1: BBB

B2: DDD

B3: CCC

B4: AAA

In C1 type in formula: =IF(ISERROR(MATCH(B1;$A$1:$A$1000000;0))=1;B1;"")
and copy formula down from C1 to C4.

The formula displays in column C the values that are in the list B and do not exists in list A.

Hope this helps. If not please describe your problem in more detail.

dajare · June 1, 2013, 6:39pm

Works for me (4.0.3.3) in a little sample set. Took about 23 seconds thinking to work out how to isolate the “new” values displayed in C column, but quite effective, really. I’m not sure how now to turn those “reference” values in C into text values I can copy into List A. Care to help on that one? (I suppose the is the “match” approach mentioned by OP.) Thanks @froz!

oweng · June 2, 2013, 12:57am

@dajare, do you mean something like this?

dajare · June 2, 2013, 12:44pm

@oweng - Yes, exactly like that. Many thanks!

Gary2 · August 18, 2014, 4:24am

Works perfectly. Thank you.

qubit1 · June 4, 2013, 2:35pm

Hiya,

Unless the data is already in an ODT or otherwise requires LibreOffice, I would go old-school and use the command line.

*(On an Ubuntu system – might require tweaks for OSX or other nixes. On Windows, install Cygwin or something)

Put your lists in two different files a.txt and b.txt

Run this command:

 cat a.txt a.txt b.txt | sort | uniq -u

The “uniq- u” invocation will remove all duplicate lines. We include the “A” list twice to make sure that we remove all of those items (even items that don’t appear in list “B”).

froz · June 5, 2013, 6:24am

Comparing files on Linux: comm -13 <(sort -u a.txt) <(sort -u b.txt)

dajare · June 5, 2013, 9:07pm

Love this command line stuff! So powerful, so simple. Right tool for the right job.

AlexKemp · November 2, 2015, 1:32am