How to get a list of all unique values from two columns?

Aigars · September 14, 2016, 7:31am

Hi! I have two columns containing values like “aaa”, “aab”, “aac”, etc. They repeat n times in a random pattern and new values may emerge after few intervals. I need to create a list in a new column that would show all the unique values from both columns.

It would look something like this:

If I had just one column I could use Pivot table with the necessary Row Field, but since I have two columns, this does not work. Any suggestions?

Lupp · September 14, 2016, 10:22am

Assuming the two columns are containing continuous cell ranges.

A
(1.) Collect some meta information first in a few available cells: Numbers of used cells for each column.
(2.) Dedicate two extra columns to calculate (MATCH or COUNTIF or …) for each source column independently which elements occur there the first time (top down) inside the column .
(3.) Modify the formulae for the second source column to also return FALSE if the element already occurred in the first column.
(4.) Collect new meta info: How many distinct elements from each column.
(5.) Use one of my many examples demonstrating how to “compact” a selection based on such helper columns twice.
(6.) Make one column from the two results using OFFSET again in a final column.

B Use the obvious interactive means: Copy/Paste and Filtering.

For your convenience (and as a final demo concerning questions of the kind) I made this new example doing as described under A.
(Appended:) Please be sure to note that the data for this demo are generated randomly and will thus change on full recalculation.

(Editing with respect to the discussion redarding the answer by @rautamiekka)
For requests of the present kind it mostly essential to know whether the “data” to be processed are actually data (direct content of the cells; types 1 or 2) or probaly formula results. In the second case the suggestion to register a sheet as (a table of) a database may not apply, IMO. In addition a database will only accept sheets of a restricted structure as tables. If thie condition of only containing constants is met I would suggest to use interactive means as demonstrated in this second attachment.

Aigars · September 14, 2016, 10:44am

Wow, thanks for going the extra mile and providing an example. I will look into it, might be what I needed.

As for the “B” option, it’s no good, there are about 3k entries and it needs to be automated.

Lupp · September 14, 2016, 10:54am

Automation may require a bit of progranning user code. I will not encourage you to go this way, but if you do not feel the need to migrate your application to a database anyway, this is also an option in the specific case. I personally prefer solutions by formulae what is the “Spreadsheet-Do” IMO. However we have to consider formula solutions to get ineffective when applied to very large data collections.
Isn’t there a way to block multiple entry of keys from the beginning?

Lupp · September 14, 2016, 11:00am

Sorry! The announced “second attachment” was not accepted by the askbot for unknown reasons. I will try again later.
As you see it finally worked after a few ‘Refreshs’ in my browser. This askbot really adds some additional waste of time.

Aigars · September 14, 2016, 11:08am

Ok, so I tested your spreadsheet with a small portion of my data, it seems to work very well. Although I don’t yet know why and how (I have to run)
Though, my spreadsheet is not meant to be infinite. Data fields have a fixed size, but I’m starting to wonder if it could in fact be turned into a continuous table…

The very foundation of my spreadsheet is to enter those values in these two columns with a random pattern. They are supposed to repeat.

PYS · September 14, 2016, 12:00pm

Hi @Lupp & @Aigars

For information, in this case it is no necessary to filter MERGER Does not begin with... ZzZzZz%.

In the field name just select - none - and of course, don’t forget to tick Options▸No duplications

Regards

Lupp · September 14, 2016, 1:02pm

@PYS: Thanks for the reminder! Je me sens un peu faible … now and then.

PYS · September 14, 2016, 2:17pm

@Lupp - I would not say this to someone who provides so many answers. You rock

Aigars · September 14, 2016, 4:39pm

@Lupp Yes he does!

Lupp · September 14, 2016, 7:25pm

(You made me flush. I am just after the karma.)

Aigars · September 15, 2016, 6:30am

@Lupp Can your solution be adapted to a fixed range?

Lupp · September 15, 2016, 8:32am

@Aigars: Yes (if I understand correctly what ranges you want to fix - and relative addressing is included with the term). But what good for. Searching long columns for matches using MATCH with order-parameter 0 (no sort order assured) may soon get inefficient. In addition I regard it bad style to search longer parts of columns as actually needed. You also may get fake errors in (the rare) case of having “fixed” mor than a half of the available column…

Lupp · September 15, 2016, 8:42am

I regard OFFSET to be the means of choice anyway for engines of the kind. It offers a (next to) transparent way to acces arbitrary parts of remote (rectangular) ‘SheetCellRange’(s). To get the functionality using INDEX would require to pass calculated addresses in addition to the very thorough usage of absolute and relative addressing.

rautamiekka · September 14, 2016, 8:00am

I ran same test with this code and it works perfectly.

SELECT
    "Column1"
FROM
    "_TABLE_"
WHERE
    "colum1"
NOT IN(
    SELECT
        "Column2"
    FROM
        "_TABLE_"
) UNION
SELECT
    "Column2"
FROM
    "_TABLE_"
WHERE
    "Column2"
NOT IN(
    SELECT
        "Column1"
    FROM
        "_TABLE_"
)

Aigars · September 14, 2016, 8:27am

This looks like SQL, but I’m working with a spreadsheet in calc.

rautamiekka · September 14, 2016, 8:35am

@Aigars: then it might not be possible without complex cell code or a macro. I seem to recall SQL can be used in Calc too.

anon87010807 · September 14, 2016, 9:25am

You can register the Calc file as a database and then use that SQL code.

Aigars · September 14, 2016, 9:32am

It’s starting to sound promising, but I lack the knowledge in this area.

mariosv · September 14, 2016, 11:47pm

Something can be done with Menu/Data/Consolidate, but there is a bug and only the first range to consolidate it’s preserved in the option when the file it’s saved.