What would be the formula to compare cells in a row (A1:D1) against the rows directly below (A2:D2, A3:D3…), and if any rows match A1:D1 then insert values of the trailing cells (E2:H2, E3:H3, E4:H4…) to the end of that first matching row into I1:…, if those values are not already present there. Then go on and compare the next row to the ones below, and insert any trailing cells of those rows to the end of that row if the values are not already present in column I:…, and so on down…
Thanks!
Example:
(numbers in red are duplicates and not to be copied)
no one seems eager to jump on this, probably cause it would be a pain to get the expected result from clear Calc formulas
Could be reachable with a good old ad hoc iterative macro;
and basically, your data being pretty raw, would be even far simpler with short Perl-ish / python-ish scripting over csv file.
or with clean SQL JOIN + EXCEPT if you organize your data.
thanks for your help fpy!
the script result is close, but it seems to copy repetitive values that are already present - it should only copy unique values. And it only runs successfully with the given data, but if using different data it returns “Program did not output anything!”.
Open the table. This is where the data are stored. It is a remake of your spreadsheet columns from A to H.
Open the form.
Left side shows unique combinations of ABCD.
The grid in the center shows the corresponding EFGH for each selected ABCD. When you enter a new row into this grid, you add a new record to the table with the selected ABCD values and the newly entered EFGH values. When you delete a row, you delete the entire row from A to H from the table.
The right side shows the unique EFGH values of the selected ABCD.
I like what you both did , but it’s a bit different from what I’d like to have done. Ok, I’ll try to explain better. The idea is to consolidate the unique values of partial-matching rows in order to eliminate redundant columns later.
.
That is, compare the values in column A:D, and copy the values (from E:H) from the rows whose values in columns A:D match, and copy only the unique values from E:H (if not already existing) to the first row in the first available empty cell .
.
So, starting with A1:D1, if any cells directly below (A2:D2, A3:D3…) match A1:D1, then copy only the unique non-repeating values of E2:H2, E3:H3, E4:H4… to the end of the first matching row as far as needed (in this case it would be the unique values from E2:H2 and E3:H3, E4:H4…, inserted to I1…W1).
.
The values of those cells (in column E:H) are copied only to the first/original row. Do not copy a value if it was previously copied and already exists in E1:W1… The matching rows below are left untouched, nothing copied to those rows.
.
Then find the next different group of matching rows (in this case, A9:D9 matches A10:D10) and copy only unique non-repeating values (in this case, the unique values of E9:H9 and E10:H10 are copied into I9:…L9), and so on down.
.
I will later apply Auto-Filter to eliminate rows containing redundant matching columns (A:D).
.
I hope I explained it clearly. Here’s an expanded example…
.
.
oops…
In my mockup example I made a couple errors. I mistakenly inserted into I1, L1 and M1 values already existing in E1, F1 and H1. Since those values are already in that row, they should not be copied over.
Nothing qualifies a spreadsheet application for this task. Spreadsheets have no concept of sets nor duplicates. Spreadsheets are error prone.
Call menu:Tools>Options>Base>Databases and register my database under some name…
Open my database and call Insert>“Query in SQL view”
SELECT "A", "B", "C", "D", LIST( "X" ) AS "List" FROM "Qry2" GROUP BY "Qry2"."A", "Qry2"."B", "Qry2"."C", "Qry2"."D"
Save the query under some name.
Save and close my database.
Open a Calc document and the data source window (Ctrl+Shift+F4).
Expand the database name, “Queries” and drag the icon of your query from the left pane into the spreadsheet. This creates a linked database range. It can be updated via Data>Refresh when the cell cursor is anywhere within that range.
From your sample data, the query returns this record set:
A B C D List
10 11 12 13 22,23,25,26,28,29,30,33,37,38
20 21 22 23 24,25,26,27
40 41 42 43 44,45,46,47
50 51 52 53 54,55,56,57,58,59,60,61
60 61 62 63 64,65,67,69,71,74
If you need the list values as separate numbers:
Select the “List” column and call menu:Data>Text to columns…
thanks again to you fpy and Villeroy
.
fpy, your code works nicely, but I have to first replace the tabs between the values with single spaces, I guess it doesn’t work if I just copy and past from the spreadsheet. Thanks a lot!!!.
.
Villeroy, I have to become more familiar with the database, but you’ve inspired me to learn to do things in it instead of calc. Thanks a lot for your help, it’s very much appreciated!
.