Consolidate partial-matching rows - add unique cells to previous matching rows

skye99 · February 10, 2025, 2:09am

What would be the formula to compare cells in a row (A1:D1) against the rows directly below (A2:D2, A3:D3…), and if any rows match A1:D1 then insert values of the trailing cells (E2:H2, E3:H3, E4:H4…) to the end of that first matching row into I1:…, if those values are not already present there. Then go on and compare the next row to the ones below, and insert any trailing cells of those rows to the end of that row if the values are not already present in column I:…, and so on down…
Thanks!

Example:
(numbers in red are duplicates and not to be copied)
Consolidate partial-matching rows - example

fpy · February 10, 2025, 7:29pm

no one seems eager to jump on this, probably cause it would be a pain to get the expected result from clear Calc formulas

Could be reachable with a good old ad hoc iterative macro;
and basically, your data being pretty raw, would be even far simpler with short Perl-ish / python-ish scripting over csv file.

or with clean SQL JOIN + EXCEPT if you organize your data.

skye99 · February 11, 2025, 4:13am

I didn’t think it would be that difficult.
thanks!

fpy · February 11, 2025, 6:21am

well, maybe more tricky than really difficult …
the perl-ish version is already ~20 lines :
438r7fea8 - Perl - OneCompiler

or if you have the courage to test/debug with AI :

skye99 · February 11, 2025, 3:48pm

thanks a lot for your help fpy!
Perl scripts are beyond me, how do you use it?

fpy · February 11, 2025, 5:09pm

you can follow the link given above, change the input (STDIN),
probably adapt the print statement,
and click

Villeroy · February 11, 2025, 8:37pm

ask117806.odb (15.8 KB)

skye99 · February 12, 2025, 12:11am

thanks for your help fpy!
the script result is close, but it seems to copy repetitive values that are already present - it should only copy unique values. And it only runs successfully with the given data, but if using different data it returns “Program did not output anything!”.

Villeroy, thanks, but I’m not sure how use that.

fpy · February 12, 2025, 4:40am

well, need to clarify (=formalize) your expected result on less ambiguous exemples; especially what means * “insert”, “copy”, “repetitive”*

like what is your expected on this input ?

0 2 3 4 a b c
1 2 3 4 a b c
1 2 3 4 a b c
1 2 3 4 a c
1 2 3 4 b b

it was only printing matching lines.
I’ve updated the script to (hopefully) help clarifying : v2 - Perl - OneCompiler

Villeroy · February 12, 2025, 10:52am

Open the table. This is where the data are stored. It is a remake of your spreadsheet columns from A to H.
Open the form.
Left side shows unique combinations of ABCD.
The grid in the center shows the corresponding EFGH for each selected ABCD. When you enter a new row into this grid, you add a new record to the table with the selected ABCD values and the newly entered EFGH values. When you delete a row, you delete the entire row from A to H from the table.
The right side shows the unique EFGH values of the selected ABCD.

skye99 · February 12, 2025, 8:48pm

Thanks fpy & Villeroy!

I like what you both did , but it’s a bit different from what I’d like to have done. Ok, I’ll try to explain better. The idea is to consolidate the unique values of partial-matching rows in order to eliminate redundant columns later.
.
That is, compare the values in column A:D, and copy the values (from E:H) from the rows whose values in columns A:D match, and copy only the unique values from E:H (if not already existing) to the first row in the first available empty cell .
.
So, starting with A1:D1, if any cells directly below (A2:D2, A3:D3…) match A1:D1, then copy only the unique non-repeating values of E2:H2, E3:H3, E4:H4… to the end of the first matching row as far as needed (in this case it would be the unique values from E2:H2 and E3:H3, E4:H4…, inserted to I1…W1).
.
The values of those cells (in column E:H) are copied only to the first/original row. Do not copy a value if it was previously copied and already exists in E1:W1… The matching rows below are left untouched, nothing copied to those rows.
.
Then find the next different group of matching rows (in this case, A9:D9 matches A10:D10) and copy only unique non-repeating values (in this case, the unique values of E9:H9 and E10:H10 are copied into I9:…L9), and so on down.
.
I will later apply Auto-Filter to eliminate rows containing redundant matching columns (A:D).
.
I hope I explained it clearly. Here’s an expanded example…
.
Consolidate partial-matching rows - example 2
.
oops…
In my mockup example I made a couple errors. I mistakenly inserted into I1, L1 and M1 values already existing in E1, F1 and H1. Since those values are already in that row, they should not be copied over.

Villeroy · February 13, 2025, 12:56am

Nothing qualifies a spreadsheet application for this task. Spreadsheets have no concept of sets nor duplicates. Spreadsheets are error prone.

Call menu:Tools>Options>Base>Databases and register my database under some name…
Open my database and call Insert>“Query in SQL view”

SELECT "A", "B", "C", "D", LIST( "X" ) AS "List" FROM "Qry2" GROUP BY "Qry2"."A", "Qry2"."B", "Qry2"."C", "Qry2"."D"

Save the query under some name.
Save and close my database.
Open a Calc document and the data source window (Ctrl+Shift+F4).
Expand the database name, “Queries” and drag the icon of your query from the left pane into the spreadsheet. This creates a linked database range. It can be updated via Data>Refresh when the cell cursor is anywhere within that range.
From your sample data, the query returns this record set:

A	B	C	D	List
10	11	12	13	22,23,25,26,28,29,30,33,37,38
20	21	22	23	24,25,26,27
40	41	42	43	44,45,46,47
50	51	52	53	54,55,56,57,58,59,60,61
60	61	62	63	64,65,67,69,71,74

If you need the list values as separate numbers:
Select the “List” column and call menu:Data>Text to columns…

fpy · February 13, 2025, 6:27am

would be v3 - Perl - OneCompiler

skye99 · February 14, 2025, 12:57am

thanks again to you fpy and Villeroy
.
fpy, your code works nicely, but I have to first replace the tabs between the values with single spaces, I guess it doesn’t work if I just copy and past from the spreadsheet. Thanks a lot!!!.
.
Villeroy, I have to become more familiar with the database, but you’ve inspired me to learn to do things in it instead of calc. Thanks a lot for your help, it’s very much appreciated!
.

fpy · February 14, 2025, 1:34pm

same tricks with tab/space separators.
just need to change the split to recognize both : v4 - Perl - OneCompiler

then iterified and python-ized to a macro you can run directly in Calc

def copyUniq(s, rref,cref, val):
  for c in range(4,cref):
    if (val == s[rref,c].getValue()):
      return cref;
  s[rref,cref].setValue(val)
  return cref+1

def groupBy(s, rref,curr):
  for c in range(0,4):
    if (s[rref,c].getValue() != s[curr,c].getValue() ):
      return False
  return True

def conso():
    doc = XSCRIPTCONTEXT.getDocument()
    sheet = doc.getCurrentController().getActiveSheet()

    rref = 0
    cref = 8
    curr = 1
    while sheet[curr,0].getValue():
      if groupBy(sheet,rref,curr):
        for c in range(4,8):
          cref = copyUniq(sheet,rref,cref, sheet[curr,c].getValue())
      else:
        rref = curr
        cref = 8
      curr += 1

skye99 · February 15, 2025, 2:36am

that’s great fpy, thanks!
question: how hard and how long does it take to learn Perl or Python, and what is the best source to learn from?

karolus · February 15, 2025, 6:32am

may be you should read Peter Norvigs Essay about

python-Tutorial

fpy · February 15, 2025, 10:11am

and even before :
→ Python For Beginners | Python.org
→ https://wiki.python.org/moin/BeginnersGuide/NonProgrammers

also plenty of videos; playing with AI, etc

skye99 · February 15, 2025, 4:07pm

ok, thanks karolus and fpy, I’ll add this to my already long list of “more things to learn”, lol…