Concatenate multiple row data under like field

Howdy -
I’m newer to Libre Office Calc, but fairly experienced with spreadsheets. I haven’t used VBA from MS, and I haven’t written macros in any spreadsheet. I was given a new task with 4 years of data, and I need to find a way to sort and compile the data - in a manner I haven’t done before. Some of the files have over 30,000 rows of data.

The basics are:
Sheet 1 - raw data
Column B = member ID (with each member ID most likely appearing on multiple rows)
Column G = member’s evaluation for each appearance of member ID

Sheet 2 - compiled
Column A = member ID (only one occurrence to reference the raw data)
Columns B - Z (or more) = each evaluation of a the member (so a member could have 1 evaluation, or 20+) transferred from sheet 1

Image attached of the Raw Data sheet.

What I need to do is be able to take the 30,000+ entries in sheet 1 column A and remove the duplicates for sheet 2 column A - that is easy enough. But the tough part is to then find each evaluation for a member on sheet 1, and transfer that evaluation to sheet 2 in columns B - Z.

I’ve tried to work with INDEX and VLOOKUP, and have had a bit of success with them. But, those commands require the ID look-up to be fixed to a specific cell (i.e., $A$2). And I don’t have the time to change the ID look-up for upwards of 7,000 unique members in 20 files. I’m hoping a macro would be able to execute the search and transfer of all evaluations for member #1 and transfer them to sheet 2 row 2, then automatically move to the next member and transfer the evaluations for member #2 to sheet 2 row 3 (row 1 is the header row for both sheets).

Thanks for any help or advice you can offer!

Paul

  1. Don’t get caught up on 30,000+ rows. If you find something that works for 5 rows, it will probably work for 30,000 with only a bit of hesitation on a newer computer.
  2. You should edit the original question and upload a seriously shortened version of your ODS, with a mock-up sheet of the result you want.
  3. I think you are on the right track if working in Calc. If in fact you cannot just use INDEX(…MATCH(…)…) constructs or VLOOKUP() constructs, then you are possibly wanting to do something that needs these with array formulas. It would be more valuable to learn array formulas than macro programming, since they would carry over almost directly to Excel, don’t come with the security burden, etc.
  4. Really, you should get the data into a database using LO Base and then what you want is likely only a few SQL queries away. Compared to learning LO macros, your time would be much better spent learning Base and some SQL. Much better spent.

You will have to explain what you mean by this. Of course the criterion can move along with a lookup function…A2 not $A$2. Perhaps what you are not seeing is, in fact, the need for an array formula where you pass a range like $A$2:$A$200 into something that you would think takes just a single value, but because it is being evaluated over an array, the matching cell from the range is what is used.

See the example:

SimpleArrayFormulaExample.ods (15.1 KB)

If the Sheet1 is sorted by members, you can make the command button and attach macro to it. Macro something like this:
REM ***** BASIC *****

Option Explicit

Sub Main
dim oDoc, oSheets, oSheet1, oSheet2
dim lRow1 as long, lRow2 as long, iCol2 as integer
dim sMember as string, s as string
dim Cell as object

’ set sheets ----
oDoc = StarDesktop.CurrentComponent
oSheets = oDoc.Sheets
oSheet1 = oSheets.GetByName(“Sheet1”)
oSheet2 = oSheets.GetByName(“Sheet2”)

sMember = “*****”
lRow1 = 1
lRow2 = 0
Cell = oSheet1.getCellByPosition(1, lRow1)
do while (NOT IsEmpty(Cell) AND Cell.string <> “”)
s = Cell.string
if s <> sMember then
lRow2 = lRow2 + 1
oSheet2.getCellByPosition(0, lRow2).string = s
oSheet2.getCellByPosition(1, lRow2).string = oSheet1.getCellByPosition(6, lRow1).string
iCol2 = 2
sMember = s
else
oSheet2.getCellByPosition(iCol2, lRow2).string = oSheet1.getCellByPosition(6, lRow1).string
iCol2 = iCol2 + 1
end if
lRow1 = lRow1 + 1
Cell = oSheet1.getCellByPosition(1, lRow1)
loop

msgbox “Finished”

End Sub

@shma_lo1, can you upload the example ODS you are working with?

Members.ods (12.0 KB)

Looks like you should start from sheet1 with -->Data–>Pivottable–>…

with USSA_ID into Row-field, and Result into Data-field, and changing from sum to Average to aggregate??

Good morning -

Thank you for the suggestions on how to tackle this project! I’m now learning about Pivot Tables, karolus, and finding them to be useful. I don’t know why I hadn’t tinkered with them before.

shma_lo1 - thanks for offering the Macro solution/code. With the timeframe I have to work, I’m finding it easier to work with the Pivot Tables and some familiar old-school functions for a few secondary calculations.

Joshua4 - the problem with the INDEX formula/function is the search for the ID in cell $A$2 is locked to that cell/ID. If I want to search for another ID, I cannot simply copy and paste the formula/function into the next row. I have to manually change the ID (now in $A$3) to find that specific ID. I don’t have the time to manually change the ID cell for upwards of 140,000 searches in 20 data files.

All the best,

Paul

@Ewog, use what works for you! But why use $A$2 then say it is ‘locked’? Just use $A2. Or use an array formula with $A$2:$A:$20000. I don’t want you to spend weeks using spreadsheets without realizing how they work fundamentally.

Welcome, @Ewog !
Paul, I hope you heed this advice from @joshua4

For this task, you will not need to make few SQL queries, one very simple query will suffice.
See how it’s done.
Open the raw data sheet and select the entire range of the table.
Create a new database
Drag the selected range to the Tables section
Specify the columns that you really need
Confirm the creation of the primary key (you can use the Orig Sort column, but why?)
Now create a request - select USSA_ID and Result
Replace Result with LIST(Result) and provide an alias for this field. It seems to be the only thing you will do without using the mouse.
Execute the query and copy its result to a spreadsheet.
The second column contains all the results for each USSA_ID, separated by a comma.
Select this column and apply Text To Columns to it.

PivotWithBase1

I think that converting 30,000 rows will not take much longer than reading this instruction. In any case, it’s faster than writing a macro. And it seems easier than inventing a working formula.

I tried this. Figuring that “Request” would be “Query” in English, and everything was pretty easy to follow. However, “replace Result with LIST(Result)” is not especially clear. I tried it in the normal Query Edit, and it showed up as expected in the SQL; however, my 7.3 install is not recognizing LIST. I can do COUNT, MAX, or MIN if I add a GROUP BY, and that works fine. I’ve barely ever used this Base install. I was expecting LIST on the default. What am I missing?

The aggregate function LIST() really cannot be selected from the drop-down menu. The fact is that this function is specific to the Firebird database. For example, for HSQLDB one would have to use GROUP_CONCAT - how many different dialects of SQL, so many different ways to solve the same problem. Of course, Base couldn’t list all the possible options in one drop-down menu. So just write the name of the function from the keyboard: click the Result field, surround it with parentheses, add four letters. Yes, it requires certain skills, but it is not very difficult, you quickly get used to it.
BTW, one point is missing in the instructions (and in the video too) - the created request must be saved. After that, you can open it and copy the result.

Ah, the installation defaults to HSQLDB. I’ll try GROUP_CONCAT. I’m not at all timid about typing the function in…in fact, I didn’t even realize that there was a Function row in the UI until after I’d entered LIST. The SQL has always made more sense to me than those grids, in any case, even back when I was doing a lot more in MS Access.

For the record, the browser I was using did not show your video, just a screen shot of the first frame, so I would not have seen Firebird in the status bar…I considered Firebird, but it wasn’t in your written instructions, so I thought I’d ask. On this computer this morning, I can see the video–and that you are using Firebird.

@JohnSUN, it looks like your solution will only work in Firebird. GROUP_CONCAT does not seem to be available in HSQLDB, only in HSQL2, both from my tests and from an OO forum posting by @Villeroy in 2007. If you know otherwise, please let me know.

GROUP_CONCAT is availlable with HSQL2.
Example database with a GROUP_CONCAT query dumped into a spreadsheet: Vacations

[Python,Calc] Group_Concat for the spreadsheet

Vacations database is a great intuitive use of GROUP_CONCAT…everyone off that day. I looked over the concatenation, GROUP_CONCAT, and COALESCE. Could not get link to Python to work, but I think the coding side is pretty clear against a spreadsheet. Thank you.

Download the text document to a trusted directory and click the install button. The document’s text body will be saved in <user_profile>/Scripts/python/pyCalc/group_concat/
Open a spreadsheet with at least 2 columns, one column having the values to be grouped and a subsequent column having the values to be concatenated. Select a range so the first column has the group values and the last column has the concat vallues.
Call Tools>Macros>Run>pyCalc>group_concat>makeGroupConcat_draft
The resulting table will be dumped below the selection.

COALESCE is like IFNULL in other languages. I allowed the second date to be Null (missing, empty, blank) whereas the first date is mandatory. Furthermore, I added a constraint that the second date needs to be greater than or equal to the first date. A valid record has the first date only or both dates with D2>=D1 otherwise the record can’t be stored.
COALESCE(“D2”, “D1”) returns the same value as D1 when D2 is Null, so this record is treated as a one-day vacation.

1 Like

Sorry, that was Python2 as used by OpenOffice. Try this one:
Group_Concat.py.odt (44.3 KB)