Leanly and dynamically merging all CSV files in a folder, to appear on one sheet

Mielai · October 15, 2024, 11:39am

I have a LibreOffice Calc document where I want to do calculations on standardized CSV files all found in the same folder, but I don’t wish to manually import each and every file into Calc. The files will all have the same format, so I don’t want the spreadsheet to care if I remove, swap or add files from one time to the next. I just want all of them to appear in the same sheet.

I am fine if the solution involves linking to a database, if all the same conditions are fulfilled. I already tried and found that with Base I could import all CSV files in a folder, but I found no way of dynamically merging them. The only options I could find involved statically naming each and every table involved.

To be clear, I do NOT want to create a separate sheet for every file like this answer: Multiple csv/ods files into individual tabs in single spreadsheet. The file contents should all appear in one sheet and the contents should change when updated if the original files have been swapped out.

mikekaganski · October 15, 2024, 11:46am

If I faced such a task, I would create a concatenation of all the standardized SCVs in a single CSV; and would link to that CSV as a separate sheet (which would update automatically). That way, just executing something like copy sourcedir\*.csv outdir\all-in-one.csv command line would do it for me.

Lupp · October 15, 2024, 12:01pm

Probably Apache OpenOffice Community Forum - Merge multiple CSV with same Header - (View topic) ?
(I didn’t use the original code nor one of the additionally suggested solutions.)

Mielai · October 15, 2024, 12:30pm

@Lupp It looked like a good thread, but one creates separate file and another requires CLI usage. The third links to a template that does hold your hand quite a bit but I can’t use the template outright. Maybe I can salvage something from their macro.

@mikekaganski Unfortunately I need the result to be as streamlined as possible, so I need to avoid any manual steps after setup. That’s why I would prefer a Calc-only solution or one where Base does all the heavy lifting dynamically. To compare with the devil, it’s trivial in Excel after setting up (e.g. Import Data > From file > Folder > Combine & Transform Data)

Lupp · October 15, 2024, 12:59pm

Make a contract with the devil if he can save your soul.
I can’t advise in detail because I don’t know his forms and conditions.

Mielai · October 15, 2024, 1:11pm

I don’t know, I’m pretty sure the devil will just take it instead. Jests aside, if this isn’t a feature then I’ll need to open a feature request. This seems like it would be trivial to implement in either Calc or Base, since in the latter you can already import an entire folder dynamically as a database. I just haven’t found a way to query the tables dynamically.

sokol92 · October 15, 2024, 2:43pm

See getTables.

Villeroy · October 15, 2024, 3:52pm

Merge sheets of spreadsheet documents can be set up to merge all csv files on one sheet and move the merged files into another folder to avoid duplicate merges.

Mielai · October 15, 2024, 7:43pm

@Villeroy Unfortunately the folder needs to be dynamic so that you can add and remove any constituent file, so no file merges are allowed.

@sokol92 I will look into that, do you have any examples lying around?

Villeroy · October 15, 2024, 9:24pm

Create a sample with your own data in 3 minutes:

Open my template.
Store the new document in a trusted directory allowing macro execution.
Call menu:File>Reload to reload the saved file with macros enabled.
Move some of your files to the same directory as the document. This will be the path that is used by default. Alternatively, click the first push button and select the directory where your files are. This path will be entered in cell B3.
Enter the file name pattern into cell B4. That is *.csv if the names of your text files end with .csv.
If your text files include one column header, enter 1 into cell B7.
Open one of your text files with each and every option set properly.
With the text file in the active window, call menu:Tools>Macros>Run…, navigate to [your spreadsheet document] -> SheetMerge -> FilterOptions -> setFilterOptions and click [Run]. Close the text file. Its import options are stored to cell B9 of your spreadsheet document.
Click the [Go] button.

Results:

All the filles will be merged into one large list with the file origin stored in the very first column of that list, below the cell which is named “Target”.
The file names in the first column let you filter by origin.
You can format the list anyway you want and add all kinds of spreadsheet functionality, formulas, charts, (conditional) formatting, pivot tables, filters etc.
You can freely define your column headers.
You may rename/rearrange sheets as long as the named cells remain intact, particularly the “Target” cell. Hide the configuration sheet. It is no longer needed.

Next time when you click the [Go] button:

If you have the very same files in the source directory, you will get duplicate imports of the same files. This is why I added an option to move the processed file to another directoy.
Every import of x rows from some file will insert x new rows into the list. This way all your references in formulas charts, (conditional) formatting, pivot tables, filters etc. will refer to the newly resized list.

Villeroy · October 15, 2024, 9:49pm

If you really want snapshots of the current files in the source directory, simply delete the cells from the list below the cell named “Target” (that is, cells below the header row).

Wanderer · October 15, 2024, 9:50pm

A workaround would be to route output of a dir /b or ls-command to another .csv-file, wich would then be a “table of contents”, but you already ruled out command-line.

Wanderer · October 15, 2024, 9:57pm

I guess this with an added button to clear all target-cells should be mostly, what was asked.

Villeroy · October 16, 2024, 11:10am

One may call a subroutine clearing/deleting the current region below the target cell and call that routine in front of all the other stuff.
However, we are dealing with a spreadsheet here and we have no idea how the actual spreadsheet will look like in the long run. Clearing all contents will leave references like A2:A100. My macro will expand them with the next charge of 80 rows to A2:A180. Deleting all the “old rows” destroys existing references: =SUM($A$#REF!:$A$#REF!).
I realy had an ever growing pseudo-database in mind when I wrote this macro.

Let me suggest a clean solution to this problem:
Store your document with all the setup, your formulas, charts, pivot tables etc. but with no data as a template.
Every time you need a new snapshot from your csv, just open the template, click the macro button or let the macro run automatically on open. Then save or export the resulting report to disk or discard it unsaved.
Solving the reference problem:

Use references like A:A (entire column) or A2:A1048576 (without header row)
Comment out sh.insertCells(adr, com.sun.star.sheet.CellInsertMode.DOWN) in Module1 of library “SheetMerge” within your template.

Villeroy · October 16, 2024, 5:39pm

Under Linux, tail -qn +2 *Input.csv>Concatenated.csv concatenates files ending with Input.csv into one file Concatenated.csv, skipping one header row. Now you can open Concatenated.csv with whatever program.
Since you never mentioned any operating system, let alone other technical info, you must be a Windows user. Windows10 claims to support Linux essentials. If so, this would be the most simple solution.

Wanderer · October 16, 2024, 8:13pm

You can use a complete linux-subsystem, but for one command this is overkill.
I’m using Cygwin since long times, but there are a lot of tools available:

karolus · October 17, 2024, 2:02am

slightly offtopic: IMHO, WSL would be the only reason for me to use windows10|11!

cpb · October 19, 2024, 3:07pm

@Mielai,
I don’t do spreadsheets and for me the best and cleanest solution would be to insert the CSV data into a database table then use stored queries to perform any calculations.
.
the fact that you are happy to use a spreadsheet suggests that you don’t have a huge amount of data.
.
the attachment has a folder which contains a hsqldb embedded database and a second folder which contains 5 small standardised (3 column) CSVs.
the CSV folder is scanned and the data contained in the CSV files is inserted into text tables, each file requires its own text table. I created 20 text tables which means 20 CSVs max.
.
I created 2 temporary tables, initially they contain zero data, when the database is loaded a macro is fired then 1 will contain the CSV filenames and the other will contain the CSV data.
.
there is 1 external form with 2 table controls.
a number of non contiguous files can be selected by holding down CTRL and clicking in the left-hand margin.
contiguous files are selected by clicking on one and then clicking on another whilst holding down the SHIFT key.
hit the button to display the data.

the CSV data is read only once per session, when the database is first loaded.
temporary tables are not made visible by base so we have 2 simple queries which function as the data-source to our internal forms.
you can drag and drop the query “qAll_tmp” into a spreadsheet.
after using the form to select a different dataset you just need to select ‘menu:>Data>Refresh Range’ from the spreadsheet.
.
it’s obvious that there will be a mismatch between our CSVs but you provided zero info.
.
EDIT: I received a message stating that the attachment was corrupt.
I have replaced the attachment and emboldened the warning.
.
the attachment almost exclusively uses macros.
it’s actually a zip file so if you download it then you must change the extension from ‘.odt’ to ‘.zip’ and then extract the contents.
CSV_Embedded.odt (14.6 KB)