Why does LibreOffice Calc take so long to save changes to a file?

I downloaded this data: https://fenixservices.fao.org/faostat/static/bulkdownloads/Production_Crops_Livestock_E_All_Data.zip from FAOSTAT and imported it into LibreOffice Calc as a CSV.

I then saved the file as an ODF and this took a long time. The Manjaro operating system kept asking me to Force Quit or Wait. I am guessing that the program could have provided a progress bar with options to cancel (if the estimated time was too long).

I then made one small change to a row of data (changed the background colour) and then saved the file.

It then took a very long time to save the file with the ‘Force Quit or Wait’ dialog popping up.

Why should this be the case?

Why it is not possible to update only the changes to an in-memory data structure to a file?

Kind regards,
David.

Saving the large table as Open Document Spreadsheet (ods) takes 10 seconds on my lame notebook.
A spreadsheet is not a database. CSV is a database exchange format. The csv tables in that zip file are supposed to be related to each other.

What exactly do you want to do with these data?

Woldn’t say it is impossible, but can be complicated.
Lets assume abc is changed to adc - we could save directly.
Now try abc and change to abbc - so we need more space to save either abbc or a marker a@c with a list @2:bb to replace the marker with new text.
Same for adding attributes like bold - you need additional space.
.
As additional obstacle code all this in xml and compress the file. Now try to do a change at a defined position quickly.

1 Like

Thank you for realising that I am not criticising or taking a dig at LibreOffice Calc and that I am genuinely interested in why it is difficult to save in memory changes to storage quickly (especially tiny changes).

Is it fair to say that tiny changes to in memory data can be very fast and efficient but when these tiny changes are saved to an XML file on the filesystem things can be slow because a file has to be rewritten from start to finish.

There is no clever way in which the data on the filesystem is broken up into smaller updatable pieces that are then be joined together.

For example, from my limited knowledge, if a file was a linked list, that mirrored the in memory data, then would the program be able to collect all changes into a data structure and then pass these changes onto the file updating process.

So that it knows ‘at these points in the linked list’ insert, update or delete data - with a fast way to access these points.

Does it boil down to the limitations of the file system - how data is written, restored, updated and deleted from storage.

Anyway, thank you for piquing my interest.

Livestock (170 MB) is something to play with.
I wrote a little program to normalize and repair the csv data on sheet, created a HSQL database, copied the normalized rows million by million into the database, added some queries, a form and an installation macro which connects the Base document to the database after you extracted the package on your computer.
Then I created a spreadsheet as an additional frontend. First sheet has the same form as the database where you define meaningful filter criteria. Second sheet has a pivot table and a pivot chart as kind of report. The macro in the spreadsheet refreshes the pivot when you select its sheet.
The filterable form allows to find and edit data record by record and it allows for resonably comparable reports.
In this simplistic draft you can only choose one geographic region or all, one crop or all of them. With some extra effort it would be possible to compare multiple selected regions and crops.

P.S. when playing with this large record set, auto-updating the pivot on sheet activation may be a bad idea. How to disable: Right-click “Pivot” sheet > Events and remove the activation event. Manual refresh: right-cick pivot area and choose “Refresh”

1 Like

Because LibreOffice does not keep the ODF file in memory; the XML data is generated from in-memory data each time, and then ZIPped. If the data is large, it takes time; but using the XML in memory instead (even without taking packaging into ZIP) would result in much slower operation at every user edit, not compensated by a bit faster save time.

Indeed, improvements are possible; for them, bug reports are needed, with problematic data, and perf keyword.

1 Like

Note that file format has own effects on this. E.g., ODS has all the data (i.e., all sheets) in a single contents.xml stream; while XLSX has separate XMLs per sheet. That makes reading and writing XLSX faster in case when several sheets are used; in case of your data, it would show up if you imported each CSV from the package into a separate sheet. LibreOffice would work faster with the external, non-native file format in this case than with own native ODF (that doesn’t mean, of course, that that external format has better support - the data/formatting losses are likely; but still could look strange).

1 Like

There is no structure in a spreadsheet. Each cell can take any text or number. There are no fields nor records. A spreadsheet does not even have tables. A “table” is just a rectangle of cells that can be interpreted as a record set without being a record set.
Editing a database, you edit one record at a time.

1 Like

This is not really relevant. The “unstructured” nature of spreadsheets doesn’t map to the storage format, which has a structure. You may even invent a storage format for spreadsheets, which would be based on a database - say, store each row as a record of BLOBs … and then you could have atomic transactions on save. Or even easier - without any new format: use the XML DOM as the internal document model in a program, with per-node modified state (something that LibreOffice does not do for many reasons), and your changes would be very targeted, with very much improved store times. My point is: the two facts (LibreOffice implementation not allowing to have small/limited file modifications instead of full file regenerations, and spreadsheets being “unstructured”) are unrelated.

1 Like