Big spreadsheet - how much RAM?

I have a spreadsheet of two sheets, each with 400k rows, and around 10 columns. Around 75MB.

It started as a CSV so I’m confident there’s no formatting.

Copying and pasting a column of formulas uses 100% of my CPU. It takes 2-3 minutes to copy one column at a time, and I have to force-quit when I ask it to do more than that.

What sort of RAM/storage/etc do I need to work with something this size?
(I’m not techie so I don’t actually know what the limiting factor is likely to be)

Mac OS 12.7
1.6 GHz Dual-Core Intel Core i5
8 GB 1600 MHz DDR3

Thanks.

Which version of Libre Office do you use?

difficult to say in general.

try wtih 4k, 40k, … to make your own idea.

some other ideas …

What is the purpose of loading database data into a spreadsheet? If it is a one-off job, import, sort, filter format, analyse and throw away the source, you can live with the shortcomings and delays.
If this data set is expanding and used on a daily basis, you should not store the stock data on the grid of an arithmetic calculator.

The problem is, I can’t: LO is literally unusable with a file this size.

Ok, I’ve had a similar comment before. Makes sense, I guess; I just don’t know what the alternative is.

It’s 24.8.2.1

So I look at the RAM requirements for 4000 rows and then multiply by 400?

You cannot calculate a memory location by specifying a number of rows.
The decisive factor is the maximum number of characters in a line.
1 character corresponds to 1 byte (8 bits).
Add an overhead of approx. 10%.
Now you can do the math.

1 Like

Developing a small database is the one and only alternative. See Why does LibreOffice Calc take so long to save changes to a file?

Thanks. If I have all this data in csv format, can I use Base to create a database from the data?

(Although in natural language I would say the csv is a “database” already.)

I’ve had a quick look at the help page and it looks more complicated than this.

My data source will also provide the data as json, in case that’s easier.

No, I’m afraid, you can’t. This is for definitively for techies and Base can’t help you with that. In that other thread, it took me one hour to store 3.6 million csv records in a database that is fairly usable with Base.
P.S: with Base and with Calc and Writer as well, if you boil down the 3.6 million records to the ones you are interested in right now. Base is a bridge between databases and office documents.

1 Like

No, while json is popular in programming languages, you will not find much tools to convert whole databases using json. Also you have to find out what you get. I have seen for example json arrays of columns instead of rows, wich may compromise integrity of individual records and are completely useless for database import.
.

I’d say it is an ordered pile of data. Compare it to a big multi-volume encyclopedia, where you removed the inscripions on the back: All information available, but with an index you could find the right volume much quicker.
.
For smaller jobs like this, where I don’t need access for several people I use Sqlite-databases. But I would not use LibreOffice to import the data but the command-line interface of the sqlite shell or Sqlite-Studio as explained here:

.
Problem is: Sqlite is not directly accessible by LibreOffice. You need a database-driver. As you are on a Mac I think it is difficult to use ODBC, as I read the OS is missing a configuration tool. Maybe I’d try the sqlite-extension for LibreOffice, if I’d use a Mac.
.
Just to mention it: I don’t think this is a question of RAM, as LibreOffice can load the file, but your 1.6 GHz Dual-Core Intel Core i5 hints to a nearly 10 year old iMac or perhaps MacBook(Air?). Not the biggest number-cruncher. Even modern multicores will only partly help in this case, because Calc is mostly single core work… When starting with LibreOffice Calc on similiar hardware I found sheets with 10000 rows quite usable until you start to do complex formulas… Your data needs to be reduced by a database, to become useful for Calc.
.
And then: I think you will have to learn a bit of SQL to create the queries you will need to access the data… The level depends on what you wish to do with your “government release of a statutory register”

Background

When you have formulas in spreadsheets there is one formula for each cell. When you copy a column with formulas, each formula needs to be checked for relative references, and then recalculated for the new location. Dependency links (for optimizing recalculation) are also created. This takes time. Barely noticeable for a dataset of 100 cells, but likely the cause why the move of a set of near half a million formulas (with possible interdependencies) takes considerable time to complete.

For a calculated field in a database, the formula exists exactly once. The possibility of relative vertical dependencies is reduced (i.e. less flexibility in how you can extract info), but copy to new column is safer and probably faster. You usually make new columns by query. Database use is perceived as slower, particularly on large datasets, because the result is created at “run time” (when you run a query) while spreadsheet results are “immediate” at entry time.

A database is more efficient to work with and less prone to error with large datasets, compared to a spreadsheet. However, it requires that you understand the nature of structured data, and are able to normalize the input.

Answer

Using Base
Your csv file(s) can be directly read by Base and treated as a database. This will be a low performance and read-only solution. Keep your file(s) in a designated folder where there is nothing else, and “open existing database” from Base.

It can also be imported into a database engine, which allows for editing, performance boosting indexing, and restructuring. LibreOffice can connect as a frontend to many such database engines.

So yes, it is possible, with Base, to lift your data into a proper database and work on it from there. Whether you gain anything from it depends on the nature of the data (what goes in) and the nature of your formulas (what comes out). Little information is given about this.

Stay with Calc
Instead of cell formulas, you may be able to use array formulas. An array formula for an entire column is possible (at least I am not aware of a size limit for arrays), and does not need to be reassessed for each row (except for recalculation). Hence, copying/moving a column with an array formula may be faster. However …

  • Not all cell formulas translate easily to array formulas.
  • You can’t insert a row/column/cell (or otherwise manipulate cell ranges) within an array range.

This is probably fine. It’s a government release of a statutory register and so read-only is adequate for a first cut of the data.

No, I don’t think so. However, possibly worth trying.
Put your csv file in a dedicated directory.
File>New Database…
Connect to existing database of type “Text”
Point to the directory.
Specify import options.
Yes, register the database.
Save the database document.
You csv file appears in the “Tables” section as if it were a table of a true database (which it isn’t).
Create queries in “design view” or try the query wizard.
A query is like a formula which returns any set of rows and columns in any order of rows and columns.
Without “design view” or “wizard”, a simple query on a text table looks like:

SELECT "column 4" AS "Date", "column "6" AS "Name", "column 1" AS "Value"
FROM "TextTable"
WHERE "column 2" = 'Maxi'
ORDER BY "column 7" DESC

This shows 3 selected columns of your “TextTable”(.csv) under modified column labels “Date”, “Name” and “Value”.
Where the column named “column 2” has the text “Maxi”. The resulting table is ordered descendingly by another column.

1 Like