Trouble converting large excel spreadsheet

The bug report confirms your findings with the clipboard, so no need for you to test that file.
.
Access to your file is necessary to test, if the same problem occurs on other computers, maybe with other OS or different version of LO.
.
As you wrote “fails”, is there any error message, or does the system saves “forever” and never returns. I remember when testing limits some years ago, I sometimes left my laptop active for some hours (preventing sleep-modes) and LibreOffice finished jobs “sometime”.
.
Excel not being able to save sounds interesting, too. I would not expect shared code here…
.

Not to solve your problem… You could try, if Calc can handle first and second half of your data, or smaller amounts, but this wont help on the full file. I’d suggest to load your csv in a real database, either file-based like sqlite or using MariaDB, PostgreSQL etc. Then you may preprocess data for usage in Calc with the database by queries.

I tried to upload the CSV file since I only need to get the data in Calc and don’t need it to be an Excel file, but this forum doesn’t seem to allow text or CSV files. So here is a link to the CSV file instead.
https://drive.google.com/file/d/1-DA5UeY0vQbHrIoTHyXNzorLiPEbyEnp/view?usp=share_link

I can open the file CSV in Calc without issue. When I try to save it as an ODS file Calc I get the " Error saving the document ContosoSales CSV: General Error. General input/output error."

When I try to save as an ODS file from Excel I get a warning that some features in the CSV file may not be supported in the ODS file. I try to save anyway and I get the “Errors were detected while saving .” It suggests I can make repairs but this is simply the CVS file I opened in Excel and then tried to save as ODS file.
I saved it as an .XLSX file and then try to save that as an ODS file and I get the same warning and the then the same error message.

Geez… a 694MB .csv file… Make sure your temporary disk space is large enough, because saving the file as .ods creates a HUGE interim XML stream before being packed. I don’t know how much, my almost 8GB free was not enough. Maybe 16, 32, …

Most sites have 2 restrictions: allowed types and size. As usually only extensions are checked, we can cheat a bit, for example renaming test.oxt to test.oxt.zip
For Size: compressing the file to .zip reduces it to a size wich can be uploaded…

I got it the following way:
1 - Open the cvs in LibreOffice (size A1:CJ1045001)
2- Save in xls and close the file
3 - I opened the xls in LibreOffice and saved it in ods.


(I use Win.10 and LibO.7.5.2.2)

That didn’t quite work. When you save it as .xls it drops most of the rows and only keeps 65536 rows.

Ok, sorry, I didn’t check the result.

CSV opens in about 3’ 15’’, with
Version: 7.6.0.0.alpha0+ (X86_64) / LibreOffice Community
Build ID: f22f7159ec1bd28d19f4a8b431893436419ecd64
CPU threads: 4; OS: Windows 10.0 Build 19045; UI render: Skia/Raster; VCL: win
Locale: es-ES (es_ES); UI: en-US Calc: CL threaded Jumbo
i5-6500 16 GB of memory

Saves as ods, using about 1.6 GB of memory, in 4’ 26’’.

1 Like

Well, yes, but how much temporary disk space did it use? i.e. in /tmp/ or $TMP or Windows %TEMP% (or whatever is used there).

While saving as xlsx, I can find it in
C:\Users\xxxx\AppData\Local\Temp\lu156889fzs8x.tmp
lu15688{059DD2D3-6A6B-4F7C-AC48-B56DD5188A03}.tmp
lu15688{903E389B-AA84-495F-9D03-744427530043}.tmp
lu15688{E5F5D701-DF86-48B4-AD75-BF3B3C04E989}.tmp
C:\Users\xxx\AppData\Local\Temp\lu167563pjnou.tmp
lu16756{A69C6CCF-53FA-467A-8E90-88B64E4DF5F4}.tmp
but they show 0 size, also opening the file properties.

Saving as xlsx has a much higher use of memory up to 12 GB

1 Like

Calc loads “ContosoSales Csv.csv” within acceptable time as 1045000 database records with 88 consistent fields.
Char set: Western Europe (Win 1252)
Language: Any English
Column delimiter: comma
Char delimiter: double-quote
Detect special numbers: Yes (as always)
Columns R and S (LoadDate and UpdateDate) consist of 40179 which you may want to format as date 2010-01-01.

Importing this flavour of csv into some database is easy.

My system is:
Win11
32GB RAM (task manager shows 20GB free when saving to ODS)
I have 300GB free space on my C drive
I have 800GB free space on the drive where the files are strored and being saved to.

Is there something else I should look at?

Nothing likely. I assume you use the 64bit-version of LibreOffice? And it is not likely to find Win11 on an old FAT-Filesystem, wich would limit files/folders to 2 GB. There may be quotas (max. size per user) but I’m not using/testing Win11 at this moment.
.
One thing to try is safe-mode, as others could load your file.
.
While it is interesting to find out, why the file will not save on your system. What do you wish to do afterwards? Wouldn’t it be better to put this in a database, maybe something “simple” as sqlite. (Don’t try embedded databases.) and present Calc only the result of queries.

I am doing some load testing of different tasks in Excel and wanted to try the same tasks in LibreOffice. I wanted to use the same data in both to have the best comparisons.

Performance has always been top priority to the Excel developers. LO Calc can’t really handle this amount of data. However, with Base you could build a customized user interface on top of a database. The development effort would pay off if you had to work with these data structures in the future.

I’m not certain it is specific to Calc. I am able to open the .xlsx spreadsheet that has the same data in it with Calc with no errors. If I try to save to ODS from Excel or Calc it errors out. So it seems like it is an issue with the ODS file type and my data.

Rather, a combination of ODS and Windows, given that @mariosv succeeded on Linux with 16 GB RAM.

ODF uses much longer element names than OOXML. The latter uses single-char element names (c, s, t…) for most often used elements; while in ODF, they are like table:table-cell, office:value-type, text:p, …

The unpacked XML in a XLSX saved from your data in Excel is around 3GB. I guess that the unpacked XML of ODS should be larger then 4GB; and I suspect some difference in the number types in the compilers used to build on Windows vs. on Linux (e.g., long is 32 bit on Windows, even Win64, while it’s 64-bit on Linux64).

Hmm, on a development 7.6 pre-alpha version, similarly to @mariosv, I was able to export the document to ODS. As expected, the XML inside was large (almost 10GB); interestingly, the ODS itself was smaller than XLSX (the ODS was 277 MB). Yet, it succeeded on Win10 64 bit. While the release version (7.5.2.2) failed. And further, the resulting ODS can’t be opened in the release 7.5, while it can in development 7.6. So likely, something changed in the development version, and the future version could handle this.

tdf#128244? or rather, tdf#94915 (because it got a fix in 7.6), but they are likely the same.

“Contoso” is the name of a virtual company founded by Microsoft. Its business database is hosted here: Release Contoso Data Generator · sql-bi/Contoso-Data-Generator · GitHub
The csv dump (or spreadsheet dump) is output from that database. A csv dump of a database is not the same as the database itself. Loading it into a spreadsheet will not turn a data dump into a database even if Excel may be able to load and save it.
You can’t keep data safe and consistent when they reside in one table of 88 columns, mostly filled with redundant data. What you are supposed to use is the frontend to that database. A database consists of many tables with no redundancies. A database frontend loads a small subset of data to work with even if there are billions of table rows in the underlying database.

The downloadable database includes tools to generate the database and load the spreadsheet dump into the database tables in order to get a usable business database. The spreadsheet is not usable, not even with Excel.

Meanwhile, while writing the above text, I managed to save your csv dump as a 1 GB dBase file (blessed database format from the 90ies) and connect a Base document to the directory containing that file. Editing 88 database fields in a form is cumbersome but works fluently since Base loads only a small set of rows into memory.

How to convert the csv dump into a dBase dump which is editable in Base:

  1. Open the csv and find all date columns that are represented by integer day numbers and format these day numbers as ISO dates (number format code YYYY-MM-DD).
  2. Create a dedicated database directory and save the file as dBase (*.dbf) in that directory. This takes 20 minutes on my bread and butter laptop.
  3. File>New>Database… Connect to existing database of type “dBase”, specify the directory where you saved the dbf.
  4. Save the database document. This database document does not contain any data. It shows a single table representing your dbf file. All the data are in the dbf.
  5. Right-click the table and call the form wizzard.
    5.1 Click the >> button in order to include all table columns
    5.2 Jump to step 5 (“arrange controls”) and choose the last layout option which is least space consuming.
    5.3. Finish the wizard.
  1. Open the new input form.
1 Like