How to import a big amount of data into LibreOffice Base

I have a huge spreadsheet with 280,000 rows like this (somewhat simplified):

Product  Component
A        1
A        2
A        3
B        1
B        4
C        1
C        2
C        5

In the example above:

- Product A is made of components 1, 2 and 3
- Product B is made of components 1 and 4.
- Product B is made of components 1, 2 and 5.

Even though the Microsoft Excel file (.xlsx) weighs 6 MB and has so many rows, LibreOffice Calc manages to open it pretty quickly.

I want to filter for products that contain components 1 and 2. In the example above, that would be products A and C.

I could not find an easy way to achieve this in LibreOffice Calc. So I tried to import all the data into LibreOffice Base, in order to use an SQL query. But I am having trouble.

  1. If I import the data from Calc to Base using the “copy and paste” method, my Ubuntu Linux 18.04 uses 100 % CPU for a few seconds, and then LibreOffice hangs. I have to kill the processes manually.

  2. If I save the spreadsheet from Calc in the dBase format, I can create a dBase database in LibreOffice Base, but then I cannot join the table with itself, because a query can only use one table.
    This is apparently a limitation of the underlying dBase format.

  3. If I create a second LibreOffice Base database with the default “Firebird Embedded” format, and try to drag the table from the first dBase database into this second database, both instances of Base hang. Again, CPU usage remains high only during the first seconds.

I am using Ubuntu 18.04.2 with LibreOffice ‘fresh’ version 6.2.4.2.

How do I quickly export the data from my LibreOffice Calc spreadsheet into a proper (non dBase) LibreOffice Base database?

Maybe having a look to documentation can help. https://wiki.documentfoundation.org/images/1/16/BH5007-Linking.pdf

I had a pretty similar problem as you have, and the solution was quiet simple: close Qlipper before opening Libreoffice.

That’s all. Hope it works for you.

Pablo
Lubuntu 20.04
Libreoffice Versión: 6.4.3.2
Id. de compilación: 1:6.4.3-0ubuntu0.20.04.1
Subprocs. CPU: 2; SO: Linux 5.4; Repres. IU: predet.; VCL: qt5;
Configuración regional: es-AR (es_AR.UTF-8); Idioma de IU: es-ES
Calc: threaded

Hello,

You can do this in portions. Create a Base file. Personally would not use an embedded DB but if doing so use Firebird. Define your table. Do not use numeric or decimal field types in the definition. Instead use type DOUBLE. There are bugs with the copy/paste using these types. Now you can copy portions of your data from Calc, starting small to see that all is correct, and pasting into the table - append data.

Note that data is not automatically saved in Base using Firebird until you perform a save on the Base file. Between your copy/paste actions, you should copy your Base file as a backup. Then you have a starting point if a section has a problem.

Edit:

If opening your .xlsx file in Calc, before transferring data to Base, save it as a native Calc document (.ods format). This may be part of your problem. Only a safe guard - work with native formats…

Edit 2019-06-28:

Delay in replying to additional was due to testing.

Unfortunately, too often only after an answer is posted do the requirements become known. Even then they are usually vague and still probably incomplete. Such is the case here.

Based upon the last comments (posted as answer incorrectly), it is not apparent OP will accept results here or move on to other possibilities. Regardless this post may help others and resolved some of my curiosity. Some results are surprising.

Tests were done on Linux Ubuntu 18.04 Mate (MySQL test on Mint 18.3), LO v6.2.0.3, AMD 4 core 4GB system. The csv file used was 13M and was opened in Calc resulting in a 5M file. There were 81,832 records (21 columns) with mixed data types.

OP multiple times stated Base would hang, but this is most likely not true - only appears so.

First test - copy from Calc & paste to Firebird embedded.

  • First run all text fields abt 45 minutes

  • CPU - one of 4 cores varied from 2 to 100% usage

  • Other 3 cores occasionally used 3-10%

Base appears hung but it is not - it is processing but there is no indicator showing this or any progress type display.

Base screens may even become transparent(except for frame) or display info from other sources while processing.

ALL LO operations are halted (including Calc, Writer, etc.) until process is complete.

Second test modified some fields to reflect DOUBLE vs TEXT types but results were nearly the same as the first test.

Third test - Used macro to read csv and create table. Results were nearly the same with the exception that you could process other LO files albeit slower processing.

Fourth test - On Mint tried MySQL (Workbench) 3+ Min but failed each time & no specifics were presented as to what was wrong.

Back to Ubuntu - PostgreSQL pgAdmin III imported file in less than 30 sec.

Last test - Calc to Base (copy/paste) connected to PostgreSQL via SDBC: just under 12 minutes.

In the tests using macros or Admin tools, care is needed in the csv content. Data in fields may contain quotes, commas or apostrophes which will cause problems. This was NOT a problem in the copy/paste method and thus actually saved some time.

Back to OP’s latest comments, suggestion of importing data in portions was made to help resolve issue of Base “hanging”. Original question never stated this was a recurring process.

It is also not stated what is meant by no longer being portable. Using a DB server doesn’t mean it is located remotely. This is best dealt with as a separate issue.

Could not duplicate the HSQLDB error - not enough info to determine what is the cause.

While there are deficiencies in the process, the final results work. It is unfortunate to see unnecessary criticism of a free product just because it doesn’t work just as an individual wishes. All of us are free to use something else.

I had to “answer” above, because the text was too long for a “comment”. And the “answer” comes above yours. It looks like this “ask LibreOffice” site is also lame. 8-(

It looks like this “ask LibreOffice” site is also lame. 8-(

It looks like Ask site is about questions and solutions. It’s not about discussions. Each answer is meant to be a solution to the problem, nothing else. It’s not a forum.

And if your comment becomes too long, just add a series of comments.

Many thanks for the tests made @Ratslinger. Had similar task as OP some time ago, to get ~45Mb CSV data into Base table, about 300k rows, 12 columns, mixed data types. LO ver. 6.1, Windows 7 and Mojave, embded Firebird. Gave up, cause all attempts failed with crash message after approx. 1hour of pasting data. Look like I will go for another try :slight_smile:

Could you please explain how you did the copy and paste as I’m trying the same and finish up with all the data in the first field of the table?

@kjpetrie1
It sounds as if your data (from Calc?) is not all in individual cells. I should be.
.
Please see these posts:

1 Like

If I have to manually import the data in portions, I would rather drop LibreOffice Base altogether. I have to import new data sets every week.

If you would not use an embedded DB, what is the alternative, a database server somewhere? Then LibreOffice Base would no longer be portable and convenient. I would then use something else. Suggestions are welcome.

Converting the spreadsheet to LibreOffice Calc’s native format (.ods) does not help. This is what I was expecting, as the import is running from RAM. Otherwise, Base would not need Calc to be running in order to import the data.

Incidentally, LibreOffice Calc loads the 5,4 MB Microsoft Excel file (.xlsx) much faster than the equivalent 2,1 MB .ods file, which is surprising.

I am not using numeric fields, all 4 columns are text fields.

I tested with the HSQLDB embedded database format too, and I got this error message while importing from Calc after a few seconds of high CPU usage:

java.io.IOException: java.lang.NullPointeException in statement
[INSERT INTO
“Table1” ( “ID”, “Stücklistenartikel”, “Bauteilartikel”, “Menge”, “Mandant”)
VALUES(?,?,?,?,?)]
Continue anyway?
Yes No

Choosing Yes means Base will hang, and choosing No leaves you with a corrupt table that will not even open.

I am surprised that there is no way to import data directly from LibreOffice Base, without opening Calc. Say a direct import of a CSV file. Importing big amounts of data seems not really possible for the average user.

I am disappointed. LibreOffice Base is looking rather lame indeed.

Please add a comment or edit your question.

I think it is clear by now that LibreOffice Base is not up to the task. I cannot see a way to close this question, so I’ll try making this the “solution”.

For me the major difference was hardware. see this post > Was: How to import a big amount of data into LibreOffice Base [locked]