Ask Your Question
0

How to import a big amount of data into LibreOffice Base

asked 2019-06-25 15:43:47 +0100

rdiez gravatar image

I have a huge spreadsheet with 280,000 rows like this (somewhat simplified):

Product  Component
A        1
A        2
A        3
B        1
B        4
C        1
C        2
C        5

In the example above:

- Product A is made of components 1, 2 and 3
- Product B is made of components 1 and 4.
- Product B is made of components 1, 2 and 5.

Even though the Microsoft Excel file (.xlsx) weighs 6 MB and has so many rows, LibreOffice Calc manages to open it pretty quickly.

I want to filter for products that contain components 1 and 2. In the example above, that would be products A and C.

I could not find an easy way to achieve this in LibreOffice Calc. So I tried to import all the data into LibreOffice Base, in order to use an SQL query. But I am having trouble.

1) If I import the data from Calc to Base using the "copy and paste" method, my Ubuntu Linux 18.04 uses 100 % CPU for a few seconds, and then LibreOffice hangs. I have to kill the processes manually.

2) If I save the spreadsheet from Calc in the dBase format, I can create a dBase database in LibreOffice Base, but then I cannot join the table with itself, because a query can only use one table. This is apparently a limitation of the underlying dBase format.

3) If I create a second LibreOffice Base database with the default "Firebird Embedded" format, and try to drag the table from the first dBase database into this second database, both instances of Base hang. Again, CPU usage remains high only during the first seconds.

I am using Ubuntu 18.04.2 with LibreOffice 'fresh' version 6.2.4.2.

How do I quickly export the data from my LibreOffice Calc spreadsheet into a proper (non dBase) LibreOffice Base database?

edit retag flag offensive close merge delete

Comments

Maybe having a look to documentation can help. https://wiki.documentfoundation.org/i...

m.a.riosv gravatar imagem.a.riosv ( 2019-06-26 09:23:48 +0100 )edit

2 Answers

Sort by » oldest newest most voted
0

answered 2019-06-25 17:32:49 +0100

Ratslinger gravatar image

updated 2019-06-28 18:24:16 +0100

Hello,

You can do this in portions. Create a Base file. Personally would not use an embedded DB but if doing so use Firebird. Define your table. Do not use numeric or decimal field types in the definition. Instead use type DOUBLE. There are bugs with the copy/paste using these types. Now you can copy portions of your data from Calc, starting small to see that all is correct, and pasting into the table - append data.

Note that data is not automatically saved in Base using Firebird until you perform a save on the Base file. Between your copy/paste actions, you should copy your Base file as a backup. Then you have a starting point if a section has a problem.

Edit:

If opening your .xlsx file in Calc, before transferring data to Base, save it as a native Calc document (.ods format). This may be part of your problem. Only a safe guard - work with native formats..

Edit 2019-06-28:

Delay in replying to additional was due to testing.

Unfortunately, too often only after an answer is posted do the requirements become known. Even then they are usually vague and still probably incomplete. Such is the case here.

Based upon the last comments (posted as answer incorrectly), it is not apparent OP will accept results here or move on to other possibilities. Regardless this post may help others and resolved some of my curiosity. Some results are surprising.

Tests were done on Linux Ubuntu 18.04 Mate (MySQL test on Mint 18.3), LO v6.2.0.3, AMD 4 core 4GB system. The csv file used was 13M and was opened in Calc resulting in a 5M file. There were 81,832 records (21 columns) with mixed data types.

OP multiple times stated Base would hang, but this is most likely not true - only appears so.

First test - copy from Calc & paste to Firebird embedded.

  • First run all text fields abt 45 minutes

  • CPU - one of 4 cores varied from 2 to 100% usage

  • Other 3 cores occasionally used 3-10%

Base appears hung but it is not - it is processing but there is no indicator showing this or any progress type display.

Base screens may even become transparent(except for frame) or display info from other sources while processing.

ALL LO operations are halted (including Calc, Writer, etc.) until process is complete.

Second test modified some fields to reflect DOUBLE vs TEXT types but results were nearly the same as the first test.

Third test - Used macro to read csv and create table. Results were nearly the same with the exception that you could process other LO files albeit slower processing.

Fourth test - On Mint tried MySQL (Workbench) 3+ Min but failed each time & no specifics were presented as to what was wrong.

Back to Ubuntu - PostgreSQL pgAdmin III imported file in less than 30 sec.

Last test - Calc to Base (copy/paste) connected to PostgreSQL via SDBC: just under 12 minutes.

In the tests ... (more)

edit flag offensive delete link more

Comments

I had to "answer" above, because the text was too long for a "comment". And the "answer" comes above yours. It looks like this "ask LibreOffice" site is also lame. 8-(

rdiez gravatar imagerdiez ( 2019-06-26 09:04:08 +0100 )edit
1

It looks like this "ask LibreOffice" site is also lame. 8-(

It looks like Ask site is about questions and solutions. It's not about discussions. Each answer is meant to be a solution to the problem, nothing else. It's not a forum.

And if your comment becomes too long, just add a series of comments.

Mike Kaganski gravatar imageMike Kaganski ( 2019-06-26 09:05:38 +0100 )edit

Many thanks for the tests made @Ratslinger. Had similar task as OP some time ago, to get ~45Mb CSV data into Base table, about 300k rows, 12 columns, mixed data types. LO ver. 6.1, Windows 7 and Mojave, embded Firebird. Gave up, cause all attempts failed with crash message after approx. 1hour of pasting data. Look like I will go for another try :)

SM_Riga gravatar imageSM_Riga ( 2019-06-29 09:20:19 +0100 )edit
0

answered 2019-06-26 09:02:25 +0100

rdiez gravatar image

If I have to manually import the data in portions, I would rather drop LibreOffice Base altogether. I have to import new data sets every week.

If you would not use an embedded DB, what is the alternative, a database server somewhere? Then LibreOffice Base would no longer be portable and convenient. I would then use something else. Suggestions are welcome.

Converting the spreadsheet to LibreOffice Calc's native format (.ods) does not help. This is what I was expecting, as the import is running from RAM. Otherwise, Base would not need Calc to be running in order to import the data.

Incidentally, LibreOffice Calc loads the 5,4 MB Microsoft Excel file (.xlsx) much faster than the equivalent 2,1 MB .ods file, which is surprising.

I am not using numeric fields, all 4 columns are text fields.

I tested with the HSQLDB embedded database format too, and I got this error message while importing from Calc after a few seconds of high CPU usage:

java.io.IOException: java.lang.NullPointeException in statement [INSERT INTO "Table1" ( "ID", "Stücklistenartikel", "Bauteilartikel", "Menge", "Mandant") VALUES(?,?,?,?,?)] Continue anyway? Yes No

Choosing Yes means Base will hang, and choosing No leaves you with a corrupt table that will not even open.

I am surprised that there is no way to import data directly from LibreOffice Base, without opening Calc. Say a direct import of a CSV file. Importing big amounts of data seems not really possible for the average user.

I am disappointed. LibreOffice Base is looking rather lame indeed.

edit flag offensive delete link more

Comments

Please add a comment or edit your question.

m.a.riosv gravatar imagem.a.riosv ( 2019-06-26 09:20:57 +0100 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2019-06-25 15:43:47 +0100

Seen: 273 times

Last updated: Jun 28