Base: Handling large files, NullPointerException error

Kametec · November 11, 2022, 3:23pm

Hello again,
I’m creating a new topic, because issues from the last one were resolved and I was able to import a csv file sucessfully. I am very grateful for assistance and I was able to learn how to use Base for importing files. However, I ran into another issue when trying to import second file.

For completeness sake, I’m linking my previous topic: Base: Importing data from a csv file

Continuing from there, I’ve created a new database, selected a text source and linked it to a folder with one csv file with 1 065 552 data rows and one row containing headers. It created a linked database with that table sucessfully and I wanted to copy its data into my embedded HPSQL database. I opened both Base files, opened the table with data to be appended, selected them all, copied them and clicked paste in the context menu of the destination table. I linked the columns together in the Copy wizard and when I confirmed the operation, the Base got busy for a while (cursor changed to the one indicating activity) and following error appeared:

SQL state: S1000
Error code: 40

S1000 General error java.lang.NullPointerException in statement [INSERT INTO “Jumps” ( “OriginSystem”,“DestinationSystem”,“JumpType”,“Cost”) VALUES ( ?,?,?,?)]

What can that mean?

Kametec · November 11, 2022, 3:50pm

To follow up:
After the data copying failed, I tried to check contents of the table involved. It failed to open with following error:

S1000 General error java.lang.NullPointerException in statement [SELECT * FROM “Jumps”]

Upon restarting Base, the table opens empty. When I try to fill it with data again, I get the same error at the same time when importing the second file. I’m suspecting I encoutered an error in the Base itself.

RobertG · November 11, 2022, 6:15pm

You couldn’t solve this problem with an internal HSQLDB. Could be you are able to import all the data by splitting the *.csv-file, but internal HSQLDB would be much to slow to scroll through all this data.

I have tried to get it work with Firebird, but there are too many problems using the import wizard together with Firebird.

I get it working with MariaDB. And there the import only works right with JDBC-connection, also problems with direct connection. And you could see the running process: 50.000 rows in 5 minutes …

Villeroy · November 11, 2022, 7:47pm

What we know so far:

The source is a Base document connected to a directory of csv files.
The target is an embedded HSQL.
If we would know anything about the csv file, the solution may be obvious.

Kametec · November 11, 2022, 10:02pm

You can check the file for yourselves, here’s a link to it:
http://kametec.housen.cz/files/jumps.csv

I’m trying to import 4 columns from it: FromSol, ToSol, Type (integers) and Disto (double).
Column separator is ; decimal separator is ,

Villeroy · November 12, 2022, 12:35am

Last column has comma separated decimals?
I imported the file into Calc, last 16978 rows on a separate sheet.
Import locale=German (because of the comma decimals), columns A to F as text since they are identifiers obviously.
COUNT and COUNTA confirm that all columns are consistent. 6 columns of text (8 digits with no exception) , 3 columns of integers between 0 and 3, one column between 0.1 and 10, no blanks.
The difficult column is the last one because of the comma decimals.
I closed Calc without copying anything.

Here we go again:

Tools>Options>Security>button [Macro Security…]
Select the highest security level, select tab “Trusted Sources” and specify some directory as a trusted source where you drop office documents with trusted macro code. This must NOT be your downloads directory.
Download jumps
Extract zip to your trusted directory and open the database document “jumps.odb”. The contained macro will connect the database document with an empty external HSQLDB in directory “database”. The macro should finish with a success message.
Put your jumps.csv into the database directory.
Run the following statements via Tools>SQL…

set table csv source "jumps.csv;fs=\semi;ignore_first=true";

This takes a while. It connects jump.csv with the prepared text table named “CSV”. The last column is text because of the comma decimals.
7) The following statement copies the data of table “CSV” into the table named “DATA” converting the last column into double precision floating point numbers with 14 digits behind the point. One or two decimal digits may be cut off.

INSERT INTO "DATA"
(SELECT "FromReg","FromCon","FromSol","ToSol","ToCon",
"ToReg",
"FromSec",
"ToSec",
"Type",
CAST(replace(substr("Disto",1,16), ',', '.') AS DOUBLE),
NULL
FROM csv);

This will take several minutes.
8) Call View>Refresh Tables

Next time when you need to append new rows:

Disconnect the text table

set table csv source off;

Replace old jumps.csv with a new one.
Reconnect and append:

set table csv source on;

SELECT "FromReg","FromCon","FromSol","ToSol","ToCon",
"ToReg",
"FromSec",
"ToSec",
"Type",
CAST(replace(substr("Disto",1,16), ',', '.') AS DOUBLE),
NULL
FROM csv);

Kametec · November 12, 2022, 10:11am

Thanks for your thorough reply!

I could follow up to step 6. SQL finished immediately with 0 rows updated and the csv table remained empty.

However, if I understand the process correctly, I would end with populated table “Data” in file jumps.odb. How am I going to copy them to my own database? I have managed to get a populated table in separate database file on my own, but my problem is appending contents of table Jumps in newly created temp.odb to table Jumps in database.odb, where I hold all relevant data.

Villeroy · November 12, 2022, 10:55am

First of all, I would convert your embedded HSQLDB into a stand-alone HSQLDB as I did with my demo file because the embedded one became laggy.
My demo includes a driver of version 2.4.1 which is not the latest but IMHO most compatible with Base.

Copy your own database document to the same directory as my demo.
1b) EDIT: Make a copy of the file hsqldb-2.4.1.jar named hsqldb.jar without version number.
Download https://forum.openoffice.org/en/forum/download/file.php?id=35569 to a trusted directory and and lick the button to install the Python code.
Open your copy of step 1) and run Tools>Macros>Run Macro>pyDBA>ExtractHSQLDB>Main
Test your queries, reports, forms. They should work as before.
Save the converted database document. Notice the status bar. Your database is in the database folder now, side by side with my demo. The embedded one is still in your document but no longer in use.
The following SQL generates a blank text table which can be linked to any csv file with the same structure as your jumps.csv and residing in the database folder.

CREATE TEXT TABLE PUBLIC.CSV(
"FromReg" CHARACTER(8) NOT NULL,
"FromCon" CHARACTER(8) NOT NULL,
"FromSol" CHARACTER(8) NOT NULL,
"ToSol" CHARACTER(8) NOT NULL,
"ToCon" CHARACTER(8) NOT NULL,
"ToReg" CHARACTER(8) NOT NULL,
"FromSec" TINYINT NOT NULL,
"ToSec" TINYINT NOT NULL,
"Type" TINYINT NOT NULL,
"Disto" VARCHAR(20) NOT NULL)

(statement copied from file jumps.script in the database folder)
The last column is text because of the comma decimals. If the exporting database would export point decimals, things would be slightly easier.
Call View>Refresh Tables to make the new table visible in the Base UI.

set table csv source "jumps.csv;fs=\semi;ignore_first=true";

is the command to fill the prepared text table with the data of jumps.csv, separated by semicolons and one header row to be ignored.
set table csv source off disconnects the file.
set table csv source on reconnects the file.

To make importing easier without writing any macro programs, I recommend this:

Create a query analogue to this one:

SELECT CAST(replace(substr("Disto",1,16), ',', '.') AS DOUBLE) AS "Disto",
"FromReg","ToSec","Type","ToCon","ToReg","FromSol","FromCon","ToSol","FromSec"
FROM "CSV"

Basically I re-arranged the order of columns to meed the order of some target table. Column “Disto” is converted from text to floating point.
2) Right-click the query and choose “Create as view…”. This adds a view to the tables section. A view is almost the same as a query. A view is stored in the database itself whereas a query is outside the scope of the database because it is stored in the Base document.
3) Now you can copy the view icon and paste it onto the table icon. Columns appear in the right order “Disto” is a floating point number.

The steps to import data from a new version of jumps.csv:

Disconnect the text table or close the entire office suite before you replace jumps.csv file with a new one containing the new data.
When you are reconnected, copy the view icon over the table icon in order to append new data.

Villeroy · November 12, 2022, 10:58am

This means that there was no jumps.csv in the database folder before executing the SET TABLE command.

Kametec · November 12, 2022, 11:10am

Oh, I see now, I read the database folder as the folder where the odb file is, now I see where it should go.

Villeroy · November 12, 2022, 5:46pm

For the outlined solution there is a root directory for the Base document(s), a folder named “driver” with the actual database software (a Java program) and a folder named “database” for the database(s).

I overlooked one important issue. In order to use the right driver for your own database, make a copy of hsqldb-2.4.1.jar and name it hsqldb.jar, otherwise my script will use the hsqldb.jar that is installed with the office suite (version 1.8).

Kametec · November 14, 2022, 2:01am

I have followed your steps and I was able to import the data I needed, thanks a lot! I must say, importing csv files turned out to be much harder challenge than I expected.

I have made a few things a bit differently:

CREATE TEXT TABLE PUBLIC.CSV(
"FromReg" CHARACTER(8) NOT NULL,

I used INTEGER data type instead of CHARACTER(8). While it is true that those numbers are identifiers and no numeric operations are going to be performed on them, I have read that numeric database indexes perform better than text ones. And 8 digit numbers fit into INTEGER datatype, so I used it.

I have made the column DOUBLE in this step. I can make sure that next csv files are exported with point decimals and I have replaced all commas by dots in the current file before importing it.

Once again thanks for great assistance, I wouldn’t have figured this out on my own. I would probably try to set up a MySQL server and see if I could import the data there instead.