Linking CSV with Base: problem with field seperator (semicolon vs. comma)

derfred · July 8, 2022, 11:17am

Hi!
I know this must be written down somewhere, but I cannot find the right keywords to solve this mystery.

Every month I get a members.csv which looks sort of like this:

id;name;donation;birthday
12;Alfred;56;12.12.1912
678;Beth;12,45;1.1.2001

There are no quotes used to define “12,45” as a cell value

I want to link this csv to my odb (HSQLDB), so that I can access it easily via calc.
What I do not want to do is manually readjust the csv every month, so that it fits my import settings.

The script I run via the SQL-Menu:

DROP TABLE "members";
CREATE TEXT TABLE "members" (
"id" INTEGER PRIMARY KEY, 
"name" VARCHAR(50), 
"donation" DOUBLE,
"birthday" VARCHAR(10)
);
SET TABLE "members" SOURCE "members.csv;ignore_first=true;encoding=UTF-8BOM;fs=\semi" DESC;
SET TABLE "KG_MITGLIEDER" READONLY TRUE;

works nicely until it his the comma at 12,45.

Question: What property am I missing to tell the SET TABLE command to simply ignore commas and keep proceeding with the import using the semicolon “;” uniquely as a field seperator?

Thank you so much!

derfred · July 8, 2022, 12:22pm

I might need to withdraw this topic.
I just found out that HSQLDB does not know the datatype DOUBLE. (edit: probably wrong)
Using VARCHAR(10) for the donation seems to work.
Need to investigate this further…

… I will keep on guessing, but it seems that the whole issue is not so much a problem of SET DATA not skipping the comma. It apparently does, when choosing varchar() as datatype for the field “donation”. It seems that it only gets into trouble recognizing the comma as a decimal point, when choosing numeric data types.

This would, however, be e a different topic, I guess. And since I will have to investigate this later anyway (to get my birthdays show as dates instead of text), I call it a day for now.

Villeroy · July 8, 2022, 6:44pm

Chapter 9. SQL Syntax is the documentation of your embedded HSQL database.
12,34 is not a valid DOUBLE because of the comma. Since the value represents a currency, DOUBLE would not be the correct type anyway. DECIMAL avoids rounding errors.

If your goal is importing of a bunch of csv files that have been exported from another database as a one time task without regular updates from csv, the easiest way goes like this:
Put all similar csv files (semicolon separated, German numbers and dates) in one directory.
Start your office with German locale setting.
Create a new database connected to an existing database of type “Text”.
Specify the directory holding your csv files.
Specify the import details (encoding, separators etc.)
Save the text database.
Check all tables. SQL functions for file based database drivers is a list of availlable functions for file based databases. Queries with SIGN(“num_field”) and with DayOfWeek(“date_field”) can test if numbers are numbers and dates are dates.

Open your HSQLDB and the text DB side by side and copy the tables. Don’t let Base generate new ID fields!
If everything went fine, you have all your raw data imported into an embedded HSQLDB.
Change integer IDs to Auto-IDs where necessary. This is a 2-step process. First mark them as primary key, save, then set the auto-ID property and save again.
Rebuild any relations in the relations manager.

This answer describes a way how to import a foreign database from text tables. The other answer describes a way how to insert new records from csv on a regular base.

Villeroy · July 13, 2022, 10:57am

I developed a macro driven solution for the more complex use case where you want to merge a text table with a database table, so existing records are updated and non-existing records are appended to the data table. The demo used the same csv flavour as in this topic with comma decimals and German dates. The problem with these columns can be solved by declaring them as text. Then you may use a view to convert the numeric strings into valid database types. A view is a query in the realm of the database backend. HSQL can’t see any of your Base queries. HSQL can see views because they are stored in the database itself.

https://forum.openoffice.org/en/forum/viewtopic.php?t=108073

@derfred , With a little help from the AOO forum, I managed to do the same trick with embedded HSQLDB: Apache OpenOffice Community Forum - [HSQL2, Base, Basic] Macro Driven CSV Import - (View topic)

Villeroy · July 8, 2022, 3:47pm

Download
Members.odb (3.8 KB)
and put it in the same directory as “members.csv”.
The database has a text table created and linked like this:

create text table "import"("id" int, "name" varchar(50), "donation" varchar(7), "birthday" varchar(10));
set table "import"SOURCE "members.csv;ignore_first=true;encoding=UTF-8BOM;fs=\semi";

There is a data table “Members” created like this (could be done with the table designer as well):

create table "Members"("N" varchar(50), "VAL" decimal(5,2), "BD" DATE, "ID" INT IDENTITY);

A query to convert the invalid comma decimals and the invalid German dates:

SELECT "name", CAST( REPLACE( REPLACE( "donation", '.', '' ), ',', '.' ) AS DECIMAL ( 7 , 2 ) ) AS "VAL", 
CAST( RIGHT( "BIRTHDAY", 4 ) || '-' || REPLACE( substr( "BIRTHDAY", LOCATE( '.', "BIRTHDAY", 2 ) + 1, 2 ), '.', '' ) || '-' || REPLACE( LEFT( "BIRTHDAY", LOCATE( '.', "BIRTHDAY" ) ), '.', '' ) AS DATE ) AS "BIRTHDATE" 
FROM "import"

A view created from that query.

To import (append to “Members”) incoming new csv data, all you have to do is:

Close the office suite completely.
Replace the old members.csv with the new one.
Open the database and drag/drop or copy/paste the view icon “Members_Import” onto the “Members” table icon and confirm the dialog with default settings because the columns are selected in correct order.

The data table “Members” generates its own auto-ID. The “id” from the text table is discarded. If you want to keep the member IDs from the csv, the import becomes more complicated because some records need to be updated, others need to be appended.