Overall strategy for importing new data in .csv files into existing embedded LO database

This post may eventually supersede anearlier post of mine and one from 2015 about importing data into Base via an intermediate Calc spreadsheet. For some time it has not been necessary to go via a spreadsheet.

As of LO 6.4, to import data from one or more .csv files into a table in an existing Base embedded database does still seem to require a copy/paste step.

The overall strategy would seem to be to create an intermediate database that treats the folder containing the file/s as a read-only database. Then to copy/paste from tables in the intermediate database to tables in the destination database.

Before I go any further with this, is there in LO6.4 Base a better way to do this? Am I missing something?

1 Like

This is the best way to import the data, if you don’t know how to do it with macro. If you will import many tables with same structure it will be easier to put the *.csv-files into the folder for the csv.odb and start a macro to write the data into a table of your HSQLDB.odb (or Firebird, MySQL/MariaDB, PostgreSQL …).

See the Base Guide.

Hi Robert: thanks for getting back to me. Am I right that if I create both csv.odb and embedded.odb and do the copy/paste manually at first, then that still leaves me free later on to replace the manual steps with macro?

That would seem to be a way to get going without having too steep an initial learning curve. The Base guide seems detailed (as it needs to be) but is there a tutorial anywhere which could get me started specifically copying rows from one database to another?

Thanks
trueriver

Starting with copying manually is no problem. The chance to change this to automatically by macro will work. I have used this for sport events. Sports clubs will send a list as *.csv and I had read all this into a database by macro.

Do you really need copying rows? It will work the same way as copying a whole table: Click on the table, add the content. The wizard will know the field names and the marked rows of data source.If you insert the data to an internal database best would be to create the primary key as AutoValue.

I’ve done this for years without a single line of macro code with a text table and a view converting the strings into the right types and filtering out any duplicates.
When new data arrived, a shell script copies the file to the database location, I open the database document, copy the view onto the target table and confirm the dialog.

http://www.hsqldb.org/doc/1.8/guide/ch06.html
https://forum.openoffice.org/en/forum/viewtopic.php?t=23260

You won’t import the data this way. You use the content of the *.csv-file as content for a text table (which only works in HSQLDB).

works with many database engines but not Firebird.

The major problem is that most users are completely unable to understand what a csv file is. This is the root of the whole problem before we can think of any SQL statements, shell script, Python code, Basic macro, import via spreadsheet.

Published some Basic snippet which I use on a daily basis to import different flavours of csv into the same database table.
[Basic, SQL] CSV import into some database
This code snippet does not help anybody who can not analyse a text file and write the SQL which is needed to convert specific text files into valid data types of a specific target table. The major part of the programming needs to be done in SQL. All details depend on the exact details in the csv and the database engine in use.

P.S. The LibreOffice version 3, 4, 5, 6 or 7 does not matter. Base did not change very much since 2005 (OpenOffice.org 2.0). Everything depends on the csv details and the database engine in use. A Base document can be connected to a wide variety of different databases.

I’m sorry @Villeroy: You are partially correct that there are compleixities in the variety of formats that various organisations describe as .csv.

I still disagree strongly with your view that understanding CSV is the WHOLE of the problem. As discussed elsewhere I had a spreadsheet where the dates worked as dates in that spreadsheet but where when copy/pasted into Base those columns were only recognised as text; needing a View simply to turn those columns back into dates. That was NOT a .csv issue because within the spreadsheet data arithmetic worked fine (+1 for tomorrow, etc)
For many users, especially if they have LO set up for their own country’s locale AND if the organisation generating the .csv file also has a similar locale then LO successfully hides those complexities (most of the time). Such lucky users can often get away with selecting a few relevant options to tick on importing from a csv. As indeed I did for many years.

Until, of course, some overlooked detail jumps up and bit me… as happened to me in late 2022 after years of the same technique always working. When that happens it may or may not be necessary to dig into .csv complexities: but when that happens please do not close other people’s minds to the possibility that there may be other issues besides. LO is a hugely complicated collection of software, and for historical reasons Base remains less well integrated into LO than (for example) Access is within MS Office.

You are clearly our resident wizard on .csv issues: but just because you are an expert on hammers please do not assume that everything is a nail. Screws hold better if driven in with a screwdriver rather than hammered home (although hammering a screw home apparently works, carpenters tell me the result is less strong than using the correct tool)

Having said that: your expertise has contributed to this discussion: and I am especially grateful to you for your pointer to the CREATE TEXT TABLE extension available in HSQLDB. That creates a third way forward.

For an “overall strategy” the concrete csv and the database engine are the only things that matter. Without specifying the database engine we do not even know which software we are discussing.

There is no need to create any text table when you copy correctly imported data from Calc into Base. In case of embedded Firebird, any dates need to be numbers formatted as ISO dates.

  1. Import correctly into Calc. This seems to be a major hurdle for many users.
  2. Copy the correctly imported data, select the target table and paste.

The strategy without Calc is a lot more complicated but in the long run easier to use.

I think that “embedded LO database” in the title is fairly explicit about what datbase engine we are using. Nobody is using embedded Firefox these days, and many current Base users do not even remember it. Unless you know of plans for a further change in the embedded engine the use of HSQLDB seems a stable assumption.

Please avoid mentoring answerers here, teaching them what should they assume. Most regulars here have enough experience to finally stop assuming anything, and to start requiring explicit statements from askers about the important details - just because everyone here is too familiar with misunderstandings coming from assumptions on the both sides.

Please don’t act as if others should also assume any (any) level of your experience; it’s just a simple matter of asker’s duty to supply as much information as possible, and then some more, when explicitly asked for clarifications.

  1. It seems that @Villeroy keeps arguing without really trying to read and comprehend the post, like “I saw mentions of CSV and database and dates, and I don’t even need to read to know where the problem lies”. Namely:

This is simply a barebones of the original post, where steps 1 to 3 explain a method to import a CSV correctly (as per @Villeroy’s #1), and steps 4 to 12 explain @Villeroy’s #2 (modulo creating the table in the process, vs. pasting into an existing table).

The very idea of @trueriver was to expand on how important it is to do steps #1 and #2 correctly; and continued disagreement from @Villeroy looks simply inappropriate, ignoring the opponent’s text in its entirety.

  1. Note some of the related issues that intend to help users discover the controls for correct CSV import: tdf#152336, tdf#152337, tdf#150652.
1 Like

It seems there are three strategies I can choose among, or indeed four if I include the zero option of continuing to copy/paste from Calc.

  1. Load into Calc using the various options when opening a .csv file there: paying particular attention to getting the separators and date formats recognised correctly. Select all, copy/paste into Base creating a new table or appending to a suitable existing one

  2. Create a suitable folder to contain the .csv file or files and then create a “databse connector” file to that folder by using the options in the dialog when opening Base. That makes LO regard that folder as a read-only text database. This works best when the locale set for LO matches (or is close enough) to that of the organisation that generated the .csv file/s – then use copy/paste to move that data from the intermediate database to the destination one.

  3. Start as in option 1, but then write suitable macros in the destination database to do the transfer at the touch of a button from within forms. Thanks to @RobertG for this suggestion and the further details given in other comments he has made

  4. Use the HSQLDB non-standard SQL “CREATE TEXT TABLE …” to create a text table based on a .csv file that is placed in the same folder as the .odb file of the destination database. Then use a View to do any necessary transmogrification of the data, and the standard SQL "INSERT INTO (<columns) SELECT * FROM ". Big thanks to @Villeroy for this suggestion and the very helpful links to some Macro code to do this which you will find in his comments on my question

Each of the options has its merits, in my opinion, and corresponding disadvantages.

  1. The spreadsheet option gives a more flexible interface to importing .csv files if the user actually uses it instead of just taking the default options. In contrast there are far fewer options when creating a connector database in options 1 and 2; and even fewer in option 3 unless you are a real SQL wizard and can write the workarounds in an SQL view.

  2. Is by far the easiest conceptually for someone not interested in the inside details of SQL let alone LO and the Java of HSQLDB. You can still create a view on the imported data if it needs further massaging once it arrives in the destination database, and if you create that view using the LO GUI then you can be confident that it will survive future upgrades of LO. This option leaves the door open to future migration to option 2 once you get other parts of your software working. It also also works for copy/pasting data into a connected MariaDB engine and presumably any other external database (though that is beyond the scope of my question). I have tested this using LO 6.4 on LinuxMint copy/pasting data into both the embedded HSQLDB engine and into an external Mariadb (Ver 15.1 Distrib 10.3.37-MariaDB).

  3. Is only straightforward if you are already at least partly familiar with LO Macro. If/when I attempt this my preference would be to write the code in Python and hook it in from a very simple Macro command to run the Python code. Or even Perl! Although I am fluent in both Perl and Python, one look at Macro sends me back to the 1970’s and Basic. Perhaps that is merely prejudice on my part; but a language looking good is personally important to me when I have the choice. Your preferences may vary of course: Libre means “free as in speech” :slight_smile:

  4. Is only straightforward if you are an SQL wizard and can use that to workaround the limitation that this method only recognises dates if they are in ISO format (2022-12-25 rather than either 25/12/2022 or 12/25/2022). If you write that wizardry within the Edit as SQL option of Views then you can be reasonably confident it will survive future migration. Alternatively if you stick to Standard SQL only then CREATE View will work from Tools->SQL and by saving the SQL code elsewhere you should survive any future migration of LO. For those sufficiently knowledgeable about writing SQL and about .csv complexities this option offers by far the best degree of resliience against one other potential migration issuie: what happens if the idiots at the data provider later change some pernickety detail of their .csv format? Long experience suggests they will make it work in MS Office and one or two other MS packages, but leave free software users drowning. @Villeroy has used this method “for years” so it is well tested, and he is also quite an active volunteer here and I am sure he would answer new questions if you can’t make it work yorself

A request from me I want this question and answer to remain at the strategic level, to help other users choose a strategy. Feel free to comment here if you want to add another option, or a major advantage or disadvantage of any of the above. PLEASE do not ask detailed questions of implementation here – to keep this thread tidy start a new question. By all means link back to these options if you do that.

Almost my final point: you may be interested to know why I am so concerned about future migration. That is because I have been a user of OO Base and later LO Base for years. I survived the migration from LO using embedded Firefox to embedded HSQLDB and the migration was a nightmare for me. The end result is well worth the trouble, but I do not want a similar nighmare again. As I have no control over what the lovely LO devs will do in future I want to do everything I can to make any future transition as seamless as possible.

If you want to use the software you are writing for longer than the expected lifetime of your operating system and for longer than the expected lifetime of your current version of LO then you need to think about this.

For me the issue is that some of the accounts I prepare have to be kept for six or seven years. If the HMRC (our tax people) ask for details within that time I am legally obliged to help them understand why the summary figures I gave them are vaild. I therefore want to be sure that my accounts database will still work in seven years from the day I start work on a particular year’s data.

If you are doing something that you know you will not need to use past your next OS upgrade then that is not such an issue for you. Answer the following questions for yourself first: What is the long-term support promise from your OS provider? From LO for the version you are using?

And finally: please regard this post as CC_BY_ATTRIB This means you are very welcome to copy this answer but if you use a substantial amount of it please acknowledge me as Ask LibreOffice: @trueriver.

At present (2023-01-05) only option 0 is even hinted at in the online help for LO, and likewise in the built-in help. I am in the process of suggesting changes to that documentation which (if / when accepted) will become visible in the online help. You can track the progress of my suggestions here if you are interested.

…and also adding to the wiki, the page is currently (5th Jan) still a work in progress but you can see it here (at least for now), it may move in future as I get more in tune with how the wiki is structured

I always proceed in the same way: by creating an SQL script for the structure (tables, columns, constraints), which means doing everything by hand, after modelling the base of course. Then, if I have a lot of rows to insert, I might go through Calc and copy the columns I paste into this online converter.

No need to create a CSV file beforehand, it works very well directly with the pasted columns of Calc.

And finally, importing lines formatted in SQL.

The tool can not convert strings into dates, times, decimals. Even if there were some artificial intelligence for this, how could it distinguis 8th of November from 11th of August when you throw 8/11/2022 at it? Some idiots write “military” date-times (special integers 20220811 or 234559) into csv. How to prevent part or phone no. 0049851669 being converted into integer? Which tool can automagically distinguish if 1,234 means one-thousand-two-hundred-thirty-four or a comma decimal? You have a well formed database with numeric IDs as VARCHAR, with dates being aware of days, years, months, weekdays and aggregations. Your database has well thought decimal types, integers, floating point decimals and memos. Then some idot working for a bank throws this into your download folder:
00420815,12/31/1999,10:45 pm,‘Doe, John’,’(1,234.98)’
There are far too many crazy flavours of csv out there. Calc can handle most of them properly. @trueriver imports localized text dates into Calc and wonders why the database does not accept “7/8/1999” as a date value although it looks exactly as stored in the csv file.