How to build data analysis project?

Dealing with csv, it is completely pointless to discuss anything without concrete data.
Save this text file with .csv suffix.

Symbol,Value,Date,Time
ABCD;12.99;12/22/2022;13:45:59
MSFT;1,234.00;01/01/2022;00:00:00

analysis_project.odb (16.4 KB)
Save this database in a trusted location where macros are allowed to be run.
Open the form, push the button, pick the file.

  1. Thank you. That’s an amazing demo.
  2. Wow. I did not expect importing a flat file into a new table in a database was going to be that complicated. I struggle to even understand what all of these scripts do.
  3. What’s with the setting the table source on and off and copying the CSV file? I can’t find any documentation for what “SET TABLE tablename SOURCE OFF” does. Why are you copying the file? Does that have something to do with file locking issues?
  4. It looks like the data is being copied from the view which is pulling from TYPE_A_CSV. What is that? It looks like a pre-populated table…? Is TYPE_A_CSV connected to the CSV file in some way?

Thanks.

csv is more than a flat file. It is a convention (not a standard) how to describe a record set in plain text.

I download my csv from some website. It goes to my download directory, from where I pick it up with my file picker dialog.
The text table “TYPE_A_CSV” may be linked to a previously imported text file file “A_Import.csv” in the database directory.
SET TABLE "TYPE_A_CSV" SOURCE OFF; disconnects the old text file.
Then I replace the old text file with the new one picked up from my download directory and reconnect the text table with its text file before running the INSERT command. What the INSERT command actually does is encoded in the view.
All the details depend completely on the csv file and the view decoding this specific type of csv,

  • how text is encoded,
  • how dates, times, decimals are formatted,
  • the order of columns,
  • columns to skip, columns to add
  • the field separator,
  • the text separator,
  • if there is a header row or not,
  • if duplicate records should be avoided.

The proper setup of the text table splits the text rows into columns of type VARCHAR. If a column has ISO dates, you may declare them as date. The time column in my sample could be declared as time. If a column has point decimals with no currency symbol nor thousands separator, it may be declared as decimal.
In this sample, I declared every column as VARCHAR and let the view convert all the text. The view does most of the import job.
How to do the same manually without any macro.

  • Close the database (disconnect text table).
  • Copy the newly downloaded csv to “A_Import.csv” in the database directory.
  • Open the database.
  • Copy the icon of the view.
  • Select the data table icon and paste.
  • Confirm the import dialog without further adjustments (the view takes care of the column order).

Having some other website as data source from where you download differently structured csv files, you have to create another text table “TYPE_B_CSV” with another source file “B_Import.csv”, another view “VIEW_TYPE_B” converting the text file to your data table format, create another import routine with different constant declarations for the file picker and for the table names. All the data go to the same data table, so they are comparable to each other.

The embedded HSQLDB is documented here: Chapter 9. SQL Syntax
Text file chapter: Chapter 6. Text Tables

OK, after looking more closely at your macros, the key that I was missing was the Text Table thing. I did not find anything in the Base GUI that allowed for the creation of Text Tables, so I didn’t know they existed. Now that I know I can create those with the CREATE TEXT TABLE command, and then set the source with SET TABLE SOURCE, I’m good with creating views, copying data to native HSQLDB tables, etc. That was the piece that I was missing.

Thanks again!

And for the rest of my project, it looks like I can used linked ranges in Calc to connect to whatever views I make on that database. Those linked ranges end up getting formatted into tables, and/or used to make charts for an end user. In the user re-uploads a new CSV file, the database gets updated, and then the user just needs to hit Data–> Refresh (or I’ll just figure out how to trigger it with an event on loading of the sheets).

Right now, I think this will get me mostly to where I want to go with this project! So glad I’ll be able to do this with Libreoffice!

For human readable output, one can build reports embedded in the Base document. Reports support calculations, group aggregations, charts and pictures.
Some types of forms can be used for interactive output (click there, get related results).
Calc is a highly versatile output medium with linked database ranges (“tables”) and with pivot tables.
[Tutorial] Using registered datasources in Calc
It is possible to attach fully functional form controls to stand-alone documents (Writer, Calc): Apache OpenOffice Community Forum - [Solved] Using Filter in Calc V3.1 with dates - (View topic)

Indeed, full support of each and every feature in HSQL was never implemented. By now, the database driver shipped with LibreOffice is 18 years old version 1.8. Developers wasted a lot of time to implement Firebird as supplement for the outdated HSQL. After 10 years, embedded Firebird is still in experimental state.
After extracting the HSQLDB, you can upgrade to HSQL 2.4.1, which was the last version compatible with version 1.8. From there, you may upgrade to more recent versions.

I’ve gotten my project mostly working from start to finish. I’m able to import text from a flat file via a form in Base that copies it into a proper HSQLDB table. I make a bunch of different views in Base, and then I use linked ranges in Calc to display the data. That’s all great.

Is there any way I can trigger the flat file import to happen from Calc, without the user needing to open Base at all?

It turns out that importing the flat file into the database from Calc is a piece of cake. As long as the database is registered, it’s a simple macro in Calc:

’ Get the database context
oContext = CreateUnoService(“com.sun.star.sdb.DatabaseContext”)

’ Replace “[RegisteredDBName]” with the registered name of your Base database
oDataSource = oContext.getByName("[RegisteredDBName]")

’ Connect to the database (no username/password)
oConnection = oDataSource.getConnection("", “”)