How to build data analysis project?

I’m interested in making this project such that the end user only needs to install Libreoffice if they don’t already have it installed, so I’d like to go with the native HSQLDB database type (as opposed to having to install a separate database server like MySQL, etc). I won’t need to do anything other than run standard SQL statements against a few thousand records, so I have to imagine that even the most basic database engine will be fine.

I have already run into problems with Base not saving HSQLDB files (.ODB) when stored on a network drive. I’ve gotten around it by just doing a “Save As”. It’s not a great kludge, and of course, it really shouldn’t be necessary, but I don’t think the end user will likely be opening Base and making changes to the database, so I think it’s something that should be able to live with. I’m thinking that the end user will spend all of their time in Calc (assuming I can get the whole Base/Calc thing working).

  • Frank

Ha!

Actually, I’m doing an analysis of income from US Treasuries and dividend-paying stocks, so the current stock market crash doesn’t really matter to what I’m doing. Many people who buy these sorts of income producing investments want prices to be low so they can get a good deal.

Ideally, I’d like to show the information in several different tables of data and graphs.

I see that Base has some basic reporting as part of it, but it doesn’t seem to do charts, so I’m thinking that all of the tables of data and charts should probably be in Calc…?

  • Frank

Hire a programmer. Pay him/her well.

I’m a programmer. That’s how I know what SQL is.

Being a programmer, what are you trying to do with spreadsheets what you can’t do with a dedicated program?

Well, before any spreadsheet you need a database. IMHO, HSQL is a good choice for a single user with local access, even without any office suite. I can offer a simple but versatile Basic macro to import csv into HSQL (and may be others, I use it with HSQL).

Most of the work needs to be done in plain SQL because every csv is different.
If your draft works sufficiently well with embedded HSQL, you should definitively extract and update the HSQLDB because embedded databases become unsafe over time. I can offer another macro doing that for you.
[Python] Macro to extract and reconnect embedded HSQLDB

A. I’m not that good of a programmer that I could write something like this from scratch, nor would I want to.

B. Simple arithmetic on lists of numbers, being displayed in tabular and graphics forms, is what spreadsheets and databases are designed to handle. It seems that these should be the right tools for this job, unless somebody has a specific reason why they’re not.

My trivial macro to import csv into a Base-connected database works well with different flavors of csv. I use it with account data from 2 different banks using very different flavors of csv.
I choose the right account on a form, click a button (list box would work as well) and get a file picker dialog showing all account.csv from a distinct bank. I pick the latest file for the form’s current account, and accounting data get imported (without duplicates) into the right table. Every time they change something in their csv, I need a session of fixing and testing.
The macro performs the following steps:

  1. get csv file from customized file picker dialog
  2. disconnect text file from database
  3. copy csv to database directory as “account_12345.csv”.
  4. Reconnect the text file
  5. Run INSERT INTO "Account_Table" (SELECT * FROM "View_Name");

The view converts the table linked to the csv file from text to VARCHAR, DATE and DECIMAL in the right order of rows and columns, skipping records that have been imported already. In addition it adds the right account number and blank fields for annotations and check marks.

I’ve actually found out a very easy way to deal with CSV’s and Base. I figured out that I can create a new HSQLDB with Base, point Base at the folder containing the CSV’s, and Base automatically grabs those CSV’s and makes a table out of each CSV in the folder.
(Thanks to Microsoft Copilot!).

  • In Base, you create a new database, choose “Connect to an existing database”.
  • Choose database type of “text”.
    That’s it!
    So that’s pretty cool. It seems like it works well.

Now, I’m trying to figure out if I can make multiple views and then make Calc tables and charts from the views I create in Base…

That is a pseudo-database, good enough for mail merge jobs. Dates and numbers may be dates and numbers – or may be not. Depends on locale settings.
From this pseudo-database, you can query columns and rows from one single table. The performance is lousy because it is just another way to read lines of plain text.
Availlable functions: SQL functions for file based database drivers
What I suggested is a true database handling millions of rows for years to come.

That is a pseudo-database, good enough for mail merge jobs. Dates and numbers may be dates and numbers – or may be not.

I’m trying to create a query/view based on this data with CAST(‘2021-01-01’ AS DATE) or CAST(‘01-01-2021’ AS DATE), but it returns no data. Is this why this doesn’t work…?

CAST is not in the list of functions for file based databases. The csv driver recognizes ISO dates, so they don’t need any conversion. Further more, it seems to recognize some some (all?) date patterns matching your office locale 13/12/2022 in English context or 12/13/2022 in US-English locale context. Forget that driver. It’s useful, but not for a “data analysis project” where …

Forget that driver. It’s useful, but not for a “data analysis project” where …

OK, that helps clear that up, thanks. I’m going to play with this flat-file driver a bit more to see if I can get it to do what I want to do. I’m not really sure it will. You might be right.

So if I DO want to create a simple HSQLDB database from my flat file (again, I’m trying to streamline this for end users… I can manually do all of this myself), I’m not quite following your procedure:

The macro performs the following steps:

get csv file from customized file picker dialog
disconnect text file from database
copy csv to database directory as “account_12345.csv”.
Reconnect the text file
Run INSERT INTO "Account_Table" (SELECT * FROM "View_Name");

Where are you running the macro that grabs the text from the flat file and inserts it into a HSQLDB table? Is that happening from inside the .ODB file that contains an existing HSQLDB database? Does Base have a method for importing data from a flat file into an existing database (that I haven’t found in the Base user interface)?

Dealing with csv, it is completely pointless to discuss anything without concrete data.
Save this text file with .csv suffix.

Symbol,Value,Date,Time
ABCD;12.99;12/22/2022;13:45:59
MSFT;1,234.00;01/01/2022;00:00:00

analysis_project.odb (16.4 KB)
Save this database in a trusted location where macros are allowed to be run.
Open the form, push the button, pick the file.

  1. Thank you. That’s an amazing demo.
  2. Wow. I did not expect importing a flat file into a new table in a database was going to be that complicated. I struggle to even understand what all of these scripts do.
  3. What’s with the setting the table source on and off and copying the CSV file? I can’t find any documentation for what “SET TABLE tablename SOURCE OFF” does. Why are you copying the file? Does that have something to do with file locking issues?
  4. It looks like the data is being copied from the view which is pulling from TYPE_A_CSV. What is that? It looks like a pre-populated table…? Is TYPE_A_CSV connected to the CSV file in some way?

Thanks.

csv is more than a flat file. It is a convention (not a standard) how to describe a record set in plain text.

I download my csv from some website. It goes to my download directory, from where I pick it up with my file picker dialog.
The text table “TYPE_A_CSV” may be linked to a previously imported text file file “A_Import.csv” in the database directory.
SET TABLE "TYPE_A_CSV" SOURCE OFF; disconnects the old text file.
Then I replace the old text file with the new one picked up from my download directory and reconnect the text table with its text file before running the INSERT command. What the INSERT command actually does is encoded in the view.
All the details depend completely on the csv file and the view decoding this specific type of csv,

  • how text is encoded,
  • how dates, times, decimals are formatted,
  • the order of columns,
  • columns to skip, columns to add
  • the field separator,
  • the text separator,
  • if there is a header row or not,
  • if duplicate records should be avoided.

The proper setup of the text table splits the text rows into columns of type VARCHAR. If a column has ISO dates, you may declare them as date. The time column in my sample could be declared as time. If a column has point decimals with no currency symbol nor thousands separator, it may be declared as decimal.
In this sample, I declared every column as VARCHAR and let the view convert all the text. The view does most of the import job.
How to do the same manually without any macro.

  • Close the database (disconnect text table).
  • Copy the newly downloaded csv to “A_Import.csv” in the database directory.
  • Open the database.
  • Copy the icon of the view.
  • Select the data table icon and paste.
  • Confirm the import dialog without further adjustments (the view takes care of the column order).

Having some other website as data source from where you download differently structured csv files, you have to create another text table “TYPE_B_CSV” with another source file “B_Import.csv”, another view “VIEW_TYPE_B” converting the text file to your data table format, create another import routine with different constant declarations for the file picker and for the table names. All the data go to the same data table, so they are comparable to each other.

The embedded HSQLDB is documented here: Chapter 9. SQL Syntax
Text file chapter: Chapter 6. Text Tables

OK, after looking more closely at your macros, the key that I was missing was the Text Table thing. I did not find anything in the Base GUI that allowed for the creation of Text Tables, so I didn’t know they existed. Now that I know I can create those with the CREATE TEXT TABLE command, and then set the source with SET TABLE SOURCE, I’m good with creating views, copying data to native HSQLDB tables, etc. That was the piece that I was missing.

Thanks again!

And for the rest of my project, it looks like I can used linked ranges in Calc to connect to whatever views I make on that database. Those linked ranges end up getting formatted into tables, and/or used to make charts for an end user. In the user re-uploads a new CSV file, the database gets updated, and then the user just needs to hit Data–> Refresh (or I’ll just figure out how to trigger it with an event on loading of the sheets).

Right now, I think this will get me mostly to where I want to go with this project! So glad I’ll be able to do this with Libreoffice!

For human readable output, one can build reports embedded in the Base document. Reports support calculations, group aggregations, charts and pictures.
Some types of forms can be used for interactive output (click there, get related results).
Calc is a highly versatile output medium with linked database ranges (“tables”) and with pivot tables.
[Tutorial] Using registered datasources in Calc
It is possible to attach fully functional form controls to stand-alone documents (Writer, Calc): Apache OpenOffice Community Forum - [Solved] Using Filter in Calc V3.1 with dates - (View topic)

Indeed, full support of each and every feature in HSQL was never implemented. By now, the database driver shipped with LibreOffice is 18 years old version 1.8. Developers wasted a lot of time to implement Firebird as supplement for the outdated HSQL. After 10 years, embedded Firebird is still in experimental state.
After extracting the HSQLDB, you can upgrade to HSQL 2.4.1, which was the last version compatible with version 1.8. From there, you may upgrade to more recent versions.

I’ve gotten my project mostly working from start to finish. I’m able to import text from a flat file via a form in Base that copies it into a proper HSQLDB table. I make a bunch of different views in Base, and then I use linked ranges in Calc to display the data. That’s all great.

Is there any way I can trigger the flat file import to happen from Calc, without the user needing to open Base at all?