How to build data analysis project?

I’m interested in creating a project that allows a non-technical user to import a simple CSV file containing financial transactions, and then display various analyses of this data in several different tables and charts.
It seems to me that Calc will display the tables and charts that I need for this project. Calc is able to easily import the CSV file. Where I’m having a problem is displaying this data in a useful way.

Pivot tables in Calc seem like an obvious way to go, but I can’t seem to figure out how to get pivot tables to do any sort of complex calculations in how they display data, or even format the data they’re pulling in. I’d like to be able to display tables that are a subset of other tables, but pivot tables don’t seem to be able to do that either, unless there’s a SUM or grouping column.

I thought that perhaps putting the data into Base, and then pulling it out via various SQL statements into the charts and graphs would be more straightforward, but Base seems to be limited to only working on local flat files (ie: I can’t even open a database file on a local file share), so I don’t know if it is designed to act as an actual database, or some sort of shim to do facilitate moving data from one place to another.

Is there a workflow or application design that will let me do what I want inside of Libreoffice?

:thinking:

Ask/Guide - How to use the Ask site - The Document Foundation Wiki #More_details

I’d say Base is no database, but connects to databases. This may be embedded types of HSQLDB or Firebase, server-databases (Maria-DB, PostgreSQL…) or “emulated” databases wich consist of a bunch of dbase or csv-files in a folder. As well possible is anything you provide with an ODBC od JDBC -driver.
.
But my experience is the same as yours: Files should be local for Windows, even assigned drive-letters to a network share didn’t work. But this can usually provided with file-syncing software. Problems arise with users who never close windows, so files are always “in use” and not available for syncing. Then you need to import to a server-database. I use a MariaDB on a Synology-NAS for example.

Your data input comes via csv files, which means that you are going to process database data. Where do the csv files come from? Do you have direct access to the exporting database?

No, I don’t have direct access to the database. The CSV files comes from a financial services web site that I do not own. I’m creating this project to help users easily analyze these particular CSV files from this one particular web site.

  • Frank

I don’t know when this project will be finished, but I will want it to work with whatever the most recent version of Libreoffice that is available. Right now, I’m using 7.7.3.2.

I’m using Linux Mint, but I’d like this project to be useful to people using any OS that Libreoffice runs on.

  • Frank

OK, do you need to collect all the incoming data, so you can compare this month with May 2020?

I’m interested in making this project such that the end user only needs to install Libreoffice if they don’t already have it installed, so I’d like to go with the native HSQLDB database type (as opposed to having to install a separate database server like MySQL, etc). I won’t need to do anything other than run standard SQL statements against a few thousand records, so I have to imagine that even the most basic database engine will be fine.

I have already run into problems with Base not saving HSQLDB files (.ODB) when stored on a network drive. I’ve gotten around it by just doing a “Save As”. It’s not a great kludge, and of course, it really shouldn’t be necessary, but I don’t think the end user will likely be opening Base and making changes to the database, so I think it’s something that should be able to live with. I’m thinking that the end user will spend all of their time in Calc (assuming I can get the whole Base/Calc thing working).

  • Frank

Ha!

Actually, I’m doing an analysis of income from US Treasuries and dividend-paying stocks, so the current stock market crash doesn’t really matter to what I’m doing. Many people who buy these sorts of income producing investments want prices to be low so they can get a good deal.

Ideally, I’d like to show the information in several different tables of data and graphs.

I see that Base has some basic reporting as part of it, but it doesn’t seem to do charts, so I’m thinking that all of the tables of data and charts should probably be in Calc…?

  • Frank

Hire a programmer. Pay him/her well.

I’m a programmer. That’s how I know what SQL is.

Being a programmer, what are you trying to do with spreadsheets what you can’t do with a dedicated program?

Well, before any spreadsheet you need a database. IMHO, HSQL is a good choice for a single user with local access, even without any office suite. I can offer a simple but versatile Basic macro to import csv into HSQL (and may be others, I use it with HSQL).

Most of the work needs to be done in plain SQL because every csv is different.
If your draft works sufficiently well with embedded HSQL, you should definitively extract and update the HSQLDB because embedded databases become unsafe over time. I can offer another macro doing that for you.
[Python] Macro to extract and reconnect embedded HSQLDB

A. I’m not that good of a programmer that I could write something like this from scratch, nor would I want to.

B. Simple arithmetic on lists of numbers, being displayed in tabular and graphics forms, is what spreadsheets and databases are designed to handle. It seems that these should be the right tools for this job, unless somebody has a specific reason why they’re not.

My trivial macro to import csv into a Base-connected database works well with different flavors of csv. I use it with account data from 2 different banks using very different flavors of csv.
I choose the right account on a form, click a button (list box would work as well) and get a file picker dialog showing all account.csv from a distinct bank. I pick the latest file for the form’s current account, and accounting data get imported (without duplicates) into the right table. Every time they change something in their csv, I need a session of fixing and testing.
The macro performs the following steps:

  1. get csv file from customized file picker dialog
  2. disconnect text file from database
  3. copy csv to database directory as “account_12345.csv”.
  4. Reconnect the text file
  5. Run INSERT INTO "Account_Table" (SELECT * FROM "View_Name");

The view converts the table linked to the csv file from text to VARCHAR, DATE and DECIMAL in the right order of rows and columns, skipping records that have been imported already. In addition it adds the right account number and blank fields for annotations and check marks.

I’ve actually found out a very easy way to deal with CSV’s and Base. I figured out that I can create a new HSQLDB with Base, point Base at the folder containing the CSV’s, and Base automatically grabs those CSV’s and makes a table out of each CSV in the folder.
(Thanks to Microsoft Copilot!).

  • In Base, you create a new database, choose “Connect to an existing database”.
  • Choose database type of “text”.
    That’s it!
    So that’s pretty cool. It seems like it works well.

Now, I’m trying to figure out if I can make multiple views and then make Calc tables and charts from the views I create in Base…

That is a pseudo-database, good enough for mail merge jobs. Dates and numbers may be dates and numbers – or may be not. Depends on locale settings.
From this pseudo-database, you can query columns and rows from one single table. The performance is lousy because it is just another way to read lines of plain text.
Availlable functions: SQL functions for file based database drivers
What I suggested is a true database handling millions of rows for years to come.

That is a pseudo-database, good enough for mail merge jobs. Dates and numbers may be dates and numbers – or may be not.

I’m trying to create a query/view based on this data with CAST(‘2021-01-01’ AS DATE) or CAST(‘01-01-2021’ AS DATE), but it returns no data. Is this why this doesn’t work…?

CAST is not in the list of functions for file based databases. The csv driver recognizes ISO dates, so they don’t need any conversion. Further more, it seems to recognize some some (all?) date patterns matching your office locale 13/12/2022 in English context or 12/13/2022 in US-English locale context. Forget that driver. It’s useful, but not for a “data analysis project” where …

Forget that driver. It’s useful, but not for a “data analysis project” where …

OK, that helps clear that up, thanks. I’m going to play with this flat-file driver a bit more to see if I can get it to do what I want to do. I’m not really sure it will. You might be right.

So if I DO want to create a simple HSQLDB database from my flat file (again, I’m trying to streamline this for end users… I can manually do all of this myself), I’m not quite following your procedure:

The macro performs the following steps:

get csv file from customized file picker dialog
disconnect text file from database
copy csv to database directory as “account_12345.csv”.
Reconnect the text file
Run INSERT INTO "Account_Table" (SELECT * FROM "View_Name");

Where are you running the macro that grabs the text from the flat file and inserts it into a HSQLDB table? Is that happening from inside the .ODB file that contains an existing HSQLDB database? Does Base have a method for importing data from a flat file into an existing database (that I haven’t found in the Base user interface)?