Good Morning,
This is a follow up to the issue mentioned in How to ensure SDBC connection is closed on a connection to an excel file? - English - Ask LibreOffice
and associated issues like setting time out etc.
The follwoing problems were encountered until I found a solution to my problem. The source problem is that I have 454 excel workbooks with 18 sheets per workbook and several hundred rows of data in each worksheet. The solution that I thought would be most effective would be to open a SDBC connection to each workbook, cycle through each worksheet and inser the data into base (embedded firebird) for further data processing. I initially wrote this in Basic and while it did work, the application was extremely slow. It might be fine for a few dozen books, but not if there are more than let’s say 100 workbooks.
So I thought, let’s move to python, and while that was faster, the problem was that after about 90 workbooks, an attempt to make a SDBC connection to a workbook would always fail. I thought that if I enable pooling and connection timeout (hence the question before about setting timeout via code), this would resolve my problem. That did not work, UNO seemingly would get struck while trying to make a connection. I even tried making UNO connections and stored them in a dictionary (bad idea, don’t do that).
Since I found better performance using python, I abandoned BASIC. One way I sped up my code even more was to give up on embedded firebird and use PostGres. This helped a lot in speed and brevity of code. I needed much less code to acomplish the same task.
The problem of the UNO connection to the excel workbooks was still preventing me from properly importing all the data.So I thought instead of making a connection, importing data, and then closing the connetion, and then start the process over again (which I knew would eventually fail), why not export all sheets (this is where I asked the question about storing all of the sheets of the Excel file as multiple CSVs)in a folder and make a single SDBC flat file connection and cycle htrough all of the CSVs (8172 in total) from that connection and then close the connection and be done with it? That did not work eather, After going through about 120 csv files, LO would just quit, disappear, with no crashe message or warning. So after a few attempts of that I finally gave up with that route.
So what to do? PostGres has a copy function which speedily imports CSV data. So that means I am done using any form of UNO connection and used the pycopgg2 connection library.
So the process is:
- export the Excel files as multiple CSVs
- use postgresql to import the CSV into the data base via a psycopg2 connection
- continue with data processing
The time limiting factor in this solution is the conversion of the 454 sheets. Everything else was completed very quickly.