Any means of autoimporting CSV files into Calc?

Villeroy · January 31, 2025, 4:16pm

Writing perfectly working macro code to import one particular flavour of csv is trivial. This may be what Mr. Coin Bro wants us to do for free. The impertinence is that he tries to tell us what Excel is capable to do.
Next time he will tell us that Excel’s AI assistant can analyze a column of date strings, which is impossible when you import dates with no day number >12. I’m confident that Excel happily imports strings without bothering the user with any prompts.
Just like Calc, Excel prompts for import options every time you open a csv. It does not import anything automatically. Anything I found about Excel guessing import parameters is trivial stuff where everything in the csv matches the office locale.
This discussion about 10, 20, may be 100 mouse clicks per working day has no purpose other than wasting our time.

Sub Open_Specific_Csv()
file = getFileFromFilePickerDialog("~/Downloads/*.csv"))
specific_options = "59,34,ANSI,1,,1031,false,true,true,false,false,0,false,false,true"
Open_CSV file, specific_options
End Sub

Sub Open_CSV(sURL As String, sFilterOptions As String)
Dim aProps(1) as new com.sun.star.beans.PropertyValue
aProps(0).Name = "FilterName"
aProps(0).Value = "Text - txt - csv (StarCalc)"
aProps(1).Name = "FilterOptions"
aProps(1).Value = sFilterOptions
doc = StarDesktop.loadComponentFromURL(file, "_default", 0 aProps())
End Sub

The above code reliably opens the weird sample data from my banking account because of the specific_options passed to the working routine together with a file URL.
Reading the filter options string for a correctly loaded csv is trivial:

Sub showFilterOptions()
Dim args(),i%
   args() = StarDesktop.CurrentComponent.getArgs()
   for i = 0 to uBound(Args())
      if args(i).Name = "FilterOptions" then inputbox args(i).Name,"",cStr(args(i).value)
   next
End Sub

Villeroy · January 31, 2025, 5:17pm

In theory, the following Python code is able to guess the import parameters of a given csv sample and read the contents:

import csv
dialect = csv.Sniffer().sniff(csvfile.readline(), [',',';'])
csvfile.seek(0)  
data = csv.reader(csvfile, dialect)

But having the data of a csv, has nothing to do with importing them into an arithmetic calculator.

zerophase · January 31, 2025, 5:18pm

It’s just bad UI/UX design. Anything over 1 second of loading interrupts flow of thought with 0.1 seconds being the ideal. Not going through that prompt every time makes it more likely for Libreoffice to hit the ideal operating conditions according to research on loading software going back to the 60s. Achieving that goal through config files that cover most of the major regions, including Germany, would be ideal.
.
This is a common problem with open source software and the main issue preventing wider adoption. It’s mainly that developers are not interested in solving UI/UX problems so they do not allocate resources towards it to hire the devs that do work on that stuff. It’s not necessarily expensive to do either. I might work on a solution for this if I have the time. But, it’s probably only going to work for the US since those are the csv files I’m familiar with, unless there’s an international effort to address this.
.
I wouldn’t give money to the Document Foundation unless I know how much they have. Wikipedia and Mozilla keep asking for users to donate, and Wikipedia has $255 million and Mozillia has $826 million. Most of that money they allocate towards political causes with a very small fraction going towards development. It’s not even going towards political causes that help open source.
.
That’s basically why I’d prefer investor oversight on open source organizations since the capital is not always being used efficiently to fund development once they’re well funded. What’s going on in the US currently is a bunch of software businesses devoted resources towards political causes that do not benefit their userbase, and they’re all going bankrupt from it. They basically hired a bunch of bureaucrats to talk about improving the diversity of their workforce with $100k to $600k salaries, which does nothing to reduce business expenses, nor improve diversity. when 100 of them should have given $10k to the Document Foundation to fix some UI/UX issues to significantly reduce costs and improve profit margins by no longer having to use Excel, and hired devs to spend a year working on these issues.

Villeroy · January 31, 2025, 6:10pm

Indeed, the Python module is able to guess the line breaks, the column delimiter and the string delimiter, splitting my sample data into this:

['27.01.2025', 'xxxxxxxxxxxxxxxxxxx xxxxxxxxxxx xxxxxxxxx-xxxxx', 'xxxxxxxxxxxxxxxxxxxxxx; xxxxxxxxxxxxxxxxxxx xxxxxxxxxxx xxxxxxxxx-xxxxx; xxxxxxxxxxx xxxxxxxx; ', '810,49', '2.429,06']
['27.01.2025', 'xxx xx', 'xxxxxxxxxxx xxxxxx; ; xxx xx; 311501; xxx: xxxxxxxx; xxxx: xx21512202000068660008', '-792,70', '41.618,57']
['27.01.2025', 'xxxxxx xxxxxxx xxxx xx xxx xxxxxxxx 20, 33333 xxxxxxxxx', 'xxxx-xxxxxxxxxxxxxxxx; xxxxxx xxxxxxx xxxx xx xxx xxxxxxxx 20, 33333 xxxxxxxxx xxxxxxx; xxxxxxx xxx 200019; xxxxxxxx xx xx2299700000008303; xxx xx xxx xx: xxxxxxxxxxx xxxxxx-xxxxxxxx: xxxxxxxxxxx', '-94,61', '42.411,27']
['23.01.2025', '', 'xxxxxxxxxxxxxxxxxxxxxx; xxxxxxxxxx xxx xxxxxxxxx xxxx; xx 2085799, xxxxxx, 20107511246; ', '45,66', '2.505,88']

That is 4 arrays in [brackets] with 5 strings each.
Having 5 strings per line does not help us in any way to interprete the numerals correctly.

At this point we could do what I think, Excel does: Interprete strings of digits with the decimal delimiter of the current locale and dump the dates as strings if they do not match the locale. Don’t prompt for anything because our venture capitalists don’t give a shit.

Villeroy · January 31, 2025, 8:08pm

Right now I detected a new feature in upcoming v25.2, that was not even mentioned in the release notes:

It detects column separators and text separators.

mikekaganski · January 31, 2025, 8:33pm

tdf#152336

Wanderer · February 1, 2025, 7:37am

Thanks for the link. It seems I have to check my tsv files “hidden” in csv…