Whenever I try to open a csv file, a text import dialogue pops up and asks me which delimitator to use for opening this csv file. This is very counter productive when you are opening csv files hundred times a day and going through this extra window. I think there should be a setting to disable/suppress this with my default settings.
I know this topic has been discussed here before and answers were not that satisfying (devs say simply we are not going to do it), but I wanted to bring the discussion back again. There is a need for it.
Collect all csv files with the same import parameters in one dedicated directory.
menu:File>New>Database…
[X] Connect to existing database
Type: Text
Specify the directory and import parameters once.
[X] Register the database, so you can use its data in Writer and Calc.
Save the database document.
Have a look at the tables. Each csv file is represented by a table. The data are still in the connected csv directory. The database document does not contain any data.
With simple queries you can add sorted and filtered views on single tables. You can access any subset of columns and rows in any order of columns and rows.
If you need any table or queried subset of a table in Calc: Get the data source window and drag the table or query icon onto a Calc cell. This creates a linked database range.
More details and example documents: [Example] Loading CSV into preformatted spreadsheets (View topic) • Apache OpenOffice Community Forum
Thanks. But I don’t think that would solve any of the issues, since I cannot gather all the csv files into one directory. I don’t even have control over where they are located. I need to open and visually validate almost a hundred csv files regularly, each coming from different sources, sometimes not even in the same machine.
I have not tested this with csv-files (but for sqlite-connections and did similiar for mail-merging with spreadsheet-data):
Use a single registered Database mycsv.
It should be possible to change the location where the the database is reading data simply in the properties of the database.
But I guess a macro to open with default settings will save more clics…
Nother solution:
Sheet > Insert Sheet from File… with “Link” option.
You can open the CSV files by the macro. Open one your CSV file manually and then run the macro InspectDocArgs → and copy the string from FilterOptions to the macro openCSV (line: args(1).Value=…). Then you can open the CSV files with the macro openCSV (and you can also set the variable sInitDir to your default directory).
Sub InspectDocArgs 'from A. Pitonyak's book - AndrewBase.pdf
'open your CSV file manually and copy the string from FilterOptions to the macro openCSV -> line: args(1).Value=...
Dim args()
Dim x, i%, s$
args() = ThisComponent.getArgs()
On Error Resume Next
For i = LBound(args()) To UBound(args())
x = args(i)
s = s & x.Name & " : "
s = s & CStr(x.Value)
s = s & CHR$(10)
Next
MsgBox s
End Sub
Sub openCSV 'file-open dialog for the CSV files
on local error goto bug
dim oFileDlg as object, oFileAccess as object, oFiles as object, sInitDir$, i&, sUrl$
dim args(1) as new com.sun.star.beans.PropertyValue
'some description: https://wiki.openoffice.org/wiki/Documentation/DevGuide/Spreadsheets/Filter_Options'
args(0).Name="FilterName" : args(0).Value="Text - txt - csv (StarCalc)"
args(1).Name="FilterOptions"
args(1).Value="59/44,39,76,1,,0,true,true,true,false,false" 'FilterOptions from the macro InspectDocArgs
oFileDlg=CreateUnoService("com.sun.star.ui.dialogs.FilePicker")
oFileDlg.AppendFilter("CSV (*.csv)", "*.csv") : oFileDlg.SetCurrentFilter("CSV (*.csv)") 'filters for the file type
oFileDlg.MultiSelectionMode=true 'allow multiselection
oFileAccess=CreateUnoService("com.sun.star.ucb.SimpleFileAccess")
sInitDir=CreateUnoService("com.sun.star.util.PathSettings").Work 'directory from the Tools/Options/LibreOffice/Paths -> My Documents
'sInitDir=convertToUrl("d:\mydir") 'or your directory
if oFileAccess.Exists(sInitDir) then 'set default directory to the file dialog
oFileDlg.SetDisplayDirectory(sInitDir)
end if
if oFileDlg.execute() then 'open the dialog
oFiles=oFileDlg.SelectedFiles()
for i=lbound(oFiles) to ubound(oFiles) 'open selected files
sUrl=oFiles(i) 'url for the file
openFile(sUrl, args() ) 'check if the file is opened and if no, then open the file
'StarDesktop.loadComponentFromURL(sUrl, "_blank", 0, args) 'open the file directly (without the isOpen-control)
next i
end if
exit sub
bug:
msgbox(Error & chr(13) & Err & chr(13) & Erl, 16, "openCSV")
End Sub
Function openFile(sUrl$, args() ) 'open the file if the one is not open
on local error goto bug
dim oComponents as object, oComponent as object
oComponents=StarDesktop.Components.CreateEnumeration
while oComponents.hasMoreElements
oComponent=oComponents.NextElement
if hasUnoInterfaces(oComponent,"com.sun.star.frame.XModel") then
if oComponent.URL=sUrl then exit function 'file is opened
end If
wend
StarDesktop.loadComponentFromURL(sUrl, "_blank", 0, args) 'open the file
exit function
bug:
msgbox(Error & chr(13) & Err & chr(13) & Erl, 16, "openFile")
End Function
edit: basic-code-block by @karolus
I think this is beyond my skills and understanding of coding that I am not technical. I just simply wanna not to be bugged by a prompt window for such a simple task like opening a file. I don’t think this would work when files I am working are scattered around the machine. Thanks anyway.
You should give it a try. The solution is using the file picker. So you may select any file you like.
You reverse only the calling-order.
Before you selected a file in the operating-system and this called libre-office.
Here you start LibreOffice, select the macro and this will call the file-picker (usually the default on for the OS) and you select the file.
IMHO the devs are right - opening csv-files only looks easy until you know all details and is prone to alter data, where it is not expected.
Wrong defaults of Excel was one of the reasons for me to use/prefer OpenOffice long ago.
I will give it a try or ask around.
I do agree that implementing every feature request shouldn’t be possible otherwise configuration menu would just turn into a mess.
What I disagree is fear of setting the wrong defaults, after me using Excel for many years. I will be aware that wrong defaults may cause issue, but it is a low-risk high-reward situation for me as I am very familiar with the structure of csv files as well as the data. Hence, very easy to spot if anything is odd. There can be situations that I wouldn’t wanna take the risk. In that case I will just revert back to popup window setting. Anyway, it is a long discussion, but I see your point.
I will most likely go for a lightweight third-party tool just for viewing csv files.
Wrongly imported csv files are one of the biggest sources of support requests for the Calc component. Calc is picky about numeric text values. Calc may interprete digit-only strings as integers and ISO date times because they are unambiguous. Calc will never interprete text values “1,234” or “1/2/1999” as numeric cell values because both strings may mean completely different figures on both sides of the Atlantic Ocean. There are various ways to fix wrongly imported data. However, it is always preferable to import csv files correctly, i.e. with numbers being numeric cell values except ID numbers (zip code, phone, part number).
- Always check “Detect special numbers” (this option should not exist anyway)
- Choose a locale (“language” option on top of the import dialog) which interpretes numeric strings correctly.
If “1,234” is meant to be a decimal and “1/2/1999” refers to 1st of February, choose a European locale with comma decimals such as “French(France)” or “German(Germany)”.
If “1,234” means integer 1234 and “1/2/1999” refers to 1st of February choose “English(UK or any non-US)”.
The language setting should describe the “cultural context” of the incoming numerals, so you don’t need to specify the import type for each column which is also possible for the exception to the rules, e.g. when you mark a numeric column of ID numbers explicitly as text.
Unfortunately, our team decided to switch back to Excel after trying LO for a while. I couldn’t convince anybody to carry on using LO as they mostly complained about this issue. We don’t really use cvs for those type of data, it is usually bunch of sensors and machines outputting similar sort of data which needs to be checked regularly, so we are not off. Our problem is having sheer number of csv files I believe.
Thanks everybody.
There are thousands of different kinds of csv files out there. I have no idea how Excel opens text files. I have not used Excel in the past 20 years. Therefore I can not suggest anything that would meet your expectations because I simply don’t know what you want. Does Excel interprete text files by means of artificial intelligence? I can’t imagine how to open csv files without asking for import options. Did you notice that Calc remembers the last applied import options, so you only need to hit the Enter key when the dialog pops up?
Today I opened every csv file I can find on our system using Excel, and checked every single column. It guessed csv settings correctly with 100% accuracy.
I believe Excel uses some sort of heuristics combined with user’s localization settings to deal with it and if it senses anything odd, just displays the popup window for adjusting settings. I am not sure but it also looks like, it remembers your settings per file basis, so subsequent views never yield the same window. Or something auto-trained behind the scene, just like how Bayesian filters work.
But I am sure if you analyze enough csv files and build a model, you will find a pattern. Maybe the problem is not that hard as we guess?
Thank you very much for the description how Excel seems to do the trick. Well, I rarely open csv with a calculator. All my csv files are opened in text editors or with the help of the Base component. From time to time here is one type of csv, which I do open with Calc. I rename the incoming file to “new_import.csv”, open a Calc document with a link to “new_import.csv” and get some calculations and chart based on this file on another sheet. Of course, this does not meet your expectations.
I very much doubt there would be any intelligence like Bayesian filters in Excel’s CSV import. But I’m pretty much convinced that one can construct files with content that will fool its import and produce results the user will not be satisfied with and is not changeable.
Anyway, good luck with it.
No, older versions of excel (2003) just did wrong imports, if german csv didn’t use a comma and completely ignored charsets.
But programmers know excels defaults and generate conforming output. Recently my exported data was denied by our banking software, after working for years. They had dropped their own described import-model abd the new manual said only import csv. No info of the details - except or a screenshot wich I recognized to be from excel. So I imported from LO to excel, checked the first utf8-export and found the accepted format.
And it is impossible without context to decide if 01-04-21 is german April first or US January 4th - the info is not in the file. (And you can’t rely on locales, as people import from websites and some Software just export american data for compatibility reasons with other dumb software.)
I guess you mostly have numeric data in format where excel defaults, so no problem with charsets and hopefully no dates, but on a general view, this is bare luck.
Maybe the problem is not that hard as we guess
?It is because csv is neither a standard nor containing the necessary information. It is s usable relict from old times, where most american programmers had never thought of anything beyond a 7-bit charset. So it is universally available, but limited and potentially dangerous. ( Good, as people like me were paid to rescue altered data… )
There is no particular secret here.
First, the localization for the csv file is determined. For example, if the method used has the parameter local:=False, then the localization is en_US. If the localization is not explicitly defined, then the localization of the installed Windows system is selected.
If the column type is not specified explicitly, then the following algorithm is applied to each value. If the text can be converted based on the localization defined above as a number, date, boolean, or error value, then the converted value is entered in the cell. Otherwise, the cell is filled with text or an empty value.
That’s my recollection too. Bill Gates loved Basic (ie csv) and Excel, end of story. Why is Calc so bad? That’s how it is isn’t the right answer.
That’s nonsense. Calc isn’t bad. Instead, it gives the user the opportunity to choose between failure and correctness in most cases, whereas if the Excel automatism fails one can do nothing about it (or at least that’s what I remember from 10 years ago when I last used it). And yes, it can fail badly if the data is not as Excel expects or accidentally matches something that triggers some unwanted conversion. Apart from that, CSV data can be as bad as it gets especially with quoted or non-quoted or wrong-quoted fields and generators blast the worst possible cases into the wild not following the simplest recommendations in that non-standard.
I don’t support your view the calc csv parser is a devine creation.
- It’s more common for software to parse csv files without screen interruptions, an option that calc should support
- Evidence suggests the calc data parsing algorithm could be improved, especially considering locale
- Column width settings are associated with this tool
As @sijov1 points out, the excel parser is fine for many (most?) files parsing the data type correctly. I totally agree with your comment it can fail badly, and @Villeroy suggestion of a macro to bypass the data parsing form. At times it is not easy to fix data excel parsed incorrectly. The calc wizard is mostly there when it is needed, unlike excel.
.
Excel makes better choices than calc parsing csv data. It most likely uses locale information to configure certain special characters like decimal separators, thousand separators, currency symbols, and dates. It also seems to handle changing formats like 2-digit years to 4 digits or month digits to characters without having to manually widen the column.