Is there any csv documentation?

I can’t find any in the libre office documentation suite (unless I just missed it) and there is no csv standard - RFC 4180 is not a standard.

Between LIbreOffice 6.x and LIbreOffice 7.x some software I wrote broke. I’d like to understand how to fix it. Under 6.x calc, csv output quoted any field containing the separator character but left the remaining fields intact. Under 7.x calc csv files are quoted if requested and if unquoted unknown characters are inserted, e.g., +AC, +AF. I’d like to understand this better.

There will be only limited information provided by Libreoffice on CSV files as they are not part of LibreOffice but a external format with simple and limited formating.
There have been a number of questions asked, and answers given on the topic of CSV files. It is worth searching them for there useful information.
It is important to understand that s CSV file is a simple TEXT file which contains NO formatting information including character set used nor delimiter characters, if any. Hence, the character set used must be defined at creation, and the receiver must be told the character set and delimiters used, or else they have to guess.

The following extract was taken from the Wikipedia article.

A CSV file typically stores tabular data (numbers and text) in plain text, in which case each line will have the same number of fields.
Many applications that accept CSV files have options to select the delimiter character and the quotation character. Semicolons are often used instead of commas in many European areas in order to use the comma as the decimal separator and, possibly, the period as a decimal grouping character.

I hope this helps.

A few more thoughts.

 The topic of CSV OPEN and SAVE is considered in some detail in the LibreOffice Calc Guide Chapter 1 Introduction. This manual is available to download from the LibreOffice Documentation Website. 
 LibreOffice uses the International Unicode (UTF-8) standard. Western Europe ISO-8859-1 and American US-ASCII are subsets of this standard. So data restricted to US-ASCII, for example, delimiter punctuation, appears interchangeable. 
As a .csv file (often also labeled .txt) contains no content formatting information except Cr-Lf at the end of each record and I believe Cr without a Lf to tell the file is finished, you need to explicitly tell LibreOffice any delimiter information. The Text Import dialogue gives you help here.
CSV Comma-separated-values is a name of historical interest, but is, of course, confusing when many of us you no longer use a comma (,) but a semi-colon (;) or something else.

Thank you and I will try to find other csv questions on libreoffice.

Your statement that “CSV is a simple TEXT” file is inaccurate.The current LibreOffice rendition of CSV files is a text based formatted TEXT file. What is at issue are the formatting rules. In particular, when is “+ACx, +ACx-, +AFx, +AHx” and potentially other textual formatting commands issued. I have been unable to find documentation on this. As I have indicated, the current rendition of CSV files is different from previous renditions of CSV files. Which I guess, brings up the question of who controls the product, its delivery, and its documentation? I thought that whatever was developed and delivered was under unitary control of LibreOffice. What I think you are saying is that this is not accurate; that any co-developed is delivered by LibreOffice but for which LibreOffice does not provide any direct guidance, at least for documentation. Is this accurate? I hope that I am dreadfully wrong.

I suggest you add example files with this behaviour to this thread. That might help analyze the issue.

The good news is that I’m going through 211 entries and trying to determine what they are and how to access them online, with reasonably good success. That bad news that it is tedious and will take several hours.

I must thank you for your effort.

This entries (should) contain four fields: title,author,target,URL

An Introduction to Combinatorics and Graph Theory,,,+ACI-https://www.whitman.edu/mathematics/cgt+AF8-online/cgt.pdf+ACIAIg-+ACI-

        +ACI-Digraphs Theory,  Algorithms and Applications +ACI-,,,+ACI-http://www.cs.rhul.ac.uk/books/dbook/main.pdf+ACIAIg-+ACI-

    A Practical Introduction to Real+AC0-Time Systems,Harder,,https://ece.uwaterloo.ca/+AH4-dwharder/icsrts/Lecture+AF8-materials/A+AF8-practical+AF8-introduction+AF8-to+AF8-real+AC0-time+AF8-systems+AF8-for+AF8-undergraduate+AF8-engineering.pdf

+ACI-Towards an Open, Disaggregated Network Operating System+ACI-,AT+ACY-amp+ADs-T,,https://about.att.com/ecms/dam/innovationblogd

It’s a conflict between UTF-7 and UTF-8 coding.
Always select UTF-8 when saving and importing. This should solve it.

Here’s a more explicit answer I found on a forum:

“The +AC0, etc. is showing because you opened the CSV document as UTF-7. Instead, the CSV file should be opened as UTF-8. Then it will save with quotation marks without you having to set filter parameters each time. If you have inadvertently saved as UTF-7, just reopen in Calc and set the filter to save with quotes. Next time you open the document, select UTF-8 and you’ll be set from then on.”

Hope this helps.

Thanks. You have explained what happened and its cure. The question still remains as to how to interpret it. The end-around processing indicated does not seem to work.

I can modify the input to UTF-7, and hence have end-around capability, but I haven’t found how to modify UTF-7 to UTF-8 after on save and how to set to UTF-8 for all csv saves.

My get-around-it is to load a csv file, store it as an ods file, delete the csv file, load the ods file, and store it as a csv file. In LO 7.x on Win7-64 when an existing file with the same name is detected it refuses to overwrite it (although the pop-up says it will). During the initial load, UTF-7 is specified. During the save, UTF-8 Unicode is selected. This is hardly clever.

LO 6.x was much (much, much) better.

This thing that puzzles me is that all the characters are part of the graphic characters in ASCII, i.e. 32-127 decimal, none are in the range 129-255 decimal. I would think that they are UTF-7 compliant?

I am not certain where your use of UTF-7 comes from. Has someone else created the files mistakenly using UTF-7 as if it is a working Unicode Standard? There is no mention of UTF-7 in the 1417 pages of my Unicode Consortium Standard Manual. Indeed, as a Wikipedia article quotes-

“UTF-7 has never has been an official standard of the Unicode Consortium. It is known to have security issues, which is why software has been changed to disable its use.”

If, as suggested earlier, you or someone are creating files, change to UTF-8.

@ml9104 indicated that LO has a UTF-7 option and indicated how to use it to solve my issue. I have used it (see above) and I have solved my issue, and LO does have a UTF-7 option. This is irrespective of the 1417 pages of your Unicode Consortium Standard Manual not having this option. The software does though the manual does not.

I hope this clarifies the issue. In apologia, I did not do it. It wasn’t me. I don’t set the rules. I should be held harmless (you can blame me but you can’t harm me).
art

The interesting question remains. Why are you using UTF-7? Is it just something to try or is there some unusual technical / country issue?

I think there is some confusion on the use of UTF-7. It is a description of an obsolete type of data, not a standard. Unicode is an 8 bit standard UTF-8 (1 x 8), UTF-16 (2 x 8) and UTF-32 (4 x 8). I beleive UTF-7 was a way of adding some characters in e-mails us US-ASCII. For normal CSV, the LibreOffice Calc manual has good coverage, in my opinion, but no mention of UTF-7 as with many little used obsolete data types.

@petermau Nope, No confusion. I saved the file, LO interpreted it as UTF-7, a non-existent standard, and did the rest. In the current LO Calc csv input there is an option for UTF-7. In the current LOC Calc csv output there is an option for UTF-7. I promise never, never, never to use this option for a non-existent UTF-7, and I will thoroughly admonish and chastise LO if it does use it without my explicit permission, and if it persists, I will use stronger measures.

But just suppose my public domain (wonderful did I say?) software is given a UTF-7 csv formatted document from someone who is not as careful as I will be in the future. What can my software do to interpret this recalcitrant’s software. If I can’t interpret the input (for reasons that have been discussed here) how can I punish this horrible person?

@ostbits I hope you do not have too many files encoded in UTF-7, but they are probably encoded to give you a little challenge. After all you will be having 8-bit characters encoded to survive on 7-bit hardware which is being decoded on 8-bit hardware as there is no 7-bit hardware available to run it on. If you get a file like that, I suggest you have a quiet drink and then ask the sender to have another go. Thank you for your interesting question.

While RFC 4180 is not a binding standard, it’s the best recommendation to follow available.

Your +AC and +AF in data are just a result of saving in UTF-7 text encoding instead of UTF-8 (or another suiting one). It got nothing to do with formatting or something alike.

My question is, suppose I want to build a software to input these UTF-7 files, what are the rules? RFC 4180 seems outdated. LO has UTF-7, UTF-8, UTF-16 and various regional languages. RFC 4180 does not have UTF-16 or regional languages. The ‘standard’ is 15 years old. So I agree that it is the best recommendation, but is seems to be losing its relevance, and sheds no light on being able to build a csv processor able to handle these formatted text excursions.

Please note that

  1. You are confusing text encodings with languages/locales.
  2. Text encoding is completely irrelevant to RFC 4180.
  3. CSV, or RFC 4180 for that matter, does not have any notion of languages or locales either.
  4. Which text encoding is used or what locale and thus separators or date formats are used are by convention and agreement between generating and consuming applications.

To read UTF-7 encoded text see UTF-7 - Wikipedia and RFC 2152, but the easiest probably would be to read the data back into LibreOffice Calc and save again in UTF-8 text encoding.

@erAck Thanks. The references you provided are the one’s I am seeking. Again, thanks for the time and your efforts.