Data corrupted after CSV import

I encounter the same problem as the closed issue here:

https://ask.libreoffice.org/t/calc-fails-to-import-csv-files-properly/19516/3

So step one: CSV file like that:
01
(unfortunually one can not upload csv here… whatever the reason is)

step two: open the file with Libreoffice calc - the CSV import dialog appears:

as you can see, the import dialog shows the values from the file imported correctly!!!

and last, the calc window after the data was imported
03

note: now the Data of the first row is wrong.

No, it doesn’t show you anything imported. It only shows you the characters from the CSV file, and into which cell (column) will those characters go, with the set of settings (Separated by, String delimiter etc.) that you chose. It also allows you to see if the characters themselves look correctly - using the encoding defined at the top of the dialog. The import process will happen only after you define all settings, and press OK, and will include recognition of numbers and other data.

No, it’s not wrong - it only follows the locale conventions that you chose in the dialog: you selected “Standard - Deutsch (Deutschland)”, and that locale specifies that dots are thousand separators, and commas are decimal separators (e.g., “12.345.678,90”). Hence all numbers that have dots separating three digits are recognized as whole numbers, and where the dots were in wrong positions, those strings couldn’t be interpreted as numbers, and were imported as strings.

1 Like

well I understand the logic of your answer, but when I select comma as an separator it could be suggested by the converter that applying the german rules for numbers is not wanted at all (at least comma can’t be the separator AND part of the data at the same time, right?). This is what I expected to happen. If I select comma as CSV separator dot must be the delimiter between fractional and decimal. Also on my german configured system a lot of programs simply export floats in ‘english’ notation with dots instead of commas. So I expect that as a natural fallback.

Wrong. If the data contains the field separator, the data must be enclosed in double quotes.
See rfc4180.

1 Like

nice, never was aware of that one.
incomplete but nice.
Of course - just tested it with octave (OSS Matlab clone) under linux - is not following that spec (LF instead of CRLF).
Also the spec seem missing e.g. 16bit widechars as a lot of old windows SW using Microsoft libs is producing just that (we have a network tester in the lab which exports such files.)

So yes, for myself I can utilize that “standard” in the future, but its a pointless reference, as there is to much stuff around which is more standard and does not follow that.

That is exactly why we need the settings. CSV existed long ago that standardization attempt, and the variations need to be handled.

No. That is a good reference for new software implementing CSV, and also a good reference to show format capabilities.

RFC 4180 does not restrict to any text encoding, it’s perfectly fine to use UCS2 or rather UTF-16 if both sides agree (or if in MIME context a charset parameter is specified). Real world applications best use UTF-8 for exchange. Importing to LibreOffice offers all supported text encodings, of which the common Unicode encodings are autodetected if possible.

note: now the Data of the first row is wrong.

Data on both rows have wrong content, as @mikekaganski already explained.

  • On your first row the dot is interpreted as a thousands separator according to German standards you are using, so the value is wrong.
  • On second row there are more than 3 digits after the dot, so the content does not conform to the number format standard. This yields text content.

To resolve, either …

  • select English (USA) for the entire import (Field “Sprache” near the top of the import dialog)
    … or …
  • select English import individually for each column containing numbers (right click the column, or select in the “Spaltentyp” field for each column in turn).