Consider the use of CSV.
For any file format, the most fundamental use is: open it (allow user to see the content in a form that is comprehendible / adequate); allow to edit it (if applicable); and allow to save it.
It is a normal practice, that change of data must be an explicit action (compare to the “cell formatting never changes cell data” principle).
Now look what a CSV user has.
- One gets a CSV. They don’t have a slightest idea what “CSV” is (they can even have such a file with a wrong extension like XLS). They google how to handle it; they learn that LibreOffice can; they have it, or gladly install it.
- They try to open the file in LibreOffice. LibreOffice shows an import dialog. This may scare users already, but this step is justified, because the specific format really misses many aspects that would be required to convert it to a spreadsheet. So users are asked for these aspects.
- They somehow manage to provide the wanted details.
- Now they do the edits they need, and save.
Any sensible user would expect, that at this point, after pressing Save
button, they have a CSV having the same structure as the original, with the edits made. Why? Because they had provided everything that the program needed to make that happen. Did they? Yes they did, at step 3. And if the program can’t handle that information properly, this is the program’s fault, not the user’s.
There may be technical challenges here (actually, only coming from other uses of the same code, which may complicate things internally), but all the functionality is there. What prevents us from formatting all the cells with detected number format? Do you see how e.g. percent, currency, ISO dates get the format applied correctly? The filter can do more, there is nothing preventing it technically. Unless this breaks the important use. So the whole question is: why don’t we do that. The detection happens, but we simply discard that information - is that justified?
Compare to opening an ODS from e.g. me. I can use Russian-formatted dates. You would get the spreadsheet with Russian-formatted dates, because the file has that. But they would be the correct dates. And all the math with them will be OK.
Now you get a CSV. It has that 05/14/2023
in it. You told the import filter something, and the filter could figure that that string means 2023-05-14
date (serial date 45060
). In the process, it indeed knew that the first was 2-digit month, then was a slash, then a 2-digit day, then a slash, then a 4-digit year. It used that knowledge. What prevented the filter, besides writing the proper serial date to the cell, also to apply the detected MM/DD/YYYY
format? What would break for the user who opened it this way?
And without all that, when we save back, the user gets 05/14/23
in the file. Remember the “change of data must be an explicit action” principle? The change of data happened here silently. And it could be prevented, if the program was smarter. It didn’t require the user to know some unexpected things that required the user to do more to keep their data, beyond the things they already cared to do at step 3.