Does LO restrict what it does to non-ODT/DOCX files until the user is warned?

As 160362 – Syntax Highlighting Support explains:

Plain text files in Writer do not exist. Imported text files are converted into the internal document model of Writer just like everything else.

Does that mean that when someone opens a plain text (.txt) file in LO Writer, makes no changes and saves (again, as a .txt) the result might differ? After all, it’s evidently converting it, and I can’t imagine that the conversion is exactly 1:1 (especially when prefixed whitespace for indentation is involved).

That seems problematic to me. Surely there should be a warning dialogue or banner? I believe that Word handles this correctly, from what I recall.

This might seem strange, but I’ve seen people open files they created in Notepad with Word. It’s a very surprisingly common use case.

No, if you make no changes. The only change must be the date of the file. Try with a copy of a .txt file.
EDIT: See @mikekaganski comment below.

This is incorrect. It used to change file endings (LF/CRLF), and add BOMs to UTF-8 files unconditionally (tdf#142669). I don’t remember if we fixed an unexpected change of encoding (aha, fixed - tdf#120574). At some point, it could split long lines - tdf#70423 (and still can, but now they would be really long). We do something to decrease such problems, but it’s too optimistic to claim that there would be no changes whatsoever.

2 Likes

Yes it might differ. And yes it issues a warning about that:

image

Of course, if a user unchecks the “Ask when not saving in ODF” checkbox, they surely know what they are doing.

Setting a non-ODF format as their default would also make this dialog not shown for the chosen non-ODF format; and that’s problematic (because setting it as default doesn’t make it magically 100% supported); but again, that setting indicates that user really is prepared for consequences.

By the way, your title “Does LO restrict what it does to non-ODT/DOCX files until the user is warned?” indicates that you assume DOCX support ideal. It is not; DOCX is just another external file format.

1 Like

@mikekaganski, apologies for forgetting about that dialogue window. Just testing it would have reminded me. That’s embarrassing.

By the way, your title “Does LO restrict what it does to non-ODT/DOCX files until the user is warned?” indicates that you assume DOCX support ideal. It is not; DOCX is just another external file format.

It doesn’t. I mean the internal XML-based representation, which to my knowledge, is ODT-conformant. However, had I said that (which is slightly too verbose for a title) I expected that someone would mention how DOCX and ODT files are zip files containing XML alongside myriad sidecar files.

Syntax is not semantic.

1 Like

@Villeroy, are you responding to me or @mikekaganski?

imagen

@Villeroy, then I don’t know what you mean. @LeroyG, I’m guessing that’s a different account of yours (else how are you editing the message)?

No Villeroy and LeroyG are different users.
Leader user can do some edits in others’ posts.

1 Like

@Villeroy and @LeroyG are two different persons. Regarding “Syntax is not semantic”: DOCX and ODF both use XML to describe their contents; this is syntax. But tags and element contents are totally different; this is where semantics enters the play.

You need to understand the specifications to be able to interpret the code and do something “interesting” with it.

@rokejulianlockhart I believe, that the controversy is caused by your

when we discussed if you consider OOXML (DOCX) support ideal in LibreOffice. And that your phrase is unclear; if you meant that the internal OOXML representation is ODT-conformant, that was completely incorrect (in many ways, starting with the different namespaces, elements, ways of representing similar things; and to the more fundamental things like pure inability to represent in one format, 100% of what is possible in the other, and vice versa - because of different founding principles which are the respective software is built upon, which causes the need to approximate when exporting and importing, so having no way to keep data completely intact on round-trip).

But that is largely orthogonal to the specific topic you asked about.

1 Like

@ajlittoz, I’m aware of that (although I’m thankful anyway): specifically, XML is merely a data interchange format. Although DTDs exist, they do not inherently define how they should be interpreted. However, I don’t see the relevance of this to what I’ve aforestated.

Ah, that is indeed what I meant. It was what I had been told more than once, which is surprising if it’s wrong. Is it a common misconception?

Indeed, hence my question. I expected LO to detect when a file that hasn’t been modified by the user shall be modified upon save, like the example I provided - a UTF-8 .txt with a BOM opened in LO and then saved without the user modifying anything shall remove the BOM. I would have expected the user to be notified of this. Is it fundamentally infeasible to have LO detect when saving in a certain format shall modify data previously present?

It is fundamentally infeasible. LibreOffice does not “work with” the external file formats. It imports something form such files, what it can understand (and it even often doesn’t care to keep minute details like whitespace in XML, its encoding and line endings); then it works with its own document model - and when you save, it creates a new document, exports from its memory representation using different code, and finally renames to the old name.

Basically, it’s more infeasible to make sure nothing changes. You may generally assume that any external format would 100% be changed upon open-and-save (except the most simple cases like text-only files).

1 Like

@mikekaganski, thanks for that. I suppose it fundamentally differs from text editors in that way, then. Do you know whether MS 365’s Word is the same?

Try opening an empty ODT in MS Word, and save back, to compare :slight_smile:

@mikekaganski, I don’t have a subscription… ;-;

I only have Word 2016. See how it performs at least.

emptyFrpmLO_24.8.odt (9.0 KB)

resavedFromWord2016.odt (5.0 KB)

1 Like