LibreOffice "mangles" TextEdit RTFs and .doc files

I am having a really weird problem with LIbreOffice opening - The paragraph formating or TextEdit RTFs is not being maintained.

For example, the section

And it stretched out on all sides, an endless pane of black glass reaching right to the dim horizon.

‘You don’t land a Hercules HC-130J on water, you crash it!’

His Colonel’s sole piece of advice on making an emergency water landing bounced around persistently in King’s head as he fought against the controls, struggling to correct how the damn plane kept trying to pull to one side.

is rendered into the following wall of text:

And it stretched out on all sides, an endless pane of black glass reaching right to the dim horizon. ‘You don’t land a Hercules HC-130J on water, you crash it!’ His Colonel’s sole piece of advice on making an emergency water landing bounced around persistently in King’s head as he fought against the controls, struggling to correct how the damn plane kept trying to pull to one side.

Pages keeps the paragraph formatting (the Mac clipboard doesn’t) but (and this is where it gets really weird) if you save the RTF as a doc from Pages and open it in LIbreOffice the paragraph formatting disappears and you get walls of text.

Is this a quirk of LIbreOffice or am I doing something wrong?

TextEdit RTF is not a “normal” RTF as defined by the format author (Microsoft) [1]. It contains Apple extensions, which are not supported (yet) by LibreOffice; see e.g. tdf#53412 for another problem with this format.

Any specific problem with this format (as with others) should be reported to bug tracker, with sample documents showing the problem, so that it could be reproduced and missing functionality gradually implemented.

[1] : Rich Text Format - Wikipedia

The provided link only comments regarding Apple or TextEdit are as follows.

“RTF supports inclusion of JPEG, Portable Network Graphics (PNG), Enhanced Metafile (EMF), Windows Metafile (WMF), Apple PICT, Windows Device-dependent bitmap, Windows Device Independent bitmap and OS/2 Metafile picture types in hexadecimal (the default) or binary format in a RTF file. Not all of these picture types are supported in all RTF readers. When a RTF document is opened in software that does not support the picture type of an inserted picture, such picture is not displayed at all.”

“The default text editor for Mac OS X, TextEdit, can also view, edit and save RTF files as well as RTFD files. TextEdit currently (as of July 2009) has limited ability to edit RTF document margins. Much older Mac word processing application programs such as MacWrite and WriteNow were able to view, edit, and save RTF files as well.”

There is nothing in the wikipedia article about TextEdit using a “unnormal” RTF format. If anything the article point to Microsoft not providing full information on the format resulting in “RTF specifications lack some of the semantic definitions necessary to read, write, and modify documents.” (Microsoft RTF Specification Nightmare | Diary Products - Hannes Schmidt ). Of course that was 9 years ago but the article gives no indication that Microsoft has changed tack but to be fair much of the information (as shown by the comments above) is 10 years old meaning the article is in serious need of an update.

To the TL;DL crowd: don’t use wikipedia articles with largely 10 year old information for your reference regarding computer file formats.

The wikipedia article is not about “Apple implementation” - it’s about “RTF being created and standardized (even if internally) by MS”. The article contains also (no more working) links to the standard descriptions on MS site, and also links to the still accessible (via webarchive) text of the description.

The format itself stopped developing after 2007, so article not being updated simply reflects the state of the format (stagnating).

On the other hand, the link to TDF bug included a link to the Apple knowledge base describing Apple extensions to the format (which I provided back in 2012; still accessible; and MS official documentation was still available from their site at that time btw - I still use a copy of it saved locally when fixing RTF issues in LO). So - please try to not be sarcastic when there’s a possibility that it’s you who misunderstood/made poor investigation.

I think you missed the point I was raising - the link originally provided as a reference to “TextEdit RTF is not a “normal” RTF as defined by the format author (Microsoft)” didn’t support the statement being made Heck, the link article didn’t say anything about either Apple or TextText doing their own thing with TextEdit with regards to RTF. There is a free section of Apple’s developer section so I hopped over to the Bug Reporting section and looked up RTF - “No items match your query”. So I looked up TextEdit and again “No items match your query”. Now to be fair I don’t know how far back the archive goes (there are no dates on the material there) so I hopped over to Apple’s discussions section.and found several specific bugs (including one reported with a reply made Mar 20, 2006 that felt the blame was MS’.

In short: Know the different between sarcasm and criticism and make sure a linked reference actually supports the statement in entirety it is referencing

I ran out of room so here is the statement made regarding RTF referenced above:
“Yes that is correct. RTF is far from standard, and Mellel doesn’t support it well - nor do they want to. In fact they are moving to a XML type native document. Having said that, I don’t know if the issue resides with Mellel’s RTF coding or with Pages import. I suspect that it is Mellel’s issue.” Another comment states “RTF is a publicly documented file format offered by Microsoft. It includes footers and forms (tables?). If Pages is removing footers then it doesn’t seem to support the ‘current’ RTF 1.6 standard. (link to MS’ standard provided is mangled by this comment box and so removed)” - “importing RTF into Pages 2” thread.

Furthermore there is Microsoft RTF Specification Nightmare ( Microsoft RTF Specification Nightmare | Diary Products - Hannes Schmidt ) from 2004 and it points out various issues all stemming from MS’ handling of the format…on Windows.

I think you are missing my point. A link is meant to support whatever statement I made it support. And my statement that the link was intended to support was “RTF is a MS format”. That statement is covered by that article in length.

And also - you are on a site where your knowledge level is unknown in advance. You ask a question; and I provide an answer enough for anyone to draw conclusions, and file proper bug report so that e.g. I could fix it. If you need to nitpick that for some specific purpose, the answer was not enough, you are missing the point of this site.

Sigh, the main part of the statement of the line cited (the Independent Clause for all you English nerds) is "“TextEdit RTF is not a “normal” RTF” with the “as defined by the format author (Microsoft)” part being a Dependent Clause. Ideally (and per what I was taught in collage) citations should support the Independent Clause of a sentence which as I point the citation you provided doesn’t.

Never mind as the “Microsoft RTF Specification Nightmare” article showed MS didn’t really have a clear set of guidelines regarding RTF… “The RTF spec lacks a lot of the semantic definitions necessary to read, write and modify documents.” Heck, the writer calls the various tweeks in 1.6 “hacks” one of the repliers points out “For some characters, RTF uses \u800, where the number represents the unicode value. But for others it uses \u-8000” ie the standard isn’t a real standard. Hence the problems.

  1. You are coming here asking a question “Is this a quirk of LIbreOffice or am I doing something wrong?”. The question implies an answer that would help you to proceed. You are getting an answer that covers that.
  2. Writing that answer, I had to check if I can provide some background from the authors of the standard (MS). I spent some time checking that, and deciding which links to provide instead, so that questioner (of unknown background) could see the picture, in the situation that MS has removed the links from own site; I decided that Wikipedia is enough for the basic understanding.
  3. Now you declare that my answer does not meet requirements of an article published in Nature/Science. And also you try to push whatever you learned in college as something universal (for “you English nerds”??? really? I’m Russian btw!), as a reason for criticism of the answer given to you out of good will. Good luck asking next time!

You are ignoring the last part of the original post:

"Pages keeps the paragraph formatting (the Mac clipboard doesn’t) but (and this is where it gets really weird) if you save the RTF as a .doc from Pages and open it in LIbreOffice the paragraph formatting disappears and you get walls of text. To make this clear I have added “and .doc files” to the title.

So the formatting issue is not just RTF but the “.doc (Word Compatible 1997-2004)” format as well. Saving as .docx in Pages bypasses the problem but if you save that as RTF Libreoffice has no issues with the formating and yet in TextExit it looks identical to the original RTF file. IMHO all that put the problem in how LibreOffice is importing RTF and .doc files rather then anything with the format themselves.

The only thing I did notice is LibreOffice’s RTF is much larger (5.1 MB rather the original 1.8 MB) but doesn’t explain .doc having the same issue. Nor why what is supposedly two different ways of formatting RTF (3.3 more MB then the other) look identical in TextEdit.

You are ignoring the last part of the original answer:

“Any specific problem with this format (as with others) should be reported to bug tracker, with sample documents showing the problem, so that it could be reproduced and missing functionality gradually implemented.”

The only thing I did notice is you trying to “criticize” something, and keep useless “discussion” about unspecified problem (because no amount of walls of text from you substitute a proper bug report with sample documents, without which the problem is not seen by others).