Ask Your Question
0

How is LibreOffice supposed to render a .doc file with HTML content?

asked 2020-08-31 19:21:16 +0100

janvlug gravatar image

updated 2020-09-25 14:09:24 +0100

Alex Kemp gravatar image

Sometimes, I export a selection of issues from Jira to word (as it is called in Jira). This results in an HTML file with the .doc extension.

The strange thing is that these files sometimes are opened and rendered as HTML in LibreOffice, but at other times, I see the HTML mark-up (syntax) in LibreOffice as text.

Changing the extension from .doc to .html makes LibreOffice render the HTML.

I'm puzzled by the seemingly random decision of LibreOffice on how to open such a .doc file.

It seems to me that this is unwanted behaviour, as it seems unpredictable on how a file will be rendered.

How is it decided how LibreOffice renders a file that contains HTML mark-up that has a .doc extension on Fedora Linux?

edit retag flag offensive close merge delete

Comments

You need to provide the two such fake DOC files that open differently in LireOffice, to analyze their differences.

Mike Kaganski gravatar imageMike Kaganski ( 2020-09-01 07:42:31 +0100 )edit

1 Answer

Sort by » oldest newest most voted
0

answered 2020-08-31 19:50:22 +0100

ajlittoz gravatar image

When you double-click on a file icon, the list mapping extension to application is read. If one is found, the app is launched. If the look-up fails, utility file is run to guess file type based on "magic numbers". If successful, the first matching app is launched. If everything fails, you get a dialog asking to choose an app manually.

If you try to open a file directly from Writer, the extension plays no role. The file is tentatively opened and its content (or rather a small part at the beginning) is analysed to guess which component should handle the file (Writer, Calc, Impress, …). If the file does not show ODF structure (i.e. does not look like a compressed or uncompressed XML file with the proper DTDs) or other structure readable through the import filters, it is considered as text or binary. "Text" files will be shown "as is" even if you, as a human, immediately see it is HTML or script or whatever.

I am no developer, so I can't explain the difference of processing between .doc and .html for a pure text containing HTML. IMO, only extension .html makes sense because .doc is a binary format with a well-defined structure.

To show the community your question has been answered, click the ✓ next to the correct answer, and "upvote" by clicking on the ^ arrow of any helpful answers. These are the mechanisms for communicating the quality of the Q&A on this site. Thanks!

In case you need clarification, edit your question (not an answer which is reserved for solutions) or comment the relevant answer.

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2020-08-31 19:21:16 +0100

Seen: 31 times

Last updated: Aug 31