Basic clean-up tool for bloated LibreOffice HTML exports

I’m looking for a tool to clean up the bloated, messy HTML files tha LO puts out. What is being exported from Writer is certainly not an exact copy of my document, given the presumption all that extra unreadable and untraceable cr(uft) in the file is supposed to make the web page exactly match the LO document. I’m finding a lot of lines are overlapping, spacing is way off (an excessive amount of whitespace) and generally looks bad. Tidy even barfs on the HTML.
Maybe for various use cases all that extra formatting is fine, but I just want simplified HTML. With all the cr(uft) being introduced here I’d almost be better off writing files in a plain-text editor.

The “almost” is superfluous here.
LibreOffice is not an HTML authoring / editing tool. Even though every bug report / enhancement request is welcome, you need to realize, that a simplistic partial support for HTML4 (I have only recently removed some remnants of HTML 3.2 code), and some tiny bit of CSS support, is absolutely inadequate for a good HTML result.

Please try the export to XHTML instead. It is in menu File > Export…

1 Like

Well, considering that I am trying to make an HTML export of a document (the ODT is it’s primary format) I’not USING it as an HTML editing tool; I merely want a basic, clean format I can use elsewhere. I never ASKED for it to be an authoring tool, I need to use the file in an application that IS an HTML editing tool.

Good idea, but that one is even less of a plain HTML file.

You can try the extension Writer2xhtml
I tested it before few years and there was some bad crash for complicated ODT, so the better was to backup the Libre User Profile before use.


But for interest, can you upload some example of your ODT document you want to transform to HTML? If there is only a few paragraphs with few bold&italic&color formatting, it could be really easier to rewrite it to HTML handly. But if there are more difficult items like tables, frames, alignmented images etc., I believe you will not find any program that will export these difficulties 100% well, because the formatting of web pages is primarily based on CSS and not only simple HTML. And in CSS, you need the “complex” idea for layout of whole web with all elements, so export to simple HTML means mostly export without complex CSS. And then it is mostly very complicated to attune the simple HTML with “complex” CSS.