Headless conversion to HTML embeds images instead of creating separate files

We’ve been using libreoffice’s headless conversion to convert Word documents to HTML files. On version 4.0, it would create an HTML file and separate JPG files for any images embedded in the Word document and reference them via the img tag src attribute.

Now that we’ve upgraded to 4.2, the conversion only creates the HTML file, with all of the images embedded inline as base 64 encoded data-src attributes (eg. <img src=“data:image/jpeg;base64,R0lGODlhEAAOALMAA…”).

Is there a way to make the libreoffice conversion create the individually linked image files again? Here’s the command we are using for the conversion:

soffice --headless --convert-to html:HTML file_to_convert.docx

1 Like

This was resolved in tdf#48887, and by default, the images are saved as separate files again. Optionally, one can force LibreOffice to embed the images, using this command line:

soffice --convert-to html:HTML:EmbedImages file_to_convert

Closing as outdated.