We’ve been using libreoffice’s headless conversion to convert Word documents to HTML files. On version 4.0, it would create an HTML file and separate JPG files for any images embedded in the Word document and reference them via the img tag src attribute.
Now that we’ve upgraded to 4.2, the conversion only creates the HTML file, with all of the images embedded inline as base 64 encoded data-src attributes (eg. <img src=“data:image/jpeg;base64,R0lGODlhEAAOALMAA…”).
Is there a way to make the libreoffice conversion create the individually linked image files again? Here’s the command we are using for the conversion:
soffice --headless --convert-to html:HTML file_to_convert.docx