Converting files using soffice convert-to with embedded images (html to doc)

Hi,

I’m converting html files to doc using the following command:

soffice -headless -convert-to doc test.html

…and it works well, but if my html has an embedded image like this page:

www.akamine.com.br/blog/imagem_embutida.html

…it just doesn’t work… it creates the doc file, but the image is not added to the doc document.

Anyone knows if I need to add a parameters to make it work or if it’s not possible to work with html embedded images?

I really appreciate any help

This appears to be a bug. When you state “embedded” I presume you mean using inline Base64 encoding e.g.,

<img src="data:image/jpeg;base64, ... " id="idImagem" border="0">

It is also worth noting that you really should be specifying a particular DOC format to output e.g.,

--convert-to doc:"MS Word 97"            # produces a dot graphic
--convert-to doc:"MS Word 2003 XML"      # measure_conversion.xsl: Find no conversion for  to 'twip'!
--convert-to doc:"MS Word 2007 XML"      # produces a graphic with a "read-error"
--convert-to docx:"Office Open XML Text" # produces a graphic with a "read-error"

The two “read-error” outputs appear like this when opened in Writer:

The DOCX version certainly appears to include the Base64 encoded object so I am not sure why it experiences a read-error. I can’t find a related open bug report for this issue so please raise a bug. Include as much detail as you can and link to this thread if necessary. Post the number of any raised bug back here in a comment using the format “fdo#123456” (this format appears broken at present, but hopefully will be fixed soon).

Thanks!

The bug was reported, if you want to check the status, go to: https://bugs.freedesktop.org/show_bug.cgi?id=66852

Bug 66852 - FILESAVE: Converting html to doc or pdf using soffice command line doesn’t convert embedded image · Status:NEW