Bad gif when --convert-to html:"HTML (StarWriter)" on Linux

OS: Open SUSE 12.1

LibreOffice Version: 4.0.3.3

I use following command to convert a doc to html:

libreoffice4.0 --headless --convert-to html:“HTML (StarWriter)” maths.doc

In the output, some formulas are displayed as black bar like this:

If I open libreoffice and use “Save as” HTML function, it is displayed normally like this:

The two images I attached above are from the same formula in the doc.

Theoretically, I think, they should call the same conversion function in core. Why could they generate different gif?

I try the command on a Mac, it works well. Both methods can generate normal gif.

In fact, I don’t want to use gif at all. I would like to use --convert-to html to embed all in one html file. However, most of the embedded images are corrupted on both Mac and Linux with default HTML filter. The same situation to the last comment by Luc.Tartier of following question: Export formulas to mathml?

Could someone help? Thanks a lot!

Cameron

(Some more info to help to reproduce the issue)

Here is the original doc (but it includes Chinese): http://www.eguidedog.net/tmp/maths8/maths.doc

Here is the version converted with libreoffice4.0 --headless --convert-to html:"HTML (StarWriter)" maths.doc
http://www.eguidedog.net/tmp/maths8/maths.html

Here is the version converted with “Save as” in GUI menu:
http://www.eguidedog.net/tmp/maths8/maths2.html

Thanks for providing the original file. The objects that are appearing as blacked-out GIFs are graphic objects (not formula objects) in the original DOC. They appear to be vector objects that the filter cannot understand. If I save the DOC in DOCX/ODT format they are included as WMF file. There are known issues with how LO handles vector objects. It is steadily improving although Windows vector formats (WMF/EMF) may never be fully supported, as SVG is the preferred format for such objects.

What I don’t understand is that it can be correctly converted on Mac version or through “Save as” menu. Aren’t they using the same filter to that generates blacked-out GIFs?

I can confirm that under MacOS 10.6.8 LO v4.0.3.3 using File > Save As… > HTML Document (Writer) produces normal (low quality) graphics. I have never been able to get headless mode to work under MacOS so did not test that. I will update my answer.

Thank you for the detail updates although this issue is still mysterious to me. I am wondering whether this may be caused by some font libraries only loaded via GUI. By the way, LO I am using on Open SUSE Linux 12.1 is from LO website. The LO version in Open SUSE 12.1 is 3.4.

Yeah, me too. The fact we are both experiencing the same thing, suggests (to me) that you may want to consider raising a bug for this. Include as much detail as you can and link this thread. It is probably going to take a developer or someone more familiar with the ui/headless differences to explain why this is happening. If you do raise a bug, please report it back here in the form “fdo#123456”. Thanks.

I’ve raise a bug for this issue. (fdo #65918). Thanks!

Thanks Cameron. I have confirmed the bug. We shall see what the developers make of it.

Is your version of LO from the OpenSUSE repository or the LO website? I don’t have an answer to why the same filter appears to behave differently via headless mode and the UI. Under Crunchbang 11 running TDF/LO v4.0.3.3 I managed to obtain the blacked-out graphics using your suggested command:

$ soffice --headless --convert-to html:"HTML (StarWriter)" maths.doc

I thought I tried this originally and it worked OK, but I must have gotten confused amongst the various tests I did, because it certain produces the same black-out graphics you are seeing:

Crunchbang 11 formulas, blacked out

Sorry for any confusion. Using the File > Save As… > HTML Document (Writer) menu method the graphics appear OK:

Crunchbang 11 formulas, OK

I don’t know why this is as both methods would appear to be using the “HTML (StarWriter)” filter, presumably HTML__StarWriter_.xcu and HTML__StarWriter__ui.xcu (source). I can’t get LO to run in headless mode under MacOS so could not test this. Under MacOS 10.6.8 running LO v4.0.3.3 using the File > Save As… > HTML Document (Writer) menu method the graphics appear OK (slightly better quality than under GNU/Linux, although still fairly poor):

MacOS formulas, OK

… most of the embedded images are corrupted on both Mac and Linux with default HTML filter.

I had a look at the embedded formulas in your original DOC. The objects that are displaying as blacked-out GIFs are graphic objects (not formula objects). They appear to be vector graphics that the filter cannot understand (although why this works via the UI and not in headless mode I don’t know).

If I save the DOC in DOCX/ODT format these formulas are included as WMF format graphics. There are known issues with how LO handles vector objects. It is steadily improving although Windows vector formats (WMF/EMF, which is possibly what the original objects are) may never be fully supported, as SVG is the preferred format for such objects. The ability to store an equation as MathML (DOCX) rather than a graphic (DOC) offers significantly better quality of output.

I don’t want to use gif at all. I would like to use --convert-to html to embed all in one html file.

The “XHTML Writer File” filter (same as File > Export… > XHTML file type) embeds the equation in the html as you desire. Unfortunately though this appears to work better for a DOCX source that a DOC source. For a basic formula object a DOC produces this output:

<!--Next 'span' is a draw:frame.-->
<span style="height:31.01pt;width:83.99pt; padding:0; " class="fr1" id="graphics1">
	<img style="height:1.094cm;width:2.963cm;" alt="" src="data:image/*;base64,
	... embedded graphic ...
	/>
</span>

… while a DOCX source produces this output:

<!--Next 'span' is a draw:frame.-->
<span style="height:31.15pt;width:77.9pt; padding:0; " class="fr1">
	<math xmlns="http://www.w3.org/1998/Math/MathML">
	... MathML object ...
	</math>
	<img style="height:1.0989cm;width:2.7481cm;" alt="" src="data:image/*;base64,
	... embedded graphic ...
	/>
</span>

For reference, this AskLO thread on HTML exporting and this AskLO thread on MathML are likely related.