Writer export to epub, messy, why?

Hi :slight_smile:
Can someone tell me if I am doing something wrong (and obviously the proper way to do it) when exporting a text from libreoffice writer to an epub3 file.
When doing so, and then opening the epub in Sigil, I get span class changes all over the place ; like 2, 3, or even more inside a single paragraph, even tho there is no span class change needed. (For example, I’ll get: < p class=“para3”>< span class=“span3”> text…text…text < /span> < span class=“span3”> and this and that < /span >< span class=“span3”> and so on… < /span> … )
Removing all of those occurrences by hand (which takes close to forever), yet my ebook has no issue at all afterwards, no mather the epub reader I try it in… So obviously, all those span class changes were not necessary at all.
Makes quite a messy epub file to further edit in Sigil…
How can I avoid that ?
Thanks :slight_smile:

P.S. I format my text using paragraph styles. On the other hand, I do not use character styles at all.
OS : Windows 10.
LO version : 7.0.3.1
(Most of my work is done in Scrivener, then exported to Writer.)
Here is a sample of what I get exporting to an epub3 in Writer. (Page 2 is what I get from page 1, once in Sigil.)
Sample ODT + conversion result.odt

How do you format your original document? With styles? Manually with buttons, keyboard shortcuts, …? In the second case, Writer may have a very hard time to figure out your formatting and will issue redundant and/or useless formatting changes.

To get a correct conversion into any alien format, your original document must first be very clean formatting-wise, i.e. be based exclusively on styles (paragraph, character, page, frame, list). Also, Writer may find difficult to merge two adjacent sequences which show apparently the same formatting if they’ve been so at different times.

What’s your OS and LO version?

Please do not use Add Answer but edit your original question to enhance the details of your question (answers are reserved for solutions to a problem on this Q&A site).

Thanks. I updated my question with the info you specified.

The problem you report with <span> looks like an “intra-paragraph” formatting issue. Sorry to emphasize that, but do you use character styles for this? I mean, when you want to bold a word, do you Ctrl+B or apply Strong Emphasis character style?

Writer not only knows of paragraph styles (the only concept in Word) but also offers other style categories to remove the need of manual formatting.

For bold and italics, I do use the toolbar buttons. I never use caracter style. (I actually am a Scrivener user, and do pretty much everything there. I pretty much use libreoffice because Antidote (a correction software) isn’t compatible with Scrivener.) But: I get those span changes regardless of bold, or any formatting whatsoever. They even very often enclose only a single sign.
(Note that once in libreoffice, I reformat everything using paragraph styles, since they have no style once exported from Scrivener…) But I never used the character styles at all.

Maybe this is a lead : I converted the ODT to epub using CALIBRE, and tho the coding is different, it came out just as messy…

According to Wikipedia, Scrivener doesn’t seem to address the same target as Writer. From what I understand, you organise your book or “library” with Scrivener and you delegate formatting to a document processor.

I already helped an AskLO user with a complex multi-layout export from single source (LO doc) to EPub. The solution went through restyling his work with all Writer style categories and barring any direct formatting. Then the multiple export was solved with templates + master document importing the single source.

I think your problem is much simpler and derives from your non-use of character, frame, page styles.

Eventually, make a short sample file with the problem (max 2 pages) and attach it to your question (you can’t attach to a comment).

I attached a sample. Page 2 is what I get from page 1, once in Sigil. (Had to merge the two files, since I couldn’t either attach the resulting epub, nor have more than one attachment.)

I saved your sample file as .fodt so that I could analyze the underlying XML.

The <span> mess is already present in the Writer document. As previously mentioned in the comments, this originates in manual character formatting which eventually was replaced to revert to the initial state. Note that replacing an existing formatting with something which looks like the status quo ante is not the same as erasing the formatting. E.g. toggling twice bold is not the same as no bold at all: a markup is kept in the XML.

I selected the whole text and cleared direct formatting (Ctrl+M) to remove all added manual formatting. I saved again as .fodt and there was no longer the <span> mess.

This also showed that centering in Header 3 was done with direct formatting instead of customising the paragraph style.

Fix:

  • select your whole document and Ctrl+M

  • configure your paragraph styles to include all needed properties instead of providing them with direct-formatting

  • create character styles or use existing ones for intra-paragraph typographical variations

  • make sure you don’t use direct formatting unless you have measured the consequences

    With such a simple type of document, you shouldn’t need any direct formatting at all, except a special manual break here or there to switch to another page style.

EDIT Recipe to “forward” your present intra-paragraph formatting

What you can do on your original file (before clearing direct formatting) is to navigate to words which are not formatted as per the paragraph style. You’ll see if they are italics or bold. Select the sequence and apply built-in Emphasis for italics or Strong Emphasis for bold. If you need more variants, create the adequate character styles and apply it.

Applied character styles will not be cleared by Ctrl+M. Therefore, you should not see any difference when you Ctrl+M. Ctrl+M acts only on the selection. Consequently if you proceed paragraph after paragraph, this should be quite safe.

Don’t bother for the exact aspect of the styles at this step. The important thing is to mark up your text with styles.

When done, tune the styles, paragraph styles as well as character styles. You’ll see immediately the effect without the pain to track the occurrences. This is why it is important to give styles names reflecting the semantic intent, not the typographical appearance: Emphasis may be italics or red or another font face. Change the style, all occurrences are changed.

This is the magic of style. You separate content from look. This may seem disturbing at first but it is tremendously versatile and powerful. Experiment.

Once your text is styled, pasting paragraphs in another fresh document keeps style marking. No stress.

To show the community your question has been answered, click the ✓ next to the correct answer, and “upvote” by clicking on the ^ arrow of any helpful answers. These are the mechanisms for communicating the quality of the Q&A on this site. Thanks!

Since I didn’t do much in Writer, is it fair to say that the issue is actually coming from Scrivener ? Also, I understand the solution you proposed, but have to admit that I am a little taken aback by the fact that I would have to redo the entire formatting. In the context of a novel, re-reading the whole thing trying to remember what was in italics ect, would not only be a huge amount of work, but also quite stessfull…

You must understand that Writer puts a huge emphasis on document structure. Styles are here for this purpose. Not only styles like Heading n describe your book outline but other paragraph style contribute to semantically mark up your text. This is complemented by paragraph styles for sub-paragraph marking.

See the update to my answer.

Sorry, I can’t upvote, not enough points.
Thank you very much for your patience and the answer.

After doing all that you said, tho LO is inserting way less redundant/useless span changes, I have to unfortunately report that it is still doing it. All direct formatting was previously removed as instructed. I also find it disappointing to have to tip-toe through that process, considering that at the stage where I use LO, my work is 97% done already. It is uselessly stressful to tweak around like that in a text that I have revised to the edge of insanity, only to risk messing it up. In other words, it is an extra step I could well do without. (Note that when converting my text from Scrivener to epub, all is clean.) Also note that all those direct formatting italics and whatnot, were actually coming out fine before when exporting to an epub. The problem was solely LO constantly switching from spanX to spanX, instead of waiting for spanY to be necessary… Imo, LO has a flaw in that matter. But hey, thanks still, at least we tried…

Your “stressful” step is a consequence of your non-styling from start. The normal way of using Writer is styles. Direct formatting is a poor substitute offered for quick’n’dirty experimentation and for initial conversion of those switching from another office applications where formatting is much looser.

Adequate semantic markup (note my choice of word markup instead of formatting) is part of author job as it brings more understanding of document meaning. Typographical translation can be done afterwards as a totally independent task, without the need to ever review text.

As an example, I recently changed a template on which a 200-page document was based: change of font faces, page geometry, line spacing, heading aspects, numbering in headings and lists, colours, … It only took less than 5 minutes with 2-3 adjustments to completely change the general presentation because the document was fully styled (paragraph, character, frame, page and list), no typographical d.f.