Superfluous span tags in epub (even when not using any manual formatting)

I’m exporting from LO writer to epub, and looking at the output in sigil. There’s a lot of superfluous span tags in the output. I’ve since learned that this happens, among others, when you use manual formatting, so I stopped doing that. I now use paragraph and character styles exclusively. However, there still are these superfluous tags, and they seem to be a result of editting the document (ie, cutting a word, pasting it elsewhere, inserting characters, etc…).

Is there a way to disable the creation of these span tags? They add considerable bulk to the exported epub, and lead to rendering problems on certain ereaders.

As an example, here’s a fresh text document that I created, pasted in some lorem ipsum from the web, and then I cut&pasted a word and inserted some characters elsewhere. I didn’t do any formatting, just entering and editing text.

test.odt (9.9 KB)

Looking at the epub, there’s the superfluous span tags:

<p class="para0"><span class="span0">Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam sit amet ipsum augue. Cras dolor lectus, congue vel interdum eu, pharetra eget magna. Nulla non vulputate magna, sed egestas metus. Sed eget porttitor augue. Aenean augue elementum sit amet tristique lacinia. Integer iaculis ipsum turpis, sit amet bibendum velit semper eget. </span><span class="span0">Invoegen c</span><span class="span0">urabitur vel nulla et mauris vestibulum iaculis sit amet sed nibh. Etiam egestas condimentum lacus vitae interdum. Donec molestie vel lacus et bibendum. Sed bibendum sem vehicula mi maximus, vitae semper enim lobortis. Vestibulum quis tellus tortor.</span></p>

(sorry, this forum doesn’t allow me to upload the .epub :frowning: )

This should really be just one paragraph with one span tag.

Saving as flat xml (.fodt), I can see the tags are there as well, with style T1:

<text:p text:style-name="P1">Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam sit amet ipsum augue. Cras dolor lectus, congue vel interdum eu, pharetra eget magna. Nulla non vulputate magna, sed egestas metus. Sed eget porttitor augue. Aenean augue elementum sit amet tristique lacinia. Integer iaculis ipsum turpis, sit amet bibendum velit semper eget. <text:span text:style-name="T1">Invoegen c</text:span>urabitur vel nulla et mauris vestibulum iaculis sit amet sed nibh. Etiam egestas condimentum lacus vitae interdum. Donec molestie vel lacus et bibendum. Sed bibendum sem vehicula mi maximus, vitae semper enim lobortis. Vestibulum quis tellus tortor.</text:p>

(sorry, this forum doesn’t allow me to upload the .fodt :frowning: )

Short of going through the document with ctrl+M every time before I export, is there anything I can do to prevent this pollution from occuring?

@guilhem could we possibly add fod* to allowed list?

1 Like

When you copy from the web, the text portion is HTML-marked up. This is probably sufficient to cause pasting to add some indication that the fragment is not the same as the surrounding document.

You should try to paste as unformatted text so that the pasted fragment takes only the current paragraph styling. This is equivalent to Ctrl+M on every pasted text.

That’s not it though. I retried the experiment. This time I pasted the text via ‘Edit → Paste Special - > Paste Unformatted Text’. Same result.

I looked at your attached sample file and saw nothing special int it but for the <span> around “Invoegen”.

By chance, is Track Changes enabled is some way even if changes are not shown?

If you File>Save as (not just File>Save so that there is a brand new copy of the document), do you get the same behaviour from the copy?

The span around “Invoegen” is an example of a superfluous span. This is just a simple example. In real documents, sometimes there are span tags per letter of a sentence.

Double checked, track changes is disabled, so that’s not it either. And when I ‘Save as’ and then close LO, reopen LO and open the copy, the problem persists.

Does it relate to Options|Writer|Comparison|Random number to improve accuracy ...|[x] Store it when changing the document? This setting inserts RSIDs into the document, which could be the contents of that “T1” character autostyle.

1 Like

@mikekaganski: this <span> effectively inserts an rsid. I thought it would be cleared by a Save as, which apparently is not.

Ah, I opened the document now, which I overlooked initially. Yes, this is the rsid function.

First, its generation may be disabled as I mentioned above.
Second, it seems that exporting it to epub is a bug.

That’s it! I disabled that option, redid the experiment, and no more unnecessary span tags. Thanks a bundle, I would never have been able to figure that out for myself :slight_smile:

Except now I have a few paragraphs at the end that have these T?? span’s and even ctrl-m doesn’t remove them. Tried enabling the option again to remove them, but they won’t go away. Maybe I should edit the fodt manually ?

edit Cutting the text and then pasting as unformatted text, then reapplying all paragraph styles again did the trick - I hope those rsid spans are now gone forever!