I am the editor of a document [the IEEE 754-2008 standard] that was created around 15 years ago (using OpenOffice), and has had nearly 200 drafts, a number of editors, and countless edits. It was last changed in 2008, but is now about to go though a revision cycle.
I was delighted to find that LibreOffice handled the 2008 .odt file almost perfectly, with only 7 errors (all were weird spurious empty reference tags, of unknown provenance, that OpenOffice quietly ignored).
While identifying and removing those from the content.xml, I noticed that there are hundreds (possibly thousands) of redundant tags. These are typically in the context: <span whatever>text1</span><span whatever>text2</span> where ‘whatever’ is identical, and either or both ‘text1’ or ‘text2’ may be empty.
It there a tool to clean these up? I could write one myself (I recently wrote an XML parser) but if one already exists …
Many thanks – Mike Cowlishaw