Clean up/optimize an odt file?

asked 2015-09-23 18:09:31 +0200

this post is marked as community wiki

This post is a wiki. Anyone with karma >75 is welcome to improve it.

I am the editor of a document [the IEEE 754-2008 standard] that was created around 15 years ago (using OpenOffice), and has had nearly 200 drafts, a number of editors, and countless edits. It was last changed in 2008, but is now about to go though a revision cycle.

I was delighted to find that LibreOffice handled the 2008 .odt file almost perfectly, with only 7 errors (all were weird spurious empty reference tags, of unknown provenance, that OpenOffice quietly ignored).

While identifying and removing those from the content.xml, I noticed that there are hundreds (possibly thousands) of redundant tags. These are typically in the context: <span whatever>text1</span><span whatever>text2</span> where 'whatever' is identical, and either or both 'text1' or 'text2' may be empty.

It there a tool to clean these up? I could write one myself (I recently wrote an XML parser) but if one already exists ...

Many thanks -- Mike Cowlishaw

edit retag flag offensive close merge delete

Comments

It's an old question, but I'm here to say I am also looking for ODT/FODT optimizers (as there are SVG optimizers).

João Paulo gravatar imageJoão Paulo ( 2018-06-14 00:50:07 +0200 )edit

I would like this too.

lomacar gravatar imagelomacar ( 2019-03-08 07:08:43 +0200 )edit