When I:
-
make a copy of one of my odf/odt documents
-
rename it as a zip file
-
unzip it
-
look inside of the
- .//content.xml
- .//meta.xml
- .//settings.xml
- .//styles.xml
data, I see a lot of extra, superfluous mark up such as:
I would wonder if th<text:span text:style-name=“T92”>o</text:span>s<text:span text:style-name=“T92”>e</text:span> people notice
All that unnecessary, funky mark up is transferred into the formatting of the new file when you “save as” RTF.
Even though I have only declared “Liberation Serif”, I see all those font-face declarations in styles.xml:
<office:font-face-decls>
<style:font-face style:name="Lohit Devanagari1" svg:font-family="'Lohit Devanagari'"/>
<style:font-face style:name="Liberation Serif" svg:font-family="'Liberation Serif'" style:font-family-generic="roman" style:font-pitch="variable"/>
<style:font-face style:name="Liberation Sans" svg:font-family="'Liberation Sans'" style:font-family-generic="swiss" style:font-pitch="variable"/>
<style:font-face style:name="Lohit Devanagari" svg:font-family="'Lohit Devanagari'" style:font-family-generic="system" style:font-pitch="variable"/>
<style:font-face style:name="WenQuanYi Zen Hei" svg:font-family="'WenQuanYi Zen Hei'" style:font-family-generic="system" style:font-pitch="variable"/>
</office:font-face-decls>
I thought all that extra fluff may be included in documents in case you need to undo the text, but what I see doesn’t make any sense.
There should be a “final save” to save your document in a minimally lint, canonical style set.
I work on corpora research with lots of text files. Dealing with all that extra junk doesn’t make any sense.
Why all that junk in odf formatted files?
Is there a way to “lint” odf formatted files?
lbrtchx
(reformatted by ajlittoz for better readability)
(One single semicolon added to get an opening angle bracket after the orphaned “e”. @Lupp)