I have a LibreOffice Writer document saved as a “flat” XML file (*.fodt
file type). I am trying to apply regular expressions to it, using an external text editor.
My efforts are hampered because the document is littered with dozens of <text:span text:style-name="Tnn"> ... </text:span>
wrappers. They seem to appear haphazardly, even between characters of a single word without any apparent change of “style” in the Writer view of the document itself. The Tnn
(e.g. T10
) numbers appear to be related to style declarations including something like: officeooo:rsid="009e4655"/
numbers.
Of course, this makes it impossible to construct any regex that works across the document as a whole.
So, two questions:
- What are these wrappers and
officeooo:rsid
numbers? - Is there an easy way to get rid of them?
Trying to remove them manually would be ridiculously difficult.
Note: this Q&A is related to the following:
· “Regular expressions to move punctuation from after to before superscripts”