Ask Your Question
0

Where do the `text:style-name="Tnn"` span tags come from, and how do I get rid of them?

asked 2018-03-09 13:24:42 +0200

David gravatar image

updated 2018-03-09 13:29:09 +0200

I have a LibreOffice Writer document saved as a "flat" XML file (*.fodt file type). I am trying to apply regular expressions to it, using an external text editor.

My efforts are hampered because the document is littered with dozens of <text:span text:style-name="Tnn"> ... </text:span> wrappers. They seem to appear haphazardly, even between characters of a single word without any apparent change of "style" in the Writer view of the document itself. The Tnn (e.g. T10) numbers appear to be related to style declarations including something like: officeooo:rsid="009e4655"/ numbers.

Of course, this makes it impossible to construct any regex that works across the document as a whole.

So, two questions:

  1. What are these wrappers and officeooo:rsid numbers?
  2. Is there an easy way to get rid of them?

Trying to remove them manually would be ridiculously difficult.


Note: this Q&A is related to the following:

· "Regular expressions to move punctuation from after to before superscripts"

· "Writer: clarification needed about character attributes"
 

edit retag flag offensive close merge delete

1 Answer

Sort by » oldest newest most voted
1

answered 2018-03-09 13:43:21 +0200

David gravatar image

updated 2018-03-09 13:46:32 +0200

This matter was discussed in a bug report raised in 2013, fdo#68183. In the comment trail it is explained that the "officeooo:rsid" of <style:text-properties> (and one or two other related attributes) are part of the "OASIS OpenDocument 1.2 extended" format. These numbers (it appears) carry revision IDs.

Although a "fix" for this was committed as long ago as v. 4.5.0 (in "tdf#68183 sw: config option for disabling the creation of automatic RSID marks"), it isn't immediately clear how to get rid of them.

There are two approaches, I believe, one that is partial, and one that is more thorough:

  1. Partial: one can select text, and use Format > Clear Direct Formatting, and this will get rid of many (perhaps not all) of these wrappers. It will also get rid of direct formatting, of course (e.g., if "italics" have been applied), and so even this partial solution may come at a cost.

  2. Thorough: It is possible to save the document using a different ODF file-save setting. Go to Tools > Options > Load/Save > General to get to this dialog:

settings-screenshot

You want to choose one of the options that does not have "extended" with it. Choose, e.g., 1.2 (plain), then save your document. This should get rid of most of the offensive <text:span text:style-name="Tnn"> ... </text:span> wrappers, and in my tests at least produces a document which can be meaningfully manipulated by regex.

edit flag offensive delete link more

Comments

While the above account is as far as I have got, if there are other or, especially, better solutions, it would be good to know.

David gravatar imageDavid ( 2018-03-09 13:43:57 +0200 )edit
1

I'm looking at this from the position of wanting to use an external revision control tool on "fodt" files. This requires that style names are "stable" as sequentially numbered anonymous styles are likely to cause storms of differences in the saved file for even minor edits if the edit causes another "P" style to be created or destroyed. Elsewhere the suggested solution is "don't use anonymous styles", which is good provided the anon. styles are not being silently auto-generated.

oliverb gravatar imageoliverb ( 2018-10-31 10:02:59 +0200 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2018-03-09 13:24:42 +0200

Seen: 209 times

Last updated: Mar 09 '18