Regular expressions to move punctuation from after to before superscripts

I am editing a manuscript with many references formatted as superscripts before the punctuation. However, the journal style guide requires that references in superscripts come after punctuation.

I would like to know how regular expressions can move the punctuation so that it comes before the superscripts. In most cases, the punctuation is periods, but there may also be some semi-colons and commas.

The very first question: are the superscripts real superscripts or not?

Also important to know which version of LibO you’re using (and on what platform) – something using this approach might work for you, but the AltSearch extension doesn’t work for everyone.

Good questions. Gabix, if you are asking if the superscripts are styles or directly formatted, the answer is that they are directly formatted.

I am using LibreOffice 5.4 and Windows 10.

David, I have not had problems using the Alternate Find and Replace extension, if that is what you mean. The problem is that I don’t know the codes or expressions needed to do what I want.

@catbill - the syntax for searching for footnote anchors is in that Q&A I linked, but here’s the AltSearch help page which also provides examples. IF you ccould convince AltSearch to work (I can’t on the machine I’m presently using), the it should be possible to construct some regex to help you. Let us know how you get on!

Fortunately, searching for superscripts finds each set of references. For example, it highlights (5,7,9-11). That is great. However, I am stuck on how to specify that the superscripts should be followed by a period (or semi-colon). It seems that this should be relatively simple, but I can’t figure out to do it.
Also, once this is figured out, is there anything tricky about specifying that the replacement should have the period before the superscript?

The problem with the inbuilt search engine is that if you spec superscript, it will apply it to all the items in the search term. So I don’t see a way to search for a superscript number followed by a normal punctuation mark. Which suggests a macro is necessary.

Thanks, paul1149. I am not quite sure what you are saying, but to be clear, I want the search to apply to all of the items in the search term. I do not want to search for a specific number.
Currently, the text looks like this: (5,7,12-14).
The parentheses and numbers are superscripts. The period is normal text, not superscript. All I want to do is move the period so that it comes before the superscript, not after.

This is how I see it. Because the LO search engine applies formatting to the whole search string, you cannot search for a superscript followed by anything not superscript. Therefore that is out. If the trailing character was superscript, you could use this:

With Regex enabled,

Search string: (.*)([\.?!;,])

search string properties: None,
superscript, automatic

Replace string: $2$1

But that won’t work in your case, because it isn’t.

Now, there may be a workaround. If the only numbers in parentheses and followed by punctuation are the references you want to hit, you could use this search:

Search string: (\([0-9,-]*\))([\.?!,;])

Replace string: $2$1

[x] Regex enabled.

But if there are other numbers in parentheses followed by those punctuations, then it will catch them as well.

Unless I’m missing something, or someone knows how to conjure the Alternative Search engine to do this, the only other way is by macro.

Thank you for looking into this some more. When I tried this in LO Find and Replace, it selected what was needed, but for the replacement, the period was in the right place, to the left of the superscript, but it was also superscript.
With Alt Search, the replacement was literally $2$1.
Any other thoughts?

Alt Search uses more standard nomenclature. For Replace you would use /2/1.

If the superscript numbers are autogenerated and use the “Footnote Anchor” character style, then a “simple” regex ([0-9]) does not find them – at least not on 6.0.2.1 under Ubuntu. :frowning: The AltSearch [::Footnote::] search will find them, but searches of this kind cannot be used with back references, or “subexpressions”, as the AltSearch help calls them. Catch 22.

In my tests on this intriguing problem, I have run up against an obstacle for each of the most obvious solutions:

  • when using built-in regex search for footnote anchor numbers, [0-9] does not find them;
  • and although AltSearch’s [::Footnote::] search does find them, this kind of search cannot be used with “back references”.

I have tested a solution that works OUTSIDE of LibreOffice. It requires:

  1. Saving the file as an *.fodt, or “flat” XML file; and
  2. Using a text editor with regex capabilities to modify that “flat” XML file.
  3. A “clean” source file, since the chances of success VERY MUCH depend on the nature of the code in this file.

I have put a copy of my test fodt file on Google Drive if you would like to see or use it. It’s just “Lorem ipsum” with strategically placed footnotes, mostly before the set ,;:.!?, but also a couple that should not be changed.

My text editor of choice is the lovely Textadept (cross platform) which uses the TRE regex engine. Loading my test-fn-punct0.fodt file in Textadept, and using this regular expression search string:

(<text:note .*text:note-class="foot.*<text:note-body>\n.*</text:note>)([,;:\.!?])

(including the \n code is essential) with this as the replace phrase:

\2\1

converts all the pre-punctuation footnotes into post-punctuation footnotes. This file can then be re-opened in LibreOffice Writer for further editing, or “save-as” .odt or whatever you like. The search in the editor looks like this:

textadept

Caveat - I have tried this with a “real world” academic article, and … the source code in the fodt file was a mess. There were loads and loads of <text:span text:style-name="Tnn"> ... </text:span> wrappers, and I have no idea where they came from or how they’re being used.1 But it means that my regex, above, is pretty useless for this file. If there was a way of cleaning out those “text:span” wrappers (which are everywhere!), then it would be possible.

I don’t know if this is any use for @catbill, but it is at least one “solution”.


1 A little more investigating shows these to be related to officeooo:rsid numbers and autocreated “styles” – for a description of the “issue” see fdo#68183 – I have posted a Q&A about them which offers a bit of help for getting rid of them.