Finding text in multiple formats

wf · September 5, 2018, 7:12pm

I have a document with text in italics followed by a period, a variable number of spaces, and then non-italics text. Sometimes the period and the spaces are in italics and sometimes not. I want to find all of the italics text followed by a period and spaces and replace the period and spaces with a non-italics colon and a single space. The regular expression to do this is pretty simple: I find ([a-z]). + and replace it with $1: . The problem is the formatting. If I set the find term to no format, then it picks up on every sentence of non-italics text that ends with a period and a space. If I set the find term to Italics, then it only picks on those places where the period and spaces are in italics, which fails to catch a lot of what I want to change. I have tried selecting just the ([a-z]) and setting that to italics, and the . + and turning off formatting for that, but LibreOffice does not seem to support that functionality (or I am going about it incorrectly). Furthermore, I cannot figure out how to get the replacement text to be partially in italics (the $1 match) and partially not (the : ). Either the colon appears in italics or the text is set to non-italics. Here is an example where you can test this. (As you’ll see, the crucial point is that the italics sometimes include the period and spaces and sometimes don’t.) Thank you in advance for any ideas.

Here is an example. This is the first one. And another. And Another.

*Another example. * With. multiple. sentences.

And a third example for good measure. Try it. You’ll see.

mikekaganski · September 5, 2018, 7:50pm

Unfortunately, it seems that when using (negative)? look-(ahead|behind) assertions, we cannot use back references in replacement box. So my initial idea to use search string like ([a-z])(?=\. ) failed.

Anyway, you could use a three-stage procedure.

Find all of ([a-z])\b formatted italics and replace with $1{enditalics} (the string “{enditalics}” is expected to be absent in the text prior to the operation).
Find all of \{enditalics\}[.] + with formatting turned off and replace with : with non-italics formatting.
Find all \{enditalics\} left (or simply {enditalics} with regexes disabled) and replace with empty string.

wf · September 6, 2018, 1:12pm

Thank you, your suggestion worked. I do wish that LO had the native ability to specify different formats for different portions of the find field. Perhaps that is worth a feature request.