Select portion of text with F&R and Regex

simurq · January 11, 2021, 11:30am

Hi there,

I’m trying to select the following portion of formatted text between 2 chapters (Тема ХХ) with F&R and Regex. But I think I’m missing something in the pattern. Can you please advise? Thanks!

I’m using this simple greedy pattern to no avail yet: (тема \d{,2}).*(тема \d{,2})

simurq · January 11, 2021, 4:59pm

I’ve attached a sample document to this post for testing. I’ve also tried all your suggestions, nothing works yet. But thank you anyway!

sample.odt

Lupp · January 11, 2021, 8:52pm

Please edit the question for siuch purposes (additional information, uploads, …)
The answer tool should only be used for answers, and no user is allowed a second answer to the same question here. (Edit and comments again,)

Lupp · January 11, 2021, 12:35pm

-1- The {,n} quantifyer (with a default lower bound - which one?) is not supported. Use \d{1, 2} (or \d+ with its more general but probably acceptable matching).

-2- F&R cannot search beyond the next paragraph break (with a few specialized exceptions).

The REGEX() function (available in LibO v 6.2 or higher) will accept the .String content of a complete .Text object (if there isn’t a limitation to the number of characters I don’t know of), but its usage in Writer requires a bit of programming.

[edit 2020-01-11 about 13:40 UTC]
Applied the strikeout since I don’t feel sure about the relevant part of the statement.
Will probably try later to get it right.
Killed the strikeout again. Explanation below.
[/edit]
[edit 2020-01-11 about 20:45 UTC]
Attach a very raw and sloppy “proof of concept” for what I meant when talking of the REGEX() function.
It may be a bit tricky anyway because REGEX() seems to not accept CR (or CR LF) for the special character . for “any”. A substitution is needed.
I wouldn’t want to base serious work on that routine, but you may enhance it.
ask286951findAccrossBreaksSample_re.odt
[/edit]

keme · January 11, 2021, 1:48pm

The regex implementations I have seen will assume default from = zero, so :

(тема \d{0,2}).*(тема \d{0,2})

However, as @Lupp indicated, Writer does not perform matching across paragraph boundary, so the pattern will never match a heading + body + heading sequence (which will cross at least two paragraph boundaries, probably more).

Also, if the leading ТЕМА is not explicitly typed into the heading, but rather part of the numbering scheme, it is not detected by find/replace.

It may be possible to offer better help if you

supply a sample document, not just a picture of the document

A copy of your document, with only a page or two, should suffice.
tell us why. What is it you are trying to do? Why do you need that particular match?

Lupp · January 11, 2021, 2:04pm

I tried the {,n} quantifier in LibreOffice recently, and it failed.
LibO uses the ICU RegEx engine, and Regular Expressions - ICU Documentation also doesn’t list the construct.
See also Regex Tutorial - Repetition with Star and Plus .