Ask Your Question
2

Regular expression references not working [closed]

asked 2013-12-31 05:43:56 +0200

eaglgenes101 gravatar image

updated 2014-01-02 03:01:28 +0200

([:print:]+)\p\1

I have the alt find and replace extension for libreoffice. As I understand regexes, the one above should match for duplicate paragraphs like:

This is a paragraph.

This is a paragraph.

But it doesn't. Yet ([:print:]+)\p([:print:]+) matches any two consecutive paragraphs.
Am I understanding regex references wrong?

edit retag flag offensive reopen merge delete

Closed for the following reason the question is answered, right answer was accepted by Alex Kemp
close date 2015-11-17 01:42:03.204825

2 Answers

Sort by » oldest newest most voted
1

answered 2013-12-31 16:04:20 +0200

David gravatar image

updated 2014-01-01 11:48:19 +0200

An interesting question. For clarification, are your "duplicate paragraphs" also contiguous (i.e., next to each other)? or separated in the file? I'm assuming the former (e.g., in a sorted file, and looking for duplicates together).

In fact, I can't seem to get your "working" expression to find anything in the AltSearch extension (assuming that is what you're using?).

The help file for AltSearch suggests there might be issues in using back-references (and more widely on that page).

Some older OpenOffice forum threads suggest that searching across paragraph boundaries is impossible (also, an older one), but I'm not exactly sure if that's the issue here.

In much experimenting both with regular expressions in the "normal" CTRL-H dialog, and with the AltSearch extension, I couldn't manage to find duplicate paragraphs. I would be fascinated to see a solution to this one!

Update: On a different machine now, and the expression eaglgenes101 provided does "work" - it finds two consecutive paragraphs. The explanation for why it is finding any two contiguous paragraphs, and not two successive identical paragraphs, is that ([:print:]+)\p([:print:]+) does not provide a "back reference".

In other words, it finds one set of printable characters ([:print:]), followed by end-of-paragraph, followed by another set of printable characters, but there's nothing in the expression to make those two sets of [:print:] sequences to be the same. That's the job of "grouping and back-references", and you would normally use \1 to refer back to the first grouped sequence (and \2, if there is a second grouped sequence, and so on). The expression ought, then, to look something like ([:print:]+)\p(\1) ... but that doesn't work in AltSearch.

So there's a bit of a "Catch-22" here. AltSearch can find matches across paragraph boundaries, but it seems that its back-references are broken (well, "limited") in searches in some situations, including this scenario. On the other hand, back references work fine in LibO's CTRL-H + regex searching, but in this case the limitation is that you can't (apparently) search across paragraph boundaries.

It looks to me that this problem has been registered in the bug tracker at fdo#58744 (to which I've added a comment and a link to this thread). It would be VERY good to have this "fixed", enhancement, developed, whatever. Maybe in 2014? (Updating on New Year's Day!)

edit flag offensive delete link more

Comments

Tried my working regex again. It still works. Just ot clarify, the paragraphs have to be next to each other to match. http://imgur.com/0d3DVkG

eaglgenes101 gravatar imageeaglgenes101 ( 2014-01-01 07:14:51 +0200 )edit

Just looking at this now in Writer Version: 4.3.0.4, and it seems I need to use $1 style back references, as the \1 style doesn't seem to work. FWIW.

David gravatar imageDavid ( 2014-08-23 23:01:46 +0200 )edit
1

answered 2013-12-31 07:38:44 +0200

L-user gravatar image

updated 2013-12-31 07:41:47 +0200

I can't answer your question directly (I am not using replace extension, I have no such requirements)...

Funny thing about regular expressions is that there are many variants of regular expressions and so learning and understanding one system in one software is no guarantee that the same logic can be applied for another product. LibreOffice is using regular expressions from this system: http://userguide.icu-project.org/strings/regexp#TOC-Regular-Expression-Metacharacters

edit flag offensive delete link more

Question Tools

1 follower

Stats

Asked: 2013-12-31 05:43:56 +0200

Seen: 756 times

Last updated: Jan 02 '14