Regular expression not working

asked 2019-11-17 12:19:08 +0100

torreone gravatar image

updated 2019-11-17 14:38:49 +0100

I'm doing a search using the following regex patterns separately

a) .$ (analogous to .? $)

b) $

The pattern a detects all paragraphs, wherever they are, including those in nested table cells consisting of at least one character, including the space. But it does NOT detect empty paragraphs instead

Pattern b detects the end of all paragraphs, including empty ones, but DOES NOT detect an empty paragraph preceding a table or even paragraphs in table cells containing a single paragraph.

I wrote a simple regex that theoretically should include both patterns, namely (. $ | $) But I also tried ($ |. $) And (^ $ |. $) And ^ $ |. $

But this compound pattern makes only the results rendered by the pattern a, so it doesn't make the paragraphs empty.

Why ?C:\fakepath\dumDaCanc.odt

edit retag flag offensive close merge delete

Comments

I cannot easily understand the problem from the description - because it lacks one simple thing: a sample document, where I could easily see which od expected places are not found using either of the regexes (which should have been marked as preformatted text, btw, to avoid deformation by site's formatting engine). Please provide a sample (with inline comments), so that testing would be easier.

Mike Kaganski gravatar imageMike Kaganski ( 2019-11-17 13:22:28 +0100 )edit

@Mike Kaganski , very thanks. The regex are (with regular expression check box set to true)

a) .$   (point+dollar)   //  This 'a' pattern works, either alone or together with the pattern 'b' in 'c' pattern
b)  $   (only dollar)     //  This 'b' pattern works only alone, not work  in pattern 'c'
c)  .$|$   or equivalent   (.$|$)   //  In this compound pattern 'c' works only the first part

I have attached the file, in the post that opens the discussion I explained what it reads and what is found e what is non found with the 3 pattern.

However, by performing the searches separately using only the 'a' pattern first and only the 'b' pattern after we see very well that running the pattern 'c', which should include the two patterns 'a' and 'b', come out only the results of 'a', not of 'a' union 'b' In adding, inserting pattern 'b' between round brackets

($)

don't work !!!

torreone gravatar imagetorreone ( 2019-11-17 14:01:20 +0100 )edit

I have attached the file

Unfortunately I don't see it

Mike Kaganski gravatar imageMike Kaganski ( 2019-11-17 14:33:31 +0100 )edit

https://ask.libreoffice.org/upfiles/1...

I work with Version: 6.2.6.2

torreone gravatar imagetorreone ( 2019-11-17 14:39:25 +0100 )edit

I don't even think the problem is the pipe. Even the

 .?$

pattern detect all paragraph containing at least one character, but not detect empty paragraphs (you can check in the sample file that I have attached) or even single paragraphs in table cells

torreone gravatar imagetorreone ( 2019-11-17 19:07:37 +0100 )edit

What do you want your regex to match, exactly?

  • Space at the end of any paragraph?
  • paragraphs which are, or appear, empty (only whitespace)?
  • Something else?
keme gravatar imagekeme ( 2019-11-17 23:24:06 +0100 )edit
1

As I mentioned, $ alone is treated specially.

When your pattern has "something and then $", the "$" is not something to match, but rather a postcondition: "that something must immediately precede end-of-paragraph". The end of paragraph itself is not included into the match, even if the preceding pattern is like .? or \s?, note this!

But when the pattern is a lone $, it's treated specially (a similar special treatment is for ^$, by the way), and in that case, the end-of-paragraph itself is matched. You cannot have that with whatever but a single-character $.

Mike Kaganski gravatar imageMike Kaganski ( 2019-11-18 00:01:16 +0100 )edit

@keme and @Mike Kaganski , thanks. The real objective of this request is to use the search to quickly find all the paragraphs in large pieces of text, empty and full without using the enumeration object. Applying the textParagraph property to each instance found I can send a reference to the paragraph, its position in the document, etc.

I'm testing a code for multiple paragraph selecting starting from help that mike provided me with another discussion a few days ago. Most of my question are related with this problem.

The search recognizes all the paragraphs even if there are tables or nested tables, the enumeration works well on plain text but stops and requires further processing when it encounters tables. The most important thing is to find the non-empty paragraphs, and for this reason the pattern "a" is enough for me.

But I'm also interested in the why of ...(more)

torreone gravatar imagetorreone ( 2019-11-18 00:14:04 +0100 )edit

@Mike Kaganski : I have to think on your answer, today I read something similar on other sites that explain the functioning of the regex engine, they are not immediate concepts, thanks for now

Furthermore, I still don't understand how this forum works. I discovered your answers only by chance, in the list of discussions ordered by date or activity there was only my last intervention dating back to several hours ago.

How do I sort the discussions based on the latest update, new comments or edit existing pre-existing comments?

torreone gravatar imagetorreone ( 2019-11-18 00:19:55 +0100 )edit