Can Regex help find missing closing »?

I have been counting the « and » in a big document and the numbers do not match.
« 3576
» 3565
I have tried this ([«|„|"|“])(\s*)(.+?)(\s*)([»|“|"|”]) to change the font of the whole and then search for « and » with automatic font. I found a few but some closing » as you can see are still missing.
Is there a way to look for « not followed by »?

I have tried this ([«|„|"|“])(\s*)(.+?)(\s*)([»|“|"|”])

[«|„|"|“] means not “any of listed quote characters”, but "any of «, , ", , or |" - note that the pipe symbol is also included, because you used the [] set syntax, where many special characters are treated as normal. You needed simply [«„"“].

You likely need something like


to exclude the closing quotations in the text … but then nested quoting would need special treatment.

Thank you, I tried ([«„"“])(\s*)(.)(\s)([»“"”]) and it does work. Now, if in a sentence I have:
Bla bla «abc», bla «abc ‹ab›bla»abc. This expression will select everything until the last » before the end of the paragraph. Is it possible to tell it to stop after each complete set: «abc»?

Thank you, I just realized, trying your suggested regex that you solved this last question. Now I get «abc» selected individually. Any ideas on how to search for «not followed by»?

I assume that “«not followed by»” means “«not followed by» followed by another « or end of paragraph”:


but likely you would need to look for individual pairs separately, to not interfere with nested quotations.


I think that regexp’s have too weak an expressive power to correctly handle simultaneously nested quotations and unbalanced marks. Also + and * have the “greedy” property, meaning they will try to match as many characters as possible, not stopping at the shortest pattern. IMHO you need a different tool than mere regexp’s. Unfortunately, external tool will work on plain text (losing formatting). Or they can be used to tell you there is an error in the plain text and you fix it in Writer (iterating over the process of exporting and testing).

Also + and * have the “greedy” property

… which is why @Earendil used the ? in the (.+?), to make it not greedy.

Thank you @mikekaganski This is working for me: ([«„"“])\s*([^»“"”]+?)\s*([«„"“]|$)
and I hope to be able to have a try at «\s*([^»]+?)\s*(«|$) too
I am really grateful for your help. I am not sure if I marked correctly your answer as the correct one.

@mikekaganski: thanks for correcting me, I was not careful enough when reading the regexp. Anyway, I still feel that regexps are not the best tool when trying to check for balanced delimiters. Unfortunately there is nothing else inside Writer.

I confirm that both ([«„"“])\s([^»“"”]+?)\s([«„"“]|$) and «\s([^»]+?)\s(«|$) worked very well for me! Thanks a lot!