Ask Your Question
0

Calc: regex function

asked 2019-06-30 12:34:52 +0200

McMc gravatar image

updated 2019-06-30 13:07:39 +0200

Dennis R. gravatar image

In Calc (Version 6.2.4.2) there is a specific function available which is called »regex«. (https://help.libreoffice.org/6.2/en-U...)

Basically this is a neat feature, but the documentation doesn't help much of WHICH format the regex should be. The link to the ICU irritates more than it helps.

What I want to accomplish: Searching for a string (DOI format) in a cell.

The regex in Python 3.7 for this:

\b(10[.][0-9]{4,}(?:[.][0-9]+)*\/(?:(?![\"&\'<>])\S)+)\b

Test strings:

Comput Methods Programs Biomed. 2018 Nov;166:33-38. doi: 10.1016/j.cmpb.2018.09.006. Epub 2018 Sep 12. Forensic Sci Int Genet. 2019 Jan;38:39-47. doi: 10.1016/j.fsigen.2018.10.005. Epub 2018 Oct 9. Surv Ophthalmol. 2019 Mar - Apr;64(2):233-240. doi: 10.1016/j.survophthal.2018.09.002. Epub 2018 Sep 22. Review.

Regex101 matches correctly. Calc returns error code 508 after this formula: =REGEX(D2;"\b(10[.][0-9]{4,}(?:[.][0-9]+)*\/(?:(?![\"&\'<>])\s)+)\b"))

Which »Regex language« is needed for this function?

edit retag flag offensive close merge delete

1 Answer

Sort by » oldest newest most voted
0

answered 2019-06-30 13:06:59 +0200

Dennis R. gravatar image

Works like a charm, simply put your regex in another cell and use the refence to that in the formula. The \" part is a bit misleading for the engine...

Regex

edit flag offensive delete link more

Comments

The \" part is a bit misleading for the engine...

Why is this working and not as a string directly in the formula? I didn't try that though. Thank you very much!

McMc gravatar imageMcMc ( 2019-06-30 13:58:46 +0200 )edit

Because you use " in the middle of a string that you pass to a function, and that character naturally ends the string, everything else being trailing garbage giving wrong syntax for the spreadsheet formula. You need to escape the double quote in that case, using second double-quote, so that the formula looks like

=REGEX(D2;"\b(10[.][0-9]{4,}(?:[.][0-9]+)*\/(?:(?![\""&\'<>])\S)+)\b")

This is not about specific REGEX function, this is basic spreadsheet usage question about using double quotes in formula strings.

The link to the ICU irritates more than it helps

Well - LibreOffice uses ICU regex engine. So - that is the authoritative reference to the supported syntax.

Mike Kaganski gravatar imageMike Kaganski ( 2019-06-30 16:06:21 +0200 )edit

Thank you for the clarification.

McMc gravatar imageMcMc ( 2019-06-30 17:20:25 +0200 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2019-06-30 12:34:52 +0200

Seen: 78 times

Last updated: Jun 30