Lookup matching regular expression in Calc - Addition

This is an addition to the similarly named but Closed Question

I managed to get this working with LibreOffice Version: 6.0.7.3 and wanted to share the solution in more detail.

The Original Poster’s Question:

I want to implement classification based on regular expressions.
In sheet1 I have

A       B       C
Item    Class   Wanted Class
apples1 =???    apples
apples2         apples
apples3         apples
Pear A          pears
Pear B          pears
Melon           other

In sheet2 I have:

A           B
Regex       Class
.*apples.*  apples
.*Pear.*    pears
.*          other

What formula to put in sheet1.B so they have the same values like C?
Does Calc have a function that can be used in formula that makes regular expression replace?

The Solution:

On sheet1 in Cell B2 Put the following:
=INDIRECT("$Sheet2.B"&MATCH(1,COUNTIF(A2,$Sheet2.$A$1:$A$4),0))

*Note: I don’t have the header row “Regex | Class” in my sheet2. I recommend deleting this header row so that the COUNTIF function doesn’t try to evaluate the header row A1


Explanation of what is happening:

COUNTIF() is one of the few functions that can evaluate a regular expression.
Substituting an array of regular expressions doesn’t seem to give COUNTIF() any trouble, by it’s self
=COUNTIF(A2,$Sheet2.$A$1:$A$4) just evaluates the A2 against Sheet2.A1 but when MATCH() which can take an array as input, is added; COUNTIF() evaluates each expression in turn and MATCH() returns the first that matches. i.e. The index of the first Regular Expression that Matches the string will be returned

If shee1.A3 contained apples and pears then apples would be returned because .*apples.* is the first regular expression to match.

The INDIRECT() function just takes the result of the MATCH() function, which is the index (row) of the regular expression that matched and returns the corresponding value in sheet2 column B.

=INDEX($Sheet2.$B$2:$B$4,MATCH(1,COUNTIF(A2,$Sheet2.$A$2:$A$4),0))

INDIRECT is evil!

What about OFFSET()?

(And: We still should use the semicolon as the default -globally understood- argument separator, imo.)

because the pair of INDEX|MATCH seems more straight-forward and easier!

of course we should, but I was not sure what happens if OP trows a semicolon-seperated Formula on his own comma-seperated Locale-Setting

My related comments here should still be valid.

Wow what a lively community we have here!

I prefer a semicolon, but don’t have the time or motivation to find out where to set it. I’m in Australia and use UK English, date and 24h time setting.

I found the original post and had to do some work to figure out how to get it to work. I was just sharing that work for others.

Lovely to see so many comments. :slight_smile:

The locale is the language which is used to transfer input into decimals, dates, times etc. See Tools>Options>LanguageSettings>Languages>Locale.

Semicolon as list separator is unambiguous and works with any locale. Comma as list separator works only if the decimal separator is a point as in English, Chinese, Japanese,… but not in French, Russian, German,… where the comma is the decimal separator.
If I paste a formula with comma separated arguments into my Calc, I get an error value because of my German locale. We can exchange semicolon formulas without any conflict. Internally, the formula is stored with semicolon anyway.