Ask Your Question
2

Lookup matching regular expression in Calc [closed]

asked 2013-03-15 15:22:52 +0100

nb gravatar image

updated 2013-05-27 16:25:32 +0100

oweng gravatar image

I want to implement classification based on regular expressions. In sheet1 I have

A       B       C
Item    Class   Wanted Class
apples1 =???    apples
apples2         apples
apples3         apples
Pear A          pears
Pear B          pears
Melon           other

In sheet2 I have:

A           B
Regex       Class
.*apples.*  apples
.*Pear.*    pears
.*          other

What formula to put in sheet1.B so they have the same values like C? Does Calc have a function that can be used in formula that makes regular expression replace?

edit retag flag offensive reopen merge delete

Closed for the following reason the question is answered, right answer was accepted by Alex Kemp
close date 2016-02-19 05:51:09.391355

Comments

Do you mean this?

JohnSUN gravatar imageJohnSUN ( 2013-03-15 16:11:30 +0100 )edit

Yes, I want to use Calc exactly for the same purpose :)<br>However the solution there is not what is needed as far as I understand. It matches a pattern made of all words of the item against the class. When the item and class contain common words there will be wrong classifications. What is better the class to be a pattern that can be tuned to be unique even containing common words. I managed to get what i want using the array formula:

nb gravatar imagenb ( 2013-03-15 22:21:04 +0100 )edit

=INDEX(classes.$B$2:$B$10000,MATCH(0,ISERROR(SEARCH(regex,C406)), 0),1). The idea is that we have an array of regular expressions tested against the current row and then the row of the array element that produced a match is returned as a single result. regex is a named range defined as $classes.$A$2:$A$17, the class names are in the B column

nb gravatar imagenb ( 2013-03-15 22:32:27 +0100 )edit

Hi @nb, Could you please post the same content into an Answer? That would help us to resolve this question, and make it easier for any user who reads your question to find your Answer.

Thanks!

qubit gravatar imagequbit ( 2013-03-17 05:00:09 +0100 )edit
1

Does the INDEX method proposed in comment #3 above work? I couldn't get it to work here with the named ranges indicated.

oweng gravatar imageoweng ( 2013-05-27 16:29:09 +0100 )edit

1 Answer

Sort by » oldest newest most voted
1

answered 2014-01-29 22:07:44 +0100

hunteke gravatar image

updated 2014-01-29 22:09:24 +0100

I do not know how @nb solved the problem, but I offer a solution to the problem as described with COUNTIF. There are only a handful of functions that interpret strings as regular expressions, and COUNTIF is one of them. So, treating 1 as a TRUE value, and 0 as a FALSE value:

  A           B
Item        Class
apples1  =IF(COUNTIF(A2, ".*apple.*"), "apples", IF(COUNTIF(A2, ".*pear.*"), "pears", "other")
apples2  =IF(COUNTIF(A3, ".*apple.*"), "apples", IF(COUNTIF(A3, ".*pear.*"), "pears", "other")
apples3  =IF(COUNTIF(A4, ".*apple.*"), "apples", IF(COUNTIF(A4, ".*pear.*"), "pears", "other")
Pear A   =IF(COUNTIF(A5, ".*apple.*"), "apples", IF(COUNTIF(A5, ".*pear.*"), "pears", "other")
Pear B   =IF(COUNTIF(A6, ".*apple.*"), "apples", IF(COUNTIF(A6, ".*pear.*"), "pears", "other")
Melon    =IF(COUNTIF(A7, ".*apple.*"), "apples", IF(COUNTIF(A7, ".*pear.*"), "pears", "other")

The basic IF statement format is TEST, TRUE_ACTION, FALSE_ACTION. Since Calc doesn't care if we use multiple lines in the formula, we can break it up into a more manageable version with nested IF functions. Consider:

=IF(TEST1, TRUE_TEST1,
 IF(TEST2, TRUE_TEST2,
 IF(TEST3, TRUE_TEST3, ... )))

This makes writing the above formula more approachable, in, say, a text editor (Notepad, Gedit, TextWrangler, etc.). For example:

=IF(COUNTIF(A2, ".*apple.*"), "ANSWER IF CONTAINS APPLES",
 IF(COUNTIF(A2, ".*pear.*"), "ANSWER IF CONTAINS PEARS",
 "UNKNOWN FALLBACK"))

Then, once the formula is written, just fill down.

edit flag offensive delete link more

Comments

You are missing parenthesis in the first example.

Nicolas Raoul gravatar imageNicolas Raoul ( 2018-01-26 06:07:14 +0100 )edit

Question Tools

Stats

Asked: 2013-03-15 15:22:52 +0100

Seen: 5,241 times

Last updated: Jan 29 '14