Ask LibreOffice - RSS feedhttps://ask.libreoffice.org/en/questions/Questions and answers for LibreOfficeenTue, 21 Nov 2017 10:51:44 +0100Counting rows where specific combination of values from two columns occurhttps://ask.libreoffice.org/en/question/138593/counting-rows-where-specific-combination-of-values-from-two-columns-occur/Hi!
I will try to explain what I'm looking for with an example.
Given the table below
| A B C
-|---------------------
1| AA- XYZ a
2| AD- ZYX b
3| AA- AB- c
4| XYZ ZYX d
5| ZYX DB_ e
6| AC- DA_ f
7| DA_ AB- g
(see CSV at the end)
I need to count all the rows where the first (A) and second (B) column **both** contain a value that belongs to the set `{AA-, AB-, AC-, AD-, DA_, DB_, DC_, DD_}`. For the example above only row 3, 6 and 7 will be taken into consideration because both of the "important" columns contain values that are from that set.
Using `COUNTIFS` I tried to come up with a regular expression. I've used the `?` since the values are limited and there is not chance of having something like `AX-` or `DG_`. Even the values that are not allowed are a part of a set on their own but since it's a larger set I have decided to use inclusion instead of exclusion:
=COUNTIFS(A1:A7, "[A?-|D?_]", B1:B7, "[A?-|D?_]")
Since I'm new to LibreOffice and Calc in particular I was hoping to get an explanation why this expression always returns and how to fix it.
I need this in order to verify some results from a software I'm working on which uses a similar table to extract the respective rows and check some of their values. Manual verification is really tedious. Just one table only contains ~90K rows and the set of allowed values is much larger which leads to a huge pool of possible combinations. That is why I decided to check some of the stats my software prints out and compare those with the results from LibreCalc.
---
The example table from above as CSV (delimiter is space):
AA- XYZ a
AD- ZYX b
AA- AB- c
XYZ ZYX d
ZYX DB_ e
AC- DA_ f
DA_ AB- gTue, 21 Nov 2017 09:44:10 +0100https://ask.libreoffice.org/en/question/138593/counting-rows-where-specific-combination-of-values-from-two-columns-occur/Answer by pierre-yves samyn for <p>Hi!</p>
<p>I will try to explain what I'm looking for with an example.</p>
<p>Given the table below</p>
<pre><code> | A B C
-|---------------------
1| AA- XYZ a
2| AD- ZYX b
3| AA- AB- c
4| XYZ ZYX d
5| ZYX DB_ e
6| AC- DA_ f
7| DA_ AB- g
</code></pre>
<p>(see CSV at the end)</p>
<p>I need to count all the rows where the first (A) and second (B) column <strong>both</strong> contain a value that belongs to the set <code>{AA-, AB-, AC-, AD-, DA_, DB_, DC_, DD_}</code>. For the example above only row 3, 6 and 7 will be taken into consideration because both of the "important" columns contain values that are from that set.</p>
<p>Using <code>COUNTIFS</code> I tried to come up with a regular expression. I've used the <code>?</code> since the values are limited and there is not chance of having something like <code>AX-</code> or <code>DG_</code>. Even the values that are not allowed are a part of a set on their own but since it's a larger set I have decided to use inclusion instead of exclusion:</p>
<pre><code>=COUNTIFS(A1:A7, "[A?-|D?_]", B1:B7, "[A?-|D?_]")
</code></pre>
<p>Since I'm new to LibreOffice and Calc in particular I was hoping to get an explanation why this expression always returns and how to fix it.</p>
<p>I need this in order to verify some results from a software I'm working on which uses a similar table to extract the respective rows and check some of their values. Manual verification is really tedious. Just one table only contains ~90K rows and the set of allowed values is much larger which leads to a huge pool of possible combinations. That is why I decided to check some of the stats my software prints out and compare those with the results from LibreCalc.</p>
<hr>
<p>The example table from above as CSV (delimiter is space):</p>
<pre><code>AA- XYZ a
AD- ZYX b
AA- AB- c
XYZ ZYX d
ZYX DB_ e
AC- DA_ f
DA_ AB- g
</code></pre>
https://ask.libreoffice.org/en/question/138593/counting-rows-where-specific-combination-of-values-from-two-columns-occur/?answer=138600#post-id-138600Hi
**[EDIT]**
I delete my answer that does not correspond to the need of the OP
<strike>You have to escape the character `-` ("between" operator, see [help](https://help.libreoffice.org/Common/List_of_Regular_Expressions))
=COUNTIFS(A1:A7; "[A?\-|D?_]"; B1:B7; "[A?\-|D?_]")
</strike>
RegardsTue, 21 Nov 2017 10:04:15 +0100https://ask.libreoffice.org/en/question/138593/counting-rows-where-specific-combination-of-values-from-two-columns-occur/?answer=138600#post-id-138600Comment by Mike Kaganski for <p>Hi</p>
<p><strong>[EDIT]</strong></p>
<p>I delete my answer that does not correspond to the need of the OP</p>
<p><strike>You have to escape the character <code>-</code> ("between" operator, see <a href="https://help.libreoffice.org/Common/List_of_Regular_Expressions">help</a>)</strike></p><strike>
<pre><code>=COUNTIFS(A1:A7; "[A?\-|D?_]"; B1:B7; "[A?\-|D?_]")
</code></pre>
</strike><p><strike></strike></p>
<p>Regards</p>
https://ask.libreoffice.org/en/question/138593/counting-rows-where-specific-combination-of-values-from-two-columns-occur/?comment=138603#post-id-138603This regex is incorrect wrt what OP wants: it returns true only if the cell contain only one character in the set `A, ?, -, |, D, _`Tue, 21 Nov 2017 10:20:14 +0100https://ask.libreoffice.org/en/question/138593/counting-rows-where-specific-combination-of-values-from-two-columns-occur/?comment=138603#post-id-138603Comment by rbaleksandar for <p>Hi</p>
<p><strong>[EDIT]</strong></p>
<p>I delete my answer that does not correspond to the need of the OP</p>
<p><strike>You have to escape the character <code>-</code> ("between" operator, see <a href="https://help.libreoffice.org/Common/List_of_Regular_Expressions">help</a>)</strike></p><strike>
<pre><code>=COUNTIFS(A1:A7; "[A?\-|D?_]"; B1:B7; "[A?\-|D?_]")
</code></pre>
</strike><p><strike></strike></p>
<p>Regards</p>
https://ask.libreoffice.org/en/question/138593/counting-rows-where-specific-combination-of-values-from-two-columns-occur/?comment=138602#post-id-138602Hmm, doesn't work. I even removed the `---` and `___` from a couple of rows and still nothing. Then I went further and created 5 rows with only `AAA` and `BBB` in each column (of course I adapted the reg ex you've kindly provided to fit the new pattern) - nothing.Tue, 21 Nov 2017 10:19:00 +0100https://ask.libreoffice.org/en/question/138593/counting-rows-where-specific-combination-of-values-from-two-columns-occur/?comment=138602#post-id-138602Comment by rbaleksandar for <p>Hi</p>
<p><strong>[EDIT]</strong></p>
<p>I delete my answer that does not correspond to the need of the OP</p>
<p><strike>You have to escape the character <code>-</code> ("between" operator, see <a href="https://help.libreoffice.org/Common/List_of_Regular_Expressions">help</a>)</strike></p><strike>
<pre><code>=COUNTIFS(A1:A7; "[A?\-|D?_]"; B1:B7; "[A?\-|D?_]")
</code></pre>
</strike><p><strike></strike></p>
<p>Regards</p>
https://ask.libreoffice.org/en/question/138593/counting-rows-where-specific-combination-of-values-from-two-columns-occur/?comment=138604#post-id-138604I'm looking for matching the complete AA-, AB- etc. and not just a chunk of it.Tue, 21 Nov 2017 10:23:26 +0100https://ask.libreoffice.org/en/question/138593/counting-rows-where-specific-combination-of-values-from-two-columns-occur/?comment=138604#post-id-138604Comment by pierre-yves samyn for <p>Hi</p>
<p><strong>[EDIT]</strong></p>
<p>I delete my answer that does not correspond to the need of the OP</p>
<p><strike>You have to escape the character <code>-</code> ("between" operator, see <a href="https://help.libreoffice.org/Common/List_of_Regular_Expressions">help</a>)</strike></p><strike>
<pre><code>=COUNTIFS(A1:A7; "[A?\-|D?_]"; B1:B7; "[A?\-|D?_]")
</code></pre>
</strike><p><strike></strike></p>
<p>Regards</p>
https://ask.libreoffice.org/en/question/138593/counting-rows-where-specific-combination-of-values-from-two-columns-occur/?comment=138606#post-id-138606@Mike Kaganski - You're right of course, I read the question too quickly :(
@rbaleksandar - Sorry for the loss of time.Tue, 21 Nov 2017 10:31:05 +0100https://ask.libreoffice.org/en/question/138593/counting-rows-where-specific-combination-of-values-from-two-columns-occur/?comment=138606#post-id-138606Comment by rbaleksandar for <p>Hi</p>
<p><strong>[EDIT]</strong></p>
<p>I delete my answer that does not correspond to the need of the OP</p>
<p><strike>You have to escape the character <code>-</code> ("between" operator, see <a href="https://help.libreoffice.org/Common/List_of_Regular_Expressions">help</a>)</strike></p><strike>
<pre><code>=COUNTIFS(A1:A7; "[A?\-|D?_]"; B1:B7; "[A?\-|D?_]")
</code></pre>
</strike><p><strike></strike></p>
<p>Regards</p>
https://ask.libreoffice.org/en/question/138593/counting-rows-where-specific-combination-of-values-from-two-columns-occur/?comment=138608#post-id-138608No problem. :DTue, 21 Nov 2017 10:33:12 +0100https://ask.libreoffice.org/en/question/138593/counting-rows-where-specific-combination-of-values-from-two-columns-occur/?comment=138608#post-id-138608Answer by Mike Kaganski for <p>Hi!</p>
<p>I will try to explain what I'm looking for with an example.</p>
<p>Given the table below</p>
<pre><code> | A B C
-|---------------------
1| AA- XYZ a
2| AD- ZYX b
3| AA- AB- c
4| XYZ ZYX d
5| ZYX DB_ e
6| AC- DA_ f
7| DA_ AB- g
</code></pre>
<p>(see CSV at the end)</p>
<p>I need to count all the rows where the first (A) and second (B) column <strong>both</strong> contain a value that belongs to the set <code>{AA-, AB-, AC-, AD-, DA_, DB_, DC_, DD_}</code>. For the example above only row 3, 6 and 7 will be taken into consideration because both of the "important" columns contain values that are from that set.</p>
<p>Using <code>COUNTIFS</code> I tried to come up with a regular expression. I've used the <code>?</code> since the values are limited and there is not chance of having something like <code>AX-</code> or <code>DG_</code>. Even the values that are not allowed are a part of a set on their own but since it's a larger set I have decided to use inclusion instead of exclusion:</p>
<pre><code>=COUNTIFS(A1:A7, "[A?-|D?_]", B1:B7, "[A?-|D?_]")
</code></pre>
<p>Since I'm new to LibreOffice and Calc in particular I was hoping to get an explanation why this expression always returns and how to fix it.</p>
<p>I need this in order to verify some results from a software I'm working on which uses a similar table to extract the respective rows and check some of their values. Manual verification is really tedious. Just one table only contains ~90K rows and the set of allowed values is much larger which leads to a huge pool of possible combinations. That is why I decided to check some of the stats my software prints out and compare those with the results from LibreCalc.</p>
<hr>
<p>The example table from above as CSV (delimiter is space):</p>
<pre><code>AA- XYZ a
AD- ZYX b
AA- AB- c
XYZ ZYX d
ZYX DB_ e
AC- DA_ f
DA_ AB- g
</code></pre>
https://ask.libreoffice.org/en/question/138593/counting-rows-where-specific-combination-of-values-from-two-columns-occur/?answer=138598#post-id-138598E.g., your reference values `AA-, AB-, AC-, AD-, DA_, DB_, DC_, DD_` are in D1-D8. Then, formula
=SUMPRODUCT(NOT(ISNA(VLOOKUP(A1:A7;$D$1:$D$8;1;0)));NOT(ISNA(VLOOKUP(B1:B7;$D$1:$D$8;1;0))))
does what you need.
If you enable regular expressions in formulas, then
=SUMPRODUCT(ISNUMBER(SEARCH("^(A.-|D._)$";A1:A7));ISNUMBER(SEARCH("^(A.-|D._)$";B1:B7)))
will also work. As well as this:
=COUNTIFS(A1:A7;"A.-|D._";B1:B7;"A.-|D._")
The explanation why the `[A?-|D?_]` expression is wrong:
1. The [] brackets in regex define a **set** - i.e., "in this place any of the following is allowed". So, in your example (`=COUNTIFS(A1:A7, "[A?-|D?_]", B1:B7, "[A?-|D?_]")`), you looked for cells containing one-char values in the set defined in the square brackets. Aside: the expression in square brackets has special treatment of `-`, as @pierre-yves samyn noted: it means range. So, your charset included A, *all chars* between ? and | inclusive, D, and _.
2. The intended usage of ? is also wrong. You implied it to mean "any character in this position; mandatory present". If used in regexes (outside of sets), the meaning of unescaped `?` is to mean "zero or one of previous expression".
See [here](http://userguide.icu-project.org/strings/regexp) the full list of regular expressions in ICU (used in LO).Tue, 21 Nov 2017 09:58:18 +0100https://ask.libreoffice.org/en/question/138593/counting-rows-where-specific-combination-of-values-from-two-columns-occur/?answer=138598#post-id-138598Comment by rbaleksandar for <p>E.g., your reference values <code>AA-, AB-, AC-, AD-, DA_, DB_, DC_, DD_</code> are in D1-D8. Then, formula</p>
<pre><code>=SUMPRODUCT(NOT(ISNA(VLOOKUP(A1:A7;$D$1:$D$8;1;0)));NOT(ISNA(VLOOKUP(B1:B7;$D$1:$D$8;1;0))))
</code></pre>
<p>does what you need.</p>
<p>If you enable regular expressions in formulas, then</p>
<pre><code>=SUMPRODUCT(ISNUMBER(SEARCH("^(A.-|D._)$";A1:A7));ISNUMBER(SEARCH("^(A.-|D._)$";B1:B7)))
</code></pre>
<p>will also work. As well as this:</p>
<pre><code>=COUNTIFS(A1:A7;"A.-|D._";B1:B7;"A.-|D._")
</code></pre>
<p>The explanation why the <code>[A?-|D?_]</code> expression is wrong:</p>
<ol>
<li>The [] brackets in regex define a <strong>set</strong> - i.e., "in this place any of the following is allowed". So, in your example (<code>=COUNTIFS(A1:A7, "[A?-|D?_]", B1:B7, "[A?-|D?_]")</code>), you looked for cells containing one-char values in the set defined in the square brackets. Aside: the expression in square brackets has special treatment of <code>-</code>, as <a href="/en/users/15459/pierre-yves-samyn/">@pierre-yves samyn</a> noted: it means range. So, your charset included A, <em>all chars</em> between ? and | inclusive, D, and _.</li>
<li>The intended usage of ? is also wrong. You implied it to mean "any character in this position; mandatory present". If used in regexes (outside of sets), the meaning of unescaped <code>?</code> is to mean "zero or one of previous expression".</li>
</ol>
<p>See <a href="http://userguide.icu-project.org/strings/regexp">here</a> the full list of regular expressions in ICU (used in LO).</p>
https://ask.libreoffice.org/en/question/138593/counting-rows-where-specific-combination-of-values-from-two-columns-occur/?comment=138599#post-id-138599Is it possible to really use `A?-`? As I wrote this is just a simple example but in reality the number of allowed values is rather big.Tue, 21 Nov 2017 10:00:54 +0100https://ask.libreoffice.org/en/question/138593/counting-rows-where-specific-combination-of-values-from-two-columns-occur/?comment=138599#post-id-138599Comment by rbaleksandar for <p>E.g., your reference values <code>AA-, AB-, AC-, AD-, DA_, DB_, DC_, DD_</code> are in D1-D8. Then, formula</p>
<pre><code>=SUMPRODUCT(NOT(ISNA(VLOOKUP(A1:A7;$D$1:$D$8;1;0)));NOT(ISNA(VLOOKUP(B1:B7;$D$1:$D$8;1;0))))
</code></pre>
<p>does what you need.</p>
<p>If you enable regular expressions in formulas, then</p>
<pre><code>=SUMPRODUCT(ISNUMBER(SEARCH("^(A.-|D._)$";A1:A7));ISNUMBER(SEARCH("^(A.-|D._)$";B1:B7)))
</code></pre>
<p>will also work. As well as this:</p>
<pre><code>=COUNTIFS(A1:A7;"A.-|D._";B1:B7;"A.-|D._")
</code></pre>
<p>The explanation why the <code>[A?-|D?_]</code> expression is wrong:</p>
<ol>
<li>The [] brackets in regex define a <strong>set</strong> - i.e., "in this place any of the following is allowed". So, in your example (<code>=COUNTIFS(A1:A7, "[A?-|D?_]", B1:B7, "[A?-|D?_]")</code>), you looked for cells containing one-char values in the set defined in the square brackets. Aside: the expression in square brackets has special treatment of <code>-</code>, as <a href="/en/users/15459/pierre-yves-samyn/">@pierre-yves samyn</a> noted: it means range. So, your charset included A, <em>all chars</em> between ? and | inclusive, D, and _.</li>
<li>The intended usage of ? is also wrong. You implied it to mean "any character in this position; mandatory present". If used in regexes (outside of sets), the meaning of unescaped <code>?</code> is to mean "zero or one of previous expression".</li>
</ol>
<p>See <a href="http://userguide.icu-project.org/strings/regexp">here</a> the full list of regular expressions in ICU (used in LO).</p>
https://ask.libreoffice.org/en/question/138593/counting-rows-where-specific-combination-of-values-from-two-columns-occur/?comment=138605#post-id-138605I have enabled reg ex in the preference of my LibreCalc but the given pattern you have added in your answer returns 0 matches. I have used both reg ex versions - with `SUMPRODUCT` and with `COUNTIFS`.Tue, 21 Nov 2017 10:28:41 +0100https://ask.libreoffice.org/en/question/138593/counting-rows-where-specific-combination-of-values-from-two-columns-occur/?comment=138605#post-id-138605Comment by Mike Kaganski for <p>E.g., your reference values <code>AA-, AB-, AC-, AD-, DA_, DB_, DC_, DD_</code> are in D1-D8. Then, formula</p>
<pre><code>=SUMPRODUCT(NOT(ISNA(VLOOKUP(A1:A7;$D$1:$D$8;1;0)));NOT(ISNA(VLOOKUP(B1:B7;$D$1:$D$8;1;0))))
</code></pre>
<p>does what you need.</p>
<p>If you enable regular expressions in formulas, then</p>
<pre><code>=SUMPRODUCT(ISNUMBER(SEARCH("^(A.-|D._)$";A1:A7));ISNUMBER(SEARCH("^(A.-|D._)$";B1:B7)))
</code></pre>
<p>will also work. As well as this:</p>
<pre><code>=COUNTIFS(A1:A7;"A.-|D._";B1:B7;"A.-|D._")
</code></pre>
<p>The explanation why the <code>[A?-|D?_]</code> expression is wrong:</p>
<ol>
<li>The [] brackets in regex define a <strong>set</strong> - i.e., "in this place any of the following is allowed". So, in your example (<code>=COUNTIFS(A1:A7, "[A?-|D?_]", B1:B7, "[A?-|D?_]")</code>), you looked for cells containing one-char values in the set defined in the square brackets. Aside: the expression in square brackets has special treatment of <code>-</code>, as <a href="/en/users/15459/pierre-yves-samyn/">@pierre-yves samyn</a> noted: it means range. So, your charset included A, <em>all chars</em> between ? and | inclusive, D, and _.</li>
<li>The intended usage of ? is also wrong. You implied it to mean "any character in this position; mandatory present". If used in regexes (outside of sets), the meaning of unescaped <code>?</code> is to mean "zero or one of previous expression".</li>
</ol>
<p>See <a href="http://userguide.icu-project.org/strings/regexp">here</a> the full list of regular expressions in ICU (used in LO).</p>
https://ask.libreoffice.org/en/question/138593/counting-rows-where-specific-combination-of-values-from-two-columns-occur/?comment=138607#post-id-138607Please attach a spreadsheet with the formula giving 0 (here is mine: [regex.ods](/upfiles/15112567537325861.ods))Tue, 21 Nov 2017 10:32:48 +0100https://ask.libreoffice.org/en/question/138593/counting-rows-where-specific-combination-of-values-from-two-columns-occur/?comment=138607#post-id-138607Comment by rbaleksandar for <p>E.g., your reference values <code>AA-, AB-, AC-, AD-, DA_, DB_, DC_, DD_</code> are in D1-D8. Then, formula</p>
<pre><code>=SUMPRODUCT(NOT(ISNA(VLOOKUP(A1:A7;$D$1:$D$8;1;0)));NOT(ISNA(VLOOKUP(B1:B7;$D$1:$D$8;1;0))))
</code></pre>
<p>does what you need.</p>
<p>If you enable regular expressions in formulas, then</p>
<pre><code>=SUMPRODUCT(ISNUMBER(SEARCH("^(A.-|D._)$";A1:A7));ISNUMBER(SEARCH("^(A.-|D._)$";B1:B7)))
</code></pre>
<p>will also work. As well as this:</p>
<pre><code>=COUNTIFS(A1:A7;"A.-|D._";B1:B7;"A.-|D._")
</code></pre>
<p>The explanation why the <code>[A?-|D?_]</code> expression is wrong:</p>
<ol>
<li>The [] brackets in regex define a <strong>set</strong> - i.e., "in this place any of the following is allowed". So, in your example (<code>=COUNTIFS(A1:A7, "[A?-|D?_]", B1:B7, "[A?-|D?_]")</code>), you looked for cells containing one-char values in the set defined in the square brackets. Aside: the expression in square brackets has special treatment of <code>-</code>, as <a href="/en/users/15459/pierre-yves-samyn/">@pierre-yves samyn</a> noted: it means range. So, your charset included A, <em>all chars</em> between ? and | inclusive, D, and _.</li>
<li>The intended usage of ? is also wrong. You implied it to mean "any character in this position; mandatory present". If used in regexes (outside of sets), the meaning of unescaped <code>?</code> is to mean "zero or one of previous expression".</li>
</ol>
<p>See <a href="http://userguide.icu-project.org/strings/regexp">here</a> the full list of regular expressions in ICU (used in LO).</p>
https://ask.libreoffice.org/en/question/138593/counting-rows-where-specific-combination-of-values-from-two-columns-occur/?comment=138610#post-id-138610Ah, I've just noticed that I'm using `\-` instead of `-`. I guess I took something from Pierre's answer after all. :D I will now adapt your answer to my specific needs. :PTue, 21 Nov 2017 10:51:44 +0100https://ask.libreoffice.org/en/question/138593/counting-rows-where-specific-combination-of-values-from-two-columns-occur/?comment=138610#post-id-138610