# Counting Distinct Values With Filters

Hello.

I have found the following array formula that counts how many distinct values appear on a range of cells.

{=SUMPRODUCT((Data<>"")/(COUNTIF(Data;Data)+(Data="")))}

where DATA is the range of cells.

My questions are:

1) This formula is static in relation to the filters: if one applies a filter to the range of cells, the outcome of the formula does not change. Is there any formula that would adjust itself automatically according to the filter results shown, like SUBTOTAL does?

2) What is the rationale behind this formula? How is it built? I understand what it does but I'd like to understand how it does it. In particular the COUNTIF and DATA="" part.

Thank You.

edit retag close merge delete

I would prefer "distinct" or "different" instead of "unique" here. In my understanding a unique value is one occurring exactly once anyway.

( 2017-04-28 11:10:09 +0200 )edit

Thank You. "Unique" refers to the type or kind of value not the occurrence. Yet, I suppose you are right and "distinct" may create less confusion. Question modified. Tx!

( 2017-04-28 14:08:44 +0200 )edit

Sort by » oldest newest most voted

A) Since SUMPRODUCT forces its parameters under array-evaluation anyway, there should be no difference if you enter the formula in standard mode. You need Ctrl+Shift+Enter only if you use SUM instead of SUMPRODUCT.
B) A slightly more explicit variant is =SUMPRODUCT(IF(A2:A21="";0;1)/COUNTIF(A2:A21;A2:A21))) where I replaced the placeholder 'Data' by a simple 1D-range as an example. (Editing 2017-05-02) The formula given above contained a wrongly placed closing parenthese. Rectified:
=SUMPRODUCT(IF(A2:A21="";0;1/COUNTIF(A2:A21;A2:A21)))
(End edit)

1) I do not clearly understand the term "static" here. What are the alternatives? Filtering a range is executed by hiding the rows not complying with the condition. Formulae always evaluate also hidden cells contained in their referenced ranges. This is essential for many evaluations. I confidently hope there is no option to change this behaviour. The 'Copy' tool ignores cells if hidden by applying a filter. Same with formatting, deleting, moving. How to get results restricting the evaluation of ranges to the visible part depends on the kind of evaluation. In your specific case you may have a filter condition which already was evaluated to a logical (Boolean) result in a helper column, say C per row. In this case the visible rows should show a TRUE there while the hidden rows have FALSE. If so, you can use =SUMPRODUCT(IF((A2:A21="")*(C2:C21=0);0;1)/COUNTIF(A2:A21;A2:A21))). Not shying at complicated formulae you may also evaluate the condition by a subexpression in the parameter place. There is no standard function testing rows for being hidden. If you coded a user function ROWISHIDDEN for the purpose based on row numbers yourself, you have the option
=SUMPRODUCT(IF((A2:A21="")*ROWISHIDDEN(ROW(A2:A21);0;1)/COUNTIF(A2:A21;A2:A21))). (Editing 2017-05-02) The formula given above did not correctly combine the conditions. Rectified:
=SUMPRODUCT(IF(((A2:A21="")+ROWISHIDDEN(ROW(A2:A21);1;NOW()))>0;0;1/COUNTIF(A2:A21;A2:A21)))
The funny way the logical OR is transcribed here is a workaround needed due to the fact that the OR() function cannot return an array.
(End edit)

2)Under array-evaluation the divisor COUNTIF(A2:A21;A2:A21) is evaluated for each element of A1:A21 in the second place counting over the complete range given in the first place. If a specific value or content inside 'Data' occurs n times it therefore contributes 1/n n times to the result of SUMPRODUCT 1/n * n = 1. You see: You get a mandatorily integer result (=1) by adding nonintegers. That's the spell. Your original version of the formula is specifically veiling this by its way to exclude "empty".

If you want to study alternatives you may start with the formula in cell P3 of the first sheet of the attached demo. There are also some explanations. (As of 2014-07-14 22:31)

(Editing:)
Just found the time to write a piece of ...

more

Thank You!

To clarify, the "static" refers to the fact the the formula, given the same data, gives the same result regardless of the interaction with the filters.

There are formulas, though, that interact with the filters. As mentioned, SUBTOTAL is one of these. The outcome of the formula, given the same data, changes accordingly to the filter applied to the same cell range.

( 2017-04-28 14:34:54 +0200 )edit

Sorry1: I missed the mention of SUBTOTAL in the question.
Sorry2: I missed to think of these filter-aware functions accessing other functions via an index (also excludung visible rows under certain conditions) because I never use them. The concept is too much skewed for my poor old brain. I wouldn't succeed with getting reliable results. At least I wouldn't trust in the results and check them with the pencil on a sheet of paper.

( 2017-04-28 16:07:15 +0200 )edit

Thank You.

( 2017-05-01 11:29:14 +0200 )edit

There is no standard function testing rows for being hidden

I've found a workaround using aggregate:

AGGREGATE (3,5,CELL:CELL)

This would give 1 (or TRUE) as result if CELL is shown and 0 (or FALSE) is CELL is hidden.

=SUMPRODUCT(IF(A2:A21="";0;1)/COUNTIF(A2:A21;A2:A21)))

Please note that this formula would result in a "Div/0" error if empty cells are present in the A2:A21 range. Thus, in the denominator, the expression +(A2:21="") must be added so that the empty cell are counted as one

( 2017-05-01 20:30:35 +0200 )edit
more

Thank You. The AGGREGATE function counts the values and can omit the hidden cells but unfortunately it does not count the distinct values. So for instance for the following list of value:

Value A Value B Value C Value A

AGGREGATE with the COUNTA function would give me 4 as result but the result I'm looking for would be 3 because I do not want "Value A" to be counted twice.

( 2017-05-01 12:16:04 +0200 )edit

You've given me an idea, though. I was eventually able to find a workaround using AGGREGATE. I'll post it in the answer.

( 2017-05-01 20:32:32 +0200 )edit

I have found a possible solution gathering ideas from some of the comments to my question here and elsewhere.

Here it goes:

1) Using the function AGGREGATE (3,5,Cell) for each single cell in the "Data" range the result would be an array of cells containing 1 (or TRUE) if Cell is shown or 0 (or FALSE) is Cell is hidden. This array can be placed aside (in a spare column) to be used as a reference.

2) Now, there's only need to add *(Cell-array=1) to the above formula, which would become:

=SUMPRODUCT((Data<>"")*(Cell-array=1)/(COUNTIF(Data;Data)+(Data="")))

where "Data" is the range from which the non-empty/distinct values are to be counted and "Cell-array" is the range of cells from which the shown cells (non-hidden) are to be counted.

more