May if be that FALSE values are considered 0, thus numeric?
Your expression A3:A16<10 will return a vector of TRUE’s and FALSE’s. How is this vector reduced to a scalar? Through AND or OR? It looks like it is with OR, giving a final TRUE result. The “then”-expression is chosen in IF.
I think we should re-read the documentation about array-expressions and their relation to IF() function.
Yes, this warning is not entirely clear after the first reading. But if we compare this with the results obtained, then the behavior of the formula becomes clearer.
By the way, {=AVERAGE(IF(A3:A16<10;A3:A16;))} and {=AVERAGE(IF(A3:A16<10;A3:A16))} differ only by one semicolon, but this is enough to get 3.21429 and 5
Hello @Lupp!
In Calc, non-blank cells that do not contain formulas can contain string and number. (Excel adds Boolean and Error values to these types.)
Accordingly, the formula
=0=FALSE()
returns the value 1 (which, if the cell has a boolean format, will be shown as TRUE). (In Excel, the result of a formula is the boolean value FALSE).
The cells in the range E12:E16 in your example contain 0 (due to the number format, they are displayed as FALSE).
The IF function returns 0 when the condition is false and the third parameter is omitted.
This is not about any help concerning a result I want to get. It’s about understanding a bad (by my judgement) behaviour of Calc.
…
If you take a second glance on that sheet you may mainly look at the cells A3 and G3 and on the range E3:E16.
Comparing the formulas you should find that the range E3:E16 exactly contains the values to be expected as the result of the IF() part in A3 under array-evaluation - and that G3 applies a formula to it that should be equivalent to the formula in A3. .
To be clear now: I see the fact of getting different results in A3 and in G3 as a serious -and hardly understandable- bug. Can you tell me with good reasons that I am wrong?
No, my friend, you’re absolutely right - it’s really hardly understandable.
On the other hand, @ajlittoz is also right when he unobtrusively hints at working with arrays.
Remember the difference between getData() and getDataArray()? In the second case, we can get “empty cells”. This is exactly what happens when we use IF() with two parameters - the array that will be passed further to AVERAGE() does not contain FALSE, but empty cells.
Let’s not call it a “bug”, let’s call it “a great opportunity for skilled hackers”
Looked, I ask to forgive for a carelessness (to a semicolon).
In Excel, this effect is also observed (difference in the results of formulas), but it is clear there.
Looks like a bug in Calc.
Formula
=ISNUMBER(IF(1<=0; 0))
shows TRUE.
Array formula
=ISNUMBER(IF({1;0}<=0; 0))
shows FALSE; TRUE.
I don’t use IF with 2 parameters (unless by mistake).
Well, I need to draw my conclusions.
Two are clear to me:
The different working of an incomplete IF() expression depending on its being embedded in a surrounding expression is inacceptable.
The disregarding of an explicit specification concerning incomplete IF() expressions under an unclear condition is also inacceptable.
Unclear still is if I should take the trouble to report a bug - and probably get a result like “Regardless of the specification - what really counts is compatibility.” If users of Excel or Gsheets decided to find that AsIs convenient, reasons are out of the game. WhatYouSeeIsWhatYou… should have expected. A serious and reliable software for everybody may not survive in this world.
It seems to me that we can wait a bit - perhaps gurus will join our discussion … This will allow us to more accurately formulate the problem (if any).
I don’t think it’s about Excel compatibility. For example, the above formula
=0=FALSE()
gives different results in Calc and Excel and this is accepted.
I have reread carefully your spreadsheet and I’d say the results are correct.
In B3: your A3:A16<10 will select cell range A3:A11, ignoring A12:A16. Since there is no “else” alternative, the returned matrix is A3:A9 augmented by “void” entries. Therefore, the average is computed on 9 values giving 5 as you can manually check.
B4 defers from B3 only in providing a string for discarded values. This doesn’t change the previous result. Either the returned matrix is limited to A3:A11 as filtered by the boolean expression, or the same plus 5 string entries which are ignored by AVERAGE(). In any case, the average is computed only on 9 values.
In B5, you provide a 0 for discarded entries. 0 is a number. Thus A3:A11 extended with 5 numbers contains 14 numeric entries. This changes the denominator in the computation.
B6 is troublesome. Discarded entries are replaced by FALSE(). “Boolean” is not a separate datatype for Calc. Buit-in help says that 0 is FALSE and all other values are TRUE. Consequently, your A3:A11 is again extended with numeric entries which results in the matrix having 14 “valid” entries.
Behaviour may look complicated but results are consistent. The matrix expression always returns 14 entries. Those which pass the boolean test are taken from the “then” argument, those which don’t from the “else” argument. The tricky point is to design the “else” argument so that it returns invalid entries if the goal is to filter the source matrix. Void values seem to be best. However if the goal is to eliminate the influence of non-passing entries without changing the total number, then the “else” argument must return values which are neutral for the outside computation.
First sorry: I more than once named cells olf column B as if they were in column A. Silly mistake.
Next sorry: I very much appreciate that you have studied the matter again thoroughly, but I cannot agree with the result.
Since the results for incomplete IF() expressions are explicitly specified in 6.15.4 of OASIS … ODF 1.3 Part 4 I insist on the respective result to be returned by the inner expression in cell B3 and by the overall expression in cell G3 (both under array-.evaluation) as well. This result must be an array with one column and 14 rows. The last 5 rows must return the logical value FALSE in both cases. The subexpression in B3 is not allowed to omit these results for whatever reasons.
I personally never use incomplete IF() expressions. I made an exception in this special case coming from a thread started by somebody in a different forum. The thread itself didn’t make much sense, but it used incomplete IF() and made me consider the matter once more.
I don’t know if Excel and/or Gsheets are specified to treat Boolean values in AVERAGE() like texts (ignore them). Calc is not.
The treatment of Boolean (logical) in Calc (and by ODF) is bad and inconsistent. However, even if regarding 8.2 and related parts of the mentioned specification I couldn’t yet find a justification for the experienced behaviour.
Maybe, maybe not. The behaviour may be unexpected though and is a limitation of the non-availability of a distinct boolean type.
That is because the final result as seen by ISNUMBER() of the empty IfFalse (ELSE) path is an implicit FALSE and thus 0 as there is no distinct boolean type. If there was a distinct boolean type then the result would be FALSE.
That is because in array context for interim results there is a distinction between empty IfFalse path and a given (boolean or numeric) result. The ISNUMBER() evaluates “empty path” instead of 0 as in the other case, and that is not a numeric value. The distinction in array context is necessary for some functionality e.g. of MATCH() and LOOKUP() functions and as mentioned data arrays.
If the differing behaviour is considered a bug then a solution could be to introduce distinct boolean at least for interim results, but then there still would be a discrepancy between interim and final results. Introducing fully distinct boolean would change even more calculation results. It’s complicated…
This is kind of “DeepThought” for me. In fact I had considered a possible distinction between interim and final results, but hadn’t assumed such a distinction was actually made except there was also a respective “bill”.
I personally don’t trust in software anyway, but as a part of this society I am forced to indirectly do it nonetheless.
The only undoubtable fact here?
Though better distinguished types would be advantageous (and true complex numbers in addition) there is no hope. An independent “NewSpreadsheetSoftwareFromScratch” will not be possible, and the existing variants are all bad, even if in slightly different ways.
Unfortunately I’m also not good in praying. What to do?
Let me state a commandment: Thou shalt never use incomplete IF() expressions?
Same from me, but may I suggest to give the thread a more informative title than “Request for explanation” to mention return of incomplete if(). May be useful, if the thread comes up in searches.
I would like to understand these reasons - especially why they are considered compelling.
Also, I would be interested in the “clause” in the ODF documents that allows the corresponding behavior of Calc NOT to be considered a bug.
(Please, forgive my stubbornness.)
Well, they exist, so probably for a reason… which most times is for Excel interoperability. If someone wants to dive in: inspect use of ScMatrix::IsEmptyPath() IsEmptyPath (reference) in projects: core - OpenGrok search results and the interpreter’s lcl_GetLastMatch() that is using it, i.e. lcl_GetLastMatch (reference) in projects: core - OpenGrok search results which is for iterating and comparing sorted elements for lookups. But that may be as well be a legacy leftover as this seems to be the only real use of empty path, one would have to try and compare behaviour with and without and between implementations/applications.
Thanks a lot once more - and I have to apologise to have you driven into some wasting of time where probably no actual result can be expected. I had read 8.2 of the mentioned document, but couldn’t get a justification for the obesrved behaviour from it (nor from 6.18.3). Interoperability is a valid point of view on a certain level. Compliance with specifications also is. Seems LibreOffice couldn’t come to a decision whether or not it has “Distinct Logical.”
BTW 18.6.3 of the OASIS document uses “{NumberSequence N }+” as the parameter type and I couldn’t find a clear explanation.