Rounding error when using IF + STYLE on an array of percentages

gobi · July 19, 2018, 5:08pm

Hello,

I’m trying to achieve something pretty simple: I have an array of percentages and I want to make sure they all sum up to 1 (100%).

In the cell after the end of my array I enter the following formula: SUM(my_array) + STYLE(IF(SUM(my_array) <> 1, "ERROR", "DEFAULT")).

Unfortunately my condition always evaluates to true, indeed using SUM on an array of floats will never give you an exact result, this is how computers work. What’s the proper way to deal with that implementation detail? I have tried to define an epsilon constant with a very small value, and changed my predicate to ABS(SUM(my_array) - 1) > epsilon, but the condition always evaluates to false.

What’s the proper way to do this?

Lupp · July 19, 2018, 6:48pm

-1- For what reason do you expect that a sum of 100% might occur with a reasonable probability?
-2- A desision about any tolerance to apply in a test fur “SUM = 100%” depends profoundly on the circumstances.
-3- There may also be reasons to judge a SUM to be “roughly equal 100%” if at least 97% while any excess is seen as absolutely intolerable.
-4- You need to define your limits depending on your needs.

gobi · July 19, 2018, 11:17pm

The percentages in the array are entered manually and must all sum up to 100% (it’s part of a resource allocation table), I’m trying to prevent against human error.

mikekaganski · July 20, 2018, 4:02am

Just an optimization: instead of repeating the SUM in the IF parameter, you could use CURRENT() to re-use the already calculated sum value. No idea if Calc would optimize this itself otherwise.

Lupp · July 20, 2018, 8:44am

If you manually enter values that should total to any specific sum, you need (in a sense) a cell accumulating the already made entries, or - better - to show the current lack/excess.
(Best do this in a frozen row above the entries.)

Lupp · July 19, 2018, 7:11pm

If the array of values regarded as rates (percentages) is derived from calculations based on data in a way that a sum of 100% is expectable, and no rounding did occur explicitly or implicitly (The dangerous ‘Precision as shown’ setting!) beyond the inavoidable roundings in Double areithmetic , you can - depending in detail on the complexity of calculations - assume that no single result is wrong by more than a digit in the fifteenth place. The consequences of the statement arent really clear, however.
If your percentages are of similar order of magnitude, and no statistical bias is to be expected for the rounding in machine arithmetic, you can apply a condition like
=(ABS(SUM(my_array)-1)<0.0000001).

erAck · July 19, 2018, 8:03pm

Then your epsilon is too large (you didn’t say what “a very small value” is). Epsilon should somehow match to what precision the single percentage values are and what accuracy you expect from the sum, which again may depend on how many values you have (the error that may carry on when summing). For example, single percentage values with a precision of 2 decimals means 4 decimals precision (e.g. 1.23% == 0.0123) of which summing a bunch may lead to 1.00001 instead of 1 (just an example). If you’d accept that but not more then the epsilon should be 0.00001 (or 1E-5) to yield TRUE in your comparison for error. As an aside, note that literally 1.00001-1>0.00001 still yields true due to the inexact binary floating point representation…

gobi · July 19, 2018, 11:29pm

My epsilon was actually too small, I was able to catch it by looking at the result of ABS(SUM(my_array) - 1). Thanks.

erAck · July 23, 2018, 1:11pm

I thought that with ABS(SUM(my_array) - 1) > epsilon always evaluated to FALSE you were referring that condition, but it seems you meant the place in the IF condition where it is used instead… anyway, glad you solved it.

erAck · July 23, 2018, 1:11pm