calc: are decimal correct calculations possible?

newbie-02 · February 1, 2021, 9:50am

hi @all,

!!! Caution! this is ‘in progress’, and I just noticed errors where I have to sort what are thinking errors of me and what are inadequacies of calc or tools (rawsubtract), see comments, I will adjust the question and example sheets based on my progress, sorry, maybe I started the discussion too early, but I need it - the discussion, and the results so far look so good that they certainly still have errors, this is normal, but probably will not fail completely … !!!

there are a few - imho too many - questions on the internet ‘why doesn’t my spreadsheet calculate correctly?’

a few of the ‘classics’ are:
'=0,1 + 0,2 ’ results in 0,300000000000004 instead of 0,3,
‘=1234,12 - 1234’ results in 0,119999999989100000 instead of 0,12 and
‘=0.043 - 0.042’ results in 0.000999999999999400 instead of 0.001,

attached is a sheet which calculates (more) correctly with a few primitive tricks, I would be glad if people with knowledge of the matter would have a look at it,

sample sheet, click for download

[edit - adding another version of the sheet, added a macro formula for calculation, and saved in tdf 1.2 extended format]

sample sheet with macro and in 1.2 format, click to download

i hope i included all neccessary macro help functions …

[/edit]

mostly such questions are answered with ‘that’s how floating point maths is, but it’s fast’, ‘learn first what that is (floating point mathematics)’, ‘it must be like that’, ‘we can’t change it’, ‘look at Weitz (www.weitz. de/ieee), “the error is correct”’, ‘ex$el calculates the same way’ and so on, the claim that a spreadsheet should and could calculate as ‘correctly’ as we once learned it in school is simply negated, either with ‘impossible’, or with ‘performance’,

I find this after 35 years of standard (IEEE 754) and 35 years of development (of LO calc) ‘a bit weak’, so it gets on my nerves so much that even comments like ‘spam’, ‘ignorant person’, ‘your pet peeves’ etc. didn’t deter me to think about it,

I think at least many of the problems ‘work better’, and users should be allowed to get real results even if they are different from ex$el,

attached a sheet what shows that, red marked: wrong, green marked: better,

I don’t say that this is one or ‘the!’ final solution or ‘the last word’, but it is a clear proof that:

for many cases better results are possible,
they can be achieved with relatively little effort,
for the users much more comfortable and
are much less error-prone than if the users always have to stumble over errors first and then have to manage the rounding themselves after recommendations for workarounds,

I would like to have an open discussion about whether it would be better to implement such calculation - at least as an option - in calc, and if not then why not,

in this discussion could be discussed / clarified among other things:

that there are completely different problems if decimals do not (can not) exactly match the IEEE binaries in which they are mapped, in contrast to the fact that certain calculations are difficult (cancellation in subtraction and absorption in addition), this is often ‘lumped together’ even by ‘professionals’, which hinders suitable problem solutions, the first is relatively painless, it’s just the way it is, and does no harm as long as the ways to (to binaries) and back (to decimals) match, ‘cancellation’ on the other hand can inject significant! errors, but you can ‘calculate out’ most of them !!!if you do it right away (on the intermediate result)!!!,
that these errors may be marginal, but make significant differences e.g. for comparison operations (=?, >?, <?) and cause wrong results for MOD, number series, downfills etc.,
whether it is right to fight the errors with ‘arbitrary untargeted rounding’ which in my opinion causes errors in other places, and rather bigger ones,

so … I am happy to receive opinions, and also criticism …

reg.

b.

ml9104 · February 1, 2021, 8:08pm

The problem is, that you expect insane decimal precision from an IEEE-754 40-bit or 80-bit FPU. Even though 80-bit floating-point has a high degree of precision, it still is limited to, you guessed it, 80 bits.
No one would use that many digits in practice, we’re talking about decimal fractions that are completely impossible in the real world.
This is why exponential notation was invented.
It reminds me a bit of very green engineering students measuring a 5% resistor and then calculating the value to the 20th decimal.

Funny that you send us an Excel spreadsheet on a Libre Office site, though.

newbie-02 · February 1, 2021, 9:42pm

hello @ml9104,
that you expect insane decimal precision … no!, what i’m fighting is use of such hardware in a way producing ‘visible’ fails,
64-bit doubles have space for 1 micrometer accuracy for the distance earth - moon, calc fails on 0,043 - 0,042,
we're talking about decimal fractions that are completely impossible in the real world … no!, i’m talking about calculating money to cents, distances to millimeters, times to seconds, correct!,
measuring a 5% resistor and then calculating the value to the 20th decimal. … bridge construction: calculate to the seventh decimal and then take it ‘times two’, that schould hold,
Funny that you send us an Excel spreadsheet on a Libre Office site, though. … häh? it’s an *.ods made with LO calc 7.2.0.0.a0+, perhaps it’s saved as ‘tdf1.3’, would you like it in 1.2?

ajlittoz · February 2, 2021, 8:15am

A spreadsheet program may be used for a variety of applications unforeseeable from the developers. Therefore the specification must be generic since it is impossible to predict the numerical range of the calculations. Introducing a limitation would severely impact the usefulness of the program. Therefore to widen the range, floating-point is used.

One of the frequent applications is business: accounting, invoices, asset tracking, … This application is fundamentally based on integer arithmetic. This is quite antagonist with the use of floating-point.

Yes, accounting is integer, even if you see what seems to be decimals. The smallest “quantum” is a cent (or other minimal currency subunit) and all amounts are multiple of this quantum. Only integer arithmetic can guarantee no rounding is introduced and lead to the “least astonishment” of the customer (for instance the sum of raw price and VAT should be equal to the final price even in the last decimal).

Consequently, all your calculations should be done in integer mode.

This is possible over IEEE-754 within a certain range because the 80-bit variant offers 64 bits integer part. Some precautions must be provisioned to avoid “integer overflow”:

addition and subtraction are quite safe as numbers in the higher magnitude end of the interval are quite rare
multiplication leaves you 32 bits before entering the “floating-point domain”; remember you must scale the result because is a now a multiple of a cent of a cent (i.e. you must drop 2 least significant decimal digits and perhaps introduce a rounding there to revert to your conventional unit, aka. divide by 100)

This may be insufficient if you’re tracking a country budget to the cent or big civil engineering projects. In this case, you need to change your basic unit and accept some inaccuracy in the descriptive power of your sheet.
division is the most difficult operation because the hardware will nearly systematically give you a floating point result; you must first convert to an integer with CEILING(), FLOOR(), INT(), ROUND(), ROUNDDOWN() or ROUNDUP() and do same scaling

Details of computations are described extensively in The Art of Computer Programming, Vol. 2, Seminumerical Algorithms (by Donald Knuth).

As said, all calculations are done in integer mode. Display (fixed point with 2 decimals) and internal representation (integer) are separate, though related, things. Formatting will place the decimal separator where you want it so that sums look like what you’re accustomed to.

In case you find this is a lot of fuss, the alternative is to use ad-hoc accounting dedicated applications. But check that the developers did things correctly, i.e. didn’t use IEEE-754 but implemented multi-precision arithmetic. Decades ago, the COBOL language was invented to address all these accounting issues (because in the end there are legal and tax mandatory rules to abide by). In COBOL, you could require that all computations be done in decimal mode to stick as nearly as possible to real life constraints. But even with this feature, you could botch your results.

To show the community your question has been answered, click the ✓ next to the correct answer, and “upvote” by clicking on the ^ arrow of any helpful answers. These are the mechanisms for communicating the quality of the Q&A on this site. Thanks!

mikekaganski · February 2, 2021, 8:25am

Note also that there’s no “universal integer mode”. There’s an integer mode of some programming language; an integer mode of some ALU; and - suddenly integer modes of accounting rules (which state e.g. what to do when you calculate percentages, and need not to be consistent there even with themselves - they may apply one way when you calculate taxes, and another for other calculations, and even for different taxes, inside one legislation, not mentioning differences between countries).

So dedicated accounting softwares “do things correctly” because they are working in the specific known constraints. There’s no single universal rule, and hence no silver bullet for general-purpose software like Calc.

newbie-02 · February 2, 2021, 12:46pm

@ajlittoz:
thank you very much! i need additional ideas and and that other people ‘poke around in mine …’

integer arithmetik: yes, did so sometimes,have seen even banks switching to ‘cent amounts’ in reporting,
all is integer: no, once you reach ‘uneven fractions’ (working 9 hours for 100 bucks, bucks per hour?, 19% VAT) you need either fractional numbers, or approximations with e.g. decimal or binary values,
target: is not! to get my special calculations solved, but to improve calc in general in a way that it produces less irritation for ‘normal’ users with ‘normal’ math knowledge, or ‘better results for a wide range of different tasks’,
@mikekaganski: yes, that’s additional problems, my target is not ‘correct accounting (for a special country)’ but ‘correct math’, once we get that everybody can apply it to his legal rules as neccessary,

ajlittoz · February 2, 2021, 1:06pm

As long as you remain inside the “validity domain” of your [arithmetic] assumptions, Calc delivers fairly good results. The “validity domain” is voluntarily not defined because it depends on numerous factors related to your data and intent.

Calc always does “correct math” over the subset of Q (the set of all fractions) corresponding to the IEEE-754 range. This subset has well known numerical analysis properties, some of which are not obvious and intuitive for the majority of users (and, yes, this leads to irritation and frustration). For example, the 4 arithmetic operations are not internal laws over the set (results cannot always be represented by a member of the set).

In your example of 9 hrs for 100€, read what I wrote about division. For 19%, do not use the %-unit, by write 19 followed by a division by 100. Yes, this is tedious. But a sheet may have hundreds rows and thousand columns. You can use “technical” cells (later made 0-size) to convert the values to integer and …

ajlittoz · February 2, 2021, 1:16pm

… base all your computations on the “technical cells”.

If your goal is ultimate accuracy, this is the cost. You can’t do without a careful analysis of your application and formulas (as in any production-quality application).

The problem is the long-standing belief that computers are always right. Right for integer calculation provided no overflow happened. Wrong for floating-point because precision is limited and you incur a radix conversion (we are used to think in 1/10 radix while IEEE-754 uses 1/2 radix). Note carefully I mentioned a fractional radix and not an integer radix because this is the way IEEE-754 is built! This is the origin of the difficulty because 5 (half of 10) and 2 are primes.

As a first approach, “normal” math knowledge is sufficient if you realise fraction 1/3 cannot be exactly represented in base 1/10. It is an infinite sequence 0.333… It will necessarily be truncated.

Advanced math is needed if you want to cope with floating-point with limited accuracy loss.

mikekaganski · February 2, 2021, 6:58am

In your sample document, you are rounding the result of subtraction to the position of 15th decimal digit of the greater of the original values. So your idea is to limit the precision of calculation to “fix” catastrophic cancellation.

Your idea is not new; also it is not correct. Your idea is about assumption that what is displayed in cell is closer to “ideal” value than what is actually there. It might be true only with very limited set of cases, where the values in cells are manually typed there by user, and only addition/subtracted is used.

But it would make most real calculations worse. Consider any trigonometric functions where you use radians, or any logarithmic functions, or even simple multiplication/division. There you get more “significant” digits than were in initial numbers that user had entered. And arbitrarily limiting the precision of those numbers would result in much worse catastrophes universally.

For instance, take this simple calculation of well-known triangle and its scaled variant:

        A                  B
1       3               =A1*PI()
2       4               =A2*PI()
3       5               =A3*PI()
4 =SQRT(A3^2-A2^2) =SQRT(B3^2-B2^2)

In ideal world, A4 must be equal to A1, and B4 to B1. In Calc, although the value of PI() is naturally imperfect, RAWSUBTRACT(B1;B4) also shows 0. But if in B4 we replace SQRT(B3^2-B2^2) with your SQRT(ROUND(RAWSUBTRACT(B3^2;B2^2);15-MAX(ROUND(LOG(ABS(B3^2);10);0);ROUND(LOG(ABS(B2^2);10);0)))), we suddenly get the difference between B1 and B4 by 1.77635683940025E-15.

Carefully prepared limited-precision input in your sample is not representative. Note that I don’t talk about performance here, I only talk about precision that you try to achieve. Your approach would fail miserably in all but simplest cases. Current implementation of subtract is of course not ideal (in this non-ideal world), but you may notice that it still tries to be reasonably smart using rtl::math::approxSub which handles near-zero result (unlike the implementation of RAWSUBTRACT, which tries to be simple).

newbie-02 · February 2, 2021, 12:15pm

hello @mikekaganski,
thanks for your help,
rounding acc. the greater operand may be wrong, acc. the smaller may be better, it’s experimental,
i just stumbled over 29,999999999999900000 if i wrap rawsubtract in a macro:

function rawsubtract_a (ByVal darg1 as Double, darg2 as Double)
oFunctionAccess = createUnoService( "com.sun.star.sheet.FunctionAccess")
rawsubtract_a = oFunctionAccess.callFunction( "RAWSUBTRACT", array(darg1, darg2)) 
end function

and call ‘=RAWSUBTRACT_A(33,0000000000001;3,0000000000001)’,

i used it in an attempt to stay away from 4-bit truncations and similar, will try to improve,
it’s a general problem being limited to use imperfect tools to improve just these tools …
‘trigonometric functions’: yes, but:

the average user wants correct subtractions first,
the fail may result from other shortcomings than my idea? see above,
I don’t refuse to improve my idea, I just refuse to carry on too glaring errors in calc,

newbie-02 · February 2, 2021, 12:17pm

my samples are targetting ‘massive cancellation’ when subtracting similar values,
@mikekaganski’s sample deals with values with significant difference,
adapting the formula to apply less tight rounding to ‘more different’ values should be simple …
as - perhaps that’s a good idea - don’t touch integers?
in the end it may require a ‘case differentiation’ between ‘one operand zero’, ‘addition’, ‘subtraction of identic’, ‘subtraction with very small difference’, ‘subtraction with small difference’, ‘subtraction’,
why not, progress justifies some effort …
the performance part will come later, it is! important, i think i can! solve,

mikekaganski · February 2, 2021, 12:25pm

‘trigonometric functions’: yes, but:

the average user wants correct subtractions first,

No. Please write “I want that first”. It’s the usual thing, that you claim something to be general rule when you have no idea about demographics of usage of the product. There’s no “But” here, it must be correct everywhere.

the fail may result from other shortcomings than my idea? see above,

Absolutely unclear what you mean.

I don’t refuse to improve my idea, I just refuse to carry on too glaring errors in calc,

There’s nothing to discuss until there’s an idea. There is clear understanding that what you ask is generally impossible, among those who have substantial knowledge in the topic. You insist on “it must be possible, I don’t believe you, I’m not listening lalala”. Ok, that’s fine, but that’s your task to come with the viable idea. You came with something that is absolutely old and wrong; I took time to explain why. Up to you to come with something else.

mikekaganski · February 2, 2021, 12:49pm

There’s a general rule: extraordinary claims require extraordinary evidence. Your claim (“there is some method to make it perfect or at least generally better using IEEE numbers”) is extraordinary one. It doesn’t mean it’s necessarily wrong, (although I’m sure it is) but it needs extraordinary proof. E.g., a great analysis which would prove it’s (more) correct in general case, or that tests the whole range to confirm that claim. Yes, testing square of 4 billion possible numbers (to at least prove it for 32-bit float case) is impossible. But then at least use some possible number of uniformly spread numbers, and make an analysis on those.

newbie-02 · February 2, 2021, 8:45pm

@mikekaganski:

you’re right, rounding results makes sense with ‘round’ operands, and may harm ‘special values’ (besides that the accuracy you use is outside the ‘15 digit range’),
importance of targets … imho 95% of calculations with calc are accounting and similar, less than 5% of users will use pi() or similar anywhere,
my wish: fulfill both! targets,
in an attempt to decide where to round and where not and to use ‘-’ as you said better than rawsubtract i stumbled over an old problem:
@ajlittoz: i doubt in calcs “correct math” (over the subset of Q …), try following calculation:

A1: =22
B1: =63,999999999999+A1*10^-14
C1: =-63,999999999999
D1: =A2+A3
E1: =RAWSUBTRACT(A2;-A3)

vary A1 between 30 and -30 and observe results in D1 … other people had a similar wish - correct math - and to achieve that for some special cases (result ‘0’) they äh … (brutally raped?) … calcs precision … is there any option to switch that off?

mikekaganski · February 2, 2021, 8:56pm

imho 95% of calculations with calc are accounting and similar, less than 5% of users will use pi() or similar anywhere,

Ah that “imho” again. But if your estimation is correct, then LibreOffice is perfectly fine for most of its userbase as is, otherwise they wouldn’t use it there. Your changes would benefit no significant percentage of userbase, and will hurt most (even if you invent something working).

rounding results makes sense with ‘round’ operands

There’s no way to know which operand is “round”. The very idea is flawed. The problem you try to workaround is exactly that the numbers have some data in any bit of its mantissa, and there’s no property that could tell “round” numbers from “special”. Even trying to count zeroes before the tail in decimal representation would fail, and break deliberate operation with small deltas.

ajlittoz · February 2, 2021, 10:26pm

@newbie-02: what are A2 and A3 in your sample?

Anyway, I won’t try your sample. Adding numbers which differ by 14 orders of magnitude is bound to give inaccurate results. 14 orders is ~46 bits. When the mantissas are scaled before addition, only ~10 bits remain is shifted A1. This means shifted A1 is known only to ~1/1000=0.1%. What do you expect with such a loss of accuracy?

This is what mathematics tell you when you take into account IEEE-754 specification.

I unfortunately lost the reference to an algorithm dealing with such a case (huge difference of order of magnitude between numbers in addition/subtraction). The algorithm proceeded by careful order of operations. You pay the improved accuracy (but not exact mathematical precision) with an increase in computation time.

newbie-02 · February 2, 2021, 10:42pm

sorry, i failed in numbering:

A1: =22
A2: =63,999999999999+A1*10^-14
A3: =-63,999999999999
A4: =A2+A3
A5: =RAWSUBTRACT(A2;-A3)

vary A1, observe A4,

“There’s no way to know which operand is “round”.” - 0,5 0,25, 0,125, 0,1 : round, pi() odd, 0,07 should be round, but calcs ‘specialities’ makes it difficult for me to calculate,

“and break deliberate operation with small deltas.” - but that’s exactly what was done with the 4-bit cut, and it bothers me immensely … the sample above is a try to show that …

for every try to triage between meaningful content and fp-artifacts i need to get a handle on it, that’s blocked in calc

mikekaganski · February 3, 2021, 5:16am

“There’s no way to know which operand is “round”.” - 0,5 0,25, 0,125

These numbers are already “round”. So they need no more rounding. The text was: “rounding results makes sense with ‘round’ operands”, and I assumed that the quotes around ‘round’ meant that you talked about numbers that are close to round. If you have truly round operands, you need no rounding of results. If your operands are not round, as in 0.1, you can’t tell if they are supposed to be round.

I unfortunately lost the reference to an algorithm

@ajlittoz: tdf#137679.

ajlittoz · February 3, 2021, 6:59am

@mikekaganski: Thanks, it was indeed Kahan algorithm.

@newbie-02: you don’t put yourself in the correct math context. You stay in the assumption that the radices are 10 and 2 (integer radices) which is what we are taught in college classes. The hardware truth is the radices are indeed 1/10 and 1/2 (fractional radices) and the story is totally different. It is impossible, except for rare cases, to convert exactly from one base to another one. It is a matter of primality. Read Knuth for mathematical details.

All you can do is to exchange the hardware idiosyncrasies for smart chain calculation. Your sample dives head first into the traps: once you computed A2, you can no longer compensate. You lost A1 precision and anyway it is likely that the result of the addition does not exist in the IEEE-754 subset, needing therefore a rounding to a close element (up, down, closest, furthest, zero, infinity whichever mode is in effect).

newbie-02 · February 3, 2021, 8:51am

hello guys, let me once again say i thank for your help!
and please stay with me a little further …

‘rounding’ - i didn’t want to round 0,5 any further, i want to check if i have operands as such and then be sure that a rounding of the result doesn’t harm,
‘the computing of A2’ is ‘not so bad’ (one might discuss the rounding for x5 in A1), bad is the result of the calculation in A4, ‘weitz’ can do better, ‘rawsubtract’ can do better (see A5), calc can compute 16 digit resolution outside the range -/+ 22 in A1, precision is not! lost in ‘IEEE’ or ‘doubles’ but in calcs implementation of subtraction,
‘It is a matter of primality’ - won’t say it’s trivial but i know about such, about 80% of 1-decimals-digit figures lack enough '5’es in the enumerator to cancel out the '5’es in the enumerator, and already 99,84% of 4-decimals-digits, they stay ‘odd fractions’ for binary representation and become ‘endless’ …
… continued in next comment …