Ask Your Question
0

calc: are decimal correct calculations possible?

asked 2021-02-01 10:50:42 +0200

newbie-02 gravatar image

updated 2021-02-02 09:58:40 +0200

hi @all,

!!! Caution! this is 'in progress', and I just noticed errors where I have to sort what are thinking errors of me and what are inadequacies of calc or tools (rawsubtract), see comments, I will adjust the question and example sheets based on my progress, sorry, maybe I started the discussion too early, but I need it - the discussion, and the results so far look so good that they certainly still have errors, this is normal, but probably will not fail completely ... !!!

there are a few - imho too many - questions on the internet 'why doesn't my spreadsheet calculate correctly?'

a few of the 'classics' are:
'=0,1 + 0,2 ' results in 0,300000000000004 instead of 0,3,
'=1234,12 - 1234' results in 0,119999999989100000 instead of 0,12 and
'=0.043 - 0.042' results in 0.000999999999999400 instead of 0.001,

attached is a sheet which calculates (more) correctly with a few primitive tricks, I would be glad if people with knowledge of the matter would have a look at it,

sample sheet, click for download

[edit - adding another version of the sheet, added a macro formula for calculation, and saved in tdf 1.2 extended format]

sample sheet with macro and in 1.2 format, click to download

i hope i included all neccessary macro help functions ...

[/edit]

mostly such questions are answered with 'that's how floating point maths is, but it's fast', 'learn first what that is (floating point mathematics)', 'it must be like that', 'we can't change it', 'look at Weitz (www.weitz. de/ieee), "the error is correct"', 'ex$el calculates the same way' and so on, the claim that a spreadsheet should and could calculate as 'correctly' as we once learned it in school is simply negated, either with 'impossible', or with 'performance',

I find this after 35 years of standard (IEEE 754) and 35 years of development (of LO calc) 'a bit weak', so it gets on my nerves so much that even comments like 'spam', 'ignorant person', 'your pet peeves' etc. didn't deter me to think about it,

I think at least many of the problems 'work better', and users should be allowed to get real results even if they are different from ex$el,

attached a sheet what shows that, red marked: wrong, green marked: better,

I don't say that this is one or 'the!' final solution or 'the last word', but it is a clear proof that:
- for many cases better results are possible,
- they can be achieved with relatively little effort,
- for the users much more comfortable and
- are much less error-prone than if the users always have to stumble over errors first and then have to manage the rounding themselves after recommendations for workarounds,

I would like to have an open discussion about whether it would be better to implement such calculation - at least as an option - in calc, and if not then why not,

in this discussion could be discussed / clarified ... (more)

edit retag flag offensive close merge delete

Comments

2

The problem is, that you expect insane decimal precision from an IEEE-754 40-bit or 80-bit FPU. Even though 80-bit floating-point has a high degree of precision, it still is limited to, you guessed it, 80 bits.
No one would use that many digits in practice, we're talking about decimal fractions that are completely impossible in the real world.
This is why exponential notation was invented.
It reminds me a bit of very green engineering students measuring a 5% resistor and then calculating the value to the 20th decimal.

Funny that you send us an Excel spreadsheet on a Libre Office site, though.

ml9104 gravatar imageml9104 ( 2021-02-01 21:08:00 +0200 )edit

hello @ml9104,
that you expect insane decimal precision ... no!, what i'm fighting is use of such hardware in a way producing 'visible' fails,
64-bit doubles have space for 1 micrometer accuracy for the distance earth - moon, calc fails on 0,043 - 0,042,
we're talking about decimal fractions that are completely impossible in the real world ... no!, i'm talking about calculating money to cents, distances to millimeters, times to seconds, correct!,
measuring a 5% resistor and then calculating the value to the 20th decimal. ... bridge construction: calculate to the seventh decimal and then take it 'times two', that schould hold,
Funny that you send us an Excel spreadsheet on a Libre Office site, though. ... häh? it's an *.ods made with LO calc 7.2.0.0.a0+, perhaps it's saved as 'tdf1.3', would you like it in 1.2?

newbie-02 gravatar imagenewbie-02 ( 2021-02-01 22:42:53 +0200 )edit

2 Answers

Sort by » oldest newest most voted
1

answered 2021-02-02 09:15:57 +0200

ajlittoz gravatar image

A spreadsheet program may be used for a variety of applications unforeseeable from the developers. Therefore the specification must be generic since it is impossible to predict the numerical range of the calculations. Introducing a limitation would severely impact the usefulness of the program. Therefore to widen the range, floating-point is used.

One of the frequent applications is business: accounting, invoices, asset tracking, ... This application is fundamentally based on integer arithmetic. This is quite antagonist with the use of floating-point.

Yes, accounting is integer, even if you see what seems to be decimals. The smallest "quantum" is a cent (or other minimal currency subunit) and all amounts are multiple of this quantum. Only integer arithmetic can guarantee no rounding is introduced and lead to the "least astonishment" of the customer (for instance the sum of raw price and VAT should be equal to the final price even in the last decimal).

Consequently, all your calculations should be done in integer mode.

This is possible over IEEE-754 within a certain range because the 80-bit variant offers 64 bits integer part. Some precautions must be provisioned to avoid "integer overflow":

  • addition and subtraction are quite safe as numbers in the higher magnitude end of the interval are quite rare
  • multiplication leaves you 32 bits before entering the "floating-point domain"; remember you must scale the result because is a now a multiple of a cent of a cent (i.e. you must drop 2 least significant decimal digits and perhaps introduce a rounding there to revert to your conventional unit, aka. divide by 100)

    This may be insufficient if you're tracking a country budget to the cent or big civil engineering projects. In this case, you need to change your basic unit and accept some inaccuracy in the descriptive power of your sheet.

    • division is the most difficult operation because the hardware will nearly systematically give you a floating point result; you must first convert to an integer with CEILING(), FLOOR(), INT(), ROUND(), ROUNDDOWN() or ROUNDUP() and do same scaling

Details of computations are described extensively in The Art of Computer Programming, Vol. 2, Seminumerical Algorithms (by Donald Knuth).

As said, all calculations are done in integer mode. Display (fixed point with 2 decimals) and internal representation (integer) are separate, though related, things. Formatting will place the decimal separator where you want it so that sums look like what you're accustomed to.

In case you find this is a lot of fuss, the alternative is to use ad-hoc accounting dedicated applications. But check that the developers did things correctly, i.e. didn't use IEEE-754 but implemented multi-precision arithmetic. Decades ago, the COBOL language was invented to address all these accounting issues (because in the end there are legal and tax mandatory rules to abide by). In COBOL, you could require that all computations be done in decimal mode to stick as nearly as possible to real life constraints. But even with this feature, you could botch your results.

To show ... (more)

edit flag offensive delete link more

Comments

Note also that there's no "universal integer mode". There's an integer mode of some programming language; an integer mode of some ALU; and - suddenly integer modes of accounting rules (which state e.g. what to do when you calculate percentages, and need not to be consistent there even with themselves - they may apply one way when you calculate taxes, and another for other calculations, and even for different taxes, inside one legislation, not mentioning differences between countries).

So dedicated accounting softwares "do things correctly" because they are working in the specific known constraints. There's no single universal rule, and hence no silver bullet for general-purpose software like Calc.

Mike Kaganski gravatar imageMike Kaganski ( 2021-02-02 09:25:52 +0200 )edit

@ajlittoz: :-) :-) :-)
thank you very much! i need additional ideas and and that other people 'poke around in mine ...'
- integer arithmetik: yes, did so sometimes,have seen even banks switching to 'cent amounts' in reporting,
- all is integer: no, once you reach 'uneven fractions' (working 9 hours for 100 bucks, bucks per hour?, 19% VAT) you need either fractional numbers, or approximations with e.g. decimal or binary values,
- target: is not! to get my special calculations solved, but to improve calc in general in a way that it produces less irritation for 'normal' users with 'normal' math knowledge, or 'better results for a wide range of different tasks',
@Mike Kaganski: yes, that's additional problems, my target is not 'correct accounting (for a special country)' but 'correct math', once we get that everybody can apply it to his legal rules as neccessary,

newbie-02 gravatar imagenewbie-02 ( 2021-02-02 13:46:12 +0200 )edit

As long as you remain inside the "validity domain" of your [arithmetic] assumptions, Calc delivers fairly good results. The "validity domain" is voluntarily not defined because it depends on numerous factors related to your data and intent.

Calc always does "correct math" over the subset of Q (the set of all fractions) corresponding to the IEEE-754 range. This subset has well known numerical analysis properties, some of which are not obvious and intuitive for the majority of users (and, yes, this leads to irritation and frustration). For example, the 4 arithmetic operations are not internal laws over the set (results cannot always be represented by a member of the set).

In your example of 9 hrs for 100€, read what I wrote about division. For 19%, do not use the %-unit, by write 19 followed by a division by 100. Yes, this is tedious. But a sheet may have hundreds ...(more)

ajlittoz gravatar imageajlittoz ( 2021-02-02 14:06:01 +0200 )edit

… base all your computations on the "technical cells".

If your goal is ultimate accuracy, this is the cost. You can't do without a careful analysis of your application and formulas (as in any production-quality application).

The problem is the long-standing belief that computers are always right. Right for integer calculation provided no overflow happened. Wrong for floating-point because precision is limited and you incur a radix conversion (we are used to think in 1/10 radix while IEEE-754 uses 1/2 radix). Note carefully I mentioned a fractional radix and not an integer radix because this is the way IEEE-754 is built! This is the origin of the difficulty because 5 (half of 10) and 2 are primes.

As a first approach, "normal" math knowledge is sufficient if you realise fraction 1/3 cannot be exactly represented in base 1/10. It is an infinite sequence 0.333… It ...(more)

ajlittoz gravatar imageajlittoz ( 2021-02-02 14:16:03 +0200 )edit
0

answered 2021-02-02 07:58:14 +0200

updated 2021-02-02 09:00:35 +0200

In your sample document, you are rounding the result of subtraction to the position of 15th decimal digit of the greater of the original values. So your idea is to limit the precision of calculation to "fix" catastrophic cancellation.

Your idea is not new; also it is not correct. Your idea is about assumption that what is displayed in cell is closer to "ideal" value than what is actually there. It might be true only with very limited set of cases, where the values in cells are manually typed there by user, and only addition/subtracted is used.

But it would make most real calculations worse. Consider any trigonometric functions where you use radians, or any logarithmic functions, or even simple multiplication/division. There you get more "significant" digits than were in initial numbers that user had entered. And arbitrarily limiting the precision of those numbers would result in much worse catastrophes universally.

For instance, take this simple calculation of well-known triangle and its scaled variant:

        A                  B
1       3               =A1*PI()
2       4               =A2*PI()
3       5               =A3*PI()
4 =SQRT(A3^2-A2^2) =SQRT(B3^2-B2^2)

In ideal world, A4 must be equal to A1, and B4 to B1. In Calc, although the value of PI() is naturally imperfect, RAWSUBTRACT(B1;B4) also shows 0. But if in B4 we replace SQRT(B3^2-B2^2) with your SQRT(ROUND(RAWSUBTRACT(B3^2;B2^2);15-MAX(ROUND(LOG(ABS(B3^2);10);0);ROUND(LOG(ABS(B2^2);10);0)))), we suddenly get the difference between B1 and B4 by 1.77635683940025E-15.

Carefully prepared limited-precision input in your sample is not representative. Note that I don't talk about performance here, I only talk about precision that you try to achieve. Your approach would fail miserably in all but simplest cases. Current implementation of subtract is of course not ideal (in this non-ideal world), but you may notice that it still tries to be reasonably smart using rtl::math::approxSub which handles near-zero result (unlike the implementation of RAWSUBTRACT, which tries to be simple).

edit flag offensive delete link more

Comments

hello @Mike Kaganski,
thanks for your help,
rounding acc. the greater operand may be wrong, acc. the smaller may be better, it's experimental,
i just stumbled over 29,999999999999900000 if i wrap rawsubtract in a macro:

function rawsubtract_a (ByVal darg1 as Double, darg2 as Double)
oFunctionAccess = createUnoService( "com.sun.star.sheet.FunctionAccess")
rawsubtract_a = oFunctionAccess.callFunction( "RAWSUBTRACT", array(darg1, darg2)) 
end function

and call '=RAWSUBTRACT_A(33,0000000000001;3,0000000000001)',

i used it in an attempt to stay away from 4-bit truncations and similar, will try to improve,
it's a general problem being limited to use imperfect tools to improve just these tools ...
'trigonometric functions': yes, but:
1. the average user wants correct subtractions first,
2. the fail may result from other shortcomings than my idea? see above,
3. I don't refuse to improve my idea, I just refuse to carry on too glaring errors in calc,

newbie-02 gravatar imagenewbie-02 ( 2021-02-02 13:15:40 +0200 )edit

my samples are targetting 'massive cancellation' when subtracting similar values,
@Mike Kaganski's sample deals with values with significant difference,
adapting the formula to apply less tight rounding to 'more different' values should be simple ... as - perhaps that's a good idea - don't touch integers?
in the end it may require a 'case differentiation' between 'one operand zero', 'addition', 'subtraction of identic', 'subtraction with very small difference', 'subtraction with small difference', 'subtraction',
why not, progress justifies some effort ...
the performance part will come later, it is! important, i think i can! solve,

newbie-02 gravatar imagenewbie-02 ( 2021-02-02 13:17:51 +0200 )edit

'trigonometric functions': yes, but:
1. the average user wants correct subtractions first,

No. Please write "I want that first". It's the usual thing, that you claim something to be general rule when you have no idea about demographics of usage of the product. There's no "But" here, it must be correct everywhere.

  1. the fail may result from other shortcomings than my idea? see above,

Absolutely unclear what you mean.

  1. I don't refuse to improve my idea, I just refuse to carry on too glaring errors in calc,

There's nothing to discuss until there's an idea. There is clear understanding that what you ask is generally impossible, among those who have substantial knowledge in the topic. You insist on "it must be possible, I don't believe you, I'm not listening lalala". Ok, that's fine, but that's your task to come with the ...(more)

Mike Kaganski gravatar imageMike Kaganski ( 2021-02-02 13:25:57 +0200 )edit

There's a general rule: extraordinary claims require extraordinary evidence. Your claim ("there is some method to make it perfect or at least generally better using IEEE numbers") is extraordinary one. It doesn't mean it's necessarily wrong, (although I'm sure it is) but it needs extraordinary proof. E.g., a great analysis which would prove it's (more) correct in general case, or that tests the whole range to confirm that claim. Yes, testing square of 4 billion possible numbers (to at least prove it for 32-bit float case) is impossible. But then at least use some possible number of uniformly spread numbers, and make an analysis on those.

Mike Kaganski gravatar imageMike Kaganski ( 2021-02-02 13:49:49 +0200 )edit

@Mike Kaganski:
- you're right, rounding results makes sense with 'round' operands, and may harm 'special values' (besides that the accuracy you use is outside the '15 digit range'),
- importance of targets ... imho 95% of calculations with calc are accounting and similar, less than 5% of users will use pi() or similar anywhere,
- my wish: fulfill both! targets,
- in an attempt to decide where to round and where not and to use '-' as you said better than rawsubtract i stumbled over an old problem:
@ajlittoz: i doubt in calcs "correct math" (over the subset of Q ...), try following calculation:

A1: =22
B1: =63,999999999999+A1*10^-14
C1: =-63,999999999999
D1: =A2+A3
E1: =RAWSUBTRACT(A2;-A3)

vary A1 between 30 and -30 and observe results in D1 ... other people had a similar wish - correct math - and to achieve that for some special cases (result '0') they äh ... (brutally raped ...(more)

newbie-02 gravatar imagenewbie-02 ( 2021-02-02 21:45:52 +0200 )edit

imho 95% of calculations with calc are accounting and similar, less than 5% of users will use pi() or similar anywhere,

Ah that "imho" again. But if your estimation is correct, then LibreOffice is perfectly fine for most of its userbase as is, otherwise they wouldn't use it there. Your changes would benefit no significant percentage of userbase, and will hurt most (even if you invent something working).

rounding results makes sense with 'round' operands

There's no way to know which operand is "round". The very idea is flawed. The problem you try to workaround is exactly that the numbers have some data in any bit of its mantissa, and there's no property that could tell "round" numbers from "special". Even trying to count zeroes before the tail in decimal representation would fail, and break deliberate operation with small deltas.

Mike Kaganski gravatar imageMike Kaganski ( 2021-02-02 21:56:39 +0200 )edit

@newbie-02: what are A2 and A3 in your sample?

Anyway, I won't try your sample. Adding numbers which differ by 14 orders of magnitude is bound to give inaccurate results. 14 orders is ~46 bits. When the mantissas are scaled before addition, only ~10 bits remain is shifted A1. This means shifted A1 is known only to ~1/1000=0.1%. What do you expect with such a loss of accuracy?

This is what mathematics tell you when you take into account IEEE-754 specification.

I unfortunately lost the reference to an algorithm dealing with such a case (huge difference of order of magnitude between numbers in addition/subtraction). The algorithm proceeded by careful order of operations. You pay the improved accuracy (but not exact mathematical precision) with an increase in computation time.

ajlittoz gravatar imageajlittoz ( 2021-02-02 23:26:06 +0200 )edit

sorry, i failed in numbering:

A1: =22
A2: =63,999999999999+A1*10^-14
A3: =-63,999999999999
A4: =A2+A3
A5: =RAWSUBTRACT(A2;-A3)

vary A1, observe A4,

"There's no way to know which operand is "round"." - 0,5 0,25, 0,125, 0,1 : round, pi() odd, 0,07 should be round, but calcs 'specialities' makes it difficult for me to calculate,

"and break deliberate operation with small deltas." - but that's exactly what was done with the 4-bit cut, and it bothers me immensely ... the sample above is a try to show that ...

for every try to triage between meaningful content and fp-artifacts i need to get a handle on it, that's blocked in calc :-(

newbie-02 gravatar imagenewbie-02 ( 2021-02-02 23:42:48 +0200 )edit

"There's no way to know which operand is "round"." - 0,5 0,25, 0,125

These numbers are already "round". So they need no more rounding. The text was: "rounding results makes sense with 'round' operands", and I assumed that the quotes around 'round' meant that you talked about numbers that are close to round. If you have truly round operands, you need no rounding of results. If your operands are not round, as in 0.1, you can't tell if they are supposed to be round.

I unfortunately lost the reference to an algorithm

@ajlittoz: tdf#137679.

Mike Kaganski gravatar imageMike Kaganski ( 2021-02-03 06:16:52 +0200 )edit

@Mike Kaganski: Thanks, it was indeed Kahan algorithm.

@newbie-02: you don't put yourself in the correct math context. You stay in the assumption that the radices are 10 and 2 (integer radices) which is what we are taught in college classes. The hardware truth is the radices are indeed 1/10 and 1/2 (fractional radices) and the story is totally different. It is impossible, except for rare cases, to convert exactly from one base to another one. It is a matter of primality. Read Knuth for mathematical details.

All you can do is to exchange the hardware idiosyncrasies for smart chain calculation. Your sample dives head first into the traps: once you computed A2, you can no longer compensate. You lost A1 precision and anyway it is likely that the result of the addition does not exist in the IEEE-754 subset, needing therefore a rounding to a close ...(more)

ajlittoz gravatar imageajlittoz ( 2021-02-03 07:59:35 +0200 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2021-02-01 10:50:42 +0200

Seen: 109 times

Last updated: Feb 02