How to find how "implementation-defined" things should work?

Lupp · April 8, 2021, 11:06am

There are specifications for ODF (V 1.3) leaving things to be “implementation-defined”. An interesting example is the specification of the TEXT() function in part 4, seriously complicated by the overwhelming treasure introduced by “l10n”.

6.20.23 TEXT
Summary: Return the value converted to a text.
Syntax: TEXT( Scalar X ; Text FormatCode )
Returns: Text
Constraints: The FormatCode is a sequence of characters with an implementation-defined meaning.
Semantics: Converts the value X to a Text according to the rules of a number format code passed as FormatCode and returns it.

I still need to guess concerning what’s the “implementation-defined meaning” for LibO Calc. I know the format codes, of course - for what I would assume the “null-locale”, and the codes for actual locales (number >> 100) should be found in the help systems per locale.
However, I don’t know if or in what way TEXT() looks at the application-locale or a kind of document-locale, or at the locale chosen under Numbers/Language for a CellStyle or a cell.

Some formulas/functions guess themselves a format for the result, sometimes depending on the format(s?) of referenced cells…

Where is all the related behavior specified?

Is there a rule like “Don’t implement features with an intended behavior too complicated for a clear specification”?

mikekaganski · April 8, 2021, 11:07am

for PDF (V 1.3)

Likely a typo (ODF is likely meant)

Lupp · April 8, 2021, 11:32am

Rectified. Thanks. Accuracy of sighting may even more decline with age.

erAck · April 10, 2021, 8:54pm

The TEXT() function is an interoperability and portability nightmare, specifically because MS is/was en-US centric but also chose localized number format code keywords for some locales and separators may differ between locales and there’s no official way to tell the function which locale was used. It’s one of the most crappy things MS-Excel came up with (though probably due to their intention to kill Lotus 1-2-3 and thus implement everything it had but never ever did a cut to stop their own nonsense additions). If you can avoid the TEXT() function then avoid it. But it’s so widely used that resistance is futile…

That being said, implementation is utterly atrocious trying to determine in what locale the format code could actually be and whether it uses localized keywords or not.

If the formula cell has a number format assigned, that format’s locale is taken as a hint for a first guess, else the LibreOffice default locale (which may follow the system locale or be set fixed to a specific locale under Tools->Options). Let’s call this Locale1.

If the format code argument exists already as a number format code for that determined Locale1 then everything is fine and that number format is used [case1].

Else if Locale1 is en-US then the format code argument is scanned in the en-US locale [case2]. If that fails then no other case is tried and an error returned.

Otherwise [all other cases], the format code argument is tried whether it already exists for the en-US locale [condition3].

Also tried is to scan the format code argument in en-US and convert to Locale1 [try4]. From that a valid format entry may result [case5] (or not).

If condition3 is true and try4 was successful, then the resulting format case5 is used.

If condition3 is false and one of try4 failed or the resulting format code string is equal (no syntactical changes) [condition6], then the format code argument is tried to be scanned in Locale1 with locale specific keywords. The result may be a valid format entry [case7], replacing case5 (or not).

Otherwise, if condition3 is false and condition6 is false, then the format code argument is tried to be scanned in Locale1 and converted to en-US. If that is successful and the resulting format code is not equal [condition8], then the format code argument is scanned in Locale1 with localized keywords. From that a valid format entry may result [case9], replacing case5 (or not). If condition8 is false, then there’s still the case5 format entry.

Now if there is a valid number format it is either case5, case7 or case9 (case1 and case2 were already used and ended). No valid number format returns error.

Geez, I’d rather have pointed you to the source code instead of writing this down in prosa from reading the source code

And no, the behaviour is not specified, it has grown empirically trying to suffice the needs of users. There’s also no rule to not implement features that would be too complicated to specify (or rather, even if such thing like this behaviour would had been specified it would had turned out to not be sufficiently specified). You’d also not want to not have TEXT() implemented at all… Btw, some OpenOffice.org feature specifications are still available.

So, to recap: If you have to use the TEXT() function then try to avoid locale dependent separators. Also, try to stick to English format code keywords (i.e. date YMD instead of JMT or AMG or …); which helps in LibreOffice but maybe not in Excel or even does not work there.

Some formulas/functions guess themselves a format for the result, sometimes depending on the format(s?) of referenced cells…

That’s mostly for arithmetic operations (+,-,*,/) i.e. adding currency values or times or adding days to dates and if the result carries a value type a corresponding number format is applied if the cell did not have one applied before.

Lupp · April 11, 2021, 11:24am

A first “Thanks a lot!” for this very valuable “digital lecture”. Despite you profund knowledge it surely cost you some time. (Getting such facts from C++ code is next to impossible for me. There are even cases where I no longer understand my own old Pascal code - not for bad structure but for my age probably.)
I wiil need to post another comment later. Je me sens un peu faible…
Concerning TEXT() a function TEXT.EXPL() with a third parameter for the locale (3 semicolon-separated parts) would help for LibO-internal use by those who like unambiguous terms. Concerning interoperability MegaShame will always find ways to sap it.

How to find how "implementation-defined" things **should** work?

How to find how "implementation-defined" things should work?