Calc data sorting - not following ascii/unicode order?

demo.ods (8.6 KB)

As the attached ods file. When we try to sort column B with ascending, the result would be

A B
1 ,
4 ;
2 .
3 /

But according to the order of ascii/unicode, the order should be

A B
1 ,
2 .
3 /
4 ;

Is this a bug that I should report, or is there any options I have missed?

The sorting algorithm does not only take into account the order of Unicode points, but maybe also linguistic|locale -dependend rules.

If thats a serious issue for you, you may sort by python:

def demo_uni_sort(*_): 
    doc = XSCRIPTCONTEXT.getDocument()
    sel = doc.CurrentSelection
    sheet = sel.Spreadsheet
    cursor = sheet.createCursorByRange(sel)
    cursor.collapseToCurrentRegion()
    data = sorted(cursor.DataArray, key=lambda row: f"{row[1]}")
    cursor.DataArray = data

One of our local users reported this problem. In Excel it seems to follow the unicode order, but when he used Calc the result was not what he wanted.

Sorting string data following unicode order seems to be a reasonable behavior. Your example sorting with Python approves this. Is there any option in Calc to define the sorting algorithm?

Did he specify, what he wanted for Åäöü, accents etc. ? As the use of . and , in numbers also vary, even there may exist different ideas on sorting.
.
We may need a locale us-excel for US-Software, as defined by Microsoft for compatibility…

No, he didn’t specify anything. I’m not saying this is a bug, but I want to know what the default sorting algorithm is since I couldn’t find any document mentioning it.

At first I thought that following the ASCII/Unicode order is quite reasonable default behavior for sorting string data. But… on second thought, our locale is zh_TW.UTF-8, and the unicode order for Chinese numbers does not really follow the number order, so maybe it is really a language issue and can be costumed by the Sort List. I just need a reasonable explanation about the default sorting algorithm and a user friendly way to set the sorting algorithm to follow the unicode/ASCII order.

Have to check when I’m opening my Laptop next time. As a starting point there is the help on sort-options. The german umlauts are mentioned at “language”. Your local user also has locale zh_TW.UTF-8. ?
https://help.libreoffice.org/latest/ro/text/scalc/01/12030200.html

Our local users should have locale zh_TW.UTF-8, and in the Data - Sort - Options - Language the default value is Traditional Chinese. In the help document you mentioned it explained a bit about the different behavior in Asian language. Thank you.

However, even if I set the language to English(US), the result is still the same (not following ASCII). Even if I open Calc with locale en_US.UTF-8, the result is still the same too.

I tried opening it under en_US.UTF-8 locale and English language environment (i.e., no “Asian” language issue affecting), and still got the same result (first , second ; third . fourth /). How does the result come from? It’s not explained in the help document.

I spoke to soon - it seems Excel also uses locale-dependend collation, but maybe not the same as Calc or settings may differ…

@mikekaganski @Wanderer
Not exactly related, but discovered meanwhile working on this topic calc sort-dialog crash with Language-option
can somebody confirm or does it happen only to me with:

Version: 7.0.4.2
Build ID: 00(Build:2)
CPU threads: 4; OS: Linux 5.15; UI render: default; VCL: gtk3
Locale: de-DE (de_DE.UTF-8); UI: de-DE
Raspbian package version: 1:7.0.4-4+rpi1+deb11u3
Calc: threaded

@franklin has also reported
https://bugs.documentfoundation.org/show_bug.cgi?id=153387

I guess ICU-libs are used for collation, but where the used tables are set I don’t know.

and a lot of background and links from people with more knowledge of LibreOffice -sources are here:

I could verify the sequence you found

checking the “locale explorer” mentioned in the link above for my language:

This seems to be a complex question. One workaround might be if you have a third column, C, you could enter in C1 the following =UNICODE(B1), drag down and sort on that.

Only you maybe. Quick test with 7.3.6.2 and 6.4.7.2 (german to US/french sorting)

That may had been tdf#152950.

1 Like

It’s not a bug, it’s a feature ;-)

Sorting order is defined by the Unicode Collation Algorithm (UCA) and LibreOffice uses the https://icu.unicode.org/ implementation. Collation details may even depend on locale. See ICU Collation Demo .

For zh-TW there is some specific tailoring for the different algorithms, see i18npool/source/collator/data/ (path) in projects: core - OpenGrok search results the zh_TW_*.txt files.