Ask Your Question
0

Locale independent lexicographical sort

asked 2016-06-14 15:42:54 +0200

hpekristiansen gravatar image

updated 2016-06-16 01:00:06 +0200

In my language(Danish), we have some diphthongs (ae, oe, aa) that are placed last in the alphabet. This affects the sorting in a bad way, when you are trying to sort technical terms. Are there some way to make LibreOffice sort purely lexicographical from the ASCII or Unicode order?

Edit - example sort:

This list:

Randers Aalborg København Ribe Arslev Malling

Sorted with danish:

Arslev København Malling Randers Ribe Aalborg

Notice the problematic Aa (lborg).

Edit:

The list is correctly sorted by danish rules. But I want a language independent sorting.

edit retag flag offensive close merge delete

Comments

You have as well an "ø" in your example as an Aa supposedly replacing an "Å". What if "København" was written as "Koebenhavn". Should the digraph "oe" then be treated differently as compared with the "ø"? I do not know anything about Danish, but I think "Aalborg" is rightfully placed at the end independently of whether it is written as above or as "Ålborg". What did I misunderstand?

Lupp gravatar imageLupp ( 2016-06-16 00:25:31 +0200 )edit

@Lupp: The list is correctly sorted by danish rules. As written, I do not want danish rules. I do not want any locale dependent sorting, but purely sorting from the ASCII or Unicode order.

hpekristiansen gravatar imagehpekristiansen ( 2016-06-16 00:55:41 +0200 )edit

Shall there nonetheless be allowed Danish special characters or any non-alphabetic characters or special characters from different languages? How should digits, spaces ... be treated? Shall upper/lower case be distinguished? As there is no sort option like "lexicographic by Unicode", these questions are relevant. Anyway you will have to accept some preparing expense for a workaround. Please understand that I want to know the actual conditions before I spend more time on this.

Lupp gravatar imageLupp ( 2016-06-16 01:19:35 +0200 )edit

@Lupp: I was hoping, that there were some option, that I overlooked that would alow me to do this. I do not have answers to all of your questions, as I need the option for several different tasks, but always with different technical words (file names, hash, generated passwords, -mixed letters, numbers and special chars). I can say for sure, that I do not whish to sort white spaces or non-printable chars. If the option does not excist, please do not use time on it.

hpekristiansen gravatar imagehpekristiansen ( 2016-06-16 14:01:25 +0200 )edit

@Lupp: Maybe you have some insight on how English sorting works compared to what I want? It could be the the same, but right now I have no way to know or find out.

hpekristiansen gravatar imagehpekristiansen ( 2016-06-16 14:03:27 +0200 )edit

@hpekristiansen: Unicode you refer to is simply "too big" for a generally applicable workaround. You should try to reduce the range of code points to somethimg like 32 through 255 decimal or "20" through "FF" hex, or to an explicitly given set of characters. Anyway I am afraid you may need a piece of user code to solve the problem a bit more efficiently, except you can assure a (rather small) maximum number of significant characters at the start of any generalized "word".

Lupp gravatar imageLupp ( 2016-06-16 14:15:52 +0200 )edit

As any locale has to use collators regarding the expectations of "standard users" there will be no simple remedy. The question is interesting, however, and I do not complain about the time I used. I just did not wnat to simply waste time, not knowing the exact goal.

Lupp gravatar imageLupp ( 2016-06-16 14:18:59 +0200 )edit

2 Answers

Sort by » oldest newest most voted
0

answered 2016-06-14 23:03:27 +0200

m.a.riosv gravatar image

Maybe I'm wrong but I can't see those characters at the end, using Danish as language for default style.

DanishSort.ods

It's possible create your own sort, put the characters in the order you like in a column, select it,
Menu/Tools/Options/LIbreOffice calc/Sort List - Copy list from
selected range is there, click Copy, the list is added.
To use it, it can be select in Menu/Data/Sort - Options - Custom sort order.

edit flag offensive delete link more

Comments

@hpekristiansen's concern were the letters Æ, æ; Ø, ø; Å, å. He gave a latin transcription.

Lupp gravatar imageLupp ( 2016-06-15 01:00:41 +0200 )edit

No my concern is actually not æ,ø and å, but the dipthongs (ae, oe and aa) see edit.

hpekristiansen gravatar imagehpekristiansen ( 2016-06-15 12:28:05 +0200 )edit

You do do not see it, because the problem is not single chars(glyphs) - see my edit.

hpekristiansen gravatar imagehpekristiansen ( 2016-06-15 14:43:51 +0200 )edit

@hpekristiansen: I still think I understood, but expressed my remark badly. Rectification: You gave a transcription you actually are using for technical purposes.
I do not think the above answer will help much. In fact I do not even understand it clearly. The workaround I designed myself is rather complicated, however. I will try to simplify it, and come back to this as soon as I can afford the time.

Lupp gravatar imageLupp ( 2016-06-15 15:23:53 +0200 )edit
0

answered 2016-06-15 18:48:25 +0200

Jens S gravatar image

You should look at this http://www.kat-format.dk/alfaregler.html As far as I can see, only "aa" which is sort like "å" will give you problems, so why not just use English as sorting language for your special purpose.

Jens S

edit flag offensive delete link more

Comments

All I want is a language independent sorting. I do not want or care about the danish sorting rules - they are wrong for my purpose. Maybe the best I can do is to use english, but I believe that can affect the sorting in other ways.

hpekristiansen gravatar imagehpekristiansen ( 2016-06-16 01:07:07 +0200 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2016-06-14 15:42:54 +0200

Seen: 106 times

Last updated: Jun 16 '16