Ask Your Question
0

Who is the autor of en_GB.dic :

asked 2015-09-03 20:47:35 +0100

Nemzag gravatar image

updated 2015-09-09 21:34:19 +0100

Alex Kemp gravatar image

Hi, I would like to know who is the author of the English corrector file (en_GB.dic) because I would like to ask him if he can replace the standard "-" (U+002D) by the non‑breaking "‑" (U+2011) variant that would be much better... The words will not split at end of the line / phrase...

Please also you can do the same in the "fr.dic" files any languages using "-" (U+002D)...

Thank, good day.

edit retag flag offensive close merge delete

4 Answers

Sort by » oldest newest most voted
1

answered 2015-09-05 13:31:40 +0100

petermau gravatar image

updated 2015-09-06 12:32:03 +0100

I believe changing the hyphen in the dictionary would be technically difficult. HYPHEN-MINUS (U+2D) is valid in all the character sets Unicode but also US-ASCII, ISO-8859 etc and is a single byte (2D is less than FF). However NON-BREAKING-HYPHEN (U+2011) is Unicode and requires three bytes to describe and therefore not supported in US-ASCII, ISO-8859. This would stop the dictionaries from being general.

You could, however try seeing whether you could use AUTOCORRECT to modify the text on your system... Peter

Unicode UTF-8 includes both US-ASCII and ISO-8859-1. They use a single byte giving 127 or 255 characters. (U+FF). However, Unicode characters above (U+FF) can use two three or four bytes depending on the character and version of Unicode (UTF-8, UTF-16 or UTF-32), LibreOffice and the Internet defaults to UTF-8. (U+2011) requires three bytes in LibreOffice. Hence, what appears to be a simple single character change 2D to 2011 is not. The Unicode website http://www.unicode.org/ is a good source of online information. The Unicode Standard Manual I use gives a succinct 1400 page summary of the standard and is well worth reading to get a basic understanding of the issues.

edit flag offensive delete link more

Comments

Hi, man thank for these technical explanation, I still ask to replace the simple minus, with non-breaking one... And if you one to keep compatibility then make two version one only UNICODE and the others full support of ISO, ASCI UNICODE... If I was able to make the edit by my own for Libre Office then you can do it to...

Nemzag gravatar imageNemzag ( 2015-09-08 18:25:51 +0100 )edit

In addition to the answer by @petermau refer this question and my answer.

oweng gravatar imageoweng ( 2016-01-03 12:32:21 +0100 )edit
0

answered 2015-09-05 18:11:03 +0100

Nemzag gravatar image

updated 2015-12-12 14:11:22 +0100

Hi Shm_get, I don't understand what you are trying to say... It is no very clear, because a compounded vord always use a "-" but it would be much better to use non-breaking one (U+2011), like the compounded vord always remain complete if it is placed at the end of the phrase.

You confound syllabic separation with compound term using two or more complete vords... For the kind of separation of that the link you showed, it is better to use hyph-en "‧⁠" (U+2027).

In this manner : "PEUT‑Ê‧⁠TRE".... But no one use this kind of "PEUT‑Ê‑⁠TRE" separation in real life but this one "PEUT‑ÊTRE".

About the compatibility the Unicode Standard is best... I can't replace all the time the minus, with the non-breaking one in Notepad, in web-navigator or else... Hope that you drop support for ISO version and ASCI. And use only Unicode one. We can't limit our self because of primitive technology... Same problem exist with ancient music that were compressed with MP3, and there are now archived in that format with a considerable loss of quality...

In my case I already replaced the minus in DIC with the U+2011 but I can't change the firefox one I don't know where he is and I believe it is compressed with an unknown tool...

At least the creator can make two version one with U+2011 the others one with standard Minus... And the user choose which one he want to install...

Creating the U+2011 take 10 seconds "CTRL+H" replace "U+002D" with "U+2011"... Then compress to use in Firefox...

edit flag offensive delete link more

Comments

no in French peut-

être

is a _valid_ césure

but peut-ê-

tre

is not

" the compounded word always remain complete if it is placed at the end of the phrase" that is _not_ a requirement in French; As I said, in French _if_ a word containing a '-' is to be cut it _has_ to be cut at the existing '-' and nowhere else. The rules to cut words are different in different language, so what you propose may be fine in en_GB, but "Please also you can do the same in the "fr.dic" => No.

shm_get gravatar imageshm_get ( 2015-09-06 19:53:54 +0100 )edit
0

answered 2015-09-04 13:07:48 +0100

Regina gravatar image

Do you mean this dictionary? http://marcoagpinto.cidadevirtual.pt/...

edit flag offensive delete link more

Comments

May be I don't know, which one it is and who is his author, I just ask him to replace the minus (U+002D) with non-breaking (U+2011)...

Nemzag gravatar imageNemzag ( 2015-09-05 01:21:24 +0100 )edit
0

answered 2015-09-05 06:28:52 +0100

shm_get gravatar image

I do not know in English, but in French that would be a mistake see

The rule in French is: In composed words using a '-', the only possible cutting point is the '-' itself, without inserting another one.

for example. http://www.plumefrancaise.fr/fr/c%C3%...

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2015-09-03 20:47:35 +0100

Seen: 286 times

Last updated: Dec 12 '15