Ask Your Question
1

Is my understanding of how spelling dictionaries work correct?

asked 2018-02-25 19:36:35 +0200

catbill gravatar image

I am trying to have an overall understanding of how the spelling dictionaries work and how they interact with each other. I have not come across one place that explains all of this. It would be helpful to know if my following assumptions are correct and if there are some important facts that I am missing. Here is what I understand:

For each language, there is a main dictionary (containing many words) that we do not have access to.

In addition, under “User-defined dictionaries” in the Writing Aids dialogue, there is a dictionary for a specific language, such as en-US [English (USA)], and another dictionary called Standard.

By default, the en-US dictionary comes with some words, but not as many as in the main dictionary. We can edit, add, and delete words in that dictionary.

By default, when we add words while conducting a spell check, the words are added to the Standard dictionary. We can also edit, add, and delete words in that dictionary. (So why are both en-US dictionary and Standard dictionary needed?)

We can create a new user-defined dictionary with a specialized vocabulary, such as for IT or medicine.

When we run a spell check after creating a new dictionary, we are given a choice of which dictionary to save a new word to.

The new Grammar By feature in LibreOffice 6.0 is only available after we create a new dictionary.

All user-defined dictionaries are associated with either one language or with all languages. We cannot choose to associate one with two languages at the same time, such as both UK and US English. However, we can readily change the language the dictionary is associated with.

All dictionaries associated with a specific language are in operation when that language is being used.

What do I have wrong? What am I missing?

edit retag flag offensive close merge delete

Comments

What is not working as you think it should?

robleyd gravatar imagerobleyd ( 2018-02-26 08:47:38 +0200 )edit

Nothing at the moment. I am just trying to understand how it works.

catbill gravatar imagecatbill ( 2018-02-26 17:59:20 +0200 )edit

2 Answers

Sort by » oldest newest most voted
0

answered 2018-02-26 07:22:03 +0200

gabix gravatar image

main dictionary (containing many words) that we do not have access to.

Not really. If it is a Hunspell dictionary, you can edit a main dictionary as it is merely a text file. Besides, you can have several main dictionaries for one language/locale. That's why I don't bother with user dictionaries.

edit flag offensive delete link more

Comments

Thank you. I still need a bit more clarification. Perhaps I don't understand what you mean by "main dictionary." Since there are many words that are not in the en-US English (for example), there must be another, larger dictionary with many more words. That is what I was referring to as the main dictionary. If that is not the right name, what it is called?

Or am I completely misunderstanding something?

catbill gravatar imagecatbill ( 2018-02-26 18:04:45 +0200 )edit

I don't know the right name. I call them main dictionaries, too :) What I mean is the fact that a spellcheck extension normally includes one big dictionary for a particular language, i. e. one main dictionary. However, you can package several dictionaries into one extension for one language, and all of them will be main.

gabix gravatar imagegabix ( 2018-02-26 18:41:39 +0200 )edit

Sorry, but I am still not quite clear.

There must be a big dictionary containing lots of words that we do not have access to. The en_US English dictionary simply does not have enough words. Are you saying that you call that bigger dictionary the main dictionary?

What do you mean by packaging several dictionaries into one extension? How can they be main if there is already a main dictionary?

catbill gravatar imagecatbill ( 2018-02-26 19:28:07 +0200 )edit

For each language, there is one big dictionary. It's installed via an extension (an extension may be bundled with the LO installer or may be downloaded and installed separately). I don't know, however, what do you mean under theen_US English dictionary.

gabix gravatar imagegabix ( 2018-02-26 19:45:41 +0200 )edit

Hmm, still trying to be clear. Let's try this:

What makes a dictionary a "main" dictionary in the sense that you are using "main"?

en_US dictionary is just an example. The point is that, presumably, such a dictionary is installed under “user-defined dictionaries” for whatever languages are installed. I am trying to understand the purpose of such dictionaries. How are they different from Standard dictionaries?

catbill gravatar imagecatbill ( 2018-02-26 22:56:24 +0200 )edit

What I mean under main are dictionaries bundled with LO or installed via extensions, not user dictionaries. See robleyd's answer.

gabix gravatar imagegabix ( 2018-02-27 06:55:51 +0200 )edit

Thank you for bearing with my questions. The last one remains, however: Why is there a dictionary named Standard and a language-specific dictionary such as en_UK English dictionary?

By default, words added during spell checking are added to the Standard dictionary.

The en_UK English dictionary comes with some words. Why aren't those words in the main dictionary? Why a separate one?

I am trying to understand the purpose of each type of dictionary.

catbill gravatar imagecatbill ( 2018-02-28 22:11:09 +0200 )edit
1

answered 2018-02-26 23:59:18 +0200

robleyd gravatar image

This based on a default install of LO 5.3.7 for Windows 7; file locations for other OSes will be different.

Go to Tools | Extension Manager and just select to display the extensions bundled with LO In my install I then see extensions for English, French and Spanish spelling dictionaries, hyphenation rules etc. (and a few other non-dictionary extensions) These are what you might call the xxx language dictionaries, i.e. English language, French language.

These extensions can be found in C:\Program Files|LibreOffice 5\share\extensions in their own directories dict-xx where xx represents the language. If you look in dict-en you will see a number of en_XX.dic and en_XX.aff files - the XX representing the various regional English versions (we all speak the same English, no?). dic files contain the base words used by spell check and aff file contain the affix rules for the particular region.

These are the "main" or language/region specific dictionaries, perhaps Primary would be a better word?; the user-defined ones, which are activated and which are associated with the Primary language, are used in addition to the Primary to spell check your document.

I think every file in the dict-xx directories are basically plain text files, so feel free to investigate the contents with a text editor such as notepad.

Why are there more or less words in some dictionaries? Ask the person who put the dictionaries together - these are sourced from outside LO; the README files may be interesting.

If this answer helped you, please accept it by clicking the check mark ✔ to the left and, karma permitting, upvote it. If this resolves your problem, close the question, that will help other people with the same question.

edit flag offensive delete link more

Comments

If you are actually having an issue with spell check, this trouble-shooting guide may help.

robleyd gravatar imagerobleyd ( 2018-02-27 01:00:13 +0200 )edit

extensions can be found in C:\Program Files|LibreOffice 5\share\extensions

That is true only for bundled extensions. Dictionaries installed by user, like any other extensions installed by user, are in <LO user profile directory>/user/uno_packages/cache/uno_packages/ (and browse through all that stuff).

file in the dict-xx directories are basically plain text files

Not really. Hunspell dictionary and affix files, yes, are plain-text files. Other files may be XML or else.

gabix gravatar imagegabix ( 2018-02-27 06:59:17 +0200 )edit

I am referring only to the bundled extensions - hence use of the phrase "these extensions".

Python, XML and the like are plain text that can be opened with a text editor, but yes, the th_enXXX files are not text and wouldn't open happily in Notepad.

Answer edited

robleyd gravatar imagerobleyd ( 2018-02-27 07:27:02 +0200 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2018-02-25 19:36:35 +0200

Seen: 345 times

Last updated: Feb 26 '18