Ask Your Question
0

I want to create a spellchecker for Bengali language. How can i do that ?

asked 2019-02-03 09:20:43 +0200

shahrior gravatar image

updated 2019-02-03 09:29:30 +0200

I'm from Bangladesh & a native Bengali. I use Libreoffice to write my official document. I spot that there are no Spellchecker for Bengali language. So i decide to create one. But i don't know how to do that. Can someone help me to find out how to create a spellchecker ??

edit retag flag offensive close merge delete

3 Answers

Sort by » oldest newest most voted
0

answered 2019-04-07 00:15:38 +0200

Richard Wordingham gravatar image

The .dic and .aff files can be created using a plain text editor - I use Emacs. Hunspell is supposed to support 'morphologically complex languages, such as Hungarian', but it has trouble with languages like Sanskrit and Pali, and that's without considering sandhi between words. For compound affixes, I ended up writing a program to merge repetitive application of affixes. Hunspell can handle one prefix plus one suffix on a word, or two suffixes on a word, but fails at three suffixes.

As to packaging, if you can sort out Bengali affixes, you can probably do as RGB-es suggests. There is a packaging tool at https://github.com/silnrsi/oxttools ; however, I couldn't get it to install.

edit flag offensive delete link more
0

answered 2019-02-03 11:36:55 +0200

RGB-es gravatar image

There is a dictionary editing tool called Proofing Tool GUI, by Marco Pinto, but I've never used it.

edit flag offensive delete link more

Comments

Thanks. With is tools I can create a dictionary .dic & .aff file. Any Idea how can I pack it into a LibreOffice extension?

shahrior gravatar imageshahrior ( 2019-02-03 12:01:19 +0200 )edit

You can always take an existing dictionary extension, unpack it (it's a zip file with a modified file extension) and replace all files in it. You need to also edit the files description.xml and dictionaries.xcu to point to the new files and set the language, etc.

RGB-es gravatar imageRGB-es ( 2019-02-03 15:27:16 +0200 )edit
0

answered 2019-02-04 08:04:26 +0200

gabix gravatar image

I don't think there is any simple guidance on creating spellcheck dictionaries for LibreOffice. You might want to consult Hunspell documentation, but it is full of technicalities. So, here are some quick tips:

  1. Dictionaries are packed as extensions being ZIP archives (.zip is replaced with .oxt for convenience, but this is not a must actually). So, you can use any archive/compress program for the task.
  2. Extensions have certain structures, so, as RGB-es points, download a couple of existing extensions to understand it.
  3. Dictionary/affix files are plain-text files, so you can use virtually any text editor to edit them. As I understand, Bengali uses a complex script (Devanagari?), thus, the text editor that you will use must support UTF-8, that is the only requirement. Highlighting XML markup is a plus when you edit XML files (an extension must include at least three ones).
  4. For a totally new language, I would start from a mere word list, i. e. simply collecting all words and their forms in the .dic file and leaving .aff file empty with only one line with the encoding declaration as follows:

    SET UTF-8

If Bengali is not a highly inflective language, that's enough. Otherwise, you should consider building word paradigms (this will reduce the dictionary size and make it easier to add new words), but that's for a later time when you'll begin to understand the syntax.

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2019-02-03 09:20:43 +0200

Seen: 119 times

Last updated: Apr 07