how to make a thesaurus [closed]

asked 2013-12-21 19:20:31 +0100

bgliyanage gravatar image

We, a group from an Oriental Institution of Sri Lanka, are developing perfect hunspell dictionary for 'Sinhala' language. The available dictionary is of no use at all, since it has thousands of spelling mistakes. At the same time, we also need to develop a thesaurus for Sinhala that can work with Libreoffice and other linux-based software. Can someone help us explaining how to make such a thesaurus, please.

Closed for the following reason the question is answered, right answer was accepted by Alex Kemp
close date 2016-03-01 16:56:45.017942

answered 2014-08-01 14:13:09 +0100

dnaber gravatar image

You could set up my thesaurus web-app from . It can export its data to the format that LibreOffice understands.

answered 2013-12-23 13:15:35 +0100

oweng gravatar image

updated 2013-12-23 13:21:13 +0100

I probably can't explain all the available options in the thesaurus file format, but you can get a fair idea of what the DAT and IDX files contain by examining the en_US version:

$ head /opt/libreoffice4.1/share/extensions/dict-en/th_en_US_v2.dat
's gravenhage|1
(noun)|The Hague|'s Gravenhage|Den Haag|city (generic term)|metropolis (generic term)|urban center (generic term)
'tween decks|1
(adv)|between decks
(noun)|twenty-two|firearm (generic term)|piece (generic term)|small-arm (generic term)
(adj)|.22 caliber|.22-caliber|.22 calibre|diameter|diam (related term)
.22 caliber|1
$ head /opt/libreoffice4.1/share/extensions/dict-en/th_en_US_v2.idx
's gravenhage|10
'tween decks|140
.22 caliber|353
.22 calibre|438
.38 caliber|693
.38 calibre|778
$ grep -re th_en_US /opt/libreoffice4.1/
/opt/libreoffice4.1/share/extensions/dict-en/dictionaries.xcu:                <value>%origin%/th_en_US_v2.dat</value>

As can be seen, dictionaries and thesauri are extensions. More information on hacking these files here.

Thank you very much for your help. We will check several thesauri for better understanding.

bgliyanage gravatar imagebgliyanage ( 2013-12-27 16:48:25 +0100 )edit

