Ask Your Question
1

how to make a thesaurus [closed]

asked 2013-12-21 19:20:31 +0200

bgliyanage gravatar image

We, a group from an Oriental Institution of Sri Lanka, are developing perfect hunspell dictionary for 'Sinhala' language. The available dictionary is of no use at all, since it has thousands of spelling mistakes. At the same time, we also need to develop a thesaurus for Sinhala that can work with Libreoffice and other linux-based software. Can someone help us explaining how to make such a thesaurus, please.

edit retag flag offensive reopen merge delete

Closed for the following reason the question is answered, right answer was accepted by Alex Kemp
close date 2016-03-01 16:56:45.017942

2 Answers

Sort by » oldest newest most voted
2

answered 2014-08-01 14:13:09 +0200

dnaber gravatar image

You could set up my thesaurus web-app from https://github.com/danielnaber/openth... . It can export its data to the format that LibreOffice understands.

edit flag offensive delete link more
1

answered 2013-12-23 13:15:35 +0200

oweng gravatar image

updated 2013-12-23 13:21:13 +0200

I probably can't explain all the available options in the thesaurus file format, but you can get a fair idea of what the DAT and IDX files contain by examining the en_US version:

$ head /opt/libreoffice4.1/share/extensions/dict-en/th_en_US_v2.dat
ISO8859-1
's gravenhage|1
(noun)|The Hague|'s Gravenhage|Den Haag|city (generic term)|metropolis (generic term)|urban center (generic term)
'tween decks|1
(adv)|between decks
.22|1
(noun)|twenty-two|firearm (generic term)|piece (generic term)|small-arm (generic term)
.22-calibre|1
(adj)|.22 caliber|.22-caliber|.22 calibre|diameter|diam (related term)
.22 caliber|1
$ head /opt/libreoffice4.1/share/extensions/dict-en/th_en_US_v2.idx
ISO8859-1
145866
's gravenhage|10
'tween decks|140
.22|175
.22 caliber|353
.22 calibre|438
.22-calibre|268
.38 caliber|693
.38 calibre|778
$ grep -re th_en_US /opt/libreoffice4.1/
/opt/libreoffice4.1/share/extensions/dict-en/dictionaries.xcu:                <value>%origin%/th_en_US_v2.dat</value>

As can be seen, dictionaries and thesauri are extensions. More information on hacking these files here.

edit flag offensive delete link more

Comments

Thank you very much for your help. We will check several thesauri for better understanding.

bgliyanage gravatar imagebgliyanage ( 2013-12-27 16:48:25 +0200 )edit

Question Tools

3 followers

Stats

Asked: 2013-12-21 19:20:31 +0200

Seen: 734 times

Last updated: Aug 01 '14