How to make a thesaurus

We, a group from an Oriental Institution of Sri Lanka, are developing perfect hunspell dictionary for ‘Sinhala’ language. The available dictionary is of no use at all, since it has thousands of spelling mistakes. At the same time, we also need to develop a thesaurus for Sinhala that can work with Libreoffice and other linux-based software. Can someone help us explaining how to make such a thesaurus, please.

You could set up my thesaurus web-app from GitHub - danielnaber/openthesaurus: web-based thesaurus search and management. It can export its data to the format that LibreOffice understands.

I probably can’t explain all the available options in the thesaurus file format, but you can get a fair idea of what the DAT and IDX files contain by examining the en_US version:

$ head /opt/libreoffice4.1/share/extensions/dict-en/th_en_US_v2.dat
ISO8859-1
's gravenhage|1
(noun)|The Hague|'s Gravenhage|Den Haag|city (generic term)|metropolis (generic term)|urban center (generic term)
'tween decks|1
(adv)|between decks
.22|1
(noun)|twenty-two|firearm (generic term)|piece (generic term)|small-arm (generic term)
.22-calibre|1
(adj)|.22 caliber|.22-caliber|.22 calibre|diameter|diam (related term)
.22 caliber|1
$ head /opt/libreoffice4.1/share/extensions/dict-en/th_en_US_v2.idx
ISO8859-1
145866
's gravenhage|10
'tween decks|140
.22|175
.22 caliber|353
.22 calibre|438
.22-calibre|268
.38 caliber|693
.38 calibre|778
$ grep -re th_en_US /opt/libreoffice4.1/
/opt/libreoffice4.1/share/extensions/dict-en/dictionaries.xcu:                <value>%origin%/th_en_US_v2.dat</value>

As can be seen, dictionaries and thesauri are extensions. More information on hacking these files here.

Thank you very much for your help. We will check several thesauri for better understanding.