Modifying a thesaurus

How do you add words and connections to a thesaurus?

For example, looking for a synonym for a school, as in an educational institution
LO gives me: institution or establishment
Meanwhile, in my roget’s international thesaurus (5ed), in entry group 567.1, I get:
School , educational institution, teaching institution, academic or scholastic institution, teaching and research institution, institute, academy, seminary, schule (Ger), ecole (Fr), escuela (Sp); alternative school; magnet school
In addition, there are 15 other entry groups related to schools, of which, 14, 15, and 16 are lists of school types or school classifications.
Yes, there is some overlap, but the number of possible alternatives is much higher then is presented in LO.

for that nitpicker: win10, LO 7.6
not that it matters, it’s an external resource referenced by LO

Please take a look on Adding/Updating bundled Dictionaries

While I had not progressed to the point of even considering publishing the modified thesaurus, this is a good point, and I will definitely include the link at the top of the instructions

Not easily unfortunately. There is no built in method for doing so in LibreOffice, but if you’re up to building your own language files to use with LibreOffice you could start here: GitHub - silnrsi/oxttools: Tools for creating language support oxt extensions for LibreOffice

Thank you, figured it wasn’t going to be easy, thesaurus’ (thesauri?) are notably complex, 467 pages of my 1299 page, standard sized, thesaurus is the index.

So I found the th_en_us_V2 dat & idx files
Looked somewhat easy, then I realized that the idx file indexes by character, not by line. So if I add entries into the dat, I need to add values to every index after the data line that I altered.

There are 349,814 lines in the dat file, and 145,868 in the idx file,
wonder if I can make a calc book to manage the heavy lifting (1,048,576 rows available)
:tired_face: I really need to quit while I’m ahead sometimes

Little suggestion for the next major revision, (I mean, when you change how the language files are formatted or addressed), maybe index by line number next time, if that’s possible. I know that is not something to do now; it’s way too much for something that’s really not meant to be user configured.

1 Like

Since you’re already half way down the rabbit hole you might find this interesting also.

https://wiki.documentfoundation.org/Language

Also if you want to make a suggestion (also bug reports etc) you can here:

https://bugs.documentfoundation.org/

actually…
started working on a calc sheet
it’s more approachable to casual users, even though a base would have been more applicable

the dat file was easy, just had to remember to count the non-printing char, which I was already aware of.
the idx file isn’t hard, I’ve got calc to generate the values for the idx file
all I need to do is go through and pull the indexed values into a different column
right now, they’ve still got the spacing from the dat sheet

1 Like

I was supposed to go to bed a few hours ago, maybe some sleep will help me figure out the last bits
I think I’m going to have to resort to macros, but I’ll work on it some more later

Can’t find anything that allows me to manipulate an array, in ways that I need to, that are part of calc core and not macros.

Just to keep those following my progress up to date:
-I have extracted that dat entries that need to be indexed
-however, the manner in which I have done so, has left me with blank cells that need to be removed
-As I see it, I need to read the column as a one dimensional array
-and either remove the null cells while reading, or remove them after I’ve created the array
-then paste the data into column A

Once that’s done, creating or modifying a thesaurus becomes much more accessible

But… what if multiple people are editing the same thesaurus
so I’m adding a page for combining two dat files

So, I guess I’m back to writing VB, and figuring out the idiosyncrasies of LO macros.
Hopefully this should only take a few hours. :crossed_fingers:

Would love to see what you’ve done if you’re willing to share. It might make language customization more accessible at least in the short term.

That is the intent, many find the thought of installing github and repositories a daunting thought, which restricts them from contributing.
Data entry is tedious, but if there are those that would be willing if there was an easier way, well…

1 Like

Here’s what I’ve got so far.
had rl interrupt, so I still don’t have the arrays set up yet
There is a lot of material and fields that are there for error checking; the data used is from the stable th_en_US files, so if done right, column A of the idx sheet should match column P, with particular attention to the number of lines in the header
I think I’ve got enough documentation, at least for those who have an idea of what’s going on
I think I do need to delve into the language community to check my terminology on the parts of the files
I may also need to add a primer section, breaking down the syntax, so those not familiar with any coding can understand what’s going on. It’s possible that restructuring the pages may prove helpful as well

files too large to do a direct upload, even when zipped, so I’ve put it in my stash as an ods:
https://www.deviantart.com/stash/01bpsy6usk7p
edit:
I didn’t realize how long since I’ve done any coding, including vb or vba, so I vastly underestimated my runup time.
I know what I need the macro to do, it’s just figuring out how to best do it, and how to do it as a macro

Here’s the working version, again, in my stash cause it’s still big:

https://www.deviantart.com/stash/0112b7z941d5.
.
It took a bit over ten minutes to recreate the idx for the English (US) thesaurus
I don’t like having to go through the menus to activate the macro; it’s just something else to fluster a casual contributor
So I’m going to add controls :tada:; go figure
since I’m going to do that, figure I’d redesign the document
Sheet 1: instructions
Sheet 2 : buttons (generate idx, export files(?)), dat entries, idx output
Sheet 3: (stub right now) dat combiner w/ any controls it needs, possibly one to copy the combined dat to
Sheet 4: calc data manipulation; locked to prevent casual user from gumming the works, but no pass so others can maintain it should the standards for the files change
.
here’s my basic code, any ideas on how to steamline it:

Sub Main
	dim aIndexRawM() as string  'the data extracted from the idx sheet will be filled into this array
 	Dim oSheet 'Sheet Ref, sheet idx ()
 	Dim oRange 'range of cells to write to
 	Dim oCell 'Cell Ref
 	Dim oCell2
 	Dim lngArrayLimit as long
 	Dim oCellStr 'string from the cell
 	dim strCell as string
 	dim lngRange as long 'length of the aIndexRawM array
 	
 	oSheet = ThisComponent.getSheets.getbyindex(2) 'set sheet to idx
 	oCell=osheet.getcellByPosition(11,10)
 	lngArrayLimit=oCell.getvalue()
 	
 	redim aIndexRawM(lngArrayLimit) as string
 	
	Dim I as long 'index array
	Dim R as long 'reference to calc
	Dim A as long 'loop to write to calc
	I=0
	for R = 2 to lngArrayLimit 'L11 (11, 10) is limit, itterate down colum J (11)
		oCellStr=oSheet.getCellByPosition(9,R)
		strCell=oCellStr.getString()
		if len(strCell)>0 then 'check to see if the cell has a string in it
			aIndexRawM(I)=strCell 'assign string into the array
			I=I+1 ' increment array address by 1 after the value has been assigned
		end if
	Next R

	lngRange=I+1 'just to ensure that needed data isn't lost
	A=0
	for A=0 to lngRange
		oCell2=osheet.getcellByPosition(0,A+2)
		oCell2.setString(aIndexRawM(A))
	Next A
	 	 
 End Sub

It will get moved into a ‘generateIdx’ sub later
.
just realized that I forgot to add the part about activating the macro,
Oh well, I’ll start changing the layout sunday or monday and it won’t matter once I add buttons
.
found a few issues, mainly formula to cell linking, so I will likely have to leverage more operations through macros.
issues:
when adding lines, you have to select and drag columns B C & D from a few lines above the addition to a few lines below
Likewise, the formulas on the idx page remain linked to the cells that have shifted, which throws the table out of orientation. currently, the only fix is to copy the formula in cell F3, and paste it back into the column
.
My solution to these is going to be to paste the relevant formulas using a macro as the first set of operations

1 Like

I can’t view the file in your stash despite having logged in. It’s telling me only people with the link can view it despite me using your link.

I’ll see if it will let me add it to a gallery
Got it in the LibreOffice Projects
click on the DA block in the comment
this will take you to another page
then click on the three dots and an option to download should show up
.
There’s probably an easier way to do it, but I haven’t needed to link to non-art files before

The 3 dots don’t show, but it gives me a download link now :+1:

Nice work @NanoEther. Just had a quick look and you’ve explained it well and it’s easy to follow. That’s as far as I’ve gotten for now though. Hoping to put some time aside this week to have a more in depth look and play around with it.