How do I get a word index to show up in my document? I managed to get a concordance file created but I can't get the index to display

Kurt_Matthys · March 22, 2022, 2:42pm

I have a 120 page philosophical document and want to add a word index with page numbers for each entry in the index. I have another document with the words that I want to index; they are all transliterated Sanskrit words if that makes any difference. I copied all of them into Notepad and saved it as an .sdi file. I’ve inserted a Index into the document using Insert → Table of Contents and Index → Table of Contents, Index, or Bibliography. When I right click on the index in the document and select ‘Edit’ from the dropdown, it brings up the appropriate pop-up. When I select the ‘File’ button on the pop-up, it shows my list of words, which is great. But I can’t figure out how to get the index of the words in the document. Remember that I just want to add an appendix to my document with a list of important words and where they are referenced. I don’t want any fancy formatting, just a list of words with the references. Is what I want possible? If so, can someone please help? I’ve been reading the help for days but can’t figure it out.

ajlittoz · March 22, 2022, 3:53pm

An .sdi concordance file is not just a list of words, one per line. It also contains other information needed to drive index build-up.

Ideally, the concordance file is built from the Insert>TOC & Index>TOC, Index or Bibliography dialog through the File menu, New or Edit item.

As a first aid remedy, open your concordance file in Notepad and paste ;;;;0;0 (4 semicolons, a zero, one semicolon, a zero) at end of every line. Save and update index.

Once you have your index, tune the concordance file parameters to get the index the way you like it.

Kurt_Matthys · March 22, 2022, 4:46pm

I guess that I wasn’t clear in my previous post that the .sdi file looks exactly as you mentioned. However, even saving it after I’ve edited it to make sure that it’s correct, and then doing an update doesn’t cause anything to be displayed in my document.

The second problem here is that I can’t figure out how to get the configuration UI to even format the index as I want.

ajlittoz · March 22, 2022, 5:16pm

Attach a 1- or 2-page sample file to your question with the .sdi file so that I can analyse the case.

Kurt_Matthys · March 22, 2022, 5:49pm

sample file for libre support.odt (21.4 KB)

second sample file for libre support.odt (13.4 KB)

ajlittoz · March 22, 2022, 6:17pm

The concordance file must be stored as a plain text file. You created it with Writer. Writer saves its .odt files as a zip-compressed one. Inside it, you have a directory made of several ancillary files.

When you open such a file, even after changing its extension to .sdi, it remains a zip file which is not interpreted as plain text (or rather, it is accepted as garbage).

To fix your problem, open your concordance file in Writer. Select all and copy. Paste clipboard contents into a text editor (I don’t know if Notepad qualifies as a simple text editor or a document app) and save it.

In the index dialog, use the file drop-down menu to open the file you just created. Now, your index is built as you expect it.

Kurt_Matthys · March 22, 2022, 7:05pm

I think that I had done what you suggested before. I had to store the sdi file as odt because I couldn’t upload an sdi file. Earlier, Writer opened the original text file and added all the semicolons and the zeroes and saved it.

However, I started again as you suggested. I have uploaded a screen capture of the file in edit mode from the index UI. As you can see, the words are there even though they look all screwed up, I think because of the font. Anyway, they all look like they are there. I named the file that I created via Notepad, which is just a simple text editor as far as I know, and named it Word Index.sdi. So, when I opened up the .sdi file from the Edit Index UI, it shows the file with the words. There’s no save button on that UI that I can see though. When I click OK, my document doesn’t get updated. When I then click on Update Index in the Index entity in my document, nothing happens. I’m sorry if I’m really stupid but I still can’t get it to work.

ajlittoz · March 22, 2022, 7:17pm

There may be another issue at stake: character encoding. Writer expects a Unicode plain text file. Notepad may output the file as some ISO-8859-x or Windows-legacy CPxxx encoding. See if you can force Unicode output (I have no Windows machine, I can’t check this hypothesis).

You can also try a much simpler procedure than the convoluted one I described in the answer. Open your .odt concordance file with Writer. Verify the words are correct. File>Save As and select Text (.txt) in the Filter: menu below the file list. This will create a Unicode plain text file (don’t forget to give it .sdi extension).
Use this file when creating your index.

Kurt_Matthys · March 22, 2022, 7:54pm

SDI file opened in Writer.odt (13.4 KB)

We’re making progress. If you look in the first file you’ll see that there are no question marks in the words. And in the second file I’ve done what you suggested in your last response and I’m now getting an index. However, as you can see, the words have question marks in this file. The only three entries in the generated index are ones that don’t have question marks.

I tried saving it as 8 bit and also as 16 bit unicode text. The 8 bit one gives the question marks. The 16 bit one gives garbage. I think that you are right in that it’s character encoding, but the sdi file opened in the Index UI in my document, and also in Notepad show question marks. The sdi file before I saved it in unicode format looks fine. Thoughts? Thanks for all your help.

ajlittoz · March 23, 2022, 9:15am

Since you seem to be in trouble getting the concordance file correct, I attach the one I extracted from your previous attachment. It works fine here. Check it anyway.

second sample file for libre support.odt (2.1 KB)

Change extension from .odt to .txt.

mikekaganski · March 23, 2022, 9:43am

This is not correct.
Before the fix to tdf#106899, writer used current system encoding (usually UTF-8 on non-Windows systems; almost never UTF-8 on Windows - unless one uses some experimental mode). Now after the fix (7.3.1+), Writer tries hard to detect the encoding.

Just to avoid confusion

Kurt_Matthys · March 23, 2022, 6:48pm

I’m still stuck. I saved the file you sent, pasted it without opening it in the right directory, and renamed it as .txt. When I go to the Index UI, I can’t open it because the UI only allows .sdi files. When I changed it to .sdi, it opened it but it doesn’t have the question marks, but it does have lots of weird characters, which I assume is a character encoding issue since my words are transliterated Sanskrit. However, when I ok it, I didn’t get any index at all. I took a screen shot of what I see in the Edit screen. Then I added the word ‘senses’ at the beginning to see if I got anything. I get an index but only for the word ‘senses’. Thoughts?

ajlittoz · March 23, 2022, 7:00pm

Something is broken on your machine (by the way, you didn’t mention your OS name and LO version) because the file you sent opens like a charm on my Linux machine.

If you still have “strange” characters in the list, there is definitely a problem of encoding. The file I sent you is UTF-8. Apparently your Windows expects some other encoding (see @mikekaganski’s comment). The differences may occur on accented letters because these can rarely be translated into 256-character sets. The accented characters are replaced with question marks and this ?-sequence is looked for in your document. This is why “senses” (no accented character, only ASCII) is found and creates an index entry. The sequences with “?” don’t occur in your text and won’t create entries.

If you can’t solve this encoding issue, the workaround is to manually add the index entries.

Kurt_Matthys · March 23, 2022, 7:12pm

I’m running Windows 10 Pro. I don’t know how to get the version of the OS. I tried doing a copy/paste from my original list of transliterated words, but I still get the question marks in the new words and no index for them. I still get the index for ‘senses’ which I left there so I know if it ‘worked’ or just failed completely. So I can’t copy/paste and I don’t know how to type them since I just copied them from another web page. Am I just screwed here?

ajlittoz · March 23, 2022, 7:37pm

If you copied from a web page, you may have yet another encoding conversion.

I was asking about LO version. Mike Kaganski mentioned a bug report affecting Windows users. A fix was applied in version 7.3.1 and 7.4.0 but these are still development versions.

My suggestion:
Find the first occurrences of the concordance file words in your document. Select the word and insert an index entry (since the word is selected, it will be entered as key). Tick the box Apply to all similar texts to index all other occurrences.

This will bypass the concordance file to the cost of manually inserting the entries.

EarnestAl · March 23, 2022, 8:38pm

I can confirm that LibreOffice 7.2.5.2 will accept only ANSI encoded concordance file. All other encoding is not added to Index. Of course, then some unicode characters are not saved (replaced with ?). Notepad gives me a warning
UnicodeAsANSI

If I switch to LibreOffice 7.3.1.3 then the file provided by @ajlittoz , with extension renamed to .sdi, works correctly with full index. Cheers, Al

Version: 7.3.1.3 (x64) / LibreOffice Community
Build ID: a69ca51ded25f3eefd52d7bf9a5fad8c90b87951
CPU threads: 8; OS: Windows 10.0 Build 22000; UI render: Skia/Vulkan; VCL: win
Locale: en-NZ (en_NZ); UI: en-GB
Calc: CL

BTW 7.3.1.3 is Fresh, not Development.

Kurt_Matthys · March 24, 2022, 2:15am

The version of Libre that I have installed is 6.4.7.2. The version that’s available is 7.2.5.2. When will 7.3.1.3 be available for download?

I did try doing them one at a time, and it does work, but that’s going to be really painful since I’ve got over 100 words to do. If 7.3.1.3 is available somewhere, I’ll download that and try it. Can you tell me where I can get it?

EarnestAl · March 24, 2022, 2:29am

I stayed with 6.4.7.2 for quite a while too. I now use 7.2.5.2 for work and have installed 7.3.1.3 in parallel for testing but it seems quite stable.

7.3.1.3 has been available for around a month now, Download LibreOffice | LibreOffice - Free Office Suite - Based on OpenOffice - Compatible with Microsoft

There are two versions on the page: 7.3.1 which is regarded as Fresh, and 7.2.6 which is regarded as Stable.

Kurt_Matthys · March 24, 2022, 2:33am

I’ve downloaded it and I’ll try installing it tomorrow. It’s late here and I need some sleep. Thanks very much, both of you!

Kurt_Matthys · March 24, 2022, 1:36pm

It works!!! Thank you so very much!!!

I do have another question dealing with this though. While it does alphabetize correctly according to the ASCII codes, if you look at the attached screen shot you can see that, due to the special characters, words that start with ‘ā’, etc. come after words that start with ‘y’. Is there a way that I can manually, or automatically, reorganize the list so that the ‘ā’ words are in order with the words that start with ‘a’? I can’t find out how to do this in the help.