Problems finding all occurrences of a text string

jeshkhol · April 1, 2022, 5:57am

While copyediting in a large ODT file, I have problems finding certain character strings that indeed are present in the file. I noticed this after doing a large series of text replacements using “Replace All” across the whole document – when done I thought a task was completed, but a few hours later I found out that at some places, the character strings that had been replaced elsewhere in the file had remained unchanged at other places.

A character string may not be found if the cursor is in a different section of the document, for example at the very beginning, when the search is entered in the Find field. Also, a character string in a footnote may not be found if the cursor is in the main text when the search is entered in the Find field. – I say “may not be found” because strangely, sometimes a character string may indeed be found even under these circumstances.

The file was created as a DOCX file, but when I received it I saved it as an ODT file to work with it. View > Formatting Marks is on so I can confirm it’s not some hidden character causing the issue. Also, no formatting or attributes are involved when doing the searches.

It’s irrelevant whether I use the Find toolbar or the Find & Replace dialog. The problem also occurs when none of the options in the Find & Replace dialog (incl. “Match case” and “Whole words only”) are selected.

The only workaround I have found so far is to save the file as HTML and open it in a browser, where the character string can easily be found. Then go back to the ODT file and go near the place where the character string is, using the vertical scroll bar. The character string can be found using the Find field, as expected.

Is this a bug, or am I missing an option in Preferences or the like, which would let Writer “Find All” (and “Replace All”) when told to do so?

Thanks
Sam

7.2.6.2
macOS Monterey 12.3

ajlittoz · April 1, 2022, 6:54am

Are you 100% sure that the occurrences don’t contain ancillary marks like ZERO-WIDTH JOINER or other special characters affecting word division or wrapping? The only way to ascertain is to save the file as .fodt (flat XML format) and to look at the offending occurrences in a text editor (but the special characters won’t display either) or a binary editor.
Also, if your target string contains accents (more generally diacritical marks), Unicode offers for some combinations two encodings: a “precomposed” glyph which is a single glyph integrating the base character and the diacritics) or a sequence of several codepoints as the base character followed by codepoints for the diacritics. Visually they are the same but the byte representations are different.
A possible other cause may be the approximations introduced by the .docx conversion to native format.

jeshkhol · April 1, 2022, 9:11am

Thank you for your thoughts.

Yes, I’m 100% sure that there are no ancillary marks: saving the ODT file as FODT and opening it in a hex editor I can see that the searched-for string and the target string are identical.
I’m aware of potential problems caused by precomposed characters; there are none involved here.
I’ve now backsaved the file as DOCX and opened it in LO; the same string is not found, so there is no difference.

I can find the target string in footnotes if the cursor is somewhere near the text above the footnote, but I cannot find it with the same search string if the cursor further away from it (say, at the beginning of the document).

mikekaganski · April 1, 2022, 9:18am

I suppose that it only could have a useful answer if you provide a sample file and a search string, so that it cold be reproduced and investigated.

jeshkhol · April 1, 2022, 10:20am

All the search strings that I used a few hours ago and yielded unreliable results back then are working perfectly fine now. The file is unchanged. And I’m seriously confused.

jeshkhol · April 2, 2022, 1:09pm

The problem is back again: I open a file and enter the string “xyz” in the search field. Although “xyz” is in the document (maybe in a footnote), I am told “Search key not found”. But if I go (via Edit → Go to Page…) to the page where I know the string is, and if I reenter the identical string of character as before in the search field, it is found in the document, as expected.

Now what is confusing is: Once the character string has been found in the document as described above, it will always be found – as long as the document is open. If I close the document and open it later, the same applies as described above.

sokol92 · April 2, 2022, 2:58pm

We are waiting…

jeshkhol · April 2, 2022, 4:29pm

I’m beginning to doubt that the file is the culprit here. I’ve done a great number of Find/Replace operations today and noted down each string that was not found (and which I only found using an HTML version of the file in a browser, as described above). And now that I want to upload the file I can find these strings without any problem … This issue is so very unpredictable.

Since I cannot simply upload the whole file here I need to delete large parts of it, and change the text. By doing so I may inadvertently also delete what causes the problem in the first place, making the sample file unusable for the purpose. As soon as I have prepared a sample file in which some text cannot be found I will upload it.

sokol92 · April 2, 2022, 4:38pm

I understand you very well, it’s a difficult task to catch and fix a “floating” error.

ChrisZ16 · April 3, 2022, 1:41am

You said it is a large file - what you describe sounds a little like the file not being fully loaded into Cache. That would explain, why you find the target when you are near it and why you find it on subsequent runs. Could that be the reason?

What happens, if you open the file, page through to the end, jump back to the beginning and then do the search?

jeshkhol · April 3, 2022, 5:36am

Thanks! The file not being fully loaded in the cache would explain this behaviour, absolutely. (And this makes filing a bug report and providing a sample file a difficult task.) Can I as a user do something about it (maybe change a setting)?

What I describe (an identical character string not being found at first, then being found without any problems) has happened many, many times with this large file so it is not a user error or a one-time thing.

I remember that I was not able to find a certain string even when I first went to the end of the file, or did “Find Previous” from there: I had to be near the location of the target string to be successful. Also, it concerns predominantly (or only?) target strings in the footnotes.

ajlittoz · April 3, 2022, 6:41am

You say you have a “large” file. What does this mean?

what is the size reported by the OS file browser?
in Writer, the bottom status bar displays a summary of pages, words and characters; what are the figures?
another relevant factor is whether you style your text: do you apply paragraph styles? character styles? how often do you direct format?

mikekaganski · April 3, 2022, 7:07am

There’s no “cache” of that kind in Writer.

jeshkhol · April 3, 2022, 7:36am

620 KB

381 pages
182,286 words
1,178,109 characters

Some paragraph and character styles are being used but very inconsistently so. A lot of the formatting is direct formatting.

ajlittoz · April 3, 2022, 9:29am

Apart from the number of pages, this looks to me like a “not-so-large” document (I have much larger ones, essentially due to graphics material). The critical factor is probably the amount of direct formatting.