I have created a document by scanning pages from various printed documents. As a result, there are paragraph breaks ‘hard coded’ where I do not want them. When I delete them, words split (see below for an example). I noticed that the style for the scanned text was Pre-formatted. So I changed it to Default using Edit>Replace. However, it made no difference to the word splitting. Any ideas as to why this happens and how to correct so that the text word wraps properly?

PS I notice a previous question saying that paragraph breaks (generated by copying text from other documents) should be removed by Tools>Autocorrect but the solution there didn’t work for me.

I forgot to mention that there are line breaks (without splitting a word) in the middle of lines. This is also shown in the example quoted.

I’ve just installed LibreOffice and still the same problem. To me, this seems like a bug. Is it?

The example (as initially displayed) does not indicate the splitting of words. To be clear, there is a carriage return (manual break) between “2♦” and “response.” and when I edit your post it seems the word “cards” is split, such that it displays as “6+ card” and then “s and a new” on the next line. Is this correct? I will amend your post to improve the formatting if this is the case, but want to be certain I have this correct first.

It’s difficult to show the presence of non-breaking spaces on text pasted into this website’s questions & comments. I’ve only just realised that files can be attached. I’ve attached a screenshot of LibreOffice Writer using a small file containing just one paragraph.

I think there is a bug in the way that non-breaking space characters cause arbitrary line feeds on very long lines. If they worked correctly, then offending paragraphs would only be on one line rather than multiple lines.

Please see fdo#68924 for more detail.

You should now have enough karma to attach files to your posts. From the bug report, it is now obvious that the problem is one of the no-break space (U+00a0) being used extensively throughout the OCR text. There should be a setting in your OCR software where you can adjust this, but if not, you will be faced with performing a global find/replace. The Unicode line-breaking algorithm is a complex piece of logic with sometimes unpredictable results.

