Text not formatting correctly

OCR text is not fitting on page correctly. It is retaining the original line length of the scanned document. Have tried ‘Clear Formatting’ and ‘Text Body’ but to no avail.

Have tried to attach sample but am informed ‘>3 points required to upload files’ whatever that means.

Below is an example of the problem:

A 26 inch diameter lift fan turning at 4500 RPM has a blade tip speed of 348 mph. For this reason
it should be guarded by
covering the duct with 1/2 inch grid screen wire. The wire should be attached well enough to suppor
t a person who may fall against it.

This is happening in both LibreOffice and OpenOffice

Turn on View | Nonprinting Characters (or Ctrl-F10) and you will see that the breaks are actually the end of paragraph (looks like a backward P, called a Pilcrow) as contained in the document produced by your OCR software. Effectively each line in the OCR document is treated as a paragraph .

You need to remove the paragraph markers where you don’t want to start a new paragraph.

If this answer helped you, please accept it by clicking the check mark :heavy_check_mark: to the left and, karma permitting, upvote it. If this resolves your problem, close the question, that will help other people with the same question.

Update:

I looked at your sample file - the issue seems to be that the spaces are actually not a regular space but a non-breaking space (Ctrl-Shift-Space will create one)

It may be line breaks rather than paragraph breaks. I would recommend trying OOoFBTools to brush up scanned documents.

Thank you robleyd. I am conversant with the Pilcrow and that is not the problem.This I can delete but does not allow the text to flow properly, it still stays at the original break. If I then delete the next character it jumps down to the continuation text and deletes the first character there.

Hi gabix, will try the OOoFBTools and will report back.

Perhaps you could upload a sample LibreOffice file and source OCR file to a service like dropbox and give us an URL to look at them?

Thought problem was solved using OOoFBTools ‘Join broken lines of a paragraph’ unfortunately not so.
Will upload a sample and post link.
Many thanks for your help.

Link to sample
https://www.sendspace.com/file/6y9w3q

Hi Robleyd,
Thank you for your input. I downloaded the ‘Alternative Find and Replace’ Extension and used this to replace the ‘offending’ character. All is now OK.

…reposting as an answer.

Using Join broken lines of a paragraph in OOoFBTools, I successfully fixed your sample. You just need to switch the radio button Start a new paragraph when the following is detected to Sentences in paragraphs split on lowercase letters…. The fixed file is attached: Text Sample.odt. By the way, your sample contains non-breaking spaces instead of conventional ones. They may cause unwanted text behavior.