Ask Your Question
0

text not formatting correctly

asked 2017-06-29 07:03:15 +0200

Bule gravatar image

OCR text is not fitting on page correctly. It is retaining the original line length of the scanned document. Have tried 'Clear Formatting' and 'Text Body' but to no avail.

Have tried to attach sample but am informed '>3 points required to upload files' whatever that means.

Below is an example of the problem:

A 26 inch diameter lift fan turning at 4500 RPM has a blade tip speed of 348 mph. For this reason it should be guarded by covering the duct with 1/2 inch grid screen wire. The wire should be attached well enough to suppor t a person who may fall against it.

This is happening in both LibreOffice and OpenOffice

edit retag flag offensive close merge delete

2 Answers

Sort by » oldest newest most voted
0

answered 2017-07-01 11:13:43 +0200

gabix gravatar image

updated 2017-07-01 11:18:56 +0200

Using Join broken lines of a paragraph in OOoFBTools, I successfully fixed your sample. You just need to switch the radio button Start a new paragraph when the following is detected to Sentences in paragraphs split on lowercase letters…. The fixed file is attached: Text Sample.odt. By the way, your sample contains non-breaking spaces instead of conventional ones. They may cause unwanted text behavior.

edit flag offensive delete link more
0

answered 2017-06-29 07:40:00 +0200

robleyd gravatar image

updated 2017-07-01 05:49:44 +0200

Turn on View | Nonprinting Characters (or Ctrl-F10) and you will see that the breaks are actually the end of paragraph (looks like a backward P, called a Pilcrow) as contained in the document produced by your OCR software. Effectively each line in the OCR document is treated as a paragraph .

You need to remove the paragraph markers where you don't want to start a new paragraph.

If this answer helped you, please accept it by clicking the check mark ✔ to the left and, karma permitting, upvote it. If this resolves your problem, close the question, that will help other people with the same question.

Update:

I looked at your sample file - the issue seems to be that the spaces are actually not a regular space but a non-breaking space (Ctrl-Shift-Space will create one)

edit flag offensive delete link more

Comments

It may be line breaks rather than paragraph breaks. I would recommend trying OOoFBTools to brush up scanned documents.

gabix gravatar imagegabix ( 2017-06-29 08:19:36 +0200 )edit

Thank you robleyd. I am conversant with the Pilcrow and that is not the problem.This I can delete but does not allow the text to flow properly, it still stays at the original break. If I then delete the next character it jumps down to the continuation text and deletes the first character there.

Hi gabix, will try the OOoFBTools and will report back.

Bule gravatar imageBule ( 2017-06-29 11:13:24 +0200 )edit

Perhaps you could upload a sample LibreOffice file and source OCR file to a service like dropbox and give us an URL to look at them?

robleyd gravatar imagerobleyd ( 2017-06-29 13:42:06 +0200 )edit

Thought problem was solved using OOoFBTools 'Join broken lines of a paragraph' unfortunately not so. Will upload a sample and post link. Many thanks for your help.

Bule gravatar imageBule ( 2017-07-01 04:57:51 +0200 )edit
Bule gravatar imageBule ( 2017-07-01 05:08:47 +0200 )edit

Hi Robleyd, Thank you for your input. I downloaded the 'Alternative Find and Replace' Extension and used this to replace the 'offending' character. All is now OK.

Bule gravatar imageBule ( 2017-07-01 10:57:38 +0200 )edit

…reposting as an answer.

gabix gravatar imagegabix ( 2017-07-01 11:10:54 +0200 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2017-06-29 07:03:15 +0200

Seen: 198 times

Last updated: Jul 01 '17