Ask Your Question
0

How to copy/select all words from a PDF file (LibO-writer)

asked 2016-09-04 18:16:07 +0200

snn47 gravatar image

Is there a way to link e.g. by formatting, all words/sentences in a PDF file opened in in LibO-writer together? I often need to copy and correct garbled text in PDF files, which was generated by OCR. While I can copy text from a PDF file, the often mysterious way text and colums are somehow selected/bound together, make me copy Text all over the page over the page, but I found no way to select the next word/sentence in many cases. Since I found no way to correct text in Adobe I was astonished to see, that opening a PDF in LibO-writer, will show me the associated text. In most documents I found no way to copy more than a single line, at worst only a single word at a time, which is why retyping is still the most efficient way to copy text from PDF files that I found so far.

edit retag flag offensive close merge delete

Comments

(Not an answer, I know.)
I do not know a way to open a pdf in LibO Writer. Selecting a pdf from the 'Open' dialogue of LibO will open it in 'Draw'. I once wrote and tested a "macro" for the purpose of collecting the texts... Wouldn't recommend it.
I would actually prefer to run my OCR (ReadIris 14 in my case. Sorry, it has that idiotic ribbon-type interface) on that pdf and to rectify all the bad guesses concerning the object type in advance of starting the final recognition.

Lupp gravatar imageLupp ( 2016-09-04 20:08:02 +0200 )edit

PDF is good file format for presentation, but it's bad format for editing. Think you should try open these PDFs in Draw and edit words that are garbled there. Probably it wouldn't be so smooth as it would in Writer, but you shouldn't be trying to copy PDF that's been created with OCR to Writer as you would if you had two ODT files.

Kruno gravatar imageKruno ( 2016-09-05 09:14:55 +0200 )edit

2 Answers

Sort by » oldest newest most voted
0

answered 2016-09-05 08:35:54 +0200

floris v gravatar image

Opening the PDF in a PDF reader would probably be easier. It'd be still better to scan to a plain ASCII text file if you have that option.

edit flag offensive delete link more
0

answered 2016-09-05 13:50:24 +0200

Lupp gravatar image

updated 2018-09-11 17:22:37 +0200

===Edit1 2018-09-11===
I came back accidentally to this even older thread today and edited it attaching a demo I meanwhile had made which contains enhanced code concerning the task discussed here. Please visit that thread if interested.
===End Edit1===

Since the experimental subroutine mentioned in my comment to the question was still present in my collection of BASIC subs, I made a demo.
You may expand the attached zip-file and check the example files and the code. The SomeName.odt files were generated calling the Sub from an open 'Draw' file. If you actually decide to do things this way you will have to rework the code, however, to get a more-than-experimental version.
When the original 'Draw' file was first exported to pdf and then opened with LibO again (making a 'Draw' representation of it anew), the Sub created additional empty paragraphs. I did not analyse this phenomenon.
As this askbot does not allow to attach zip-files, I had to add a fake extension. Please remove the ".fake.odt" part after your download.
ask76555TextFromDrawShapesToWriterDoc_1.odg.pdf.zip.fake.odt

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2016-09-04 18:16:07 +0200

Seen: 734 times

Last updated: Sep 11 '18