How do I remove many individual side text frames/boxes on many pages on the left edge?

EigentlichWizard · May 11, 2019, 10:59am

Hi!

I am on LO Writer 6.1.5.2.

I scanned a lot of pages (~500) with OCR into an Docx document. The perforation (big wholes on the left side; example ) were interpreted as text and were putted into text frames/boxes (~12 per side x 500).

Here are some example pages: Imgur: The magic of the Internet

Is it possible to remove these text frames not manually? Maybe by cutting off a certain area/edge of all pages?

Thanks in advance for any help.

Grantler · May 11, 2019, 7:51pm

Without macro I can’t see the possibility of mass deletion of frames. I could not find a proceeding on the Navigator which could do. So I propose another strategy. See answer.

Lupp · May 11, 2019, 9:29pm

If the artefacts actually are frames or a specific type of shapes which otherwise does not occur in the document, It’s simple to remove them by user code.
When scanning sheets having systematic defects of the given kind, it’s adisable to cover these defects with a (folded) strip of white paper.

Lupp · May 11, 2019, 9:36pm

If artefacts of the given kind actually are text frames or shapes of a specific kind, otherwise not occurring in the document, it’s easy to remove them by user code, because they are listed in either the .TextFrames property or as elements of the .DrawPage .

EigentlichWizard · May 11, 2019, 10:25pm

Thank you for this hint. Do you see a possibility to maybe “reduce” down to/by their position?

Grantler · May 11, 2019, 7:42pm

Scan your text into tif/png images, then you can cut the left margin in batch mode, in one step for many images. Probably XnView can do.

.

For better OCR let the images rip online by www.pdf24.org or OCR (non free) apps like Abbyy finereader or Iris. They do not set your text into frames but generate plain text, possibly preserve some text formats. Good OCR preserves paragraphs and does not set carriage returns / line feeds at the end of each text line. - It is inevitable to work on your ripped text for satisfactory finishing.

.

Right hand click on screenshot > show for better view.

EigentlichWizard · May 11, 2019, 8:07pm

Wow! This could work! Thank you so much for your answer! I will try it and will come back with a result. God sent yo! It is a shame I don’t have enough credit to upvote your answer. Will do it asap if I am able to do it! One question: Where do I find the “Stapelverarbeitung” and how can I select more than one grafic in Writer?

Lupp · May 11, 2019, 10:41pm

(In Reply on the comment by the OQ answering my second comment on his question:)
TextFrames and Shapes know a lot on what you can base a decision whether or not they should be disposed. There is a .Anchor e.g. which is a TextRange object. As any objects in LibO text documents, they dont know about the page they are placed in.
You may play with this simple example.

EigentlichWizard · May 11, 2019, 11:50pm

Do you may know where in the file these information are? Would you may take a look into the original docx file? https://send.firefox.com/download/ba53555b9bad1f6b/#BxWD-P4w05TJGeWuI87-LA Password: Heaven123!

Please tell me what you may can identify.

Grantler · May 12, 2019, 8:28am

I checked your DOCX file. You have to save it as an ODT file. Then rename extension to ZIP. You can open it as a ZIP file. Work on CONTENT.XML. You find several “tags” beginning with <draw:text-box> - okay, delete them. Godspeed.

.

cover these defects with a (folded) strip of white paper. (Lupp)

+1

.

Your undertaking is interesting but has to be planned from the beginning with lots of details. IMHO your DOCX files are the wrong way for effective work.

.

By the way: Why don’t you ask the question in the German branch of this website: https://ask.libreoffice.org/c/german/6 - Lupp and I seem to be bloody Germans as well as you are one .

Lupp · May 12, 2019, 9:11am

@EigentlichWizard: I just tried to get the file you had uploaded somewhere and linked to in your comment above, but the link had expired already.

@Grantler concerning “German branch”: Yes, it’s annoying to need to do all these foreign-language-handsprings even if all the (contributing) participants of a thread are Germans. On the other hand: The German branch has had many near-death-experiences already, and may be reserved to those few Germans, Autrichiens, de-CH friends, actually not being capable of discussing topics in English at all. The LibO project meanwhile is counting more than 200 locales. Valuable communities for every single one?
And: Labour invested into an English answer has a much larger scope of potential silent beneficiaries.
And Westerners should be interested in having (and still developing more) a common Lingua Franca using Latin letters. The next global alternative neither will be german nor русский nor català nor suomalainen. Guess the candidates yourself.

Grantler · May 12, 2019, 9:29am

OT - @Lupp - your reply concerning Lingua Franca makes sense, thanks a lot.