Edit Searchable PDF in Draw

Hi,

I was locking for a tool to edit the poorly recognized PDF I scanned in. I scanned it with a OCR tool which didn’t allow me to to a proper review of the recognized text. So I was locking for a tool where I can change the recognized text in that document and save it again as a searchable PDF with the Image as background and the overlaying text “invisible”.

Draw can import those PDF files and allows me to edit the corresponding text fields. Now I have two problems:

  1. The font is not available.

This is not a problem because the fallback font seems to fit way better. The original font is “Time New Roman” which is not availabe but shown in the font selector. Hovering the font selector it blends in the mouseover-text “Font Name. The current font is not available and will be substituted.” but I can’t figure out which text it uses as fallback but I’d like to use it for all the text in this document.

Sow how can I see which Font is actually in use?

  1. Export as searchable PDF again

Saving this document as a searchable PDF again results in something different that I want. The recognized text in the text fields is shown above the image text. I’d like to have it like a real searchable PDF.

Any suggestions?

I think, you are able to do this work with LibO and export to pdf at the end. A file exported from LibO to pdf can be searched for strings.

What your OCR can do I cannot judge but I expect it can create a kind of text file from which you can copy & paste text into LibO.

As you want to have images in the background, Draw appears to be a choice do to different layers you can use. one layer for images and another layer for text. Thus, you only need to copy&paste the OCR text into text boxes of Draw and make the text correction and your texts are done. Text and images can be moved around the page independently because they are in different layers.

Saving I would do in odg-fromat (native format of Draw) and export the file thereafter to pdf. By doing so you always can modify your file easily in Draw and create with a click of a button an new version in pdf-format.

As for font recognition there are homepages, which can help you to identify fonts. I used once: Identifont - Find Replace

The idea works, but LO has bugs and faults at understanding certain characters, as demostrated in https://bugs.freedesktop.org/show_bug.cgi?id=85174

Thanks. But that doesn’t really apply to my question. LibO takes a another font for the one it does not have. I want to know whicht one it is in that case but I cant figure out because it shows onle the name of the font from the file wich is not available.

“…because it shows onle the name of the font from the file wich is not available.” this problem exist in all text SW. Therefore I provided you the link for font identification.

I know. But under this link it identifies (or at least tries to) the font used in the original document but not the one LibO chose instead.
And that all text SW does behave like this, doesn’t make it better.