Preserve text and vector images when printing to file

zak-mckracken · May 26, 2014, 2:17pm

I’ve got a rather roundabout process (in LO 4.1 but used since 3.x) to have EPS images be preserved as vector images in a PDF:
Print to PS file and then convert to PDF using ps2pdf -dAutoFilterColorImages=false -dColorImageFilter=/FlateEncode – the regular PDF-export cannot be convinced not to convert them to nicely anti-aliased raster images (whose resolution I have no control over) that of course look crappy in print.

So far so good, but now (with printing already underway) I realized that text in the PDFs (already in the PS files) has been turned into vector objects and can neither be marked and copied nor searched for. This is only true for some fonts (Bitstream Vera becomes vector drawings, Linux Biolinum and Helvetia don’t!). Sadly, I can’t change the font types now as that would mess up the layout.

=> Does anyone know a way to keep EPS images as vector objects, PNG images as uncompressed bitmaps and characters as characters? Or is there a way to add the lost information to the PDF after the fact? Some sort of OCR process that works on vector images instead of scanned bitmaps?

ROSt53 · May 27, 2014, 1:07am

As I am of the impression that you work on LInux, I searched the web with “free OCR linux” and got several hits. This one drew my attention: Tesseract (software) - Wikipedia because it can even Japanese. But there are many more.

The ABBYY Finereader @mahfiaz recommended works very nice. I don’t have (yet) experience with Tesseract’s or other OCR SW.

zak-mckracken · May 31, 2014, 2:24pm

Tesseract is a back-end for OCR software, and it seems to be made to recognize characters in raster images. One software that builds on Tesseract is gOCR, but that requires images and will produce text files – I have no images but a PDF file with glyphs, and I don’t want a text file but a new PDF file with proper (searchable) characters.

mahfiaz · May 26, 2014, 4:21pm

ABBYY Finereader would do that really well if you have spare bucks. Adobe Acrobat Professional would do that somewhat okay (also costs money). But please write a bug report and ask for improvements on this front.

zak-mckracken · May 31, 2014, 1:11pm

There have been bug reports and feature requests since OpenOffice 2.x, along with promises of features. I have more or less given up hope by now. I used to have an openoffice.org account, I made a launchpad account to submit the same thing for LibreOffice, and now I’d have to make a bugzilla account for LO, while my hope keeps fading …
I’ve had lots of long discussions on mailing lists (and I hate mailing lists!) three years ago, and since then I’ve just used the workaround. It’s easier.

AlexKemp · February 23, 2016, 8:35pm