Ask Your Question
0

Preserve text and vector images when printing to file [closed]

asked 2014-05-26 16:17:06 +0200

zak-mckracken gravatar image

updated 2015-09-05 23:56:06 +0200

Alex Kemp gravatar image

I've got a rather roundabout process (in LO 4.1 but used since 3.x) to have EPS images be preserved as vector images in a PDF: Print to PS file and then convert to PDF using ps2pdf -dAutoFilterColorImages=false -dColorImageFilter=/FlateEncode -- the regular PDF-export cannot be convinced not to convert them to nicely anti-aliased raster images (whose resolution I have no control over) that of course look crappy in print.

So far so good, but now (with printing already underway) I realized that text in the PDFs (already in the PS files) has been turned into vector objects and can neither be marked and copied nor searched for. This is only true for some fonts (Bitstream Vera becomes vector drawings, Linux Biolinum and Helvetia don't!). Sadly, I can't change the font types now as that would mess up the layout.

=> Does anyone know a way to keep EPS images as vector objects, PNG images as uncompressed bitmaps and characters as characters? Or is there a way to add the lost information to the PDF after the fact? Some sort of OCR process that works on vector images instead of scanned bitmaps?

edit retag flag offensive reopen merge delete

Closed for the following reason the question is answered, right answer was accepted by Alex Kemp
close date 2016-02-23 21:35:24.785214

2 Answers

Sort by » oldest newest most voted
1

answered 2014-05-27 03:07:51 +0200

ROSt52 gravatar image

updated 2014-05-27 03:14:34 +0200

As I am of the impression that you work on LInux, I searched the web with "free OCR linux" and got several hits. This one drew my attention: https://en.wikipedia.org/wiki/Tessera... because it can even Japanese. But there are many more.

The ABBYY Finereader @mahfiaz recommended works very nice. I don't have (yet) experience with Tesseract's or other OCR SW.

edit flag offensive delete link more

Comments

Tesseract is a back-end for OCR software, and it seems to be made to recognize characters in raster images. One software that builds on Tesseract is gOCR, but that requires images and will produce text files -- I have no images but a PDF file with glyphs, and I don't want a text file but a new PDF file with proper (searchable) characters.

zak-mckracken gravatar imagezak-mckracken ( 2014-05-31 16:24:50 +0200 )edit
0

answered 2014-05-26 18:21:07 +0200

mahfiaz gravatar image

ABBYY Finereader would do that really well if you have spare bucks. Adobe Acrobat Professional would do that somewhat okay (also costs money). But please write a bug report and ask for improvements on this front.

edit flag offensive delete link more

Comments

There have been bug reports and feature requests since OpenOffice 2.x, along with promises of features. I have more or less given up hope by now. I used to have an openoffice.org account, I made a launchpad account to submit the same thing for LibreOffice, and now I'd have to make a bugzilla account for LO, while my hope keeps fading ... I've had lots of long discussions on mailing lists (and I hate mailing lists!) three years ago, and since then I've just used the workaround. It's easier.

zak-mckracken gravatar imagezak-mckracken ( 2014-05-31 15:11:20 +0200 )edit

Question Tools

1 follower

Stats

Asked: 2014-05-26 16:17:06 +0200

Seen: 696 times

Last updated: May 27 '14