Replace fonts in whole PDF through command line?

Hello,

To make PDFs easier to read on a 6" e-reader, I’d like to replace serif fonts with sans serif, eg. Verdana or Arial in the whole document.

Is there a way to do this through the command line?

Thank you.

mutool.exe info -F input.pdf
Retrieving info from pages 1-571...
Fonts (4):
        3       (22 0 R):       Type0 'AAAAAA+LiberationSerif' Identity-H (14 0 R)
        3       (22 0 R):       Type0 'AAAAAB+LiberationSerif-Bold' Identity-H (17 0 R)
        3       (22 0 R):       Type0 'AAAAAC+LiberationSerif-BoldItalic' Identity-H (20 0 R)
        5       (29 0 R):       Type0 'AAAAAD+LiberationSerif-Italic' Identity-H (27 0 R)

Edit: It looks like Draw seriously messes with the contents when editing and saving a PDF… :face_with_raised_eyebrow:

Draw is not a PDF editor. It opens PDF files as set of graphics objects. In PDFs, text is broken into a collection of text boxes, each containing a small number of homogeneously formatted character (all having the same properties), not exceeding one line so that there is no uncontrolled line wrap. All these boxes are directly positioned in the page without consideration for reading continuity (your eyes are fooled into thinking that words are sorted and ordered as you would have written them).

Things begin to get complicated when the necessary font(s) is (are) not embedded in the PDF document. The text boxes are still rendered individually without being moved. This is what causes the visual mess: PDF is a “frozen” format without flowing directives making impossible to reorganise the layout.

Replacing the fonts with Draw can be done individually on each text box. You can easily imagine the boring side of the procedure. I don’t think it is feasible from the command line considering how PDF is encoded. And you even have two encodings simultaneously in a single file: plain text and Base64, the latter hiding what’s inside until you decode it.

You also meet another issue: ideally, you should add font embedding for the used glyphs. This could be the simplest solution. But this requires to dig deeply into PDF format and specification.

I think this is far beyond Draw capabilities. I repeat: Draw is not a PDF editor, simply a special program to handle ODF graphics. It has filters to import other formats which are converted to ODF as much as possible. Don’t over-estimate Draw.

2 Likes

Thanks much. (why does this forum delete carriage returns?)

Alternatively, is there a tool (PyMuPDF?) that can 1) loop through every page, 2) loop through every “text object” within, and 3) edit its font name?

It doesn’t “delete carriage returns”. It follows HTML rules: spaces and carriage returns are considered whitespace. Any sequence of whitespace is collapsed into a single whitespace. Therefore, several carriage returns typed in the hope of vertically spacing paragraphs end up in a single CR, i.e. no extra space.

If you want some spacing between paragraph, enter <br> at start of paragraph.

No idea, because I usually avoid to edit PDFs or I convert them back (manually) into a Writer document, which requires to restyle the whole thing.

But you can do it in Draw. The Navigator will list all graphic objects. Right-click on the name and Edit. Very tedious anyway.

Thanks for the infos. I’ll see if it can done by going through each page, and each “text object” within