Print to PDF Using FoxIt in Writer - Text not Selectable

I’m trying to understand what you are saying, but the reality is these documents are not printed to PDF as an image in the general case. They contain text with formatting which can be selected and copied from the PDF document. Is that not correct?

The image is formed by the PDF reader you choose, no?

Here, maybe this will help. The file that presents in the reader as an image is 1.4 MB. The file that presents as text is only 59 kB. So it would seem the text selectable PDF file is not an image at all. It contains text with font descriptions and location information.

I have seen PDF files where the software that generated it was not at all sophisticated and placed each and every letter as a separate entity. When the text is selected in the PDF viewer, the selection is not exactly contiguous, either selecting other text as if it were part of the string seen, or adding spaces between each letter, or both. This would seem to be further evidence that the PDF file is not an image file like a JPG or PNG, but a text file with formatting.

In general case, yes, you are correct: the functions used for printing may send text to the device using functions that take text. That way, the printer (in this case, a virtual printer) can then put the text to the resulting media (a PDF in this case) as text.

But in the specific case, it is obvious that the print procedure for some reason preprocesses the data into a raster image before sending to device. As said, it’s difficult to say what is the reason (which is your question, and which is what can only be answered having a sample of problematic document), but that’s different from idea that the problem is on Foxit side.

Yes and no.
PDF is a complex format, and it may contain many kinds of data: text, lines, raster images, fonts… And when displaying, your PDF reader indeed outputs that data, that way forming the final image. But it does different things to output the data: e.g., when it deals with embedded raster, it simply copies pixels from internal raster to the monitor (I’m simplifying). To show text, it would do more complex things, and that would depend on if that text refers to embedded fonts or no (but this is unrelated to this problem).
In your case, LibreOffice forms a raster image, sends it to the printer, and it has no text data, only a picture, which gets embedded to the resulting PDF. Then your reader will simply show that raster.

Ok, so I think I got an acknowledgement that the printing software is not the cause of the problem and so there is no reason to discuss it further.

The problem is some setting that is associated with the LibreOffice Writer document, not Writer itself, not the printing software, but the LibreOffice Writer document, is controlling the printing to make it an image and not text.

Do you have any idea what this setting might be? I can’t find it.

I already wrote:

So you don’t know what settings in LibreOffice could impact this? Here is a test file if you wish to play with it.

What are you going to look at?

Credit_Statement_Feb_2022_.doc (27.5 KB)

First of all, I opened that file in LibreOffice, and printed it to a virtual PDF printer (Microsoft Print to PDF). And it printed as text, not as image.

Here is the PDF (renamed to DOC to allow attaching here).
problem.pdf.doc (197.8 KB)

Version: 7.3.1.3 (x64) / LibreOffice Community
Build ID: a69ca51ded25f3eefd52d7bf9a5fad8c90b87951
CPU threads: 12; OS: Windows 10.0 Build 19044; UI render: default; VCL: win
Locale: en-US (ru_RU); UI: en-US
Calc: CL

So the next would be to install Foxit PDF printer to test.

And here is the result of printing to Foxit PDF Editor Printer Version 11.2.1.4537.

Credit_Statement_Feb_2022_.pdf.doc (57.2 KB)

I can’t repro the problem - so either the attached file is not problematic, or something else is affecting the result…

Importantly, it is Not a Writer document, it is Word Document 8 so it looks like a conversion issue.

I could not reproduce @mikekaganski pdf even after installing 7.3.1.3 (and turning off Skia). However, I see there is a white page background and I think this is what is blocking the text underneath from being able to be selected as well as increasing file size.

Click Format > Page style > Area and click the button None (even though it appears to be selected already). Print to PDF without problem to get text

Or instead in the Print dialogue, in the Writer tab, untick Page background to prevent it from being printed which I suspect Mike has unticked as default.

Interesting!
I have tried under safe mode, and I repro the raster image in it. I will try to compare my profile settings with the default ones, to see what could be the affecting difference.

Could you also provide YOUR pdf created wuth foxit, so we can check it with the findings of @EarnestAl

Maybe the text is acually there in your pdf, but not selectable in your viewer, because of another invisible object at the same place.

I’m asking about settings in the tool, Libreoffice. If you don’t know, that’s an ok answer.

The file I provided does show the problem. The file you linked to above is the .doc file, not the .pdf file. Is that what you intended?

@EarnestAl that seems to have done the trick. What exactly is the Area feature for? It’s not clear to me exactly what it does or how it is used. Is it supposed to provide a background? But somehow it is in the foreground in the PDF?

Now for the findings.
The two following lines in registrymodifications.xcu make LO print the file as vector for me:

<item oor:path="/org.openoffice.Office.Views/Windows/org.openoffice.Office.Views:WindowType['swriter/10336']/UserData"><prop oor:name="Data" oor:op="fuse" oor:type="xs:string"><value>V2,V,0,AL:(5,16,0/0/260/450,260;683)</value></prop></item>
<item oor:path="/org.openoffice.Setup/Office/Factories/org.openoffice.Setup:Factory['com.sun.star.frame.StartModule']"><prop oor:name="ooSetupFactoryWindowAttributes" oor:op="fuse"><value>625,65,1129,677;1;0,0,0,0;</value></prop></item>

No idea yet what they mean.

This kind of answer to a specific question intended to allow helping you show that I was right initially. Well, sigh.

@Wanderer That appears to be what is going on. Here is a bad PDF file.

I just found out why Mike was posting PDF files as .doc files.

Credit_Statement_Feb_2022__.pdf.doc (135.5 KB)

This file was created in Microsoft PDF Writer

@Mike What were you right about exactly? This is past me.

tdf#147811

1 Like

Area gives a background. You have stumbled across a problem with the conversion process from Word document. The proprietary .doc format is not disclosed by Microsoft so conversion has been a process of discovery. The background from Word has not been converted exactly to the equivalent LibreOffice Area. This can be seen in the Area tab where although None is selected already, a colour palette with ffffff (white fill) is displayed.

Instead of pressing None, you could add any background available in the LibreOffice dialogue box under Area and you would still get text in your printed pdf because it would replace the Word layer with a background in the background

Sorry Al, I don’t follow that. When is file is updated (very infrequently), a new copy is used. I checked and the old copies also have the Area set to Color.

What would be different that having the Area set to Color would make one file mess up and the other file not?

Just to be clear, this file has been in used since 2001 and it was created in LibreOffice, not Word if that makes any difference.

So when you say the background area was not converted properly from Word, I don’t follow that. It has always been LibreOffice generating the files. So if something is wrong with the file format, it is LibreOffice not reading properly what LibreOffice has written.

More importantly, LibreOffice has been writing this file with the Area set to a color since 2001, and always interpreting the command correctly when printing. But something happened with this latest version of the file that it did not interpret the setting correctly when printing.

Perhaps you are aware of this, but I just noticed, LibreOffice does not keep the page setting as None. I change it to None, print and it prints correctly. Save it and when reopened, the Area setting is back to Color. I think this pretty clearly points to a bug in the file format between writing and reading.

So, seems to me there are two bugs. One is that the Area set to Color is mucking up when printing. The other is reading and writing .doc formats is not consistent.

I did find that using the .docx format will save the Area as None properly. When the Area is set to Color, it also prints incorrectly.