Print to PDF Using FoxIt in Writer - Text not Selectable

Ok, so I think I got an acknowledgement that the printing software is not the cause of the problem and so there is no reason to discuss it further.

The problem is some setting that is associated with the LibreOffice Writer document, not Writer itself, not the printing software, but the LibreOffice Writer document, is controlling the printing to make it an image and not text.

Do you have any idea what this setting might be? I can’t find it.

I already wrote:

So you don’t know what settings in LibreOffice could impact this? Here is a test file if you wish to play with it.

What are you going to look at?

Credit_Statement_Feb_2022_.doc (27.5 KB)

First of all, I opened that file in LibreOffice, and printed it to a virtual PDF printer (Microsoft Print to PDF). And it printed as text, not as image.

Here is the PDF (renamed to DOC to allow attaching here).
problem.pdf.doc (197.8 KB)

Version: 7.3.1.3 (x64) / LibreOffice Community
Build ID: a69ca51ded25f3eefd52d7bf9a5fad8c90b87951
CPU threads: 12; OS: Windows 10.0 Build 19044; UI render: default; VCL: win
Locale: en-US (ru_RU); UI: en-US
Calc: CL

So the next would be to install Foxit PDF printer to test.

And here is the result of printing to Foxit PDF Editor Printer Version 11.2.1.4537.

Credit_Statement_Feb_2022_.pdf.doc (57.2 KB)

I can’t repro the problem - so either the attached file is not problematic, or something else is affecting the result…

Importantly, it is Not a Writer document, it is Word Document 8 so it looks like a conversion issue.

I could not reproduce @mikekaganski pdf even after installing 7.3.1.3 (and turning off Skia). However, I see there is a white page background and I think this is what is blocking the text underneath from being able to be selected as well as increasing file size.

Click Format > Page style > Area and click the button None (even though it appears to be selected already). Print to PDF without problem to get text

Or instead in the Print dialogue, in the Writer tab, untick Page background to prevent it from being printed which I suspect Mike has unticked as default.

Interesting!
I have tried under safe mode, and I repro the raster image in it. I will try to compare my profile settings with the default ones, to see what could be the affecting difference.

Could you also provide YOUR pdf created wuth foxit, so we can check it with the findings of @EarnestAl

Maybe the text is acually there in your pdf, but not selectable in your viewer, because of another invisible object at the same place.

I’m asking about settings in the tool, Libreoffice. If you don’t know, that’s an ok answer.

The file I provided does show the problem. The file you linked to above is the .doc file, not the .pdf file. Is that what you intended?

@EarnestAl that seems to have done the trick. What exactly is the Area feature for? It’s not clear to me exactly what it does or how it is used. Is it supposed to provide a background? But somehow it is in the foreground in the PDF?

Now for the findings.
The two following lines in registrymodifications.xcu make LO print the file as vector for me:

<item oor:path="/org.openoffice.Office.Views/Windows/org.openoffice.Office.Views:WindowType['swriter/10336']/UserData"><prop oor:name="Data" oor:op="fuse" oor:type="xs:string"><value>V2,V,0,AL:(5,16,0/0/260/450,260;683)</value></prop></item>
<item oor:path="/org.openoffice.Setup/Office/Factories/org.openoffice.Setup:Factory['com.sun.star.frame.StartModule']"><prop oor:name="ooSetupFactoryWindowAttributes" oor:op="fuse"><value>625,65,1129,677;1;0,0,0,0;</value></prop></item>

No idea yet what they mean.

This kind of answer to a specific question intended to allow helping you show that I was right initially. Well, sigh.

@Wanderer That appears to be what is going on. Here is a bad PDF file.

I just found out why Mike was posting PDF files as .doc files.

Credit_Statement_Feb_2022__.pdf.doc (135.5 KB)

This file was created in Microsoft PDF Writer

@Mike What were you right about exactly? This is past me.

tdf#147811

1 Like

Area gives a background. You have stumbled across a problem with the conversion process from Word document. The proprietary .doc format is not disclosed by Microsoft so conversion has been a process of discovery. The background from Word has not been converted exactly to the equivalent LibreOffice Area. This can be seen in the Area tab where although None is selected already, a colour palette with ffffff (white fill) is displayed.

Instead of pressing None, you could add any background available in the LibreOffice dialogue box under Area and you would still get text in your printed pdf because it would replace the Word layer with a background in the background

Sorry Al, I don’t follow that. When is file is updated (very infrequently), a new copy is used. I checked and the old copies also have the Area set to Color.

What would be different that having the Area set to Color would make one file mess up and the other file not?

Just to be clear, this file has been in used since 2001 and it was created in LibreOffice, not Word if that makes any difference.

So when you say the background area was not converted properly from Word, I don’t follow that. It has always been LibreOffice generating the files. So if something is wrong with the file format, it is LibreOffice not reading properly what LibreOffice has written.

More importantly, LibreOffice has been writing this file with the Area set to a color since 2001, and always interpreting the command correctly when printing. But something happened with this latest version of the file that it did not interpret the setting correctly when printing.

Perhaps you are aware of this, but I just noticed, LibreOffice does not keep the page setting as None. I change it to None, print and it prints correctly. Save it and when reopened, the Area setting is back to Color. I think this pretty clearly points to a bug in the file format between writing and reading.

So, seems to me there are two bugs. One is that the Area set to Color is mucking up when printing. The other is reading and writing .doc formats is not consistent.

I did find that using the .docx format will save the Area as None properly. When the Area is set to Color, it also prints incorrectly.

It is a problem with using a proprietary closed foreign format with undocumented structure to save your work in.

Always save as native ODF (.odt) and only Save As .doc or .docx (better) for export if somebody cannot open the native file format. Do not build on the exported .doc file.

It would be a problem if the file were being read or written by Word. If LibreOffice is writing the file, it certainly should be able to read and interpret the file is just wrote! Matching the Microsoft specification doesn’t matter if Microsoft is not involved. What matters is that LibreOffice interprets the file the same way it wrote it!!! This is no different from someone not being able to read their own handwriting. It doesn’t matter how much it looks like the writing standards, you should be able to read what you write!

How to get LibreOffice to fix these two bugs?