Print to PDF Using FoxIt in Writer - Text not Selectable

@EarnestAl that seems to have done the trick. What exactly is the Area feature for? It’s not clear to me exactly what it does or how it is used. Is it supposed to provide a background? But somehow it is in the foreground in the PDF?

Now for the findings.
The two following lines in registrymodifications.xcu make LO print the file as vector for me:

<item oor:path="/org.openoffice.Office.Views/Windows/org.openoffice.Office.Views:WindowType['swriter/10336']/UserData"><prop oor:name="Data" oor:op="fuse" oor:type="xs:string"><value>V2,V,0,AL:(5,16,0/0/260/450,260;683)</value></prop></item>
<item oor:path="/org.openoffice.Setup/Office/Factories/org.openoffice.Setup:Factory['com.sun.star.frame.StartModule']"><prop oor:name="ooSetupFactoryWindowAttributes" oor:op="fuse"><value>625,65,1129,677;1;0,0,0,0;</value></prop></item>

No idea yet what they mean.

This kind of answer to a specific question intended to allow helping you show that I was right initially. Well, sigh.

@Wanderer That appears to be what is going on. Here is a bad PDF file.

I just found out why Mike was posting PDF files as .doc files.

Credit_Statement_Feb_2022__.pdf.doc (135.5 KB)

This file was created in Microsoft PDF Writer

@Mike What were you right about exactly? This is past me.

tdf#147811

1 Like

Area gives a background. You have stumbled across a problem with the conversion process from Word document. The proprietary .doc format is not disclosed by Microsoft so conversion has been a process of discovery. The background from Word has not been converted exactly to the equivalent LibreOffice Area. This can be seen in the Area tab where although None is selected already, a colour palette with ffffff (white fill) is displayed.

Instead of pressing None, you could add any background available in the LibreOffice dialogue box under Area and you would still get text in your printed pdf because it would replace the Word layer with a background in the background

Sorry Al, I don’t follow that. When is file is updated (very infrequently), a new copy is used. I checked and the old copies also have the Area set to Color.

What would be different that having the Area set to Color would make one file mess up and the other file not?

Just to be clear, this file has been in used since 2001 and it was created in LibreOffice, not Word if that makes any difference.

So when you say the background area was not converted properly from Word, I don’t follow that. It has always been LibreOffice generating the files. So if something is wrong with the file format, it is LibreOffice not reading properly what LibreOffice has written.

More importantly, LibreOffice has been writing this file with the Area set to a color since 2001, and always interpreting the command correctly when printing. But something happened with this latest version of the file that it did not interpret the setting correctly when printing.

Perhaps you are aware of this, but I just noticed, LibreOffice does not keep the page setting as None. I change it to None, print and it prints correctly. Save it and when reopened, the Area setting is back to Color. I think this pretty clearly points to a bug in the file format between writing and reading.

So, seems to me there are two bugs. One is that the Area set to Color is mucking up when printing. The other is reading and writing .doc formats is not consistent.

I did find that using the .docx format will save the Area as None properly. When the Area is set to Color, it also prints incorrectly.

It is a problem with using a proprietary closed foreign format with undocumented structure to save your work in.

Always save as native ODF (.odt) and only Save As .doc or .docx (better) for export if somebody cannot open the native file format. Do not build on the exported .doc file.

It would be a problem if the file were being read or written by Word. If LibreOffice is writing the file, it certainly should be able to read and interpret the file is just wrote! Matching the Microsoft specification doesn’t matter if Microsoft is not involved. What matters is that LibreOffice interprets the file the same way it wrote it!!! This is no different from someone not being able to read their own handwriting. It doesn’t matter how much it looks like the writing standards, you should be able to read what you write!

How to get LibreOffice to fix these two bugs?

It is incorrect. Though it is true that if LibreOffice writes the file, it should be able to read it, it does not imply that it must be able to read it back as it was, without data loss.

Consider this example (that I know very well, fixing some bugs around the feature).
In Writer and in ODF, there is no limit about position of page borders, other then being non-negative. So you are able to create, say, margins of 5 cm, and borders with padding of another 5 cm, making effective margin of 10 cm, and border offset 5 cm. OTOH, Word (and its binary file format) does not allow you to offset borders more than 31 pt (~11 mm) from either page edge or text. So if you create the abovementioned 5 cm+border+5 cm layout in Writer, it is entirely possible; but when you decide then to save it to DOC, Writer will not be able to save it as it was created (just because the file format has no way of storing that); thus it will not be able to restore it in the initial form when read later. We will try our best to make the result as close to original as possible, but in this case, the border will become 11 mm from text. Any feature that is present in Writer, that is not available in the target file format, will need to be approximated on write, and then there’s no way to restore the original information on read.

Consider another, extreme example. TXT is just another foreign file format. It is even more restricted than DOC. You would not expect Writer to restore page format, text styles, font sizes, tables, etc. when you use TXT, would you?

It is interesting that you say that you were using the file in question for some time without this problem; indeed, that needs investigation (what has happened? Had you added some little thing to the file, that made the problem appear? Or had you upgraded LibreOffice in the meantime?) But while the problem you show looks indeed a bug, I just wanted to repeat once again, that it is incorrect in the general case to expect any program to be able to write and read its own data into foreign file formats without data loss.

Additional consideration WRT use of external file format, and expecting consistency with its handling.
Any external/foreign file format support is necessarily incomplete, and imperfect. That, in turn, means that there is everlasting work to improve such format support in the program. And any improvement in that area may result in changed way or importing some features of that format - which might mean that something that we wrote into the format in a previous version, starts imported in a different way in a later Writer version, because that allows us to map that format feature better to our features. So here may be another source of inconsistencies of “what we read is not what we wrote”, which would not be a bug at all (other than known worse format support in earlier versions).

FTR: I made a brief test using versions 3.3.0.4, 4.0.0.3, 5.0.0.5, 6.0.0.3, and 7.0.0.3, and all them print your sample as raster. So it is unlikely that this is an import regression (although I didn’t test those multiple versions released between the mentioned ones); it is still possible that this is maybe an export regression (which can’t be ruled out without a file saved in a previous version, that was working fine there).

I don’t want to argue with you about this. I don’t see this as a matter of data loss, unless you are saying there is no way to indicate that this background Area feature is turned off in the .doc file format? It would see just the opposite that if the format does not support the Area feature, it would not be possible to indicate it was on.

Regardless, Al has been indicating this is a problem with understanding the .doc file format. That’s not the problem you are describing. You seem to be saying there simply is no support for the feature, so it mucks up.

The fact that earlier versions of this same file, under my current version of Writer, will print correctly with the Area set to Color is correct. I verified this before I made my previous post. I am running
Version: 7.2.2.2 (x64) / LibreOffice Community
Build ID: 02b2acce88a210515b4a5bb2e46cbfb63fe97d56
CPU threads: 16; OS: Windows 10.0 Build 19042; UI render: Skia/Raster; VCL: win
Locale: en-US (en_US); UI: en-US
Calc: CL

I am uploading copies of both the .doc file and the .pdf file.
Credit_Statement_2106__ - Copy.doc (26.5 KB)
Credit_Statement_2106__ - Copy.pdf.doc (52.5 KB)

Heh, fun. I tried the Credit_Statement_2106__ - Copy.doc, and it prints raster here with both 7.3.1.1 and 7.2.0.4 - both using clean profiles.

Just to be clear, there are two separate bugs, no? One is the failure to allow the Area feature to be “controlled” when using the .doc file format, which may be repairable or not. The other bug is the failure to properly print the document when the Area is set to Color, but only in the one file and not previous versions of that same file.

This is no more correct. FTR, MS does a great job nowadays publishing their file formats; [MS-DOC] is here.

This is tdf#124548, and is unrelated to the format, and is only a cosmetic issue of “Focus vs Selection”.

I do not quite understand what you are talking about here. What specifically do you mean? You can control this AFAICT. Or what are the steps to see this problem?

It’s the same problem we’ve been talking about. I’m saying the feature can be set, but it can’t be saved. Every time the file is opened it Area is set to Color.

I’m just making the distinction that one problem is the inability to set and save this feature. The other problem is when this feature is set to Color, the PDF printing mucks up.

Ah! Now I see. Yes, reproducible; it indeed must be considered a separate bug, maybe until we find them to be caused by some single root cause.

tdf#147819

It is different. It is more like translating in a different language, but not to close to your own - say french to german or french to a egypt. Losses and imprecise meaning occur. As a human being I may remember my translations and get the right text back an algorithm for translation won’t manage, if it is not cheating (like integrating the whole odt in a pdf during export to simulate “editable” pdf.)

The best tool to handle .doc/.docx is Microsoft-Office and even they had problems in several newer versions to migrate older documents. (I had to help sometimes to integrate “compatibility packs”…)

If you need to convert to MS-Files, keep originals in odf-Formatting and edit only theese. So you can avoid adding “multiple translations”.

As you seem to believe only 100% perfect support of all M$-Files during import/export is acceptable you should assume LO as not supporting M$. You may claim your money back.