Problems with line feeds in case of exporting as pdf/a

Hello,

We use LibreOffice to convert Microsoft Office documents to pdf. We want to create pdf/a as default. We find out that there is sometimes a loss of linefeeds in the textlayer. That is not the case if we convert the document to normal pdf format. It makes no different if we convert the documents manually or by the UNO api.
Affected are all kind of Microsoft Office documents(Word, PowerPoint, Excel). Sometimes subtle, sometimes eye-catching. I attached a PowerPoint document there it is eye-catching. Converting this document for expample ti PDF/A-2b the whole text ist in one line if you copy the textlayer to an editor. It should be 5 lines.
Is that a bug in LibreOffice? Is there an known workaround?

Version: 7.2.4.1 (x86) / LibreOffice Community
Build ID: 27d75539669ac387bb498e35313b970b7fe9c4f9
CPU threads: 16; OS: Windows 10.0 Build 19042; UI render: Skia/Vulkan; VCL: win
Locale: en-US (de_DE); UI: en-US
Calc: CL

PowerPointDifferentLanguages.ppt (122 KB)

No problem here with pdf appearance with any of the pdf/A settings. It does copy from Adobe Reader as one line rather than 5

Version: 7.2.5.2 (x64) / LibreOffice Community
Build ID: 499f9727c189e6ef3471021d6132d4c694f357e5
CPU threads: 8; OS: Windows 10.0 Build 22000; UI render: default; VCL: win
Locale: en-NZ (en_NZ); UI: en-GB
Calc: CL

I don’t think of bug at this moment. I guess we are outside documented behaviour. PDF shall transport the visual impression of a document to ensure it can be shown or printed identical. It does not keep document structure. Actually neither line feeds nor lines are necessary elements of PDF.
As @RobertG in the german question could not reproduce the asked behaviour on linux (confirmed by @EarnestAl on Win10) it may even depend on the old question how to handle single LF instead of CR/LF.
To check I’d ask for the pdf of @MMr so we all look at the same file, and wich software he used to copy/paste.

1 Like

I open the pdf in Acrobat Reader and paste the results in notepad++. We also import the textlayer with a library (ImageGear) with the same results. What puzzled us is the different behavior if you export it as PDF or as archive PDF.

There is no bug as long as the PDF displays the data correctly. Everything else is irrelevant: PDF purpose is archiving/printing/displaying and the like, not providing editable content.

Just picking at this a little longer.
If I select and copy the text of PowerPointDifferentLanguages.ppt from Impress into Notepad, I get 5 lines of text.
If I copy the text from Impress into Writer I get a single line of text.
This difference in output by itself looks like a bug.

If I open the file in PowerPoint and copy from Powerpoint to Writer I get 5 lines of text.

1 Like

Possibly that (not having breaks in RTF clipboard format) could be filed as a bug.

@mikekaganski I do not agree with you that everything else is irrelevant. We analyses the documents with our software. One relevant information is the line, the position of the text.
But anyway there is a different between exporting some documents with LibreOffice as pdf or exporting the documents as pdf/a. From my point of view there shouldn’t be a different.

@Wanderer. I’d love to upload the result pdf but unfortunately the file extension “pdf” is not allowed to upload as far as i known.

Just rename to xxx.pdf.odt - usually works with not to clever forum uploads …

As suggested I upload the result pdfs with the extension “odt”. Hopefully i break no rules on this forum.

ExportedAsNormalPdf.pdf.odt (41.6 KB)
PowerPointDifferentLanguages.pdf.odt (56.4 KB)

Try:
Save the document in Libreoffice format.
Open the document saved with Libreoffice format.
Check that it is correct.
Export to pdf.

Hello,

Thanks for the suggestion. I tested it. Unfortunately it does not help. It still behave different if you export it as pdf or archive pdf.