How can I prevent long lines from being split during export to PDF

Hi,

the problem is that all long paragraphs in the text are broken into several lines of text during export to PDF. It breaks, for example, long shell commands, so they cannot be copypasted into a shell window from PDF. How can I keep the text wrapped?

I attached the sample text (test.odt) and the result of export (test.pdf.odt).
test.odt (18.3 KB)
test.pdf.odt (25.1 KB)

Version: 7.6.6.3 (X86_64) / LibreOffice Community
Build ID: 60(Build:3)
CPU threads: 24; OS: Linux 6.5; UI render: default; VCL: kf5 (cairo+xcb)
Locale: en-US (en_US.UTF-8); UI: en-US
Ubuntu package version: 4:7.6.6-0ubuntu0.23.10.1
Calc: threaded

WMBR, George Hazan.

1 Like

Hi, it is typical for export into PDF files that lines are generated as single objects. Sometimes one line in Writer is exported into 2 objects. In your case we have 3 objects, because the “$” is in an extra text object.


Workaround
What you could do is to open the pdf file on masterPDF (version 4 without watermarks for some functions) and to combine the lines into one object. It’s feasible but cumbersome.


test-1_masterPDF.pdf (8.5 KB)


What the hell is WMBR? (no native speaker)

The problem is:

  • this is a part of a manual with about ~300 such commands;
  • there are about 10 such manuals totally
  • this job should be done each time *.odt files are exported to *.pdf

So I’d definitely prefer to find an automated solution or a hidden setting in Libre Office or another tool to export *.odt → *.pdf

With My Best Regards, George Hazan.

Pdf is effectively a drawing with objects placed on a page. There is no text wrap, no styles, just disconnected objects.

You could export as Hybrid pdf, that is with the .odt embedded inside the pdf.
If such a document is double clicked it will open in the default pdf reader. If opened from within LibreOffice it will open the Writer document ready for further editing.

1 Like

Sorry, I misunderstood the problem, I was thinking of editing the pdf later.

If your pdf is opened in Adobe Reader or other pdf reader (Draw is not a pdf reader) then the text is wrapped and can be copied and pasted without breaks. I suggest to check with a different pdf reader.

1 Like

I’ve tried Okular and some other readers under Linux, they all insert line breaks. Also, during text selection, I can see a certain gap between those lines.

WMBR, George Hazan

Well, when I read books in PDF format, I always see long paragraphs of text, successfully displayed as the single line. At least when I copy text from them and then paste it, that text remains to be a single line, that’s all I want from my PDFs.

My goal is to ship PDFs with my software as manuals to the end users, who can use any reader. And the text must be copied without any problems as the single line, as it is written in the original ODT document

I notice too that Firefox also has a line break. Regardless, Adobe Reader wraps the text and is still the most popular pdf reader.

In the image below, I have copied the text from your pdf opened in Adobe Reader to Writer as a single paragraph.

I also tested with a longer paragraph, the Dummy Text from Writer, which I exported to pdf, opened in Reader and pasted back to writer as a single paragraph.

What is the pdf document generator in document properties? Please report back here, but see my search.

Or are the books in epub format which is designed to wrap at different places depending on the user’s reader?

Nope. They are paragraph breaks. Line and paragraph breaks are two different things in Writer. They probably are merged to the same things in text editors used for code entry, be it a shell window (as it seems to be your purpose).

Which clearly indicates “lines” are not terminated by a line break but by a paragraph break.


To get “single paragraph” during the copy operation, you must select “text mode” in your pdf reader otherwise it gathers graphical objects (text is draw on the page as a collection of graphical shapes). Enabling “text mode” allows the PDF reader to try to rebuild text flow.

Excellent, but there’s no paragraph break (Pi) between lines in the original ODT document.

I’d definitely prefer to find the way not to insert any breaks, which are non-existent in the original document I’m exporting. What is the problem to reproduce the document’s structure as written?

You won’t in the current state. This is how the export is done. The line frames in Writer are output one by one, and the PDF’s ability to just wrap to a next line isn’t used. You may file an enhancement request, and further welcome to contribute.

The problem is making sure that the different graphical objects in LibreOffice (the rendering primitives, created from the document model, to have line-by-line stripes, each consisting of several spans) in fact form a logically continuous text (i.e., you need to re-construct the things that were decomposed for rendering).

P.s. Are you by chance the ghazan from Miranda NG?

PDF and ODF are fundamentally different formats. PDF is a display “dead end” format. By dead end, I mean no further processing will be done on data. Consequently, any artefact can legitimately be used to position the bits where they should. Since it is only a display format, there is no commitment to respect the logical consistency of original data. This (absence of) rule makes it easy to output the PDF and allows for “minimal” size of the file.

There is a possibility but it isn’t the easiest; you could use a form Text field, set to multi-line. The problem is that the reader can overwrite the field but you could go out of design mode, enter your desired text, go back into Design Mode and set the control to Read-only. The users will still be asked if they want to save the form when they close out but what they choose would be irrelevant.

In the pdf below, only the text in the form will wrap when copied from a different pdf reader.

MultilineFormField.odt (34.2 KB)
MultilineFormField.pdf.odt (26.8 KB)

I have to say that there’s very strange way to do the export, when a program knows what to do better than me…

  • вы, случайно, не сын старика Ковальского?
  • сын, но что случайно - слышу впервые :smirk:

Well, at least that’s the first workaround I could find for a long time, but in my Okular the text from a memo field is still inserted into Konsole (bash) as two separate lines, breaking the shell command therefore. Of course, I can simply put a backslash to the end of each line, in this case a command won’t be broken, but it makes the text unreadable…

WMBR, George Hazan

:slight_smile: The architecture of the PDF export in LibreOffice is: take the render of the document (something that is created in the same way for anything from display to printing, PNG export, etc.), and convert these graphical primitives (that already have no knowledge about their origins) to PDF primitives. That has a very significant upside: it allows to make different kinds of output very similar (indeed, there is imperfectness everywhere, but …). Doing this differently would mean, that the output to the format that is primarily intended for correct visual representation of the document, would have a different implementation, and so, would be very different in many ways from what one sees on screen.

Improvements are possible, but they need someone interested - not much demand, no improvements.

:smiley: Добро пожаловать!

This is exactly what I did for coding examples in several manuals I edited. For aesthetic reasons, I broke long lines into several shorter ones. Fearing unfamiliar readers would copy/paste the examples without pondering over them, I added the backslash at line ends.

I think this is a sensible way to go. You control exactly the length of code lines in the original ODF and this is transferred “as is” into the PDF which is then an exact copy. You also suppressed any hazard.