The formatting of pasted text

Hi to all. First time here.

Some of you are probably familiar with Archive.Org and their different formats for downloading whatever material. Personally, I like the text format simply because I can then format it as I like rather than, say, the PDF format which often has fonts that are somewhat unreadable. So, if I find a PDF that I like I will copy the text version and then make my own PDF via Libre.

The quirk, at least for me, is that with some fonts the pasting goes well while with others there is the issue of spacing. Since an image is worth a thousand words - may have gone up more with inflation :slight_smile: - I illustrate with the following.

This is the paragraph in text:

Only by descending into the canyon may one arrive at anything like comprehension 
of its proportions, and the descent cannot be too urgently commended to every visitor 
who is sufficiently robust to bear a reasonable amount of fatigue. There are four paths 
down the southern wall of the canyon in the granite gorge district Bass, Bright Angel, 
Grand View and Red Canyon trails. The following account of a descent of the 
old Hance trail, near Grand View, will serve to indicate the nature of such an experience 
to-day, except that the trip may now be safely made with greater comfort.

Sometimes the formatting translates well but there are instances where the initial pasting, or a font change, will make the text look like this:

image description

If itā€™s only a paragraph or two I could rectify it but if several paragraphs then that becomes a chore of line-by-line.

What would I do in LibreOffice to address this?

Thank you for any assistance.

Daniel

So what specifically is wrong on the screenshot youā€™ve provided?

The spacing from line to line rather than single line as shown in the text example.

The text appears to be too long for your page settings. Can you not either lengthen the text line by reducing the margins in your PAGE settings? Or, just reduce the size of the character font?

The cause of the problem is copying from a PDF. PDF is a layout format: initial text is broken into lines and each line is positioned on its own. When you copy a block of text from the PDF viewer, you end up with a set of paragraphs: each line is terminated by a paragraph mark.

When you paste into LO Writer these paragraphs are correctly laid out as long as the font size does not cause the paragraph to exceed a line width. Otherwise you get what you show in your screenshot.

Fixing it in Writer is not easy because built in Find & Replace does not consider more than a paragraph and the paragraph mark is conceptually located between two paragraphs. Extension AltSearch allows to circumvent some of the limitations, but uou must be careful with your regular expression (as with any regexp).

Iā€™s rather recommend to process the text file first in a text editor (not a document processor like Writer) and to find all single end-of-line (\n) to remove them. Pay special attention to two consecutive end-of-line which are usually the ā€œsignatureā€ of the end of paragraphs. These double end-of-lines should be kept as a single end-of-line to mark the end of a paragraph.

After that, you can import your processed text into Writer and format it.

To show the community your question has been answered, click the āœ“ next to the correct answer, and ā€œupvoteā€ by clicking on the ^ arrow of any helpful answers. These are the mechanisms for communicating the quality of the Q&A on this site. Thanks!

In case you need clarification, edit your question (not an answer) or comment the relevant answer.

ajlittoz

Thank you for the response. :slight_smile:

Yes, I am aware about this issue with copying from PDFs. But in this instance it was copied from text on the website. Here is the link to it:

link text

I gather then that the editor instructions you give would also apply to the text?

Daniel

The actual content of that page is simply raw text; try @ajloittoz suggestion to use a plain text editor. Or you could look at the options available from the link at the top right of the page - See other formats

That page was generated from a djvu file (either by OCR, or using text overlay of the original file); the text there was split into lines, and so is the resulting plain text on the site.

Thanks to all who have helped with this. :slight_smile:

It would seem as Mike Kapanski, said that it is a djvu file and thus the lines are split. This was proven by my copying text from somewhere else and then pasting it and having no ā€˜wrapā€™ problems.

As ajlittoz and petermau suggested, it seems a matter of the font size. For example, if I want to use Verdana then I have to go to a size 10 or 10.5pt at most; anything above that and the lines split. But if another document (djvu) is used then you can go up to 12pt with no problem. Apparently, the OCR aspect changes from one doc to the next.

Since most PDF viewers have zoom - on browsers too - then Iā€™m not going to worry too much about the size of the font unless for some reason it looks way too small. As you can see from the images, 11pt looks decent enough at 100%. May even go to 10pt with some material.

On LibreOffice:

image description

On PDF (browser)

image description

Anyway, thanks again to all of you.

Daniel V.