Find & Replace carriage returns and line feeds for spaces?

Marianillo · June 25, 2020, 9:59am

I am copying/pasting text from a pdf into a Writer document and I have a problem.

When I look at the pasted text, it seems that the blank space where the line break used to be in the PDF has disappeared, which makes the text difficult to read.

But actually, there is some “residue” left of the line breaks from the PDF text (CR LF).

I checked here (View non-printable unicode characters) and I saw that it was a CR LF character.

The character is invisible even if I toggle the non-printable characters on. But when moving the cursor with the arrow button, I can clearly “feel” that there’s something there. I tried to copy it and paste it in the Find & Replace, but it didn’t work.

I tried in Find & Replace with $ and \r\n, as proposed in other questions, but it doesn’t work for me.

I tried pasting both RTF and unformatted text, but there’s no difference.

In the sample text Ii post here there’s pasted text both RTF and unformatted. You can see that the two CR LF are between “a” and “rythm”, and between “moments” and “of”.

Does someone have a suggestion on how can I get rid of the CR LF and replace it with blank spaces?

Thank you so much for your help!
Sample_CR_LF.docx

EarnestAl · June 25, 2020, 10:25am

Did you tick the Regular Expressions checkbox in the Find & Replace dialog? You will need to click the + next to Other options to see it. For me using Find & Replace (Ctrl+H) with $ in the Find and just a space in Replace works well at replacing the paragraph with a space in your .docx.

The $ matches the end of a line. Line terminating characters are \u000a, \u000b, \u000c, \u000d, \u0085, \u2028, \u2029 and the sequence \u000d \u000a . See Regular Expressions - Old location of the ICU User Guide (linked from link in Help)

Marianillo · June 25, 2020, 1:32pm

Yes, I had checked the Regular Expressions… but the thing is that I was looking for the wrong one… I was searching for \r\n… and thanks to your link I realized that I had to search for \R.

With this, Find & Replace worked like a charm, so THANK YOU very much for your help. Problem solved!

ajlittoz · June 25, 2020, 2:46pm

@EarnestAl: FYI, $ at end of a regexp matches the end of a line; this is a position and can’t be replaced. $ as a complete regexp (nothing else added) matches a paragraph mark; the semantics of the $ is different in this case. This is the only way to replace or modify a paragraph mark in the standard Find a Replace.

ajlittoz · June 25, 2020, 10:19am

First, you must be aware that copying text in a “foreign” document, i.e. one which is not .odt, will always cause compatibility problems because of the original encoding.

In your case, you copied from PDF. PDF is not a document processing format but a page description language. What you take for a “continuous” text in PDF is in fact a collection of homogeneous text boxes accurately positioned to give the illusion of continuous text.

When you copy text in a PDF viewer, the viewer will reconstitute lines and separate them with the CR-LF pair. This is what goes into the clipboard as “unformatted text”.

You then paste the clipboard data into Writer. Writer will translate this unformatted text to its best. There is no real problem with character data. The CR-LF pair is converted into a formatting directive known to Writer, a paragraph break in this case. This results in what looked like a paragraph in PDF is now a sequence of one-line paragraphs in Writer.

It is then legitimate to try to rebuild the original paragraphs. But there are no longer any CR-LF pairs in Writer.

For Edit>Find & Replace, a paragraph mark is represented by $ only if you checked the Regular expressions box (otherwise it is simply the U+0024 DOLLAR SIGN).

Since nothing differentiates the PDF-inserted paragraph marks from your own intentional paragraph marks, the replacement in Find & Replace must be manually controlled to avoid replacing real paragraph marks.

Notes:

The character is invisible even if I toggle the non-printable characters on

Have you activated View>Formatting Marks? If so, when it reports “nothing”, there is nothing.

Unless you have a very good reason, save your document in the native format .odt, this will prevent formatting loss.

To show the community your question has been answered, click the ✓ next to the correct answer, and “upvote” by clicking on the ^ arrow of any helpful answers. These are the mechanisms for communicating the quality of the Q&A on this site. Thanks!

In case you need clarification, edit your question (not an answer) or comment the relevant answer.

Marianillo · June 25, 2020, 1:34pm

Even with the Formatting Marks on, it reported “nothing”. Still, there was “something”… so that was weird…

But now it’s solved. Thanks for your input!

Marianillo · June 25, 2020, 1:35pm

I realized that I had to search for \R in the Find & Replace, and then it worked.