Find characters before paragraph mark and replace with the found characters without the paragraph mark

dsrekjw · October 22, 2018, 8:35pm

Frequently I highlight and copy from a source (pdf, etc.) and paste into Libre Writer.

Often the source has a LF/CR at the end of each line instead of just at the end of a paragraph. This is because the source did not support a line wrap function so they just hard break at the end of each line.
This means that there are hundreds of paragraph marks in the document, many in the middle of sentences.

I wish to get rid of them using find and replace. This usually means that the paragraph mark is preceded by a lower case letter. So I search for [a-z]$ and replace with $0 or ampersand followed by a space. This should substitute a space for a paragraph mark. But it does not work.

When searching for just a paragraph mark $ Find finds the mark and highlights it.

But when searching for specific cases of paragraph mark like a letter just before a paragraph mark then it only highlights the letter and not also the paragraph mark.

Open Office and Libre both have this weakness. So every time I need to do this type of edit, which is actually quite often, I have to go to a computer with MS Word.

In MS Word I search for [a-z]^p and it highlights both the letter and the paragraph mark. So the replace works just fine.

RGB-es · October 22, 2018, 9:07pm

The built-in RegExp engine does not permit to select text plus the paragraph mark. You need to use the Alternative Find & Replace extension instead.

dsrekjw · October 23, 2018, 5:26pm

I tried that but the REPLACE step crashes. My REPLACE step uses & or $0 to replace what was found in the FIND wildcard.

dajare · October 23, 2018, 1:31pm

The regex engine used in LibO (and AOO) does not operate over “line” boundaries – represented by ¶ if you toggle formatting marks. So, in LibreOffice, to remove the paragraph marks:

Ensure [✓] Regular expressions is ticked in the drop-down below “Other options” in the Find/Replace window.
Search for $ by itself;
Replace by · (by which I mean “space”).

Of course, this will remove all paragraph marks in the document, and reflow the whole thing. If you have no “real” paragraphs to preserve, you’re done.

But, if your document (or block of copy/pasted text) includes blank paragraphs between the paragraphs you wish to preserve, you can use these steps to remove only the interrupting/rogue paragraph marks:

Search for ^$, which finds all “empty” paragraphs, and Replace with ####, or some other distinctive, unique string. This will also remove the “empty” paragraph mark, too.
Now get rid of the rest of the paragraph marks, following the bullet points, above.
The document now has no paragraph marks (except the last one in the document, which will persist).
Now, to restore the desired “real” paragraphs:
Put your “unique string” in the Search field: ####, and in the Replace field, put \n\n. This represents “newline”, and adding two of them will restore the paragraphing from the original document.

To tidy up, you might want to search for ·· (by which I mean “two spaces”), and replace that with a single space.

Of course, if your source does not have empty paragraphs to separate “real” paragraphs … then you’re stuck. Maybe.

You can use the reverse of your lower-case-at-end-of-paragraphs, to look for upper-case-at-start-of-paragraphs, and add the extra blank, before going on to the “reflow” described as ##1-4, above:

Search for ^([A-Z]) and ensure [✓] Match case (below the search field) is ticked.
Replace with \n$1 - this adds a new line before that character, and the $1 is the “back-reference” which preserves the initial capital. Now you’ve got a blank line between each “real” paragraph (assuming your “rogue” paragraph markers don’t coincide with the beginning of sentences).

If you do a lot of this, it’s a very fast process and should help you clean up your copy/paste docs fairly efficiently.

dsrekjw · October 23, 2018, 4:04pm

Thanks. This seems like it should work. It’s a lot of hoops to jump compared to the MS Word solution but I will try it.
I think I need to study up on writing scripts for Writer and see if I can just create one that does the task in one click. I’ve gotten pretty good at VBA over the years so I should be able to develop some similar skill for Libre.
BTW, Just tried your full solution and it worked. Thanks David.

dajare · October 23, 2018, 10:12pm

@dsrekjw - Glad that worked. You could record these steps to a macro, and attach to menu or key: see my answer(s) to an earlier “regex” question where I detailed the steps. Hope that helps. If you need more guidance, please ask a new question.

just.mikey · October 24, 2018, 3:14am

For something like this, it’s probably better to save as a flat file (.fodt) and edit your (.fodt) file in a plain text editor. That way you can see everything going on as to exactly what appears before the mark. Since everything is present as text in this view, regex gets a lot simpler.

TimEarl · May 6, 2019, 6:15pm

Excuse me jumping on the end of this one, but I think it’s relevant. I have a couple of questions about the amswers given here:

Search for $ by itself; Replace by ·
(by which I mean “space”). Of course,
this will remove all paragraph marks

If $ works in the find field (to find a paragraph mark), why doesn’t it also work in the replace field, to replace with a paragraph mark?

and in the Replace field, put \n\n. This represents “newline”, and adding two of them will restore the paragraphing
from the original document.

It seems like there is no distinction between “newline” and “newparagraph”, and you’re just “creating” a paragraph
break by inserting two blank lines. However, when I did this and then displayed the control characters, what appeared
was a paragraph mark and not two newlines (there is a separate newline character which can be inserted by
typing SHIFT+ENTER).

Search for ^([A-Z]) and ensure [✓] Match case (below the search field) is ticked.
Replace with \n$1 - this adds a new line before that character, and the $1 is the
“backreference” which preserves the initial capital.

Are these codes/commands shown somewhere in a list/reference, and if so, where can I find it?

Thanks for any help.

Lupp · May 6, 2019, 7:27pm

“…why doesn’t it also work in the replace field,…”
No answer possible. It’s a bad idea, but it’s fact. Same thing concerning the usage of /n in the replace string.
There is the help text on RegEx. What do you miss there? LibO basically uses the RegEx engine by ICU. See their user guide. (This is not concerning the replace strings which are a stubborn extra in LibO.)
You can overcome many of the problems with special features of F&R by using the extension Alt-Search. I don’t use it much, but a superficial test showed me that it still works with LibO Writer V 6.2.3. Alt-Search can search beyond paragraph breaks…

TimEarl · May 12, 2019, 3:21pm

I still can’t get Alt-Seach to work. I’ve upgraded to LibO 6.2.3.2 (x64) and re-installed Alt-Seach, but it still won’t run.
Thanks for the pointer to the list of “regular expressions”, I wasn’t aware of that name for them. I hadn’t seen the RegEx help as I’m not that technical a user … yet. At least now I can do what I want in separate commands. I then tried to record a macro, but LibO told me I didn’t have a Java environment. I went to install one on Firefox, but it doesn’t support them, so it’s back to the drawing board.
Thanks for all the help here.

Lupp · May 12, 2019, 5:16pm

I hadn’t used AltSearch for a while. It even was removed. Having come back to this therad I freshly installed it (from an old .oxt I had kept) and it worked as expected.
Concerning the “separate command” and the recorded macro I don’t uinderstand. You surely didn’t try to record a call to AltSearch?
I will attach an example to an otherwise empty answer.

TimEarl · May 14, 2019, 6:47pm

Hi Lupp,
No, I’ve given up on AltSearch, at least for now. What I meant is:

I can now find and replace regular expressions by using separate commands for each one.
I wanted to combine different F&R commands in a single macro, using the macro recorder (which remembers commands as they’re typed and creates a macro) but it wouldn’t start.

Lupp · May 12, 2019, 5:17pm

This isn’t actually an answer, but simply an anchor for this attachment announced in my comment of today.

Well, to be more clear:
The attached example addresses the mentioned comment. The answer to the original question was already posted by @RGD-es long ago: Use AltSearch.

However, I would add that the single “&” as the replace string would also re-insert the found paragraph break. To avoid this, the text found (accepted as matching the respective part of the RegEx) in front of the paragraph break needs to be referenced as a group. To allow for this the RegEx must contain that respective part enclosed in parentheses. Assuming that’s the first or the only group, the accepted string can be referred to in the replace string as “\1” in AltSearch.