I get a lot of documents with paragraph breaks at the end of each line and double spaces between paragraphs. I would like to convert these to wrapped text. I can figure out how to remove ALL paragraph breaks through Regular Expressions, and I can figure out how to convert double breaks to some other character such as #, but I haven’t figured out how to remove only single breaks, or convert marked double breaks from a character back to a paragraph. Ideas?
Yes, the originals were ASCII, and they mostly have a consistent double spacing between paragraphs. If they lacked that, I think gabix and ajlittoz would be right and I’d need an outside fix. But with it, and with Mike’s help, I’ve been able to clean it up inside LibO.
For anyone else struggling with this, here’s my step-by-step breakdown:
-
Ctrl+H. Open “Other Options.” Check the box for “Regular Expressions.”
-
find ^$, replace with # or some other meaningless character (if your text were full of, say, Twitter handles, you’d obviously want a different key)
-
find $, replace with a single space (in case the spaces were missing from the ends of the original lines)
-
find #, replace with \n
-
find double spaces, replace with single space (in case the spaces were present at the ends of the original lines)
Results aren’t bad. Thank you all!
A hint: you are not limited to a single-character replacements at step 2. You may also replace empty paragraphs with a string which doesn’t happen in your complex document for which you cannot find a single-character replacement, like e.g. [ParagraphHere]. Then replace it back at step 4.
or convert marked double breaks from a character back to a paragraph
You have almost figured everything yourself.
As mentioned on List of Regular Expressions help page:
\n in the Replace text box stands for a paragraph break that can be entered with the Enter or Return key.
Perfect. You rock
You might want to try OOoFBTools. This extension has a lot of features to process texts, including processing line ends.
You didn’t mention the origin of your text. It smells as if it is a copy/paste from some PDF or plain ASCII file. In this case, it may be simpler to use a macro-generator outside LO Writer to do your filtering first before importing the result. Perl or m4 under Linux/Unix are your friends.