Keep what's there without the non-printing chars

bkpsusmitaa · May 15, 2017, 6:07pm

I have strings searched by the form [a-z,A-Z,0-9]$, e.g., apple$, to$, etc.

I want to keep the words with a space char instead of the paragraph char $.

apple$ would be apple,

to$ would be to,

mat$ , mat,

and so on and so forth, i.e., the result would be without the non-printing paragraph mark at the end of the words that were earlier there.

How could I?

The editing was for Regina, who had difficulty understanding my requirements. Thanks, Regina, for pointing out!

This is possible by deleting one character from the end. But is there any better way?

Regina, please check with the text, Russell’s In Praise Of Idleness

Bertrand.odt

I could have done it by changing empty para marks ^$ into something like !!! , then convert single paras into ‘one blank space’ and then reconverted the !!! into single para mark. But is there an elegant way?

For Mike:

Please don’t use confusing phrases. Return/Enter key creates ¶, searched by $ is the paragraph-break, whereas Shift+Enter keys create line breaks displayed by ↵. I had already posted the clerical solution above, but I needed an elegant solution.

Please download the file and activate the non-printing characters button ¶ and then peruse the file. Remember that the clerical solution is already posted above.

In Microsoft word there was one, called Find What Text but I left using MsWord ten years back!

Sample was not required. I posted a summary of what I actually needed. Your and Regina’s questions made me extend my question via explanations.

Mike, your solution is incorrect. Look at my solution above! Would you like to edit your post?

Mike, again, have you checked your solution with the file? Does it work? Now I am giving you the real task, with the clue where the paragraphs were, here:

Toughening up.odt

This was the reason for my first unedited post. How to remove those paragraph marks when there are no double paragraphs? Please note that this is just a sample file, and indeed it is a plain text file.

[Response to Mike’s 3rd edit]

No, Mike, you can’t rename file, since this is just a sample file (the real file has already been hand-edited: please refer to my reply to Regina) you can’t rename it, but keep it as odt file, and then follow your steps.

But my thanks is a common constant for your efforts.

[Response to Mike]
Mike, you said:

You just don’t want to listen. I still
don’t know what actual format your
real file is. If you are just clueless
about formats, or deliberately post
wrong-named files to confuse those who
try to help. However, you don’t take
time to stop and think. Even if you
just have an ODF…

My answer was a response to only these lines:

here is the exact sequence to do with
the file Toughening up.odt (that is
actually another plain text file,
again): Rename it to Toughening up.txt
Open LibreOffice Start Center,
File-Open, and choose Text - Choose
Encoding (*.txt) in File Type
drop-down list. Select and open the
Toughening up.txt. In ASCII Filter
Options dialog, choose CR & LF as
Paragraph break

[Response to Luppe]

You said:

In advance of another editing of your
ques … $$ is not searchable after
all. (Please note that \n has a
different meaning in ‘Replace With:’).

Yes, I admit, and I am sorry! I just posted it as a solution as I had already solved my problem manually without an appropriate solution. I did not put in much thought.

You can see the number of interactions I have already had without any solution, like an alternative to MsWord’s Find What Text, that too, in 1997!

I believe the thread has become complicated and convoluted and there is a need to close the thread. I will wait for an appropriate solution at my emailbox with my same username at gmail dot com

Thanks, everyone!!!

Regina · May 15, 2017, 6:53pm

It is not clear, what you want to do. Please make an example of the current text and how you want to be it afterwards.

Lupp · May 17, 2017, 6:26am

The standard ‘F&R’ does not accept the search expression $$.
Only the single $ will work for the purpose. You may, of course, use an intemediary placeholder to prepare for the second step.

mikekaganski · May 17, 2017, 6:49am

No, I don’t want to edit my post. In it, I tried to address your initial requirement, and in edit, I addressed the attachment you’ve added. So, there’s no “confusing phrases”: the file definitely contains line breaks, not paragraph breaks, and my edit fully covers the file’s case. If the file is irrelevant to question, then it’s the file that confuses, not my phrases.

mikekaganski · May 17, 2017, 6:55am

Ah! I see. The file isn’t ODF; actually, it’s plain text that contains LFs, and they would be treated differently depending on LO settings (and they in turn depend on OS by default); that’s why it opens as a document with line breaks. So, please don’t assign arbitrary extensions to files regardless of their format; this specific file needs a TXT extension.

mikekaganski · May 17, 2017, 6:58am

For this file, if you open it with formatted text and set LF to be paragraph, I’d stick to your solution, because it’s really most elegant - replacing double breaks with something unique, then replacing rest with spaces, then replacing uniques back to paragraphs. But opening the file so that LFs are line breaks would make my solution better one (more straightforward), and only requires choosing CRLFs for paragraphs in formatted text filter.

mikekaganski · May 17, 2017, 8:08am

You just don’t want to listen.

I still don’t know what actual format your real file is. If you are just clueless about formats, or deliberately post wrong-named files to confuse those who try to help.

However, you don’t take time to stop and think. Even if you just have an ODF, and it already has its paragraph breaks, you might use a preliminary step to convert $ to line break (see here) and follow.

mikekaganski · May 16, 2017, 5:44am

“Non-printing paragraph mark” creates paragraph break. So, if you want to get rid of this mark, you effectively remove new paragraph break in this place. I suppose it’s what you actually want to do; e.g., this might happen if you import some “formatted” plain text, or a scan result, and you assume that if there’s not a dot before the paragraph break, then this paragraph break should be eliminated.

LO’s own Find and Replace dialog is unable to remove paragraph breaks in Regex mode (it operates in one paragraph’s boundaries only; the only explicit exception is removing empty paragraphs by using ^$). So, you might consider AltSearch extension for that.

The proper find string for that task would be ([a-zA-Z0-9]+)$ (note the parentheses), and replace would be \1 (first back-reference plus a space).

EDIT (2017-05-17): the file that you finally attached is a bright demonstration why the samples must be provided to make it clear what’s the problem is.

The file has only one paragraph in it, and multiple line breaks (not paragraph breaks!). Each “wanna-be-paragraph” is delimited by two successive line breaks; all other line breaks should be converted to spaces. So, taking into account LibreOffice’s List of Regular Expressions (saying: \n in the Find text box stands for a line break that was inserted with the Shift+Enter key combination; \n in the Replace text box stands for a paragraph break that can be entered with the Enter or Return key), we need just these two steps with usual Find and Replace dialog and Regular Expressions checked:

Search for \n\n and replace with \n.
Search for \n and replace with (single space).

That’s it. If you would provide the sample right at the start (by e.g. sharing in a public service like DropBox), the solution would be really quick.

EDIT 2: the attached file, despite being named Bertrand.odt, actually is a plain text file that has LFs as line breaks. On opening with LO, the breaks are treated depending on LO’s formatted text settings, where you may choose if LFs, CRLFs or CRs are treated as paragraph breaks. So, opening it so that LFs stay as line breaks, allows for my solution above.

EDIT 3: here is the exact sequence to do with the file Toughening up.odt (that is actually another plain text file, again):

Rename it to Toughening up.txt
Open LibreOffice Start Center, File-Open, and choose Text - Choose Encoding (*.txt) in File Type drop-down list. Select and open the Toughening up.txt. In ASCII Filter Options dialog, choose CR & LF as Paragraph break
Search for ([.])\n and replace with $1\n.
- Check if all generated paragraph breaks are correct.
- You might want to extend the repertoire of characters in square brackets.
- Another (equivalent) find-replace pair is (?<=[.])\n → \n - using Look-behind assertion.
Search for \n and replace with (single space).

bkpsusmitaa · May 16, 2017, 6:18am

You mean, use the ([a-zA-Z0-9]+)$ (note the parentheses), and replace code \1 (first back-reference plus a space) after I have installed AltSearch Extn?

Thanks in advance!

mikekaganski · May 16, 2017, 6:24am

Yes, these are for AltSearch. For usual Search and Replace, the backreference would be $1, not \1 (but it will not fork in this case, as I mentioned).

bkpsusmitaa · May 16, 2017, 6:31am

Sorry, there is a problem. My ver is Version 4.0.3.3 (Build ID: 400m0(Build:3)) within Knoppix 7.2.0 and I am not in a position to upgrade the version or Knoppix (old laptop).
The extension is problematic, it loops endlessly, it seems, when I have ticked regex and clicked Find!

mikekaganski · May 16, 2017, 6:40am

It might be that it is just takes long time to finish (on a big document). Try to test it on a new document with a few lines to make sure if it works or not.

If it doesn’t, then you could make a multi-step replacement. E.g., you could try to replace all $s with e.g.
{NEWLINE}, then search for those \{NEWLINE\}s that are after your pattern and replace them with spaces, and then replace \{NEWLINE\}s left with \n.

bkpsusmitaa · May 16, 2017, 6:45am

Okay, I will try, and then get back.

mikekaganski · May 16, 2017, 6:53am

I forgot to add that if you’d try to remove all paragraph breaks, then you must remember that there’s a limit for one paragraph length (for that old version, it’s 64K characters IIRC). So, for a big document, it might not work.

bkpsusmitaa · May 16, 2017, 8:20am

The extension is showing this error:
FindNextInBlock: Error 9: Index out of defined range. (line: 2561)
So do you wish me to try one paragraph at a time?
I would not like to make it complicated with double replacements. It is logically difficult to visualise.

bkpsusmitaa · May 16, 2017, 8:27am

I tried with ([a-zA-Z0-9]+)$ for a selection of text having the pattern multiple no, of times.
I had the result:

Next occurrence of searched expression  "([a-zA-Z0-9]+)$"  not found. 
 
 Searching inside of selection has been finished.

mikekaganski · May 16, 2017, 8:29am

Well, I’m out of ideas then.

bkpsusmitaa · May 16, 2017, 10:00am

Thank you for your benevolent gesture of trying to help me.

Is there no simple macros? Not elaborate codes?

Regina · May 16, 2017, 5:58pm

So you do not mean the sign $ but the sign ¶?

If your task is, to join single line paragraphs, then you can do it with “AutoCorrect”.

First set the option: menu Tools > AutoCorrect Options, then tab Options. Find the entry Combine single line paragraphs if length greater than 50%. Check its checkbox in column [M]. Double-click on the term 50%, then you can edit it. A value of 1% should work for you. OK. OK.

Now select the lines you want to join. Then menu Formats (in case of LO4) > AutoCorrect > Apply.

bkpsusmitaa · May 17, 2017, 2:28am

Thanks Regina! I completed the work manually by find option which worked. But I will remember to check and report.