Ask Your Question
0

Keep what's there without the non-printing chars [closed]

asked 2017-05-15 20:07:36 +0100

bkpsusmitaa gravatar image

updated 2017-05-17 11:30:50 +0100

I have strings searched by the form [a-z,A-Z,0-9]$, e.g., apple$, to$, etc.

I want to keep the words with a space char instead of the paragraph char $.

apple$ would be apple,

to$ would be to,

mat$ , mat,

and so on and so forth, i.e., the result would be without the non-printing paragraph mark at the end of the words that were earlier there.

How could I?

The editing was for Regina, who had difficulty understanding my requirements. Thanks, Regina, for pointing out!

This is possible by deleting one character from the end. But is there any better way?


Regina, please check with the text, Russell's In Praise Of Idleness

C:\fakepath\Bertrand.odt


I could have done it by changing empty para marks ^$ into something like !!!! , then convert single paras into 'one blank space' and then reconverted the !!!! into single para mark. But is there an elegant way?

For Mike:

Please don't use confusing phrases. Return/Enter key creates , searched by $ is the paragraph-break, whereas Shift+Enter keys create line breaks displayed by . I had already posted the clerical solution above, but I needed an elegant solution.

Please download the file and activate the non-printing characters button and then peruse the file. Remember that the clerical solution is already posted above.

In Microsoft word there was one, called Find What Text but I left using MsWord ten years back!

Sample was not required. I posted a summary of what I actually needed. Your and Regina's questions made me extend my question via explanations.

Mike, your solution is incorrect. Look at my solution above! Would you like to edit your post?

Mike, again, have you checked your solution with the file? Does it work? Now I am giving you the real task, with the clue where the paragraphs were, here:

C:\fakepath\Toughening up.odt

This was the reason for my first unedited post. How to remove those paragraph marks when there are no double paragraphs? Please note that this is just a sample file, and indeed it is a plain text file.

[Response to Mike's 3rd edit]

No, Mike, you can't rename file, since this is just a sample file (the real file has already been hand-edited: please refer to my reply to Regina) you can't rename it, but keep it as odt file, and then follow your steps.

But my thanks is a common constant for your efforts.

[Response to Mike] Mike, you said:

You just don't want to listen. I still don't know what actual format your real file is. If you are just clueless about formats, or deliberately post wrong-named files to confuse those who try to help. However, you don't take time to stop and think. Even if you just have an ODF...

My answer was a response to only these lines:

here is the exact sequence to do with the file Toughening up.odt (that is actually another plain text file, again ...

(more)
edit retag flag offensive reopen merge delete

Closed for the following reason too subjective and argumentative by bkpsusmitaa
close date 2017-05-17 11:31:47.153810

Comments

It is not clear, what you want to do. Please make an example of the current text and how you want to be it afterwards.

Regina gravatar imageRegina ( 2017-05-15 20:53:42 +0100 )edit

The standard 'F&R' does not accept the search expression $$. Only the single $ will work for the purpose. You may, of course, use an intemediary placeholder to prepare for the second step.

Lupp gravatar imageLupp ( 2017-05-17 08:26:42 +0100 )edit

No, I don't want to edit my post. In it, I tried to address your initial requirement, and in edit, I addressed the attachment you've added. So, there's no "confusing phrases": the file definitely contains line breaks, not paragraph breaks, and my edit fully covers the file's case. If the file is irrelevant to question, then it's the file that confuses, not my phrases.

Mike Kaganski gravatar imageMike Kaganski ( 2017-05-17 08:49:19 +0100 )edit

Ah! I see. The file isn't ODF; actually, it's plain text that contains LFs, and they would be treated differently depending on LO settings (and they in turn depend on OS by default); that's why it opens as a document with line breaks. So, please don't assign arbitrary extensions to files regardless of their format; this specific file needs a TXT extension.

Mike Kaganski gravatar imageMike Kaganski ( 2017-05-17 08:55:34 +0100 )edit

For this file, if you open it with formatted text and set LF to be paragraph, I'd stick to your solution, because it's really most elegant - replacing double breaks with something unique, then replacing rest with spaces, then replacing uniques back to paragraphs. But opening the file so that LFs are line breaks would make my solution better one (more straightforward), and only requires choosing CRLFs for paragraphs in formatted text filter.

Mike Kaganski gravatar imageMike Kaganski ( 2017-05-17 08:58:50 +0100 )edit

You just don't want to listen.

I still don't know what actual format your real file is. If you are just clueless about formats, or deliberately post wrong-named files to confuse those who try to help.

However, you don't take time to stop and think. Even if you just have an ODF, and it already has its paragraph breaks, you might use a preliminary step to convert $ to line break (see here) and follow.

Mike Kaganski gravatar imageMike Kaganski ( 2017-05-17 10:08:17 +0100 )edit

3 Answers

Sort by » oldest newest most voted
0

answered 2017-05-16 07:44:04 +0100

updated 2017-05-17 09:35:42 +0100

"Non-printing paragraph mark" creates paragraph break. So, if you want to get rid of this mark, you effectively remove new paragraph break in this place. I suppose it's what you actually want to do; e.g., this might happen if you import some "formatted" plain text, or a scan result, and you assume that if there's not a dot before the paragraph break, then this paragraph break should be eliminated.

LO's own Find and Replace dialog is unable to remove paragraph breaks in Regex mode (it operates in one paragraph's boundaries only; the only explicit exception is removing empty paragraphs by using ^$). So, you might consider AltSearch extension for that.

The proper find string for that task would be ([a-zA-Z0-9]+)$ (note the parentheses), and replace would be \1  (first back-reference plus a space).

EDIT (2017-05-17): the file that you finally attached is a bright demonstration why the samples must be provided to make it clear what's the problem is.

The file has only one paragraph in it, and multiple line breaks (not paragraph breaks!). Each "wanna-be-paragraph" is delimited by two successive line breaks; all other line breaks should be converted to spaces. So, taking into account LibreOffice's List of Regular Expressions (saying: \n in the Find text box stands for a line break that was inserted with the Shift+Enter key combination; \n in the Replace text box stands for a paragraph break that can be entered with the Enter or Return key), we need just these two steps with usual Find and Replace dialog and Regular Expressions checked:

  1. Search for \n\n and replace with \n.
  2. Search for \n and replace with   (single space).

That's it. If you would provide the sample right at the start (by e.g. sharing in a public service like DropBox), the solution would be really quick.

EDIT 2: the attached file, despite being named Bertrand.odt, actually is a plain text file that has LFs as line breaks. On opening with LO, the breaks are treated depending on LO's formatted text settings, where you may choose if LFs, CRLFs or CRs are treated as paragraph breaks. So, opening it so that LFs stay as line breaks, allows for my solution above.

EDIT 3: here is the exact sequence to do with the file Toughening up.odt (that is actually another plain text file, again):

  1. Rename it to Toughening up.txt
  2. Open LibreOffice Start Center, File-Open, and choose Text - Choose Encoding (*.txt) in File Type drop-down list. Select and open the Toughening up.txt. In ASCII Filter Options dialog, choose CR & LF as Paragraph break
  3. Search for ([.])\n and replace with $1\n.
    • Check if all generated paragraph breaks are correct.
    • You might want to extend the repertoire of characters in square brackets.
    • Another (equivalent) find-replace pair is (?<=[.])\n\n - using Look-behind assertion.
  4. Search for \n and replace with   (single space).
edit flag offensive delete link more

Comments

You mean, use the ([a-zA-Z0-9]+)$ (note the parentheses), and replace code \1 (first back-reference plus a space) after I have installed AltSearch Extn?

Thanks in advance!

bkpsusmitaa gravatar imagebkpsusmitaa ( 2017-05-16 08:18:53 +0100 )edit

Yes, these are for AltSearch. For usual Search and Replace, the backreference would be $1, not \1 (but it will not fork in this case, as I mentioned).

Mike Kaganski gravatar imageMike Kaganski ( 2017-05-16 08:24:13 +0100 )edit

Sorry, there is a problem. My ver is Version 4.0.3.3 (Build ID: 400m0(Build:3)) within Knoppix 7.2.0 and I am not in a position to upgrade the version or Knoppix (old laptop). The extension is problematic, it loops endlessly, it seems, when I have ticked regex and clicked Find!

bkpsusmitaa gravatar imagebkpsusmitaa ( 2017-05-16 08:31:00 +0100 )edit

It might be that it is just takes long time to finish (on a big document). Try to test it on a new document with a few lines to make sure if it works or not.

If it doesn't, then you could make a multi-step replacement. E.g., you could try to replace all $s with e.g. {NEWLINE}, then search for those \{NEWLINE\}s that are after your pattern and replace them with spaces, and then replace \{NEWLINE\}s left with \n.

Mike Kaganski gravatar imageMike Kaganski ( 2017-05-16 08:40:36 +0100 )edit

Okay, I will try, and then get back.

bkpsusmitaa gravatar imagebkpsusmitaa ( 2017-05-16 08:45:15 +0100 )edit

I forgot to add that if you'd try to remove all paragraph breaks, then you must remember that there's a limit for one paragraph length (for that old version, it's 64K characters IIRC). So, for a big document, it might not work.

Mike Kaganski gravatar imageMike Kaganski ( 2017-05-16 08:53:53 +0100 )edit

The extension is showing this error: FindNextInBlock: Error 9: Index out of defined range. (line: 2561) So do you wish me to try one paragraph at a time? I would not like to make it complicated with double replacements. It is logically difficult to visualise.

bkpsusmitaa gravatar imagebkpsusmitaa ( 2017-05-16 10:20:15 +0100 )edit

I tried with ([a-zA-Z0-9]+)$ for a selection of text having the pattern multiple no, of times. I had the result:

Next occurrence of searched expression  "([a-zA-Z0-9]+)$"  not found. 

 Searching inside of selection has been finished.
bkpsusmitaa gravatar imagebkpsusmitaa ( 2017-05-16 10:27:17 +0100 )edit

Well, I'm out of ideas then.

Mike Kaganski gravatar imageMike Kaganski ( 2017-05-16 10:29:15 +0100 )edit

Thank you for your benevolent gesture of trying to help me.

Is there no simple macros? Not elaborate codes?

bkpsusmitaa gravatar imagebkpsusmitaa ( 2017-05-16 12:00:01 +0100 )edit
0

answered 2017-05-16 19:58:15 +0100

Regina gravatar image

So you do not mean the sign $ but the sign ?

If your task is, to join single line paragraphs, then you can do it with "AutoCorrect".

First set the option: menu Tools > AutoCorrect Options, then tab Options. Find the entry Combine single line paragraphs if length greater than 50%. Check its checkbox in column [M]. Double-click on the term 50%, then you can edit it. A value of 1% should work for you. OK. OK.

Now select the lines you want to join. Then menu Formats (in case of LO4) > AutoCorrect > Apply.

edit flag offensive delete link more

Comments

Thanks Regina! I completed the work manually by find option which worked. But I will remember to check and report.

bkpsusmitaa gravatar imagebkpsusmitaa ( 2017-05-17 04:28:22 +0100 )edit

My solution assumes, that there are paragraph breaks, but your finally provided text does not contain paragraph breaks but line breaks. So my solution will not work for your text. Your task was much clearer, if you would have provided an example file from the beginning.

Regina gravatar imageRegina ( 2017-05-17 09:49:54 +0100 )edit
0

answered 2017-05-17 08:22:03 +0100

Lupp gravatar image

The placeholder for 'EndOfParagraph' ($) placed as the only character in ''Search For:' and the space in 'Replace With:', RegEx enabled, will do the first step. This will not only work for empty paragraphs or single line paragraphs. If there are additional replacements wanted a second step is necessary.

You may also check if the extension 'AltSearch' (Aölternative Serach & Replace / Installation from .oxt needed) has more capabilities. It is not very efficient with th kind of task.

edit flag offensive delete link more

Comments

Lupp, I had already posted the inelegant solution. Perhaps, you did not read this thread adequately. Unlike you! Please take a look at an earlier thread: using regular expression to clean up texts?

bkpsusmitaa gravatar imagebkpsusmitaa ( 2017-05-17 08:45:40 +0100 )edit

Quoting @bkpsusmitaa: "Unlike you!"
I don't understand.
Well, I read the posts and I only added an answer because I got the impression that posters, in specific yourself, mixed up the $ as used in 'Serach For:', the pilcrow glyph used as a 'Formatting Aid' in 'Writer' if enabled, and the paragraph break itself. My answer and my comment to your question are correct with this respect.
I only may add that a search for a hard linebreak (not creating a new paragraph) is done with \n.

Lupp gravatar imageLupp ( 2017-05-17 10:19:54 +0100 )edit

@bkpsusmitaa again :
In advance of another editing of your question you had inserted "I could have done it by changing double para marks $$ into something like !!!!" My comment was thought to be helpful for better understanding with this respect. $$ is not searchable after all.
(Please note that \n has a different meaning in 'Replace With:').
Feel free to downvote my answer.

Lupp gravatar imageLupp ( 2017-05-17 10:29:01 +0100 )edit

Question Tools

1 follower

Stats

Asked: 2017-05-15 20:07:36 +0100

Seen: 174 times

Last updated: May 17 '17