Ask Your Question
0

How do I remove some line/paragraph breaks from text using Find/Replace

asked 2018-04-10 10:44:08 +0200

Der Alte Fritz gravatar image

I have a Contents page from a book however the lines are broken up by paragraph breaks. How do I remove some of these non printable characters so that each entry fits onto one line.

I was thinking of using Find/Replace or the Find/Replace Extention to find:

[:alpha:]$ the alphabetic characters with a paragraph break after them

and then to remove JUST the paragraph break. Is this possible? Or is there an alternative?

example: 2 ПРИКАЗ НАРОДНОГО КОМИССАРИАТА ПО ВОЕННЫМ ДЕЛАМ ОБ ОРГШТАТНЫХ [$]

ИЗМЕНЕНИЯХ УПРАВЛЕНИЙ ВОЕННОГО ВЕДОМСТВА 6

turned into 2. ПРИКАЗ НАРОДНОГО КОМИССАРИАТА ПО ВОЕННЫМ ДЕЛАМ ОБ ОРГШТАТНЫХ ИЗМЕНЕНИЯХ УПРАВЛЕНИЙ ВОЕННОГО ВЕДОМСТВА 6

Thanks

edit retag flag offensive close merge delete

3 Answers

Sort by » oldest newest most voted
1

answered 2018-04-10 13:44:13 +0200

Der Alte Fritz gravatar image

The main issue seems to be not overwriting text that you have found when dealing with paragraph breaks (the little blue P symbol at the end of a line)

An expression such as \<[1-9][0-9]> should find a digital number. This works! \<[1-9][0-9]> $ should find a digital number with a paragraph break following it. This works. (\<[1-9][0-9]*>)\1 $ grouping the digital number with (......)\1 should enable it to be protected in the replace box but this does expression does not work.

So using the working expression Find: \<[1-9][0-9]*> $ Replace: 0$@@@@ should give me the original string (protected) but add @@@@ at the end. This also does not work.

Tried this alternative: [:digit:] $ finds a digit followed by a space and a paragraph break. This works and finds all the lines with numbers at the end.

Step 1: Find: [:digit:]$ finds a digit with a paragraph break after it Replace: $0@@@@ keeps all the text but adds @@@@ string after the digits Result: 45@@@@[paragraph break]

Step 2: Find: $ finds all paragraph breaks Replace: one space Result: Remove all paragraph breaks giving one long text

Step 3: Find: @@@@ Replace \n Result: 45[line break] restores breaks at any point in the text with string @@@@ at the end

So mission accomplished but slightly disappointing that the ()\1 grouping does not seem to work in the Find box.

edit flag offensive delete link more

Comments

It doesn't work because regex searches do not work across paragraph breaks with LibO's regex engine. Quite frustrating.

David gravatar imageDavid ( 2018-04-10 20:36:33 +0200 )edit

I am glad that you say that because I thought it was just me. It does not seem to work with the Alternative Find and Replace for Writer Extention either which is a bore. However the little work around I posted above does work so I got this job done in a number of stages when in reality it should be able to do this sort of thing in one go.

Der Alte Fritz gravatar imageDer Alte Fritz ( 2018-04-12 17:43:37 +0200 )edit

@Der Alte Fritz - Yeah, it's a drag. This has come up before, and see also the bugs listed against regex and paragraphs. This has been a pain for a long time, unfortunately. There was a change in regex engines at some point (I believe -- I'm sure I read about that somewhere), but it didn't help with this. :(

David gravatar imageDavid ( 2018-04-12 20:53:32 +0200 )edit
0

answered 2018-04-10 13:57:28 +0200

gabix gravatar image
edit flag offensive delete link more

Comments

Sorry is not relevant since I am not working with an e-book but thank you for the suggestion all the same

Der Alte Fritz gravatar imageDer Alte Fritz ( 2018-04-12 17:40:49 +0200 )edit
0

answered 2018-04-10 13:32:46 +0200

Lupp gravatar image

updated 2018-04-12 21:44:02 +0200

David gravatar image

You may adapt the code I posted in this recent thread to your needs.

Edit 1:
If the task can be done interactively using a specialised Sub it should be simple:
-1- Find All next to paragraph breaks you want to delete based on one of the respective regular expressions like [:alpha:]$ in the given case.
-2- With the selection made this way run the following Sub:

Sub removeParagraphBreaksFound()
theText  = ThisComponent.Text
theSel   = ThisComponent.CurrentSelection
For j = 0 To theSel.Count - 1
    oneRange    = theSel(j)
    theCursor   = theText.CreateTextCursorByRange(oneRange.End)
    theCursor.GoRight(1, True)
    theCursor.String = ""
Next j
End Sub

-3- Repeat -1- and -2- with different regular expressions if needed.

Check the suggestion thoroughly. There were many requests of the kind, but not this answer yet (or a similar one). Thus I doubt if it can be that simple.

I did not explore what complications may occur if formated text portions were split. Errors expected!

edit flag offensive delete link more

Comments

Excellent I will have a look at this

Der Alte Fritz gravatar imageDer Alte Fritz ( 2018-04-12 17:41:12 +0200 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2018-04-10 10:44:08 +0200

Seen: 382 times

Last updated: Apr 12 '18