Ask Your Question
0

how do I turn this HTML into a formatted text from which I can remove text using regex? [closed]

asked 2013-04-22 21:49:35 +0200

biofaust gravatar image

I have been trying to use regex in Find & Replace to remove all text between parentheses from this page, as explained in other answers, but apparently, even if I copy pasted the page in a new document as RTF or Libreoffice Writer format, the function is not working, possibly because the program keeps the tables. Is there a way to turn the HTML into a document file with 2 columns of text in which it is possible to work with regex to remove all text between parentheses or, alternatively, at least all text in Italic?

edit retag flag offensive reopen merge delete

Closed for the following reason question is not relevant or outdated by Alex Kemp
close date 2015-10-30 19:40:14.602856

Comments

What version of LibreOffice are you using? It matters somewhat because the regex engine was changed in LibreOffice 4.0 (the new one is the same as the one in OpenOffice 3.4.1 and is based on an ICU library). There is a known issue with the new engine.

CyanCG gravatar imageCyanCG ( 2013-04-23 01:17:51 +0200 )edit

1 Answer

Sort by » oldest newest most voted
2

answered 2013-04-22 23:51:24 +0200

mahfiaz gravatar image

Doesn't this regex work for you?

\([^)]+\)

It does work for me. Just copied and pasted the text from your site, it remained in the table and regex finds text enclosed with parenthesis. Make sure you have "Regular expressions" checked and just to be sure click on "No formatting" button, when cursor is in "search for" box (the one you put the regex into).

What does this regex do?

\( matches one left parenthesis (backslash is needed because parenthesis without it have special meaning

[^)] is any character which is NOT right parenthesis, ^ stands for not

+ makes previous class match at least one time, but possibly more, so () won't match, while (a) and (a2.3b) will, if you want to match empty parenthesis as well, use *, which means zero or more

\) stands for right parenthesis, which needs to be escaped as well

edit flag offensive delete link more

Question Tools

Stats

Asked: 2013-04-22 21:49:35 +0200

Seen: 1,201 times

Last updated: Apr 22 '13