How do I turn this HTML into a formatted text from which I can remove text using regex?

I have been trying to use regex in Find & Replace to remove all text between parentheses from this page, as explained in other answers, but apparently, even if I copy pasted the page in a new document as RTF or Libreoffice Writer format, the function is not working, possibly because the program keeps the tables.
Is there a way to turn the HTML into a document file with 2 columns of text in which it is possible to work with regex to remove all text between parentheses or, alternatively, at least all text in Italic?

What version of LibreOffice are you using? It matters somewhat because the regex engine was changed in LibreOffice 4.0 (the new one is the same as the one in OpenOffice 3.4.1 and is based on an ICU library). There is a known issue with the new engine.

Doesn’t this regex work for you?


It does work for me. Just copied and pasted the text from your site, it remained in the table and regex finds text enclosed with parenthesis. Make sure you have “Regular expressions” checked and just to be sure click on “No formatting” button, when cursor is in “search for” box (the one you put the regex into).

What does this regex do?

\( matches one left parenthesis (backslash is needed because parenthesis without it have special meaning

[^)] is any character which is NOT right parenthesis, ^ stands for not

+ makes previous class match at least one time, but possibly more, so () won’t match, while (a) and (a2.3b) will, if you want to match empty parenthesis as well, use *, which means zero or more

\) stands for right parenthesis, which needs to be escaped as well