Regex for replace with new line

I am trying to change a bunch of unordered lists in HTML in a column to plain lines. I have the following search and replace regex:

Search: <li>(.+?)</li>

Replace: $1\n

However it just puts “\n” instead of creating a new line.

I can’t find any reference in List of Regular Expressions - LibreOffice Help it says that it should put in a new paragraph, (which I wouldn’t want anyway), but it just puts in the actual characters.

It’s Calc; citation:

Using ‘\n’ in the ‘Replace with’ box will replace with the literal characters ‘\n’, not a hard line break.

Source: Documentation/How Tos/Regular Expressions in Calc - Apache OpenOffice Wiki

Is using Writer an option for you? There you may use a more comfortable FIND&REPLACE or probably the AltSearch.oxt extension.

For your purpose it could accidentially do the tag <br> instead of a new paragraph which seems to be meant by \n. Be aware that paragraphs and manual line breaks have tags like <p> text </p> or simple <br>.

In some cases the tags may differ so have a general look on the entire HTML source code.

Oh, that’s stupid, what a weird decision to take! Thanks, yeah, I wrote the code for import via csv into a WordPress website I built, so it’s literally just the ul and li tags. Now somebody wants to create a text document, so I figured I’d replace the code with new lines, so they can just copy and paste from each cell. I think I’ll export as csv and do the regex in BBEdit.

That’s not some “weird decision”, that’s the difference between Writer (where paragraphs exist), and Calc (where no paragraphs exist at all - only line breaks are possible inside cells, but the \n replacement is not about inserting line breaks in any way).

Since when was \n not a new line in regex? That’s news to me.

Since forever in OOo/LO. And that’s what written in the help page that you cited :wink:

Or … what specifically did you mean by “new line” term? line break? paragraph? let’s talk in this app’s terminology - Writer/Calc do not work it terms of plain text (where the difference may depend on flawor).

Literally a line break. I want to get rid of everything between li tags and make each one a new line within its cell.

In LibreOffice, \n in replacement field is paragraph break. And when there’s no notion of paragraphs, that is not applicable. That is consistent - however unexpected. See tdf#43107 (that I just have fixed).

The different issue is that there’s no way to insert line break in F&R dialog - neither in Writer, nor in Calc. That’s tdf#58744.

Thanks for clarifying. As you say, very unexpected. I’m just trying to work out some sort of way around this, but not coming up with too many answers. If I use BBEdit and reimport, it imports each new line as a new record.

while OP complicated this with the parsing of HTML tags. the Question Title: “Regex for replace with new line”, remains apparently impossible without resorting to a function or macro. I am baffled as to why “\n” means one thing in Find, but something different in Replace.
My use is pasting a range of cells into Calc, then doing find and replace to swap a placeholder character for new line-feed(s) within each cell (same as CTL+Enter from the keyboard).
My source is a table displayed in a browser, so Tab separated columns, and Rows delimited by LF.
Since I control the source, I simply output a tilde character where I later intend a LF, then do the find replace in Calc. Or, so I had hoped. I’m now off to write a Macro to do this. Seems Find and Replace should be capable of this.

Using the REGEX() function you can compose your replacement strings with CHAR(10) (instead of the messed-up \n) to get the desired result.
However, I don’t think the RegEx given by the OQer will work as expected.
Generally you cannot reliably analyze HTML / XML by RegEx. That’s not a bug but a fundamental fact concerning the “language” of such texts.
If the texts to be reworked are well-formed HTML / XML, there may be a better way based on the functions FILTERXML() and the XPATH syntax. I cannot tell more because I rarely used it myself.
Also see this attached example.

In the end, I used the original regex. I then put the following formula in column ‘I2’ where the original column starts ‘F2’: =SUBSTITUTE(F2,"\n",CHAR(10))

Dragged it down the column for the required number of rows, converted formula to value, then copied and pasted to over the original column.

1 Like

Of course, you need to assure that \n cannot occur as a part of the content of any line.
Using the REGEX() function as shown in the attachment to my answer, you can do it without an intermediary replacement.

Thanks! This helped!

just made an account to say thanks and that this will improve visibility to this solution