Regular Expressions replacement affacts the formatting

I use regular expression to convert straight quotation marks to the correct ones. i.e. “…” to “…”
My Libreoffice version is 4.0.0.3 on Mac os X 10.6.8

To tackle to opening quotation mark, I use the following regx in Find and Replace:

"([:alnum:])

Replace with

“$1

However, I found that although it works, it affects the formatting of the text, i.e. italics. here is an example:

Even when you are filling the war
chest at the edge of the pavement it
is not impossible, I find, to spare a
little pity for those who pass as well
as for those who are passed by.
L’homme oisif tue le temps; le temps
tue l’homme oisif,

After running the above the regx, one can notice that the Italic ‘L’ is now in regular ‘L’. Please find the attached screenshot. Is there anyway such can be avoided?

L is not in Italics

Are you using LibreOffice’s built-in Find and Replace dialog, or the “Alternative Find and Replace Dialog” extension? I once had similar problems with the latter. If you are using the built-in Find and Replace, then perhaps the problem is linked to LibO 4.0’s new regexp engine. I’ll investigate, brb

@CyanCG it is the built-in version, I haven’t installed any F&R related plugin.

I just tried the exact same regexp replacement in OpenOffice 3.4.1, which has the same new regexp engine as LibO 4.0. I got the same strange result. In fact, italics were removed from the whole quotation. There are known bugs in this new ICU engine; the AOO developers are aware of it:

The bug is marked as “Resolved Fixed” with a target milestone of AOO 4.0, which means it will take some time for the fix to actually appear in a release version. Now I hope that the fix will be in the next feature release of LibO.

On a side note, the new regexp engine is, as far as I know, not yet properly documented in the official AOO and LibO documentation. Here is the ICU regexp guide. This guide unfortunately has three major issues:

  • It is heavily developer-oriented, though the metacharacters and operators lists are useful for users;
  • It is incomplete and has many spelling/language errors;
  • The HTML document itself is the worst kind of non-standard, invalid, <font> tag-laden atrocity I have ever come across on the Web. It is ridiculous. How can C++/Java developers fail so badly at producing adequate and well-formed documentation?

So for now, refer to the user-destined lists and tables in the guide, but do not even try to make sense of the HTML. Eventually, the AOO and LibO documentations should have an updated, sensible and clean guide that fully documents this new engine. The ICU project should also clean up their mess, but that is another story.

Edit: This just came back to me and it is very important for the current case. Regular expressions are not needed to replace quotes and apostrophes! The AutoCorrect function exists for exactly that kind of scenario. “Format > AutoCorrect > AutoCorrect Options…” allows you to define replacements for various characters, among which quotation marks, and decide what they will be replaced with. The correction can be applied automatically during typing, but can also be applied afterwards with “Format > AutoCorrect > Apply”. The AutoCorrect Options dialog has many options; adding the desired definitions for all replacements makes the tool very powerful and versatile.

That’s disappointing, I need to use this function everyday, guess I’d better search for a new platform then. Thanks for the reply anyway!

It would be very sad if this made you swith to another office suite! Perhaps you could try a simpler regexp, like " to (replace a double quote that occurs after a space with a space and a double opening curly quote). I use regex a lot myself, only a few specific operations cause problems.

@CyanCG Oh I didn’t know the autocorrection has options, will check it out, many thanks!

@cyanCG Tried it out with simple quotation marks, it works fine, but if there are complex structure, i.e. quotation within quotation, quotation marks next to em dashes, it gets confused…

Found this problem exists even in Dreamweaver. Not sure why. It is the way I wrote my Regex? I do need this function desperately, do you have any suggestions of how I can get around this? many thanks! @CyanCG

I confess that I do not understand the example regexp you give in your question (my knowledge of the syntax is too limited right now). However, I think this how-to guide from the OO wiki will help you.

Update:

AOO bug @121482 - Regular expressions Search for bold word select all paragraph, originally mentioned by @CyanCG as Resolved Fixed is now back in a Reopened state.

It sounds like we really should open a LO bug for this one and track the progress on our bug tracker.

@shiyuan, @CyanCG – Could one of you please file a bug and post a link to it in a comment below? That would allow us to resolve this question as BUG FILED, and get this issue over to the LO devs for a fix.

Thanks!

Here is the bug report with number 62603. Something strange happened: I cited a AOO issue, but it links to a totally unrelated issue about another piece of software. Can I edit my comment to fix this?

@CyanCG – Sorry, but I believe you can only edit comments on the Ask site. Comments on Bugzilla can’t be edited, but you can just put a 2nd comment correcting the first one.

Okay, will do. Otherwise, does the comment seem relevant? Is this the proper way to describe a problem? I am not a developer, but I now know enough about the inner workings of LibO that I felt like my guesses about the source of the problem would be helpful.

@CyanCG – Your comments look pretty good to me. I’m sure that the devs will appreciate your attempt to try to track down what’s going wrong here, even if your guesses turn out to be incorrect. And if they get pissy at you, don’t take it personally: Anyone who has to work with this codebase probably gets a bit frustrated from time to time :slight_smile:

Cheers,