Ask Your Question
1

How can I replace non-breaking hyphens? [closed]

asked 2014-07-02 17:29:48 +0200

this post is marked as community wiki

This post is a wiki. Anyone with karma >75 is welcome to improve it.

This is the link: http://www.authorama.com/we-the-media...

These are the paragraphs:

Journalists can use SMS in any number of ways; again, this is much more common outside the U.S. The first inkling among journalists of China’s SARS epidemic came in an SMS from sources inside the medical profession there. Was this signifi­cantly different than simple phone calls in its fundamental nature? Not really. But in a place where being overheard can lead to big trouble, it’s much safer—as long as one’s messages aren’t being intercepted—to simply send a quick SMS.

Over time, perhaps the most important value of SMS will be of the kind described by Howard Rheingold in his prescient book Smart Mobs:52 a self-organizing information system in which individuals and small groups tell each other important news. Rheingold relates, among other examples, how citizens in the Philippines used SMS to organize and overthrow a corrupt government.53 On a more prosaic level, young people in coun­tries with advanced wireless communications have used SMS for social organization. We’re just at the beginning of this tech-nology’s development. As networks and handsets improve, SMS will give way to video messaging, with yet to be understood implications.

Question

I've managed to replace Non-breaking space with: [:space:] [:space:]+ and Hyphen / Non-breaking Hyphen with: [:hyphen:] [:hyphen:]+

BUT, if you faste those two paragraphs in Writer you'll find two word: "signifi­cantly" and "coun­tries" with non-breaking hyphen which is if you compare it to "normal" non-breaking hyphen "a little more indented". I've tried to paste Unofrmatted text but I get "a little more indented" non-breaking hyphen again.

And I haven't managed to replace it and I searched all web.

Any soulution to replace it or "not getting it" at all?

edit retag flag offensive reopen merge delete

Closed for the following reason the question is answered, right answer was accepted by Alex Kemp
close date 2016-02-27 12:05:04.269741

2 Answers

Sort by » oldest newest most voted
1

answered 2014-07-04 15:08:47 +0200

oweng gravatar image

Under GNU/Linux x86_64 running v4.1.6.2, v4.2.5.2, and v4.3.0.2 (current regular expression engine) these forms work in finding "J": \x4A, \x{004A}, and \0112. These forms do not: \x004A and \112.

These forms work for soft hyphen (U+00AD / 0173): \xAD and \x{00AD}. These forms do not: \x00AD, \0173, and \173.

Under GNU/Linux x86_64 running v3.5.7.2 (old regular expression engine) these forms work in finding "J": \x4A and \x004A. These forms do not: \x{004A}, \0112, and \112.

This form works for soft hyphen (U+00AD / 0173): \x00AD. These forms do not: \xAD, \x{00AD}, \0173, and \173.

As is often the case, there appears to be little that is regular about the accepted expression form. This is also in conflict with what is stated on the List of regular expressions wiki page. Other operating systems may differ again. IMO all the indicated forms should work (although the curly braces are superfluous), regardless of the character in question.

edit flag offensive delete link more
1

answered 2014-07-02 21:53:05 +0200

Regina gravatar image

It is not a "non-breaking" hyphen but an "optional" hyphen. It is called "SOFT HYPHEN" or "SHY" too. It has code point U+00AD.

edit flag offensive delete link more

Comments

@Regina, how are these characters found? Using \xNNNN for the code-point has never worked as far as I can tell. Seems to be a broken aspect of the ICU regex engine.

oweng gravatar imageoweng ( 2014-07-03 06:05:04 +0200 )edit

Do not use regular expressions, but use the character literally. Right click the input field to get its context menu, click on item "Special character". Select the character from the table. It is between ¬ and ®.

Regina gravatar imageRegina ( 2014-07-03 09:42:10 +0200 )edit

Regex itself works with Unicode code points, but not for SOFT HYPHEN. For example write ABCDEFGHIJKLM. Then you can find the character J with either of this syntax \u004a or \x{004a} or \x4a or \0112.

Regina gravatar imageRegina ( 2014-07-03 19:16:00 +0200 )edit

@Regina, I will provide a separate answer for regex forms that work here as it takes more space than a comment will allow.

oweng gravatar imageoweng ( 2014-07-04 15:03:44 +0200 )edit

Question Tools

1 follower

Stats

Asked: 2014-07-02 17:29:48 +0200

Seen: 1,931 times

Last updated: Jul 04 '14