Writer: word break without hyphen insertion at end of line

LO 25.8.4.2, Fedora 43, KDE Plasma desktop

In technical documents you often have to insert long “technical” strings of characters, like an URL. These “words” are seen by Writer as atomic, i.e. not breakable. This means justification could leave huge white space in lines. Also, they don’t look like made of human-language syllable and, if hyphenation is enabled, they break at arbitrary position.

My goal is to insert preferred break locations with soft hyphen formatting marks Ctl+-. Unfortunately, when hyphenation occurs, a hyphen is added at end of line.

When you’re dealing with technical information (like URL as aforementioned), this addition completely changes the semantics of the sequence with the insertion of a faulty character. Readers canot assert if the hyphen must be removed or if it was really part of the data.

Is there a setting to disable hyphen insertion is such specific cases? Can it be recorded in a character style, though hyphenation configuration is defined in paragraph style?

Zero-width space - Wikipedia ?

85731 – Allow setting different hyphenation characters

2 Likes

Gotcha! I think I was particularly tired today. Workaround seems to be obvious a posteriori. Thanks @fpy.

Since the goal is purely visual and there is no automatic processing nor parsing on the “sequences”, soft hyphens are replaced with ZWSP (U+200B ZERO WIDTH SPACE). Unicode states “this character is intended for invisible word separation and for line break control; it has no width, but its presence between two characters does not prevent increased letter spacing in justification.”

The last sentence warns against possible undesirable effect when the formatting mark in not used for word break. Writer does not exhibit such effect.

I also experimented with ZWNJ (U+200C ZERO WIDTH NON JOINER) but it does not give the expected result reliably. Unicode intended usage is rather in preventing ligatures while rendering. This is not the same semantics.

To summarise, ZWSP is a fairly good workaround but not the definitive solution (at least under present Unicode specification).