Hide dotted circle when using a combining character

Hello,

I tried to use the overline ‾ character (U+203E) as an overline combining character (U+0305) with the underscore _ character in LibreOffice Writer and Calc, in order to write annotations like ̅_|→.

Since LibreOffice seems to not support the Ctrl+Shif+U shortcut on my Linux computer, I did the character combination in the text editor Geany and copy-pasted the result in LibreOffice. On Linux, the character combination can be done by pressing Ctrl+Shift+U, followed by the 4 Unicode characters, followed by the space or enter key and followed by the character you want to combine (Enter special characters).

LibreOffice displays the combined characters correctly with a font like DejaVu Sans Mono but the dotted circle is still displayed, even when generating a PDF file. How can I hide this character? It should be invisible and it is hidden correctly on other text editors like FeatherPad with the same font DejaVu Sans Mono.

Update: I noticed LibreOffice does not display the dotted circle when I copy-pasted characters combination like X̅ from the Overline Wikipedia article.

Update 2: If we add a space before the overline / underscore combination, the dotted circle is hidden but I would like to avoid having a space before the combined characters.

Update 3: A workaround is to use a zero width space (U+200B) before the overline combining character, like mentioned by @LeroyG or to use the overline / underline font effects from LibreOffice (which I prefer to use for non-meaningful style only since it will be lost when exporting to CSV).

Update 4: I was in fact trying to use the overline combining character followed by the underscore character (_̅|→) instead of making it preceding, which is why the dotted circle appeared like mentioned by @ajlittoz. To display the overline right on top of the underscore, I had to change the font from DejaVu Sans Mono to another one like DejaVu Sans. In this case, the zero width space trick is not needed.
Fonts like Unidings will align the pipe (preceded by 2 spaces) correctly when there is a line without overline or underscore:

‾|
 |
_|→ annotation 1

More information:

Here are my LibreOffice details (using the package from Lubuntu 20.04):

Version: 6.4.6.2
Build ID: 1:6.4.6-0ubuntu0.20.04.1
CPU threads: 4; OS: Linux 5.4; UI render: default; VCL: kf5; 
Locale: en-IE (en_IE.UTF-8); UI-Language: en-US
Calc: threaded

Why not to use an em quad ( , U+2001) or em space ( , U+2003) with overlining and underlining?


Add Answer is reserved for solutions. Please, click edit below your question if you want to add more information.

@LeroyG I tried to add U+2001 or U+2003 before the overline / underscore combination, it has a similar result to the space character, it hides the dotted circle but it adds a space before the combined characters. I would like to avoid having unnecessary space. Is it possible?

But why do you need to use U+203E or U+0305 when you can use menu Format - Character or Cell - Font Effects and check Overlining and Underlining?

Also can try:

  • U+2009 thin space
  • U+200A hair space
  • U+200B zero width space
  • U+200C zero width non-joiner
  • U+200D zero width joiner

@LeroyG thanks, U+200B zero width space worked for me. Good to know for the built-in font effects overlining / underlining, I did not use it since I started in a plain text editor instead of LibreOffice. But using the overline and underscore characters have the advantage to be more portable since it can be exported to CSV (it would be useful if LibreOffice converts font effects to these Unicode characters when exporting as CSV). You can add your comment as an answer if you want. But is it not possible to hide the dotted circle without adding an invisible character, like it is done on some other software? Let me know if I should create a bug report.

I am with LibreOffice 6.4.7.2 (x86); OS: Windows 6.1, and don’t see the dotted circle.

If I understand right the issue, you start your sequence with a combining character. Per Unicode design, a combining character modifies the preceding one. Consequently, a “prefix” character is necessary as an application point for the combining. SPACE or variations is a means of “neutralizing” a combination char.

Note that conceptually an overline character is not the same as an overline font effect. The latter is a “decoration” outside text content. An overline character may itself receive an overline font effect. Formatting is stripped when exporting to plain text (CSV is plain text). You must then decide if your overline is formatting (just like bold or italic) or part of the meaning as in math notation.

@ajlittoz thanks, other software like FeatherPad don’t display a dotted circle, which would mean it depends on the software. It would be nice if there is an option in LibreOffice to hide the dotted circle without having to replace all the dotted circles by an invisible space. Since @LeroyG cannot see the dotted circle on Windows, there is maybe already an existing option to have the same behavior on Linux?

I think Writer behaviour is the right one. If a dotted circle is displayed when you add the combining character this is a visual feedback that something is wrong in your sequence. Unicode is not only a catalogue of glyphs, it also specifies rules about their relationship so that a text can be parsed without interpretation error on any compliant system.

It all boils down to the Unicode rule: a combining character cannot (in fact must not) be used alone. It is a modifier for the preceding character. If there is no preceding character (this can happen only at the beginning of your document or after a paragraph mark in Writer), then your Unicode sequence is faulty.

Just for fun, I entered a line break (not a para break) and the combining overline. It was accepted but the rendering is probably nonsensical because the “position” of line break (as a glyph) is not well defined.

I’ll make this an answer.

@ajlittoz for information, I used the overline combining character, followed by the underscore character. When combining the overline with underscore, Geany / LibreOffice does not display correctly (̅) the combination if I do it for the preceding character but it is displayed correctly (̅) for the following character (however it is weird because it is displayed correctly when combining the overline with a letter like X for the preceding character). I noticed that the result on a web browser like Firefox is the other way around, the combination is displayed correctly when doing it for the preceding character.

@baptx, The original idea of U+0305 (connects on left and right) is not to be “right on top”, but to connect two characters (like a little umbrella for two people: half in, half out).

Other ideas:

  • ⎽̅⎽ (U+23BDU+0305U+23BD connects with two horizontal scan line-9) ⎽̅⎽|→
  • (U+2ace square left open box operator) I have not idea for what to use it.
  • (U+27E5 white square with rightwards tick) a bit little.
  • (U+2290 square original of) also little.

Aren’t you rather trying to get something like:

The annotation number can also be automatically retrieved/created from the annotation section in your document.

@ajlittoz Is it possible to do this in LibreOffice Calc? I have a spreadsheet where I want to annotate lines. Ideally I would like to keep the annotations when exporting to CSV so I used characters.
Annotations could also be more complex:

‾|
_̅|_̅|→ annotation for row 2
 |
_|→   annotation for row 1, 3 and 4

In Calc, use (partial) borders around cells dedicated to this usage. Your might need to insert new columns to “draw” your “brackets”, plus optionally other columns for the . But anyway it is an ugly workaround.

Maybe, your best option is callout: this is a kind of text box with an “arrow” which can be point to a cell. But I don’t know how to anchor it to a specific cell.

IMHO you’re trying to abuse two functions: Unicode semantics and formatting

##Unicode

Unicode defines a catalogue of glyphs and semantic properties of groups of characters. One of these is the combining diacritics group. They are marks intended to modify any base glyph to provide variations not deemed to deserve a glyph of their own (either because it is rather “rare” or this would lead to “combinatorial explosion”). Think of the various “accents” needed to correctly write non-English languages.

Diacritics have no value by themselves and require a “letter” to be applied to. Several diacritics may be applied to the same base “letter”.

Unicode chose to list a combined sequence as base + diacritic1 + diacritic2 + … whereas common shells use diacritic + base order.

Consequently, no conforming Unicode text can begin with a diacritical mark. This is what is shown in Writer with the dotted circle which replaces a missing “letter”. This occurs at start of the document or after a paragraph mark. If you insert a combining character anywhere else, it will be applied to the character preceding it.

To display a diacritical mark without applying it to another “letter”, apply it to a space (any will do).

Occurrence of the dotted circle is a courtesy warning by Writer that your Unicode sequence is invalid and may cause problems if handed over to other applications.

##Formatting

Formatting is a way to add “decoration” to a text. The run of characters in the text bears significance when read by a human. This significance is fully contained in the raw sequence of characters.

Emphasis can be brought through various devices like italics, bold, underline, colour and any other variation. This layer should not interfere with significance, only help to understand author’s intent. Significance must be kept when the document is stripped to its characters.

Consequently, I consider (this is a personal opinion and you may disagree) that trying to mimic formatting through the use of diacritics is an abuse which can change the significance. You may argue that plain text has no formatting at all and you try to compensate. Plain text is there effectively to transmit data without formatting. In principle, adding some formatting to plain text requires prior agreement between sender and recipient about the format encoding so that an automaton can parse the file and be able to reconstruct significance. But even in this case, using diacritics is faulty because letter+diacritics (even unused combinations) are supposed to represent some sign in a human language (maybe unknown to you).

##Summary

The dotted circle cannot be disabled. It is a warning against an invalid Unicode sequence.

Don’t use diacritical marks as substitute for formatting.

If you want to manufacture a combined symbol for special purpose, this is legal but do it in a valid sequence and search first the Unicode repertoire to see if your symbol does not already exist (Unicode is presently rich of ~150k symbols!).

PS: the Ctrl+Shift+U is probably Ubuntu-specific because it does not work on my Fedora box. Also, keyboard management is usually taken over by GUI applications which interface directly the graphic subsystem (whereas text user interface, TUI, apps receive the characters unaltered from the OS).

To show the community your question has been answered, click the ✓ next to the correct answer, and “upvote” by clicking on the ^ arrow of any helpful answers. These are the mechanisms for communicating the quality of the Q&A on this site. Thanks!

In case you need clarification, edit your question (not an answer which is reserved for solutions) or comment the relevant answer.

I updated the last comment of my question. Do you have an idea how to display correctly _̅ in LibreOffice (an overline combining character, with an underscore preceding), in order to have the overline right on top of the underscore, like it is done in my comment on a web browser like Firefox? The only workaround I have currently is to use the zero width space trick before the overline and following the underscore character instead of preceding it.

Update: I managed to do it by changing the font from DejaVu Sans Mono to another one like DejaVu Sans.

I experimented and I’m not sure if there is a bug in LO or Harfbuzz (the font renderer). When I try to combine a large diacritic (like COMBINING OVERLINE U+0305 or COMBINING LOW LINE U+0332), the diacritic is offset to the right or even completely aside the glyph.

I submitted tdf#139210 for analysis by developers.

I made some research about semantics of the combining overline and underline. This is what the Unicode standard says on p. 331:

However, because of their interaction with other combining marks and other layout considerations such as intercharacter spacing, their use for underlining or overlining of text is discouraged in favor of using styled text.

Clearly, overlining should be achieved through formatting, i.e. style Font Effects.

Creating a combined glyph is another story and Unicode does not tell if this is legitimate. It has no indication about the placement of the overline combining mark (no mention of offset, so the question remains open).

The only recommendation is to use U+00A0 NO-BREAK SPACE to display a combining mark instead of U+0020 SPACE.

Use an +X Unicode toggle conversion in LibreOffice.

Toggling a U+2001U+0305U+0332 sequence will give desired ̲̅ result

The U+0305 & U+0332 over and low can connect left or right, while the alignment of the U+035e & U+035f may require some padding space after to not overlay the following glyph–depending on font metrics.

End result of the combining will depend on the font. Give the Libertinus fonts a test, their metrics were crafted to behave correctly with the Harfbuzz based shaping used in LibreOffice. Noto is pretty good as well. In general I find the default Liberation font pretty poor on these metrics.

Here is a clip from recent master/7.2.0alpha0

Not convinced by the results. I think this is a borderline use of the Unicode combining feature. Unicode semantics is not very clear on the subject.

If the idea is to introduce an annotation, a well chosen bullet is preferable. The “bullet” can be any graphic art icon or shape and is inserted as a character. This will be less difficult to handle and be immune against font metrics/rendering issues.