LTR Words inside an RTL sentence

AliBaghernejad · November 4, 2018, 5:30pm

Hello Everybody,

Working on a new text document in LibreOffice word.
have an RTL sentence that has some LTR words inside. The direction of the LTR words is incorrect when using characters like + and #. it is shown on the attachment file. When I type these characters (+|#) the selected font change to underlying RTL font in the context.
C#, C++ are correct words here!

I have changed the language settings in LibreOffice word as below:
Tools> Option> Language settings> Language:
Locale Settings and Complex text layout: Persian.

image description

Version: 6.1.2.1
Source: Snap

Thank you.

ajlittoz · November 4, 2018, 7:03pm

Don’t post as wiki for one-person questions. It will prevent you from receiving karma points to access higher features on this site.

mikekaganski · November 5, 2018, 8:34pm

Don’t Ctrl+Shift+A/Ctrl+Shift+D help?

AliBaghernejad · November 6, 2018, 1:29am

@mikekaganski
No, it doesn’t work. ctrl+shift+A make the C# and C++ words correct but the Persian words in the sentence give incorrect sequential ordering.

mikekaganski · November 6, 2018, 4:39am

Then please try Left-to-right mark/Right-to-left mark under Insert-Formatting Mark (sorry for being unable to test myself - I can’t use any RTL). They are a different thing, and if they help, they may be assigned dedicated key bindings.

AliBaghernejad · November 6, 2018, 5:10am

@mikekaganski
I attach a doc file here and embed all required fonts into that.
link to doc

You can try it.

ajlittoz · November 4, 2018, 7:02pm

Punctuation and symbols like # and + have no intrinsic directionality; they use the one in effect in the surrounding context.

In your case, default language is Persian. Consequently, # and + behave RTL.

C++ or C# are not seen as words, but as a “latin” word C (thus LTR) with punctuation. You have in fact 2 sequences and the result is as expected. To get the usual C++ formatting, you must tell LO Writer that the whole sequence is LTR.

You do that with a character style. Create one you’ll call Technical. In the Font tab, force Language to None from the drop-down menu (or English in case it doesn’t work as expected – None is better because it also disables spell-checking, which is preferable for acronyms).

After that, select all your LTR sequences and give them character style Technical. The important point is to select together C and the symbols so that it is seen as a single sequence.

To show the community your question has been answered, click the ✓ next to the correct answer, and “upvote” by clicking on the ^ arrow of any helpful answers. These are the mechanisms for communicating the quality of the Q&A on this site. Thanks!

EDIT 2018-11-5 (after discussion with @Ali Baghernejad)

The cause seems to be a symbol (# or +) at the boundary between RTL and LTR text. This symbol, which can be used in any language, adopts directionality of its context. Here we have mixed context and tie is resolved in favour of “main” context, i.e. Persian because it is the document language.

Forcing language through a character style proves inoperative because CTL styles seems to decide on which part to use (Western or CTL) based on the current character. Here, + or # have no intrinsic directionality and we revert to the context (see previous argument).

The only way to force directionality is to add LTR character around the problematic sequence. C++ is typed C++a: now we have an LTR run but an extra a. This extra a is formatted Hidden so that it does not display.

This is an ugly workaround. There should be a way to tell LO Writer we want an LTR sequence (or RTL when writing Western), no matter the internal Writer rules, i.e. unconditionally force directionality.

I filed a bug as tdf#121182 (closed because it is intended behaviour; work around with explicit directionality marks)

EDIT 2018-11-6 after information from the bug site

Completely forgotten about U+202A to U+202C (bidirectionality control characters), sorry for the fuss.

I had a look to the sample file with bidirectionality controls addes attached to the bug report. It matches the required, but from a user perspective, I could not find a way to edit these controls. Backspace or Delete do not seem to act upon them. Instead, the erase the nearest graphic character. Consequently, you can’t manage easily the marks, other than selecting a wider sequence, erasing it and retyping the missing part.

This is not user-friendly. Have I missed something?

EDIT 2018-11-7

Feature request submitted as tdf#121256: provide visual feedback for directional marks, but i may be some time, if any, before it is implemented. Marked as duplicate of tdf#58434.

AliBaghernejad · November 5, 2018, 1:33am

@ajlittoz Thank you for the response.
I have created a new character style as you say me above.
It does not work when the whole sentence is RTL. (None and English both) When I change the direction from RTL to LTR the Latin words work well but the Persian ones give incorrect ordering.

ajlittoz · November 5, 2018, 9:56am

I can’t reproduce here, but I’m not familiar at all with RTL alphabets. Please attach a one-line sample document so that I can experiment with it (you may bump into karma problem; in this case, report here). Make sure that the Persian sequence is easily identified for one who can’t read the language, i.e. I can see character permutation when it occurs.

AliBaghernejad · November 5, 2018, 6:00pm

I have attached a sample file and embed all required Persian fonts to it.
Because of that, the file size is a bit more as normal.

link text

The Expected Result is as below image with this note that C# and C++ are correct words, not #C and ++C:
link text

Hope this file can help.
Thanks, Ali

ajlittoz · November 5, 2018, 6:39pm

I fear this is a bug or a Writer shortcoming/misconception. If I paste C++a, everything is OK. As soon as the ++ signs are not bounded by latin chars, the sequence is visually formatted as ++C, despite being forced to None or English by character style. I’ll submit a bug.

ajlittoz · November 5, 2018, 6:43pm

The only workaround I found is to paste C++a and select a then Format>Character, Font Effects>Hidden. This way, Writer considers C++a, a long sequence bounded by Latin chars but displays only C++ in Western order.

Quite ugly, but works meanwhile.

AliBaghernejad · November 5, 2018, 7:05pm

Does LibreOffice have a Github page or any other issue tracking page?
If you have created a bug, put the related link here, please. maybe this is a problem to others too.

mikekaganski · November 6, 2018, 6:03am

Well, these macros allow me to do what I suppose you need (with my limited ability to test, and no knowledge about RTL workflow - sorry):

Sub InsertUnicode(ch As String)
 Dim oDoc As Object
 oDoc = ThisComponent
 oDoc.Text.insertString(oDoc.CurrentController.ViewCursor, ch, False)
End Sub

Sub InsertLTR
  InsertUnicode(CHR$(8234)) ' U+202A LEFT-TO-RIGHT EMBEDDING
End Sub

Sub InsertRTL
  InsertUnicode(CHR$(8235)) ' U+202B RIGHT-TO-LEFT EMBEDDING
End Sub

Sub InsertPOP
  InsertUnicode(CHR$(8236)) ' U+202C POP DIRECTIONAL FORMATTING
End Sub

You might put these to your macros library, and assign some key bindings to e.g. InsertLTR (which starts an embedded LTR run) and InsertPOP (which ends such run). Then you should be able to execute InsertLTR at the start of, say, “C++”, and InsertPOP after it. Hope it helps.

Reference: UAX #9: Unicode Bidirectional Algorithm, 2.1 Explicit Directional Embeddings

AliBaghernejad · November 8, 2018, 7:15am

@mikekaganski,
Thanks. it works for me too!

As it seems to be in a standard way, it will work in another office suite products like Ms word, most properly. (I don’t test that)
Why LibreOffice word doesn’t support this implementation you mentioned (or similar ones) for Unicode bidirectional algorithm?
If the LibreOffice do that implicitly when the context of a sentence has changed, the life is easier!

mikekaganski · November 8, 2018, 7:56am

I don’t believe it’s possible to come with an universal algorithm where to insert such embeddings; if it was possible, then the embedding marks weren’t explicit, and would be included into the Unicode algorithm without the need of any marks (like “when there’s sequence with properties XXX, then do AAA, else BBB…”). It might be possible to have some heuristics in some special cases, but (1) they wouldn’t cover any use case; and (b) they will make things harder for other users…

mikekaganski · November 8, 2018, 8:37am

Still, if you have some specific proposals where LibreOffice could improve in this area (e.g., because LibreOffice application has more information available than Unicode algorithm (say, current keyboard layout), and you suppose that when conditions A,B,C are true, it makes sense to insert such a mark pair), then please come with separate enhancement requests clearly describing such a proposal to our bug tracker.