We will be migrating from Ask to Discourse on the first week of August, read the details here

Ask Your Question
0

Replace Full Space with Zero Width Space using Regular Expressions

asked 2020-07-21 11:23:08 +0200

jasonlibreoffice gravatar image

I need to replace all full-spaces between می and any word with a zero-width space like U+FEFF (Zero Width Non-Breaking Space). For example, the sentence: می روم must change to می‌روم (The space is entered on keyboard by pressing Shift + Space on Fedora Linux). Any similar combination must follow the same rule.

I am trying to use the Find-Replace Dialog with the following fields: image description

This just changes sentences to:
image description

How can I tell it to just replace the full-space with zero-width space?

edit retag flag offensive close merge delete

1 Answer

Sort by » oldest newest most voted
2

answered 2020-07-21 11:36:22 +0200

Lupp gravatar image

updated 2020-07-21 15:57:59 +0200

RegularExpression doesn't help in this case. Disable the option, and choose Current Selection only if needed. You need to enter one ordinary space (the character you want to replace) into Find:, and a literal ZERO WIDTH SPACE into Replace:. Then run Replace All.

A specimen of ZERO WIDTH SPACE you get by typing 200B somewhere and hitting Alt+X immediately at the end. Then select and cut that strange character and paste it into Replace:.

Sorry. I dont know if the 200BAlt+X works exactly the same way in a right to left layout.

===Editing 2020-07-21 about 14:00 UTC after additional considerations ===
Studied the images to more detail.
The Arabic word (or particle) in front of the space and the opening square bracket will be part of the finding, and therefore re-inserted for the "&". In addition you insert it explicitly together with a trailing ZEROWIDTHSPACE (supposedl). It consequently is doubled. Let me give a respective RegEx and a replacement WesternStyle being capable of doing what you seem to want if you correctly tranliterate it right-to-left to your needs. RegEx for Find: (myWord) ([:alpha:)]) A space between (myWord) and ([:alpha:])! Replace: $1​$2 Now misleading a bit, because there must be -and actually is- the ZEROWITHSPACE between $1 and $2.in the correct place.

edit flag offensive delete link more

Comments

Thanks for the input. I usually use Ctrl + Shift + U + [Unicode Code] to enter unicode characters on Linux. It seems Alt + X does not work for me.

jasonlibreoffice gravatar imagejasonlibreoffice ( 2020-07-21 13:17:53 +0200 )edit

I'm concerned that replacing all spaces will connect all my words in the document, which is not pleasant. This should be doable via regex, e.g. every space that is surrounded by می and another word should be replaced with the desired character.

jasonlibreoffice gravatar imagejasonlibreoffice ( 2020-07-21 13:20:23 +0200 )edit
1

Of course you may find the specific spaces you want to replace by RegEx.
My respective comment should probably read "RegEx cannot help you with getting a ZeroWidthSpace in Replace:". A case of "narrow thinking" on my behalf.
However, I can hardly help you with a proper RegEx neither clearly seing what you tried nor having an example document for testing nor having any experience with right-to-left texts (in documents and/or UI dialogs).
How to define the needed Find:context with a RegEx, you may learn from https://www.regular-expressions.info/..., and if you prefer to not use lookbehind/lookahead, you may have an alternative to include the context strings with parentheses, and to re-insert them with the help of references in the Replace: using the $-character for this purpose as only supported here.

If you supply an example document with a sufficient set of clear eamples, it may ...(more)

Lupp gravatar imageLupp ( 2020-07-21 13:41:31 +0200 )edit

Studied the images to more detail.
The Arabic word (or particle) in front of the space and the opening square bracket will be part of the finding, and therefore re-inserted for the "&". In addition you insert it explicitly together with a trailing ZEROWIDTHSPACE (supposedl). It consequently is doubled.
Let me give a respective RegEx and a replacement WesternStyle being capable of doing what you seem to want if you correctly tranliterate it right-to-left to your needs.
RegEx for Find: (myWord) ([:alpha:)]) A space between (myWord) and ([:alpha:])!
Replace: $1​$2 Now misleading a bit, because there must be -and actually is- the ZEROWITHSPACE between $1 and $2.in the correct place.

Lupp gravatar imageLupp ( 2020-07-21 14:49:15 +0200 )edit

Thanks Lupp. I just added a text with Persian text. In fact, Arabic and Persian use similar script, but they are different languages. See https://pastebin.com/tAZkgLix

jasonlibreoffice gravatar imagejasonlibreoffice ( 2020-07-21 15:24:50 +0200 )edit

Wow you did it Lupp! It works. Thanks. Please add to the answer.

jasonlibreoffice gravatar imagejasonlibreoffice ( 2020-07-21 15:27:51 +0200 )edit

Just used this technique to replace 500 combinations in a long text!

jasonlibreoffice gravatar imagejasonlibreoffice ( 2020-07-21 15:30:08 +0200 )edit

Yes. I knew that Farsi/Persian is a group of "Indogermanic" languages. I didn't know how to distinjguish. Arabic and similar scripts ire just like a ravel of nematodes to me - though respecting tha fact that it can be used suberbly to create works of art. The world is so rich. But these many scripts are annoying nonetheless ...

Lupp gravatar imageLupp ( 2020-07-21 15:54:35 +0200 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2020-07-21 11:23:08 +0200

Seen: 447 times

Last updated: Jul 21 '20