Using regular expression syntax used in LibreOffice, you will find that you may define sets of matching characters (using square brackets []
).In the set, you may put character ranges and character properties. Then you may search Unicode Character Database to look for the properties you need; and also examine list of character properties.
I’m not an expert here; but brief look at these gives this first proposal: search for Ideographic characters, and for block of Halfwidth And Fullwidth Forms (including punctuation used in CJK, like ? and ,). The former is \p{Ideographic}
; the latter is either \p{Block=Halfwidth_And_Fullwidth_Forms}
(\p{Block=Half_And_Full_Forms}
), or [\uFF00-\uFFEF]
. So, the shortest combined regex for these would be
[\p{Ideographic}\uFF00-\uFFEF]
Indeed, searching for it requires that [x] Regular expression
be checked in Find & Replace dialog.
Note that halfwidth/fullwidth punctuation would not be found unless [x] Match character width
is checked in the dialog, too. Having both of them checked, the regular expression from above put into Find:
box, and with empty Replace:
box, click Replace All
button to remove all these characters from the document.
But this has at least one problem: Chinese text might have non-Chinese parts. E.g., google-translating your question to experiment with:
How can I remove the Chinese characters without removing the English alpha characters? Also how the same be done to remove the English alpha characters and leaving the Chinese characters?
I got this:
如何在不删除英文字母字符的情况下删除中文字符? 另外,如何删除英文字母字符并保留中文字符呢?
It contained Ideographic characters, full-width punctuation, but also one normal space character. Performing the replacement described above in this text would leave this space. It could be no problem for spaces (one might then check for leading/trailing/repeated spaces); but there’s a possibility that characters from general-use blocks were used also for punctuation, or numbers. I cannot advise how to overcome this.
And taking into account the problem mentioned above, I don’t know how to perform the opposite task: remove everything non-Chinese. Simply negating the regular expression above to be
[^\p{Ideographic}\uFF00-\uFFEF]
would also find (and remove) all non-full/halfwidth punctuation, numbers, spaces, and who knows what from Chinese text.
Another approach would seem to be to search for text language, but I didn’t find a way to find Chinese text in my testing using this method even in Writer (while the topic here is Calc).