(LibreCalc) How to Change the Font of all Text in a Certain Language

Lupp · September 28, 2018, 10:16pm

The simplest case of ‘number’ as a part of a text is the unsigned integer decimal number described by [1-9][0-9]* as a regular expression. English words are not distinguishable from words in other languages which are also written with Latin letters by syntactical means only. You need a huge dictionary. But dictionaries may intersect (that’s not being disjunct). A word written in Japanese characters is likely to be a Japanese word. However, it’s not really simple by that.

Eriias · September 30, 2018, 10:22pm

Styles was definitely the answer. I experienced very odd inconsistencies in trying to modify the default style, but just creating new ones got them to work almost perfectly. There were a few times where it would only modify part of the Japanese text in a line, but that was easy enough to fix “by hand” the one time.

Thank you all for the suggestions and the information. I really appreciate all of your efforts to get me through this.

satkomuni · December 24, 2018, 12:40am

Setting a style does not always work, at least in Writer. I apparently cannot include a screenshot here, but selecting a block of text in SimSun into which I have typed latin alphanumerics (I am translating the document) leaves them in SimSun. I would like all western text in this paragraph to use e.g. Book Antiqua, so following the advice of this post I defined a style in which western text should use Book Antiqua while Asian text should use SimSun. Selecting a block of text and applying the style sometimes produces the desired result and other times does not; the western text remains in SimSun. I just did this in a Writer document table in which each cell contained Chinese characters and western letters, and the western text changed to Book Antiqua in only some of the cells (table is 4x3 plus top title box: the title box, C1, D2, & D3 remained in SimSun). Before upgrading to the latest version earlier this week, I did not have this problem.

gabix · September 28, 2018, 7:30am

Modify the Default style and/or other styles accordingly.

Sarum4n · September 28, 2018, 11:52am

I agree with gabix that this problem calls for Styles to be applied. Except that the use of two languages may require that new styles are to be created besides the existing ones. So if English is the prevalent language, I’d modify Default style to use that language & the accompanying font (Calibri or something), so that all text (and thus all English texts) have the desired looks. Then for each type of Japanese text create a copy of the applied style that has the font and language settings changed. If it’s only one or two types of styles, then creating new styles is easy. If it’s more, then you might want to recreate a part of the styles tree, so that future maintenance of the styles gets easier.

gabix · September 28, 2018, 2:12pm

Good point, but for a case when two languages use different scripts creating different styles is superfluous. Just set different fontfaces in the font properties tab.

Eriias · September 28, 2018, 9:18pm

Ok so I found how to modify styles, and that seems like it should work. The odd thing is, it doesn’t. The English letters etc. change, but the Japanese characters do not. I believe this might be an error that came with the file, namely that the Japanese characters were somehow coming out as Chinese. So perhaps changing the Asian font to Japanese isn’t affecting a change because the program thinks I’ve been typing in Chinese?

Eriias · September 28, 2018, 9:52pm

It in fact seems to act VERY inconsistently. It’ll apply the font size and alignment without fail but will sometimes not even change roman letters, and seems to have no effect at all on Japanese text, despite selecting a Japanese font under Asian fonts for the style…I’m very confused…

Lupp · September 28, 2018, 10:05pm

(There are Japanese characters assigned to unicode code points. But also there can be fonts assigning Japanese glyphs to code points officially used for other purposes. That’s technical e.g. Software is a technical thing.)
I don’t know anything about Japanese. Just looked around a bit.

Japanese is written in Hiragana and/or Katakana chartacters. Unicode has assigned code points in the range o U+3000 to U+31FF to these characters. If “Japanese text” is exclusively written with these characters, you can find it in principle with the help of a regular expression (RegEx). To construct an actually working search expression accepting exactly non-empty sequences of the mentioned characters, shaped as usable in LibO will again require as well some knowledge about writing in Japanese as about technical aspects of the unicode system. The “first idea” [\u3000-\u31FF]+ doesn’t work as expected. In Calc anyhow such an expression will find the cells containing a match. You can(?) use

[ 、。〃〄々〆〇〈〉《》「」『』【】〒〓〔〕〖〗〘〙〚〛〜〝〞〟〠〡〢〣〤〥〦〧〨〩〪〭〮〯〫〬〰〱〲〳〴〵〶〷〸〹〺〻〼〽〾〿぀ぁあぃいぅうぇえぉおかがきぎくぐけげこごさざしじすずせぜそぞただちぢっつづてでとどなにぬねのはばぱひびぴふぶぷへべぺほぼぽまみむめもゃやゅゆょよらりるれろゎわゐゑをんゔゕゖ゗゘゙゚゛゜ゝゞゟ゠ァアィイゥウェエォオカガキギクグケゲコゴサザシジスズセゼソゾタダチヂッツヅテデトドナニヌネノハバパヒビピフブプヘベペホボポマミムメモャヤュユョヨラリルレロヮワヰヱヲンヴヵヶヷヸヹヺ・ーヽヾヿ㄀㄁㄂㄃㄄ㄅㄆㄇㄈㄉㄊㄋㄌㄍㄎㄏㄐㄑㄒㄓㄔㄕㄖㄗㄘㄙㄚㄛㄜㄝㄞㄟㄠㄡㄢㄣㄤㄥㄦㄧㄨㄩㄪㄫㄬㄭㄮㄯ㄰ㄱㄲㄳㄴㄵㄶㄷㄸㄹㄺㄻㄼㄽㄾㄿㅀㅁㅂㅃㅄㅅㅆㅇㅈㅉㅊㅋㅌㅍㅎㅏㅐㅑㅒㅓㅔㅕㅖㅗㅘㅙㅚㅛㅜㅝㅞㅟㅠㅡㅢㅣㅤㅥㅦㅧㅨㅩㅪㅫㅬㅭㅮㅯㅰㅱㅲㅳㅴㅵㅶㅷㅸㅹㅺㅻㅼㅽㅾㅿㆀㆁㆂㆃㆄㆅㆆㆇㆈㆉㆊㆋㆌㆍㆎ㆏㆐㆑㆒㆓㆔㆕㆖㆗㆘㆙㆚㆛㆜㆝㆞㆟ㆠㆡㆢㆣㆤㆥㆦㆧㆨㆩㆪㆫㆬㆭㆮㆯㆰㆱㆲㆳㆴㆵㆶㆷㆸㆹㆺㆻㆼㆽㆾㆿ㇀㇁㇂㇃㇄㇅㇆㇇㇈㇉㇊㇋㇌㇍㇎㇏㇐㇑㇒㇓㇔㇕㇖㇗㇘㇙㇚㇛㇜㇝㇞㇟㇠㇡㇢㇣㇤㇥㇦㇧㇨㇩㇪㇫㇬㇭㇮㇯ㇰㇱㇲㇳㇴㇵㇶㇷㇸㇹㇺㇻㇼㇽㇾㇿ
]

as a Regex, but it doesn’t look like a final solution. Try a web-search for “regular expression for japanese characters”.

To first mix up text pieces in two languages inside single cells and then to expect them to be separately treatable by the software is somehow bold. You may find a solution to separate English from Japanese thinking the technical way, and I may try to help you again with it when I find time. Alas! It’s completely useless if you just want to base an even worse mistake on it: Using different fonts inside of single cells as a rule instead of as a rare exception.

Eriias · September 28, 2018, 10:25pm

I see. Thank you for the explanation, that helps clear some things up. I’m wondering if somehow the text in this document got assigned the wrong language code to begin with? When I first sent it to a friend for proofreading, she said the font was in Chinese which made no sense to me. That’s why I’m messing with fonts at all, I changed it from the default to MS Gothic because i knew that was a Japanese font I’d used…

Eriias · September 28, 2018, 10:27pm

Is it possible that this file just got so garbled that the displayed language is not the same as the coded one? I say this because in trying to use cell styles to fix this, I’m noticing that “western” fonts defined in the style apply fine (as do other parameters) but the Japanese font I have defined under “Asian” fonts does not.

Lupp · September 28, 2018, 10:56pm

-1- Japanese characters go back to Chinese logographic glyphs. In a sense they are the same, and are also used in that way in specific cases (I was told once). “Katakana looks Chinese.”
-2- You cannot assign languages to text pieces (‘Selection’) in Calc. Cells have a language property which only should influence numeric default formats and input.
-3- Where did you define a “Japanese font”?

Lupp · September 28, 2018, 11:01pm

To find text portions - not just cells - with F&R you need to transfer the contents to a text-table in Writer. Use RichTextFormat for the move in both directiopns…

Eriias · September 29, 2018, 1:15am

Yeah, kanji are (basically) chinese characters, though I was warned by the friend doing the proofreading that they are “somewhat different” and that I should change the font because it “looks weird”.
I went Styles (F11) > default > modify > Font, and the bottom portion of that is “Asian Text Font” and it has a box to define which language is being used. It had been set to Chinese, I changed that to Japanese while also changing the font itself.

Eriias · September 29, 2018, 1:16am

Throwing it into Writer as a table might be a good idea. I had tried just saving as xlsx and opening it in Excel again but that messed up a good chunk of formatting so I thought it wasn’t worth the hassle there to solve this hassle. But if I can pass it back and forth from Writer without things getting mangled, that could work…

mikekaganski · September 29, 2018, 6:40am

As explained in the ICU User Guide’s Regular Expressions page, you can use \p{script=foo} or [:script=foo:] syntax to filter by “script” Unicode property. Combining this with information from Wikipedia, you can use regular expressions like [[:script=Han:][:script=Hiragana:][:script=Katakana:]]+ to search for these characters.

However, this won’t include the spaces and other non-specific characters into the result, which might give you less-than-ideal results.

Lupp · September 29, 2018, 1:13pm

Thnks. Another thing I should have known.

Eriias · September 30, 2018, 9:27pm

That seems like it would be a better solution if I knew how to use it, but with more experimenting, I found that just creating a new style seemed to get rid of the odd inconsistencies I was experiencing. The styles option worked wonders after that.

mikekaganski · October 1, 2018, 6:23am

The styles option worked wonders after that.

Congratulations! That really is the better and correct way.

AlexKemp · August 8, 2020, 8:50am