Ask Your Question
1

Chinese text file doesn't display correctly on opening in Writer [closed]

asked 2013-04-15 10:24:07 +0200

emulti gravatar image

updated 2015-08-28 10:02:55 +0200

Alex Kemp gravatar image

When opening a Chinese Traditional text file with Writer on LO 4.0.2.2 Linux, the characters don't display correctly, even if I 'select all' and change the font to a Chinese font such as SimSun or SimHei.

The file opens correctly in GEdit and Chinese text displays, using the system default font 'Monospace', which is DejaVu Sans Mono.

Initially I thought that this was because Chinese fonts were not correctly installed. But the following happens too:

  • If I copy and paste all the text from Gedit into a Writer document, Chinese characters do display correctly, apparently in 'Times New Roman'.

  • If the text file is imported into a Calc spreadsheet, the 'encoding' is recognised as 'Unicode', and the characters do display correctly.

Asian language options are selected in Tools, Options, Language Settings. However, this doesn't seem to matter- on LO 4.0.2 with Windows, the file can be opened and displayed with Writer even though these are not selected! So this seems to be related to LO configuration on Linux.

A sample file is attached. It is a .zip file (to preserve the original file), but I had to give it a fake .jpg extension name, as .zip is not an allowed extension type!

New sample (original unedited file as received is attached C:\fakepath\fullsamplefile.zip.jpg

$ file Full\ sample\ Chinese.txt Full sample Chinese.txt: Little-endian UTF-16 Unicode text, with CRLF, CR line terminator

edit retag flag offensive reopen merge delete

Closed for the following reason question is not relevant or outdated by Alex Kemp
close date 2015-10-30 17:33:13.065841

Comments

@emulti – Upvote = 10 "karma points" to upload a sample file ...

manj_k gravatar imagemanj_k ( 2013-04-15 11:38:53 +0200 )edit

@emulti this is possibly an encoding issue similar to that which we are attempting to sort out in this thread. As @manj_k indicates, a sample file would be a great help.

oweng gravatar imageoweng ( 2013-04-15 11:48:15 +0200 )edit

1 Answer

Sort by » oldest newest most voted
0

answered 2013-04-16 01:34:29 +0200

oweng gravatar image

updated 2013-04-18 16:23:01 +0200

Thanks for the example file. Your problem is one of character encoding. Here is a quick test:

$ file -bi Chinese\ sample.txt 
text/plain; charset=iso-8859-1

ISO-8859-1 is an 8-bit encoding that offers support for a selection of European languages. I recommend you use UTF-8 in your files to obtain Chinese support. Here is an example file (ODT) taken from the Chinese version of the Wikipedia page linked to that is encoded using UTF-8.

To be clear, the text does not display Chinese characters here under gedit.

EDIT: That second example is now showing up as UTF-8 and displaying the Chinese characters as expected. I think this confirms it was a character encoding issue.

2nd EDIT: Third example (as indicated in comments below) is now UTF-16LE encoded. Use of "Select which types of files are shown" pull-down list OR File Type selection when opening the text file will allow correct display within Writer (after choosing appropriate filter options).

edit flag offensive delete link more

Comments

I think that maybe Gedit (which I used to extract the sample) is saving with that encoding. I have added the original file as received into the zip file (named as .jpg) instead. $ file Full\ sample\ Chinese.txt Full sample Chinese.txt: Little-endian UTF-16 Unicode text, with CRLF, CR line terminator

emulti gravatar imageemulti ( 2013-04-16 11:45:13 +0200 )edit

Possibly the issue is one of character encoding, but in that case, why does the file open correctly on LO Windows, but not in LO Linux? According to the 'file' command, it is "Little-endian UTF-16 Unicode text, with CRLF, CR line terminators"

emulti gravatar imageemulti ( 2013-04-18 09:30:38 +0200 )edit

Some progress: double-click on the file opens it in Writer as 'text', and the Chinese characters are not shown correctly. However, if File..open is used, and the 'file type' filter is set manually to 'Text Encoded', then it's possible to select 'Unicode' as the Character Encoding (and also whether the Line Separator includes Line Feed as Windows text files do). In this case, the file is displayed correctly. The 'Byte Order Mark' first 2 file bytes is not automatically recognised on LO Linux.

emulti gravatar imageemulti ( 2013-04-18 09:58:50 +0200 )edit

Confirmed. Sorry, my initial reply / test was erroneously done using an old version of LO. GEdit opens the file fine via double-click however LO v4.0.2.2 relies on the filter facility to determine the encoding. Default behaviour is to (apparently now) ignore the encoding information, or at least relying on it being manual set via File Display/Type pull-down list. I've updated the answer above to reflect these findings. I used File Display rather than File Type but result is the same.

oweng gravatar imageoweng ( 2013-04-18 15:54:56 +0200 )edit

Question Tools

Stats

Asked: 2013-04-15 10:24:07 +0200

Seen: 4,131 times

Last updated: Apr 18 '13