Ask Your Question
0

Converting odt to txt

asked 2016-02-15 09:33:19 +0100

InvisibleUn1corn gravatar image

I try to convert odt file that contains Hebrew letters to txt and I get a txt files with lots of question marks instead of the actual letters. How do I fix this? Do I need to change encoding?

edit retag flag offensive close merge delete

4 Answers

Sort by » oldest newest most voted
1

answered 2016-02-20 18:33:45 +0100

petermau gravatar image

A text (.txt) file contains no formatting information except for Carriage return (CR) and Line Feed (LF) at the end of each record. Thus there is no information on character set, font, language or country. Matching these characteristics for creating and reading the file is down to the user. LibO uses Unicode (UTF-8) which is the international standard used by the Internet and also the default for Linux systems.

The character set used by Windows, for example is partially country dependant. Unicode is compatible with the American US-ASCII (1968) standard and ISO 8859-1 (1987). If you use country versions like ISO-8859-8 (Hebrew) or -9 Turkish this is not compatible with Unicode. The .odt file contains all the information to re-create a file with all the characters (about 111,000 Literally) in common use, which includes Hebrew. LibO writer expects the file to be in Unicode (or US-ASCII or ISO 8859-1) format. If you store a text file in ISO-8859-8 the Hebrew letters will be unknown unless the reader defaults to that character set. So here are three examples:

Shoshi קשםאצץ .odt original Unicode (UTF-8) - the original file.

Shoshi קשםאצץ TXT default Unicode (UTF-8) - The .odt file stored as Unicode TXT and reread.

Shoshi ������- TXT using Hebrew (ISO-8859-8) - The .odt file stored as ISO-8859-8 and reread.

Now, I am not certain what steps you went through to get your question marks, but why are you using a TEXT file, and what format is it in. The more information we have, the more we can help, but this is a start.

edit flag offensive delete link more

Comments

Good point. I have given a similar example for MacOS in this question.

oweng gravatar imageoweng ( 2016-02-21 08:31:08 +0100 )edit
0

answered 2016-02-15 18:10:44 +0100

oweng gravatar image

First check the font used in the document for the Hebrew characters. Make sure it is a valid Unicode-conformant font that is not using the Private Use Area (PUA) for encoding the Hebrew characters.

The Text export file format is UTF-8 and so should support Hebrew characters. Perhaps perform a test using one of the Text Encoded file formats that supports Hebrew characters?

edit flag offensive delete link more
0

answered 2016-02-15 16:28:02 +0100

paul1149 gravatar image

I would try changing the encoding in the txt document.

edit flag offensive delete link more
-1

answered 2019-06-28 19:31:51 +0100

norma gravatar image

Please provide a guide to converting odt. to .txt. Thank you.

edit flag offensive delete link more

Comments

RTM. That's it. And don't answer a question unless it is an answer.

gabix gravatar imagegabix ( 2019-06-28 21:05:21 +0100 )edit
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2016-02-15 09:33:19 +0100

Seen: 1,208 times

Last updated: Jun 28 '19