Converting odt to txt

InvisibleUn1corn · February 15, 2016, 8:33am

I try to convert odt file that contains Hebrew letters to txt and I get a txt files with lots of question marks instead of the actual letters. How do I fix this? Do I need to change encoding?

petermau · February 20, 2016, 5:33pm

A text (.txt) file contains no formatting information except for Carriage return (CR) and Line Feed (LF) at the end of each record. Thus there is no information on character set, font, language or country. Matching these characteristics for creating and reading the file is down to the user. LibO uses Unicode (UTF-8) which is the international standard used by the Internet and also the default for Linux systems.

The character set used by Windows, for example is partially country dependant. Unicode is compatible with the American US-ASCII (1968) standard and ISO 8859-1 (1987). If you use country versions like ISO-8859-8 (Hebrew) or -9 Turkish this is not compatible with Unicode. The .odt file contains all the information to re-create a file with all the characters (about 111,000 Literally) in common use, which includes Hebrew. LibO writer expects the file to be in Unicode (or US-ASCII or ISO 8859-1) format. If you store a text file in ISO-8859-8 the Hebrew letters will be unknown unless the reader defaults to that character set. So here are three examples:

Shoshi קשםאצץ .odt original Unicode (UTF-8) - the original file.

Shoshi קשםאצץ TXT default Unicode (UTF-8) - The .odt file stored as Unicode TXT and reread.

Shoshi ��- TXT using Hebrew (ISO-8859-8) - The .odt file stored as ISO-8859-8 and reread.

Now, I am not certain what steps you went through to get your question marks, but why are you using a TEXT file, and what format is it in. The more information we have, the more we can help, but this is a start.

oweng · February 21, 2016, 7:31am

Good point. I have given a similar example for MacOS in this question.

paul1149 · February 15, 2016, 3:28pm

I would try changing the encoding in the txt document.

oweng · February 15, 2016, 5:10pm

First check the font used in the document for the Hebrew characters. Make sure it is a valid Unicode-conformant font that is not using the Private Use Area (PUA) for encoding the Hebrew characters.

The Text export file format is UTF-8 and so should support Hebrew characters. Perhaps perform a test using one of the Text Encoded file formats that supports Hebrew characters?

norma · June 28, 2019, 5:31pm

Please provide a guide to converting odt. to .txt. Thank you.

gabix · June 28, 2019, 7:05pm

RTM. That’s it. And don’t answer a question unless it is an answer.