proper export of weird text data

Neverman · June 30, 2019, 11:35am

sorry, I just don’t know how to describe my problem better.
in order to show the correct letters of my foreign language vocabulary set i have to change the font settings of the specified column. for example:
in arial (utf-8 i suppose) there is “la2r+S4F” and only when I change the font of the cell to “Shalom Old Style” I get the desired יִשְׂרָאֵל.
But now I want to import this data into my vocabulary program but whenever I try to export or save in different file formats, the hebrew column changes back to utf-8 mixed letters. Is there a way to keep the hebrew and the latin characters when exporting?

EDIT: Windows 8.1 N, LibreOffice 5.3.4.2, i have the file in different file formats, .ods, .xml, .csv, .xhtml, etc.
I use Anki. I save in different formats and try importing them into anki or open them in editor or notepad++. But in no case it shows me the hebrew letters. i don’t get it.

EDIT2: I should have mentioned, that I extracted the data from another vocabulary program into an txt and that data I imported into LO. the hebrew letters then were already utf-8 converted and I could only make thim appear correctly by adjusting the font of the column to “Shalom Old Style”. Its the only way it even shows hebrew. What I’m looking for is a solution to merge these two formats together into utf-8, like convert the hebrew ones to their correspondent utf-8/unicode ones. On screen the data is fine I just can’t use them elsewhere which sucks.

ajlittoz · June 30, 2019, 12:31pm

Since LO is Unicode-based, you should have no problem, provided the base file is itself Unicode. Under which format, do you save your file: .ods or some other foreign format like .xls(x)?

Edit your question (don’t use an answer reserved for solution) to provide more information: OS, LO version, saved file format. Also describe your working procedure for exporting data (and mention the “vocabulary program” in case it is known).

ajlittoz · June 30, 2019, 3:31pm

I made a quick text in Calc. I inserted Hebrew letters in a cell without changing my default fault (Tex Gyre Hermes). Since I used a “character selector”, I am sure letters where in U+05D0-U+05EA Hebrew section of Unicode.
They display correctly without any “trick”. I typed standard Latin words in nearby cells.

Anki official site says that LO Calc spreadsheet content can be exported as plain text in CSV format.

Therefore, the main question is: are you sure your Windows 8 is configure for Unicode and not for some Windows codepage (which could explain why you must change font to see Hebrew chars)? Check how you type or otherwise import Hebrew word into Calc.

ajlittoz · July 1, 2019, 3:00pm

The first step is to make sure you have correct UTF-8 encoding in your extraction .txt file. Try to find an hexadecimal editor/dumper on Windows. Hebrew letters are represented by a 2-byte sequence in UTF-8: 0xD7 0x90 for ALEF to 0xD7 0xAA for TAV. If you haven’t this encoding in the initial file, all other steps cannot complete correctly.

petermau · July 1, 2019, 2:00pm

here are three parts to this problem.

The data need to be Unicode. When you create this in LibreOffice it is in Unicode.
The font you use needs to have Hebrew Characters in it. If not there will be substitute characters, one for each Hebrew character.
You then need to store the spreadsheet as a native .ods file.
The user reading this file must have the Hebrew font available when reading the .ods file.

If you change to other file types you will usually lose the Hebrew pointer. For example, if you copy and paste in a Windows system.
If each Hebrew character is replaced by the three bytes used for Unicode, you have lost the Unicode settings. For example .csv, .txt .doc etc. do not have the encoding parameters embedded in the file.

This is why you EXPORT or IMPORT files into and out from LibO, you are given the option to manually define the character set used. Otherwise you will lose the Hebrew Characters.

Have you looked to see any other questions on Hebrew use in LibO For example Hebrew turns into rectangles

Hope this helps Peter.

Neverman · July 1, 2019, 2:56pm

Well, you helped in describing my problem better, because in fact the hebrew letters are not ‘real’ hebrew letters, but only work with this specific font that only changes the shown symbols, like an aleph for an a and so on. I found however a way how to use that broken data in Anki nontheless. I just adjust the font for the hebrew words in Anki as well and voilá.

ajlittoz · July 1, 2019, 3:58pm

Then your initial data is not Unicode!

Neverman · July 1, 2019, 3:43pm