Libreoffice convert writer to text Windows , Linux

When I run the following command on Windows
soffice.exe --headless --convert-to txt:Text --outdir C:\C#\odt2txt “R:\Greek\Greek Lemon Roast Chicken.odt”

The text utf-8 characters are corrupted.

When I run the following command on Linux
libreoffice --headless --convert-to txt:Text --outdir C:\C#\odt2txt “R:\Greek\Greek Lemon Roast Chicken.odt”

The utf-8 characters are correct. The Region on Windows was changed to UTF-8, this made no difference. When I run libreoffice on linux under python, the output text files are correct. When I run libreoffice on linux under dotnet c# the text output is corrupted.

Why is the text corrupted on Windows and c#. I much prefer programming with c#, but will use python if I have to.

Thank you.

I was using notepad++, for some reason does not work with utf8. After changing the Windows region to utf8, the characters display normally in Notepad.

In which way? What is the original text, and what exactly (byte values) is saved? May it be that it saves in UTF-16 instead of UTF-8, because that is default text encoding for Unicode on Windows? (not using Windows myself so I can’t really tell).

utf8 is a compatable subset of utf16, should make no difference. unicode characters corrupted are

00E9, 2309, 2303.

I was using notepad++, for some reason does not work with utf8. After changing the Windows region to utf8, the characters display normally in Notepad.

No, it is absolutely not. UTF-8 and UTF-16 are different encodings of Unicode characters, specifically UTF-16 uses two-byte values, 16-bit, hence UTF-16, and UTF-8 uses one-byte values, 8-bit, hence UTF-8. Both encodings are capable to encode the full Unicode character range.

1 Like

Notepad++ is working with UTF8 and UTF16 maybe the auto-detect was misleading. BOM set?