Why do I get these strange symbols when uploading a file.

The downvote here looks unfair - I upvote to compensate. If someone downvotes: please provide some explanation in cases there’s nothing obvious.

If the problem happens in ODT files

Exactly, if. I can see no proof to it. The problem may be about the server, we don’t know how files are processed. In such a case, you can’t do much.

By the way, I tried to upload a sample ODT file produced with LO Writer 6.1.2.1 containing a single phrase:

You’re right.

It has been processed correctly.

I’ve created and uploaded another one with the same content using LO 5.4.2.7. Still processed correctly.

Can you share the file?

I have reinstalled LibeOffice. I have made a new file with different text and tried to copy and paste every way there is not uploading to anywhere and the problem remains. It happens wherever I copy or paste of upload but NOT if I copy from elsewhere and paste. There is no pont telling a chrome user anything to to with LO 6 and beyond. We cannot get it. We are limited to L0 5. It cannot be the server as the problem happens working in LO offline. I do not want to share a long file but will make and upload a short one. My current theory is this is related to LO on a Chromebook but I am away and cannot use other hardware to experiment.

the problem happens working in LO offline

Really? As I understand, the problem happens when you upload a file to the server.

@gabix: “and tried to copy and paste every way there is not uploading to anywhere and the problem remains”

I am beginning to think this is a Chromebook issue…

You file already contains garbled text such as

Let’s

or

don’t

So, it’s not a conversion or encoding issue. No wonder, that you see the problem wherever you copy and paste the text. How did you produce it? Yes, it might be a Chromebook issue (I have no experience with such an environment). Or not.

I suppose that it’s something about how LibreOffice treats clipboard data encoded as UTF-8. The other day I remember seeing another developer mentioning problems there, saying that we only correctly handle UTF-16-encoded clipboard data (on some apple OS? - don’t remember correctly) - that might be a similar case…

Worth a bug report. Possibly with a screencast (video showing the problem; also would be great to see which different clipboard formats are offered in LO when pasting).

I doubt, it is a reason for a report, at least, at Bugzilla. Chrome OS is not an officially supported operating system, thus, the issue can hardly be qualified as a bug.

@mikekaganski: by default, stored plain text files are UTF-8 unless there’s a sequence 0xFE 0xFF or 0xFF 0xFE at the beginning to establish UTF-16 and its endianness. Consequently, if UTF-16 is internal format, LO must convert input files at some time. After that, all internal data is UTF-16. Problem may occur when external data is passed from some app to LO via the clipboard. I don’t know how encoding is advertised in the clipboard, if any. Deciding between UTF-16 and UTF-8 is not obvious, moreover on short sequences.

Again a very fascinating thread. The text from the example file can be reversed to readable text. Using Notepad++ with an empty new file (UTF-8) and copy / paste a paragraph from the example file it worked with ‘convert to ANSI’ and afterwards changing encoding back to UTF-8. Maybe this helps to light up this mystery.

From @JackyAnn I’d appreciate to read a clarification on what ‘upload an .odt file’ means exactly. Is it about an explicit file upload of the local odt-file to prowritingaid.com or is it copy & paste the hole content from an .odt file into that website which generates an OO .odt file with this?

Heh, “fascinating thread”…

One person gets the idea that the question is about uploading. Another talks about plain-text files. Yet another advises how to reverse to readable text the utf-8-read-as-ISO-8859-1 garbage in an ODT file (!).

I need to repeat what I wrote above: poor description confuses people, and so - it’s the uttermost priority for asker to try their best and describe as clear and as step-by-step as possible.

If the bug report should be filed against TDF or against ChromeOS is questionable; despite many OSes are not officially supported, we do take their patches, like for Haiku. So - it’s better to have a report here than not to have anywhere.

Clipboard management is strictly OS-dependent; and if I understand the question correctly, and typing “I’m OK”, copying it, and pasting back to the same LO breaks the apostrophe, it’s clearly LO-to-OS-clipboard problem… please file it.

I an lost. Do LO users not save a file as ODT? I do. Why would that confuse anyone?

A side comment: GalliumOS is a Xubuntu-based Linux distro for Chromebooks. Looks interesting and might be worth trying.

This looks like a confusion between Unicode and ISO-8859-x.

The source file is probably UTF-8 plain text. For some reason, the uploading process thought it was ISO-8859-x and converted it to Unicode giving the surprising text.

Edit your question to explain how you uploaded the text (which intermediate steps with which applications). Mention your OS, that could help to suggest tools.

EDIT 1

I’d like more technical details on the process of uploading.

How was the initial file content typed? Locally with a text editor (not a document processor like LO)?

Was this initial file uploaded via the Chrome browser? Or some other tool/protocol like ftp?

I guess that you had a plain text (without any formatting effect like bold or italics) file which was uploaded using an HTML tool. HTML protocol uses “headers” to describe the exchanged data. One of these headers tells the recipient the character encoding used at source. It is ISO-8859-1 by default. When source file is plain text, there is no marker inside it (*) to contradict this default. At the other end, this wrong encoding is remembered.

When file is opened by LO, the byte stream is erroneously taken for an ISO-8859-1 while it is in fact an UTF-8 stream. Your original text is probably “You’ve achieved” with a typographical apostrophe U+2019 RIGHT SINGLE QUOTATION MARK, which UTF-8 encoding is 0xE2 0x80 0x99. 0xE2 is “â” both in ISO-8859-1 and Unicode which explains the first strange character. 0x80 and 0x99 are control characters in the C1 set; they may display strangely, accounting for the other characters.

You must find a way to force the uploading mechanism to transmit the file as an UTF-8 stream. If you can’t select the encoding in the utility, try to put a “BOM” (byte order mark or ZERO WIDTH NO-BREAK SPACE) at the start of your file.

BOM is U+FEFF, but included as such in a UTF-8 stream, it may disrupt correct interpretation. Its UTF-8 encoding is 0xEF 0xBB 0xBF. This may be quite difficult to insert unless you have an hexadecimal editor.

(*) The only marker which can flag a plain text file is BOM appearing as the first character in the file.

I uploaded from Google Chrome. ODT file from LO 5 via my chromebook.

It occurred to me the source file was originally L0 in ODT form, then uploaded to Google docs as this was how I worked with my editor, then downloaded to L) and the formatting of any kind cleared to begin formatting the document. I don’t understand all you say as it is beyond my current technical understanding, but not get it that the download may have pulled in something from Google docs. The same problem does not happen if the file is transferred to a different OS. I would assume if I put a BOM it will help but would need the instructions as to how to do that. However, it is an ODT not a plain text fle. I have no idea if I have a hexadecimal editor.

If the problem does not happen outside ChromeOS, forget my suggestion about BOM. It is valid only with plain text. And as you work with .odt, it will not help.

Thanks. I found it interesting to learn about anyway!

If you are using an editor or system that does not support Unicode you will need to EXPORT your LibO file to downgrade the support character set. For example .txt. This will allow you to select ISO-8859-1 which should be supported by your prowritingaid. If not you could try US-ASCII as the system appears to only understand English. (I think, we in Europe would say English-US, rather than English-GB)
You will probably have the same problem if you COPY/PASTE to go over to Chrome.
What keyboard setting, language setup, data types does prowriting support?

Libreoffice and the Internet default to Unicode, the International standard since 1997 which supports about 138,000 characters.
Unicode includes ISO-8859-1 (1987 vintage) which supports the first 255 characters used in Western Europe including England and France. And it supports US-ASCII (1968 vintage) the first 127 characters used in America.

As I mentioned above, if you copy and paste the system must support Unicode. If not you will have this problem. Also, if you EXPORT to a RTF file you must specify the character set your system understands. Otherwise the file again will be still Unicode. Have you defined your Chrome system Keyboard to support Unicode (UTF-8)?

It is importance to understand that copy / paste from one file to another, even if both are Libreoffice .odt files is controlled by the operating system settings, including language. The fact that both files use Unicode (UTF-8) is overuled by your operating system. A bit like having a colour camera, television but black and white film. I assume that one of your problems is there. Language is also important. If you are in Europe and use the € (Euro) sign, for example, you would have the same type of problem. This is why I asked what keyboard and Language settings you have defined. You may have not set these things yourself but they are important to sort out your problem which must be frustrating for you.

By the way, most of my users are not programmers.

==========================
Updated to consider Apostophie and Quotation Mark situation

One problem that can add slightly to this conversion problem and shows why it is important to know the operating system language, LibreOffice language setting and language used for cutting and pasting. Or to put it another way, what you think you are typing is not actually what you get.
I quote from the Unicode Manual.
“Most keyboard layouts support only the U+0022 ( " ) QUOTATION MARK therefore word processors commonly offer a facility for automatically converting the U+0022 ( " ) QUOTATION MARK to a contextually selected curly quote glyph.”
These conversions are language dependant. So, for example, English and Dutch are not the same a Danish and Finnish or French. Also, the Apostrophe used in publishing is often “improved” by upgrading it to a quotation mark. As the Old US-ASCII and ISO-8859-1 support only the 0022 and 0027 you can see this leads to problems highlighted in this question.
Apostrophe U+0027 ( ’ )
Quotation Mark U+0022 ( " )
Left Single Quotation Mark U+2018 ( ‘ )
Right Single Quotation Mark U+2019 ( ’ )
Left Double Quotation Mark U+201C ( “ )
Right Double Quotation Mark U+201C ( ” )

On the Linux and Windows versions of LibreOffice you can display the character code by using ALT-X.

You do not have to be one to think like one. I did edit to explain I was talking of a mindset. I am now pretty sure this is a Chromebook issue. My theory is Google wants to force us all to Google docs…