Why do I get these strange symbols when uploading a file.

This looks like a confusion between Unicode and ISO-8859-x.

The source file is probably UTF-8 plain text. For some reason, the uploading process thought it was ISO-8859-x and converted it to Unicode giving the surprising text.

Edit your question to explain how you uploaded the text (which intermediate steps with which applications). Mention your OS, that could help to suggest tools.


I’d like more technical details on the process of uploading.

How was the initial file content typed? Locally with a text editor (not a document processor like LO)?

Was this initial file uploaded via the Chrome browser? Or some other tool/protocol like ftp?

I guess that you had a plain text (without any formatting effect like bold or italics) file which was uploaded using an HTML tool. HTML protocol uses “headers” to describe the exchanged data. One of these headers tells the recipient the character encoding used at source. It is ISO-8859-1 by default. When source file is plain text, there is no marker inside it (*) to contradict this default. At the other end, this wrong encoding is remembered.

When file is opened by LO, the byte stream is erroneously taken for an ISO-8859-1 while it is in fact an UTF-8 stream. Your original text is probably “You’ve achieved” with a typographical apostrophe U+2019 RIGHT SINGLE QUOTATION MARK, which UTF-8 encoding is 0xE2 0x80 0x99. 0xE2 is “â” both in ISO-8859-1 and Unicode which explains the first strange character. 0x80 and 0x99 are control characters in the C1 set; they may display strangely, accounting for the other characters.

You must find a way to force the uploading mechanism to transmit the file as an UTF-8 stream. If you can’t select the encoding in the utility, try to put a “BOM” (byte order mark or ZERO WIDTH NO-BREAK SPACE) at the start of your file.

BOM is U+FEFF, but included as such in a UTF-8 stream, it may disrupt correct interpretation. Its UTF-8 encoding is 0xEF 0xBB 0xBF. This may be quite difficult to insert unless you have an hexadecimal editor.

(*) The only marker which can flag a plain text file is BOM appearing as the first character in the file.

I uploaded from Google Chrome. ODT file from LO 5 via my chromebook.

It occurred to me the source file was originally L0 in ODT form, then uploaded to Google docs as this was how I worked with my editor, then downloaded to L) and the formatting of any kind cleared to begin formatting the document. I don’t understand all you say as it is beyond my current technical understanding, but not get it that the download may have pulled in something from Google docs. The same problem does not happen if the file is transferred to a different OS. I would assume if I put a BOM it will help but would need the instructions as to how to do that. However, it is an ODT not a plain text fle. I have no idea if I have a hexadecimal editor.

If the problem does not happen outside ChromeOS, forget my suggestion about BOM. It is valid only with plain text. And as you work with .odt, it will not help.

Thanks. I found it interesting to learn about anyway!

If you are using an editor or system that does not support Unicode you will need to EXPORT your LibO file to downgrade the support character set. For example .txt. This will allow you to select ISO-8859-1 which should be supported by your prowritingaid. If not you could try US-ASCII as the system appears to only understand English. (I think, we in Europe would say English-US, rather than English-GB)
You will probably have the same problem if you COPY/PASTE to go over to Chrome.
What keyboard setting, language setup, data types does prowriting support?

Libreoffice and the Internet default to Unicode, the International standard since 1997 which supports about 138,000 characters.
Unicode includes ISO-8859-1 (1987 vintage) which supports the first 255 characters used in Western Europe including England and France. And it supports US-ASCII (1968 vintage) the first 127 characters used in America.

As I mentioned above, if you copy and paste the system must support Unicode. If not you will have this problem. Also, if you EXPORT to a RTF file you must specify the character set your system understands. Otherwise the file again will be still Unicode. Have you defined your Chrome system Keyboard to support Unicode (UTF-8)?

It is importance to understand that copy / paste from one file to another, even if both are Libreoffice .odt files is controlled by the operating system settings, including language. The fact that both files use Unicode (UTF-8) is overuled by your operating system. A bit like having a colour camera, television but black and white film. I assume that one of your problems is there. Language is also important. If you are in Europe and use the € (Euro) sign, for example, you would have the same type of problem. This is why I asked what keyboard and Language settings you have defined. You may have not set these things yourself but they are important to sort out your problem which must be frustrating for you.

By the way, most of my users are not programmers.

Updated to consider Apostophie and Quotation Mark situation

One problem that can add slightly to this conversion problem and shows why it is important to know the operating system language, LibreOffice language setting and language used for cutting and pasting. Or to put it another way, what you think you are typing is not actually what you get.
I quote from the Unicode Manual.
“Most keyboard layouts support only the U+0022 ( " ) QUOTATION MARK therefore word processors commonly offer a facility for automatically converting the U+0022 ( " ) QUOTATION MARK to a contextually selected curly quote glyph.”
These conversions are language dependant. So, for example, English and Dutch are not the same a Danish and Finnish or French. Also, the Apostrophe used in publishing is often “improved” by upgrading it to a quotation mark. As the Old US-ASCII and ISO-8859-1 support only the 0022 and 0027 you can see this leads to problems highlighted in this question.
Apostrophe U+0027 ( ’ )
Quotation Mark U+0022 ( " )
Left Single Quotation Mark U+2018 ( ‘ )
Right Single Quotation Mark U+2019 ( ’ )
Left Double Quotation Mark U+201C ( “ )
Right Double Quotation Mark U+201C ( ” )

On the Linux and Windows versions of LibreOffice you can display the character code by using ALT-X.

You do not have to be one to think like one. I did edit to explain I was talking of a mindset. I am now pretty sure this is a Chromebook issue. My theory is Google wants to force us all to Google docs…

can someone tell me why and what to do?

I can’t tell why. I can tell what to do: wipe that ChomeOS and install Linux. Will work.

Actually, I have done that before. The results were not great compared to installing on my Laptop (sadly the hardware is so old it failed.) I am used to Ubuntu and have always used the latest install for it. When I find a laptop with 64ARM that is sound but needs and OS I will go back to that.

I did get L0 6 but one night the cat sat on the keyboard, I awoke to find the CAT reset the entire Chromebook! Both the Linux install and the Chromebook OS were wiped as if I just bought the Chromebook.

I did not repeat the exercise as the results installing LInux were not great. I decided to stick with the Chromebook Beta version of Linux. I

Anyone taking your advice would need to be sure they were happy with voiding a guarantee. I may do it all again but decided to give the Chromebook Beta a chance seeing as the only reason I see for them not adding the repository for LO6 is they do not yet see it as stable. I am not yet ready to wipe the entire Chrome OS.

I get the strange characters when I copy from a web page and paste into LO. The character clusters represent the open and closed single and double quotes and the em dash. I have the AltSearch extension installed, and have saved a batch search-and-replace operation that returns these strange characters to what they should be. AltSearch has a Batch button that takes you to the Batch screen, which has an Edit button to edit the text file that holds saved batch operations. If you can insert these lines in that text file you can fix the pasted text in one operation.

Sorry but the following text should be on new lines and I can’t seem to get it to show properly. A new line before every open square bracket. It should look the same as the other batch commands in the text file. The text file you have to edit is AltSearchScript.txt and it is in .config/libreoffice/4/user/config/ in your home folder if you are using Linux.

 [Name] Fix Strange Characters
 [Parameters]   MsgOff  Regular  CurrSelection  
 [Command] ReplaceAll

 [Parameters]   MsgOff  Regular  CurrSelection  
 [Command] ReplaceAll

 [Parameters]   MsgOff  Regular  CurrSelection  
 [Command] ReplaceAll

 [Parameters]   MsgOff  Regular  CurrSelection  
 [Command] ReplaceAll

 [Parameters]   MsgOff  Regular  CurrSelection  
 [Command] ReplaceAll

(edited by ajlittoz for proper formatting)

@GerardBuz: FYI use five spaces at start of line to disable line merging; lines will display as typed.

It is obviously a confusion ISO-8859-x/UTF-8. What happens if you “paste special” as unformatted text?

use five spaces at start

A small correction: four spaces at start have this special meaning.

@mikekaganski: thanks, I did it from memory and preferred to play safe