Issues with dropbox: "unicode encoding conflict" after saving files with LibreOffice

Hello,

I have been using LibreOffice since the beginning and Dropbox since it too came out. Both have been working smoothly for a very long time. However, a few nonths ago, a certain issue emerged that’s giving me headaches:

I use LibreOffice and Dropbox under OS X, dealing with filenames in Greek characters (that’s pure unicode in the filenames). I have a lot of old files, i.e. over 3 years old saved in my dropbox folder. Almost every time I open one of those files and use the “save as…” option to create a copy from within LibreOffice, dropbox add the “(unicode encoding conflict)” suffix on the filename. I then have to edit through the Finder and remove the suffix. Sometimes, LibreOffice reports that it’s unable to save the file, because (the now to be created) file cannot be found.

I have found reports regarding the above-mentioned dropbox filename suffix from non-English latin speakers, who are using suffixes on their filenames (that also triggers unicode filenames), however these reports don’t seem LibreOffice related; they don’t say anyway…

dropbox support replies that it’s a normal behavior of dropbox that take over when a filename is entered in some way other that normal keyboard input, thus producing some other form of unicode filename that dropbox doesn’t like and it add the suffix so as to protect from data loss or whatsoever.

Since LibreOffice is the only application so far on the Mac that brings forward the issue, do you have any ideas of how I could avoid it?

I have gone so far as to downloading all my dropbox files from scratch by completely removing dropbox from my machine and reinstalling with a different dropbox folder target.

This issue has been happening on two different Macs, one running 10.5.8 and the other running 10.8.5.

Hello, back again with a way to replicate the issue. I’m in touch with dropbox support as well, so getting a bit more technical here:

[using LibreOffice 4.1.4.2 under OS X 10.8.5]

  • create a new text document

  • save the file using filename “amélie” (é entered by alt-e+e) [LibreOffice saves the file using NFC unicode normalization, since it comes from the opensource world]

  • close the file

  • open the file by dbl-clicking on the Finder [OS X possibly sends filename path to LibreOffice using NFD normalization]

  • “save as…” the file using filename “amélie amélie” (just add a space and a second “amélie”, do not overwrite the default filename) [Seems like when opening a file which filename contains unicode characters, it’s being opened by LibreOffice using NFD standard, and then adding the same accented text under OS X results in the first “é” being in NFD and the second “é” being coded in NFC, thus producing the suffix.]

P.S: The fact is that it’s not just LibreOffice, there are also a couple .pdfs (could be that they were also saved by LibreOffice, don’t remember, I cannot replicate it using Preview) and LOTS of windows AutoCAD 2009 .bak files (running windows through Parallels and have dropbox folder set up as a shared folder, i.e. dropbox is not installed under windows VM).

Sorry for the fuss,
keep up the good work.

Please edit your question to include examples of the characters, filenames, and how the characters in these filenames were entered. Thanks.

updated with comments and a way to replicate on OS X

Thanks. I will update my answer with a clearer example of what I believe is likely to be the issue.

The official account from Dropbox help is:

In some instances, there are several ways to create the same character on your keyboard. Although the characters may look the same, they are not the same to operating systems and Dropbox.

Indeed. Given you mention MacOS it is likely that the characters have been created using a different encoding from UTF-8. It may be that this is a LibreOffice issue, although It sounds to me more like an operating system setting or method of entry problem. A good reference article on encoding can be found here with the part from the “Hello” example onwards essentially demonstrating the same thing I have here (I am using MacOS 10.6.8 Terminal for clarity).

MacOS X encoding example

This is basic example to demonstrate in a simple manner the problem with encoding and why the character é needs to be qualified with an encoding in order to understand how it is represented.

Make sure both UTF-8 and Western (Mac OS Roman) encodings are available i.e., Terminal > Preferences… > Encodings

terminal encodings

With the terminal set to UTF-8 (Terminal > Preferences… > Settings > Advanced > International) check the encoding:

encoding of UTF-8

… and the locale:

$ locale
LANG="en_AU.UTF-8"
LC_COLLATE="en_AU.UTF-8"
LC_CTYPE="en_AU.UTF-8"
LC_MESSAGES="en_AU.UTF-8"
LC_MONETARY="en_AU.UTF-8"
LC_NUMERIC="en_AU.UTF-8"
LC_TIME="en_AU.UTF-8"
LC_ALL=

Now we create a simple text example using the MacOS key combination Option-E+E to produce é (in all instances below). First place some UTF-8 text in a file:

$ echo amélié > utf8.txt
$ cat utf8.txt
amélié
$ file -b utf8.txt 
UTF-8 Unicode text

… now place some ASCII text in a file with a UTF-8 file name:

$ echo abc123 > amélié_ascii.txt
$ cat amélié_ascii.txt
abc123
$ file -b amélié_ascii.txt 
ASCII text

Change terminal encoding from UTF-8 to Western (Mac OS Roman):

encoding ANSI

We are now placing ANSI text in a file (even though it appears to be UTF-8):

$ echo amélié > ansi.txt
$ cat ansi.txt 
amélié
$ file -b ansi.txt
Non-ISO extended-ASCII text

Check the previous UTF-8 file:

$ cat utf8.txt 
amélié

Create another ASCII file:

$ echo abc123 > amélié_ascii.txt
$ cat am

… at this point I press TAB (twice) to try and autocomplete the filename, which displays:

am%8Eli%8E_ascii.txt  ameÃÅlieÃÅ_ascii.txt

Check both files, starting with the Western encoding ASCII one:

$ cat am%8Eli%8E_ascii.txt 
abc123
$ file -b am%8Eli%8E_ascii.txt 
ASCII text
$ file -b ameÃÅlieÃÅ_ascii.txt 
ASCII text

Change terminal encoding from Western (Mac OS Roman) to UTF-8.

$ ls am*
am%8Eli%8E_ascii.txt amélié_ascii.txt
$ cat am%8Eli%8E_ascii.txt 
abc123
$ cat amélié_ascii.txt
abc123

Even though the same character (é) appears to be getting entered (using the same method) it is the encoding in use at the time that determines what get stored.