Base not decoding Western charset correctly with sdbc:firebird

xenon1 · March 6, 2019, 9:36pm

I am working with LibreOffice 6.0.7.3.

[Details:
Libre Office base (Linux/Ubuntu): Version: 6.0.7.3
Build-ID: 1:6.0.7-0ubuntu0.18.04.2
OS: Linux 4.15; UI-Render: Standard; VCL: gtk3;
language German (de_DE.UTF-8)]

I connect to an existing Firebird 3.0 database via the sdbc driver (direct connection, no jdbc or odbc). Connection string/database path: “file:///home/…/Databasename.fdb”. My firebird database is encoded in ISO8859_1 (“Western”, 8-bit charset), whereas my LibreOffice encodes in UTF-8.

Unfortunately, LibreOffice cannot read the special characters of the database (in German: the Umlauts). Instead it produces the question marks in black diamonds. It obviously does not translate between the ISO8859_1 charset of the database and its own UTF-8 charset.

Comment: With LibreOffice 5.x I had to connect via the JDBC driver (JAVA) to the database. More tedious to set up! But I could add “?encoding=ISO8859_1” to the connections string/database path and it would decode correctly. However, this does not work now.

How can I make it decode the ISO8859_1 character set correctly with the new sdbc connection? Or do I have to go back to the JAVA connection? I’d like to keep the new sdbc-connection, as it is easy to establish (no JAVA drivers necessary) and it is fast.

Thanks for an answer!

Xenon1

mikekaganski · March 8, 2019, 5:58am

By the way - how do you connect to FB using SDBC? I thought it’s only possible using ODBC/JDBC. Or do you mean you open FB database file in file mode, not to an external FB server?

xenon1 · March 9, 2019, 12:40am

Mike, with LibreOffice 6.0 or maybe 6.1 you can connect directly: Open a new instance of LibreOffice Base: Choose Connect to an existing data base | open with a click on the downward triangle | Choose Firbird. There you are! It is really nice and quick.

mikekaganski · March 9, 2019, 11:26am

I’d say - file a bug report,attaching a sample database (both ODB and FBD (or whatever extension the Firebird DB has)). That would allow to reproduce, check for possible solutions, or fix it if it’s a bug. Please create a new sample DB, or anonimize existing one before upload.

petermau · March 7, 2019, 4:04pm

Unicode (UTF-8) is the Character Set used by the Internet and LibreOffice. It supports about 138,000 characters. Unicode contains US-ASCII which is supported by the first 127 characters and ISO8859-1, the first 255 characters. So characters such as ç, Ä, ä etc exist in Unicode in exactly the same way as genuine ISO-8859-1. You will see this if you INSERT > SPECIAL CHARACTERS as the first characters in the table grouped as BASIC LATIN (0-127)(X 1-7E) and LATIN-1 (128-255) (X A0-FF).

The main character outside ISO8859-1 that you may use in Europe is the EURO sign €. So if you have an error problem, check that it is not the €.

If you have the � (Replacement Character (U+FFFD)) in the text, this is the Unicode character to replace an unknown character. You will not see this to replace an ISO-8859-1 character. The problem is elsewhere. Why is the Firebird encoded in ISO-8859-1? This 1987 technology and restricted to Europe. Linux is based on Unicode, as is the Internet and supports all the major written languages and the € (euro)

If you are seeing the � character in LibreOffice, this is implying that you have some unknown or error characters in your ISO8859-1 data. It is this information that is being transferred and then being displayed as the � character. If the data is pure ISO8859-1 it will also be pure Unicode.

xenon1 · March 9, 2019, 12:43am

Thanks riosv and petermau. However, that was not the question. My Firebird database is already configured as ISO-8859-1. I want to connect to it as it is, not change my database’s character set to fit LibreOffice, but make LibreOffice work with my database. Thank you anyway.

mikekaganski · March 9, 2019, 11:22am

If the data is pure ISO8859-1 it will also be pure Unicode

well - of course, it depends.
While the first 256 Unicode codepoints are exactly the same as in ISO8859-1, it doesn’t mean that all 256 ISO8859-1 characters are encoded exactly as in UTF-8 (which uses 2-byte sequences for ISO8859-1 characters 128-255). So if there is a mismatch in some settings (like LO treats byte sequence stored in DB as if it were UTF-8, while it’s in fact ISO8859-1), the string would contain invalid UTF-8 sequences.

mariosv · March 7, 2019, 9:44pm

If I’m not wrong Firebird can use UTF, take a look to this book Firebird character set

Ratslinger · March 9, 2019, 6:39pm

Hello,

Try adding ?charSet=ISO8859_1 to the end of the Datasource URL.

xenon1 · March 9, 2019, 11:28pm

No, this does not work.