# Error. ascii' codec can't encode character u'\ufffd'

Hi,

Since a few days I am writing StarBasic custom functions for Calc wich call external Python functions.

Yesterday I found this error:

-----------------------------------------------------------
BASIC runtime error.
An exception occurred
Type: com.sun.star.uno.RuntimeException
Message: <type 'exceptions.UnicodeEncodeError'>: 'ascii' codec can't encode character u'\ufffd' in position 3: ordinal not in range(128), traceback follows
------------------------------------------------------------


I bypassed it in Python with

   tmp = dTlx[isin]['descrizione_masterchart']
tmp = tmp.replace(u'\ufffd',"?")


But I think the problem can arrive in other forms, since there can be other unicode characters in the text i have in my database and here it seems like StarBasic is trying to convert the string to ASCII.

Do you know if it is possible for StarBasic to accept unicode strings from Python ?

I include the string receiving Basic function

Function msDescrizioneMasterchart(v1, v2) as String
Dim oScriptProvider, oScript
scriptUrl = "vnd.sun.star.script:masterchart.py$msDescrizioneMasterchart?language=Python&location=user" oScriptProvider = ThisComponent.getScriptProvider() oScript = oScriptProvider.getScript(scriptUrl) Dim out as String out = oScript.invoke(array(v1,v2), array(), array()) msDescrizioneMasterchart = out End Function  edit retag close merge delete ## Comments What you describe looks like a bug; and bugs are offtopic here on this site, and should be filed to the bug tracker. It's not clear how to reproduce the problem based on your description, so when filing the bug reprot on the tracker, please provide all that is necessary to reproduce on a clean system: be it required data files, code, configuration, or specific steps. ( 2018-06-11 08:36:35 +0200 )edit There are a number of articles on using Unicode with Python. For example https://docs.python.org/3/howto/unico.... I do not know if this will help. LibreOffice uses Unicode. ( 2018-06-11 10:49:26 +0200 )edit Please also be more specific about version of LO and version of Operating System. ( 2018-06-11 23:25:23 +0200 )edit For better help, show the lines of Python code where the problem occurs, along with example data that shows the problem. Also, be sure to post the entire error message. You left out the traceback that tells where the error occurred. See guidelines for asking. ( 2018-06-12 08:19:56 +0200 )edit US-ASCII only recognises the first 127 characters used in Unicode as it is 7 bits. The EURO (€) a unicode character would not be recognised. The /uFFFD is in fact an attempt to show that the character is not supported (� ) It is actually the Unicode replacement character. ( 2018-06-12 15:58:58 +0200 )edit ## 3 Answers Sort by » oldest newest most voted Do you know if it is possible for StarBasic to accept unicode strings from Python ? Yes, it is. For example, take the following code. Sub call_msDescrizioneMasterchart MsgBox msDescrizioneMasterchart(1, 2) End Sub Function msDescrizioneMasterchart(v1, v2) as String Dim oScriptProvider, oScript scriptUrl = "vnd.sun.star.script:masterchart.py$msDescrizioneMasterchart?language=Python&location=user"
oScriptProvider = ThisComponent.getScriptProvider()
oScript = oScriptProvider.getScript(scriptUrl)
Dim out as String
out =  oScript.invoke(array(v1,v2), array(), array())
msDescrizioneMasterchart = out
End Function

def msDescrizioneMasterchart(v1, v2):
return "%s\ufffd%s" % (v1, v2)


Executing call_msDescrizioneMasterchart produces the correct result:

The problem with your code seems to occur while in Python. The error message is similar to https://stackoverflow.com/questions/9....

Also, U+FFFD is the Unicode replacement character, so it probably shouldn't be showing up at all. There is likely something wrong with your Python code and maybe the data as well.

more

I will do some other tests later starting from jim_K code.

I post here some more details to reply some of your questions:

1] Garbadge unfortunately arrives with data. Most probably it is an EURO symbol encoded in some way.

here the hexdump of the original .csv

00038a50  32 32 47 4e 32 32 22 2c  32 30 32 32 30 36 32 32  |22GN22",20220622|
00038a60  0d 0a 22 49 54 30 30 30  35 31 38 38 31 32 30 22  |.."IT0005188120"|
00038a70  2c 22 42 54 50 a4 49 20  30 2e 31 25 20 31 35 4d  |,"BTP.I 0.1% 15M|
00038a80  47 32 32 22 2c 22 49 54  30 30 30 35 31 38 38 31  |G22","IT00051881|


then, the hexdump of .csv converted to JSON, wich is what Python gets

0008d2d0  22 49 54 30 30 30 35 31  38 38 31 32 30 22 2c 22  |"IT0005188120","|
0008d2e0  64 65 73 63 72 69 7a 69  6f 6e 65 22 3a 22 42 54  |descrizione":"BT|
0008d2f0  50 ef bf bd 49 20 30 2e  31 25 20 31 35 4d 47 32  |P...I 0.1% 15MG2|


2] I am using LibreOffice 6.0.2.1.0 in FreeBSD 11.1, installed from package

3] It is definitely a Python error:

python> print u'\x50\xbd\x49\x20\x30\x2e'
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in position 1: ordinal not in range(128)

more

You must first import the CSV using the appropriate encoding. If you start with wrong encoding then anything else will be garbage. Example: there are two almost identical Greek encodings ISO-8859-7 and WINDOWS-1253 that differ in some characters. If you use the Windows-1253 to import a text that includes the Ά and saved in ISO-8859-7, that character will be converted to 00B6 unicode which is unprintable character.

You probably converted the csv using the wrong encoding, or (the most probable) your CSV already contains text with different encodings for different rows (or string values) which means you added lines using the same encoding but from sources that use different encodings. Thus exporting to json creates unpredictable characters. Is not a python problem.

Edited as erAck correctly commented...

more

Nitpick: these are not locales, these are text encodings.

( 2018-06-15 10:47:52 +0200 )edit

I asked the data provider to save the files in UTF8, that should fix it.

( 2018-06-15 11:33:52 +0200 )edit