Ask Your Question
0

Error. ascii' codec can't encode character u'\ufffd'

asked 2018-06-11 08:22:25 +0200

nicola.di.bergantino gravatar image

Hi,

Since a few days I am writing StarBasic custom functions for Calc wich call external Python functions.

Yesterday I found this error:

-----------------------------------------------------------
BASIC runtime error.
An exception occurred 
Type: com.sun.star.uno.RuntimeException
Message: <type 'exceptions.UnicodeEncodeError'>: 'ascii' codec can't encode character u'\ufffd' in position 3: ordinal not in range(128), traceback follows
------------------------------------------------------------

I bypassed it in Python with

   tmp = dTlx[isin]['descrizione_masterchart']
   tmp = tmp.replace(u'\ufffd',"?")

But I think the problem can arrive in other forms, since there can be other unicode characters in the text i have in my database and here it seems like StarBasic is trying to convert the string to ASCII.

Do you know if it is possible for StarBasic to accept unicode strings from Python ?

I include the string receiving Basic function

Function msDescrizioneMasterchart(v1, v2) as String
    Dim oScriptProvider, oScript
    scriptUrl = "vnd.sun.star.script:masterchart.py$msDescrizioneMasterchart?language=Python&location=user"
    oScriptProvider = ThisComponent.getScriptProvider()
    oScript = oScriptProvider.getScript(scriptUrl)
    Dim out as String
    out =  oScript.invoke(array(v1,v2), array(), array())
    msDescrizioneMasterchart = out
End Function
edit retag flag offensive close merge delete

Comments

What you describe looks like a bug; and bugs are offtopic here on this site, and should be filed to the bug tracker.

It's not clear how to reproduce the problem based on your description, so when filing the bug reprot on the tracker, please provide all that is necessary to reproduce on a clean system: be it required data files, code, configuration, or specific steps.

Mike Kaganski gravatar imageMike Kaganski ( 2018-06-11 08:36:35 +0200 )edit

There are a number of articles on using Unicode with Python. For example https://docs.python.org/3/howto/unico.... I do not know if this will help. LibreOffice uses Unicode.

petermau gravatar imagepetermau ( 2018-06-11 10:49:26 +0200 )edit

Please also be more specific about version of LO and version of Operating System.

Xoristzatziki gravatar imageXoristzatziki ( 2018-06-11 23:25:23 +0200 )edit

For better help, show the lines of Python code where the problem occurs, along with example data that shows the problem. Also, be sure to post the entire error message. You left out the traceback that tells where the error occurred. See guidelines for asking.

Jim K gravatar imageJim K ( 2018-06-12 08:19:56 +0200 )edit

US-ASCII only recognises the first 127 characters used in Unicode as it is 7 bits. The EURO (€) a unicode character would not be recognised. The /uFFFD is in fact an attempt to show that the character is not supported (� ) It is actually the Unicode replacement character.

petermau gravatar imagepetermau ( 2018-06-12 15:58:58 +0200 )edit

3 Answers

Sort by » oldest newest most voted
0

answered 2018-06-15 06:08:15 +0200

Xoristzatziki gravatar image

updated 2018-06-16 20:36:37 +0200

You must first import the CSV using the appropriate encoding. If you start with wrong encoding then anything else will be garbage. Example: there are two almost identical Greek encodings ISO-8859-7 and WINDOWS-1253 that differ in some characters. If you use the Windows-1253 to import a text that includes the Ά and saved in ISO-8859-7, that character will be converted to 00B6 unicode which is unprintable character.

You probably converted the csv using the wrong encoding, or (the most probable) your CSV already contains text with different encodings for different rows (or string values) which means you added lines using the same encoding but from sources that use different encodings. Thus exporting to json creates unpredictable characters. Is not a python problem.

Edited as erAck correctly commented...

edit flag offensive delete link more

Comments

Nitpick: these are not locales, these are text encodings.

erAck gravatar imageerAck ( 2018-06-15 10:47:52 +0200 )edit

I asked the data provider to save the files in UTF8, that should fix it.

nicola.di.bergantino gravatar imagenicola.di.bergantino ( 2018-06-15 11:33:52 +0200 )edit
0

answered 2018-06-12 10:11:02 +0200

nicola.di.bergantino gravatar image

Thank you all for the many comments/answer,

I will do some other tests later starting from jim_K code.

I post here some more details to reply some of your questions:

1] Garbadge unfortunately arrives with data. Most probably it is an EURO symbol encoded in some way.

image description

here the hexdump of the original .csv

00038a50  32 32 47 4e 32 32 22 2c  32 30 32 32 30 36 32 32  |22GN22",20220622|
00038a60  0d 0a 22 49 54 30 30 30  35 31 38 38 31 32 30 22  |.."IT0005188120"|
00038a70  2c 22 42 54 50 a4 49 20  30 2e 31 25 20 31 35 4d  |,"BTP.I 0.1% 15M|
00038a80  47 32 32 22 2c 22 49 54  30 30 30 35 31 38 38 31  |G22","IT00051881|

then, the hexdump of .csv converted to JSON, wich is what Python gets

0008d2d0  22 49 54 30 30 30 35 31  38 38 31 32 30 22 2c 22  |"IT0005188120","|
0008d2e0  64 65 73 63 72 69 7a 69  6f 6e 65 22 3a 22 42 54  |descrizione":"BT|
0008d2f0  50 ef bf bd 49 20 30 2e  31 25 20 31 35 4d 47 32  |P...I 0.1% 15MG2|

2] I am using LibreOffice 6.0.2.1.0 in FreeBSD 11.1, installed from package

3] It is definitely a Python error:

python> print u'\x50\xbd\x49\x20\x30\x2e'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xbd' in position 1: ordinal not in range(128)
edit flag offensive delete link more

Comments

0

answered 2018-06-12 08:19:43 +0200

Jim K gravatar image

updated 2018-06-12 08:25:12 +0200

Do you know if it is possible for StarBasic to accept unicode strings from Python ?

Yes, it is. For example, take the following code.

Sub call_msDescrizioneMasterchart
    MsgBox msDescrizioneMasterchart(1, 2)
End Sub

Function msDescrizioneMasterchart(v1, v2) as String
    Dim oScriptProvider, oScript
    scriptUrl = "vnd.sun.star.script:masterchart.py$msDescrizioneMasterchart?language=Python&location=user"
    oScriptProvider = ThisComponent.getScriptProvider()
    oScript = oScriptProvider.getScript(scriptUrl)
    Dim out as String
    out =  oScript.invoke(array(v1,v2), array(), array())
    msDescrizioneMasterchart = out
End Function

def msDescrizioneMasterchart(v1, v2):
    return "%s\ufffd%s" % (v1, v2)

Executing call_msDescrizioneMasterchart produces the correct result:

1 replacement_char 2

The problem with your code seems to occur while in Python. The error message is similar to https://stackoverflow.com/questions/9....

Also, U+FFFD is the Unicode replacement character, so it probably shouldn't be showing up at all. There is likely something wrong with your Python code and maybe the data as well.

edit flag offensive delete link more
Login/Signup to Answer

Question Tools

1 follower

Stats

Asked: 2018-06-11 08:22:25 +0200

Seen: 54 times

Last updated: Jun 16