Reading UTF-8 strings with Python

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Reading UTF-8 strings with Python

Georg Maubach-2
Reading UTF-8 strings with Python

Hi All,

in my current project I need to read open-ended questions using Python. I used the spssdata class and discoverd that the UTF-8 characters were not read properly. All characters above code 128 (as in ASCII) are transformed to "?????". Reading the spssdata module documentation showed no information concerning this matter.

As I need this in Python to do transformations on it I would like to have the original characters in UTF-8 format.

Is there a way to read these UTF-8 characters with Python programmability?

I am looking forward to hearing from you.

Best regards

Georg Maubach
Research Manager

Reply | Threaded
Open this post in threaded view
|

Re: Reading UTF-8 strings with Python

Peck, Jon
Reading UTF-8 strings with Python

What is the scenario here?  Are you fetching cases from SPSS into your Python program or reading external text and generating SPSS cases, or something else?

 

You should be using Unicode mode in SPSS for best behavior in this situation.  Python will get Unicode strings from SPSS in that case, and vice versa.

 

If you are reading raw utf-8 text  with Python, you need to use a codec to read that as Unicode.  I can supply details if that’s the issue.

 

Regards,

Jon Peck

SPSS, an IBM Company

 

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Georg Maubach
Sent: Monday, October 05, 2009 11:02 AM
To: [hidden email]
Subject: [SPSSX-L] Reading UTF-8 strings with Python

 

Hi All,

in my current project I need to read open-ended questions using Python. I used the spssdata class and discoverd that the UTF-8 characters were not read properly. All characters above code 128 (as in ASCII) are transformed to "?????". Reading the spssdata module documentation showed no information concerning this matter.

As I need this in Python to do transformations on it I would like to have the original characters in UTF-8 format.

Is there a way to read these UTF-8 characters with Python programmability?

I am looking forward to hearing from you.

Best regards

Georg Maubach
Research Manager