SPSSX Discussion

Reading UTF-8 strings with Python

Classic

List

Threaded

2 messages Options

Georg Maubach-2

Reading UTF-8 strings with Python

Hi All,

in my current project I need to read open-ended questions using Python. I used the spssdata class and discoverd that the UTF-8 characters were not read properly. All characters above code 128 (as in ASCII) are transformed to "?????". Reading the spssdata module documentation showed no information concerning this matter.

As I need this in Python to do transformations on it I would like to have the original characters in UTF-8 format.

Is there a way to read these UTF-8 characters with Python programmability?

I am looking forward to hearing from you.

Best regards

Georg Maubach
Research Manager

Peck, Jon

Re: Reading UTF-8 strings with Python

Reading UTF-8 strings with Python

What is the scenario here? Are you fetching cases from SPSS into your Python program or reading external text and generating SPSS cases, or something else?

You should be using Unicode mode in SPSS for best behavior in this situation. Python will get Unicode strings from SPSS in that case, and vice versa.

If you are reading raw utf-8 text with Python, you need to use a codec to read that as Unicode. I can supply details if that’s the issue.

Regards,

Jon Peck

SPSS, an IBM Company

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Georg Maubach
Sent: Monday, October 05, 2009 11:02 AM
To: [hidden email]
Subject: [SPSSX-L] Reading UTF-8 strings with Python

Hi All,

As I need this in Python to do transformations on it I would like to have the original characters in UTF-8 format.

Is there a way to read these UTF-8 characters with Python programmability?

I am looking forward to hearing from you.

Best regards

Georg Maubach
Research Manager