Unicode error when importing SPSS data into python

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Unicode error when importing SPSS data into python

Andy W
I get a pretty esoteric complaint about a unicode error for the below code, and I can only replicate when I import both the GeoID and the Side variable at the same time (whereas importing one or the other they are fine, see the MyDat1 and MyDat2).

**********************************************************************************.
*Get the test file from online.
SPSSINC GETURI DATA URI="https://dl.dropboxusercontent.com/u/3385251/TestSet.sav" FILETYPE=SAV DATASET=TestSet.
*Note it is in Unicode.

*Now importing data into python.
BEGIN PROGRAM Python.
import spss

def AllSPSSdat(vars):
  if vars == None:
    varNums = range(spss.GetVariableCount())
  else:
    allvars = [spss.GetVariableName(i) for i in range(spss.GetVariableCount())]
    varNums = [allvars.index(i) for i in vars]
  data = spss.Cursor(varNums)
  pydata = data.fetchall()
  data.close()
  return pydata

#This works
MyDat1 = AllSPSSdat(vars=["Lat","Lon","Orient","CID","Order_","Side"])
print MyDat1[0]

#This works
MyDat2 = AllSPSSdat(vars=["Lat","Lon","Orient","CID", "Order_","GeoID"])
print MyDat2[0]

#But dumping all variables at once does not work
MyDat3 = AllSPSSdat(vars=["Lat","Lon","Orient","GeoID", "CID", "Order_", "Side"])
print MyDat3[0]

END PROGRAM.
**********************************************************************************.

The exact error is:

> Traceback (most recent call last):
  File "<string>", line 20, in <module> 
  File "<string>", line 10, in AllSPSSdat
  File "C:\PROGRA~1\IBM\SPSS\STATIS~1\24\PYTHON\Lib\site-packages\spss\cursors.py", line 1313, in fetchall
    data.append(self.binaryStream.fetchData())
  File "C:\PROGRA~1\IBM\SPSS\STATIS~1\24\PYTHON\Lib\site-packages\spss\binarystream.py", line 406, in fetchData
    activeCase = self.readcache(data)
  File "C:\PROGRA~1\IBM\SPSS\STATIS~1\24\PYTHON\Lib\site-packages\spss\binarystream.py", line 324, in readcache
    currentcase = self.unpackdata(binaryData)
  File "C:\PROGRA~1\IBM\SPSS\STATIS~1\24\PYTHON\Lib\site-packages\spss\binarystream.py", line 130, in unpackdata
    case[i] = unicode(case[i], encoding='utf-8')
UnicodeDecodeError: 'utf8' codec can't decode byte 0xbb in position 0: invalid start byte

I'm on V24 (no fix packs) 64 bit on Windows 10, and my python points to an Anaconda install (2.7).

The workaround is pretty simple here (if I convert GeoID to a number it reads in fine), but would like some input on what is causing the issue?

I've generated a few different errors when setting SPSS to local instead of unicode (get weird unicode escaped strings, and the variables are in a different order). I'm pretty sure there isn't any weirdo unicode lurking in my variables though (all are numeric except for the GeoID variable, and that imports ok by itself, per "MyDat2").
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Unicode error when importing SPSS data into python

Andy W
Well, maybe it is something to do with the order of the variables at least, just changing the order to

MyDat4 = AllSPSSdat(vars=["Lat","Lon","Orient","CID", "Order_", "Side","GeoID"])

will return results fine and dandy as well. So it isn't "GeoID" + another variable as I thought at first.
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Unicode error when importing SPSS data into python

Jon Peck
There is a bug in the binarystream reader used by the Cursor class when appears when variables are fetched out of order as here.  I have sent this to Development.  If you read the variables in file order, it seems to be okay.

You can avoid this by adding isBinary=False to the Cursor call.

On Wed, Dec 7, 2016 at 11:22 AM, Andy W <[hidden email]> wrote:
Well, maybe it is something to do with the order of the variables at least,
just changing the order to

MyDat4 = AllSPSSdat(vars=["Lat","Lon","Orient","CID", "Order_",
"Side","GeoID"])

will return results fine and dandy as well. So it isn't "GeoID" + another
variable as I thought at first.



-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Unicode-error-when-importing-SPSS-data-into-python-tp5733574p5733575.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD