I get a pretty esoteric complaint about a unicode error for the below code, and I can only replicate when I import both the GeoID and the Side variable at the same time (whereas importing one or the other they are fine, see the MyDat1 and MyDat2).
**********************************************************************************. *Get the test file from online. SPSSINC GETURI DATA URI="https://dl.dropboxusercontent.com/u/3385251/TestSet.sav" FILETYPE=SAV DATASET=TestSet. *Note it is in Unicode. *Now importing data into python. BEGIN PROGRAM Python. import spss def AllSPSSdat(vars): if vars == None: varNums = range(spss.GetVariableCount()) else: allvars = [spss.GetVariableName(i) for i in range(spss.GetVariableCount())] varNums = [allvars.index(i) for i in vars] data = spss.Cursor(varNums) pydata = data.fetchall() data.close() return pydata #This works MyDat1 = AllSPSSdat(vars=["Lat","Lon","Orient","CID","Order_","Side"]) print MyDat1[0] #This works MyDat2 = AllSPSSdat(vars=["Lat","Lon","Orient","CID", "Order_","GeoID"]) print MyDat2[0] #But dumping all variables at once does not work MyDat3 = AllSPSSdat(vars=["Lat","Lon","Orient","GeoID", "CID", "Order_", "Side"]) print MyDat3[0] END PROGRAM. **********************************************************************************. The exact error is: > Traceback (most recent call last): File "<string>", line 20, in <module> File "<string>", line 10, in AllSPSSdat File "C:\PROGRA~1\IBM\SPSS\STATIS~1\24\PYTHON\Lib\site-packages\spss\cursors.py", line 1313, in fetchall data.append(self.binaryStream.fetchData()) File "C:\PROGRA~1\IBM\SPSS\STATIS~1\24\PYTHON\Lib\site-packages\spss\binarystream.py", line 406, in fetchData activeCase = self.readcache(data) File "C:\PROGRA~1\IBM\SPSS\STATIS~1\24\PYTHON\Lib\site-packages\spss\binarystream.py", line 324, in readcache currentcase = self.unpackdata(binaryData) File "C:\PROGRA~1\IBM\SPSS\STATIS~1\24\PYTHON\Lib\site-packages\spss\binarystream.py", line 130, in unpackdata case[i] = unicode(case[i], encoding='utf-8') UnicodeDecodeError: 'utf8' codec can't decode byte 0xbb in position 0: invalid start byte I'm on V24 (no fix packs) 64 bit on Windows 10, and my python points to an Anaconda install (2.7). The workaround is pretty simple here (if I convert GeoID to a number it reads in fine), but would like some input on what is causing the issue? I've generated a few different errors when setting SPSS to local instead of unicode (get weird unicode escaped strings, and the variables are in a different order). I'm pretty sure there isn't any weirdo unicode lurking in my variables though (all are numeric except for the GeoID variable, and that imports ok by itself, per "MyDat2"). |
Well, maybe it is something to do with the order of the variables at least, just changing the order to
MyDat4 = AllSPSSdat(vars=["Lat","Lon","Orient","CID", "Order_", "Side","GeoID"]) will return results fine and dandy as well. So it isn't "GeoID" + another variable as I thought at first. |
There is a bug in the binarystream reader used by the Cursor class when appears when variables are fetched out of order as here. I have sent this to Development. If you read the variables in file order, it seems to be okay. You can avoid this by adding isBinary=False to the Cursor call. On Wed, Dec 7, 2016 at 11:22 AM, Andy W <[hidden email]> wrote: Well, maybe it is something to do with the order of the variables at least, |
Free forum by Nabble | Edit this page |