Hi
We have isolated the following code as being responsible for causing memory to accumulate and eventually the job runs out of memory, it worked OK under SPSS 18, so I suspect a memory leak in spss.Cursor() we are running the latest versions of SPSS20, 20.0.0.2 and the python extensions Any help much appreciated # checking to see if any variables with all values set to sysmis or a missing value i.e. no valid data for any cases labf = open(labfile, 'w') firsterr=1 # flag to allow for printing of headings for first error numerr=0 validdata=0 # default is that there is no valid data in variable for i in xrange(varcount): validdata=0 # default is that there is no valid data in variable dataCursor = spss.Cursor([i],'r') vallist = dataCursor.fetchall() dataCursor.close() for xval in xrange(len(vallist)) : if vallist[xval-1][0] or vallist[xval-1][0]==0 : # if var has a valid value (i.e. not none / false) for this case set validvalue flag to 1 validdata=1 if validdata==0 : # no valid values for this var for any cases if(firsterr==1): print "\n*******************************************************************************" firsterr=0 print "Variable ",spss.GetVariableName(i)," has no valid values in the dataset " thanks Jon Johnson Senior Database Manager Centre for Longitudinal Studies Email: [hidden email] Tel: +44 (0)20 7612 6571 Follow CLS on Twitter www.twitter.com/clscohorts Mailing Address ------------------------------- Centre for Longitudinal Studies Institute of Education 20 Bedford Way London, WCIH 0AL Office Location: ------------------------------- 55-59 Gordon Square London WC1H 0NT ------------------------------------------------------- Following lives from birth and through the adult years. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
hi, perhaps running the garbage collector will help. import gc gc.collect() Here are two versions of the same program. One requires spss, the other does not. Both use collections.Counter to do the counting of None. # version 1 import collections import spss ncases = spss.GetCaseCount() for i in xrange(spss.GetVariableCount()): try: dataCursor = spss.Cursor([i],'r') column = dataCursor.fetchall() finally: dataCursor.close() if collections.Counter(column)[None] == ncases: print "Variable %r has no valid values in the dataset" % spss.GetVariableName(i) # version 2 import collections from savReaderWriter import * reader = SavReader("Employee data.sav") ncases = len(reader) with reader: for i in range(reader.numVars): column = reader[..., i] count = collections.Counter(column)[None] if count == ncases: print "variable %r has %d missing values" % (reader.varNames[i], count) Regards, Albert-Jan ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
I am on vacation and will return to the office on Monday, July 1. If you need to a response before then, please contact Celeste Gilman at [hidden email]. Thanks, Michelle Michelle Zeidman Transit Program Operations Specialist Transportation Services University of Washington Office: 206-616-6087 Cell: 206-518-1490 |
Administrator
|
In reply to this post by Jon Johnson-3
So you are rewriting a tiny piece of AGGREGATE?
Rather silly approach IMO!
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Jon Johnson-3
First, I would like to know which process
you are monitoring for a possible memory leak. It can be a little
difficult to tell from the Task Manager statistics what the actual memory
usage is because of the automatic garbage collection behavior of Python
(and Java).
Second, the code below is extremely inefficient. It creates a cursor object and passes the entire dataset separately for each variable. A much more efficient approach would be the way that the spssaux2.FindEmptyVars function works. The spssaux2 module can be downloaded from the SPSS Community website in the Python utilities collection. Usage examples are in the module. HTH, Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Jon Johnson <[hidden email]> To: [hidden email], Date: 06/19/2013 03:51 AM Subject: [SPSSX-L] Cursor memory leak Python Extension 20.0.0.2? Sent by: "SPSSX(r) Discussion" <[hidden email]> Hi We have isolated the following code as being responsible for causing memory to accumulate and eventually the job runs out of memory, it worked OK under SPSS 18, so I suspect a memory leak in spss.Cursor() we are running the latest versions of SPSS20, 20.0.0.2 and the python extensions Any help much appreciated # checking to see if any variables with all values set to sysmis or a missing value i.e. no valid data for any cases labf = open(labfile, 'w') firsterr=1 # flag to allow for printing of headings for first error numerr=0 validdata=0 # default is that there is no valid data in variable for i in xrange(varcount): validdata=0 # default is that there is no valid data in variable dataCursor = spss.Cursor([i],'r') vallist = dataCursor.fetchall() dataCursor.close() for xval in xrange(len(vallist)) : if vallist[xval-1][0] or vallist[xval-1][0]==0 : # if var has a valid value (i.e. not none / false) for this case set validvalue flag to 1 validdata=1 if validdata==0 : # no valid values for this var for any cases if(firsterr==1): print "\n*******************************************************************************" firsterr=0 print "Variable ",spss.GetVariableName(i)," has no valid values in the dataset " thanks Jon Johnson Senior Database Manager Centre for Longitudinal Studies Email: [hidden email] Tel: +44 (0)20 7612 6571 Follow CLS on Twitter www.twitter.com/clscohorts Mailing Address ------------------------------- Centre for Longitudinal Studies Institute of Education 20 Bedford Way London, WCIH 0AL Office Location: ------------------------------- 55-59 Gordon Square London WC1H 0NT ------------------------------------------------------- Following lives from birth and through the adult years. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |