Cursor memory leak Python Extension 20.0.0.2?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Cursor memory leak Python Extension 20.0.0.2?

Jon Johnson-3
Hi

We have isolated the following code as being responsible for causing memory to accumulate and eventually the job runs out of memory, it worked OK under SPSS 18, so I suspect a memory leak in spss.Cursor() we are running the latest versions of SPSS20, 20.0.0.2 and the python extensions

Any help much appreciated

#   checking to see if any variables with all values set to sysmis or a missing value i.e. no valid data for any cases
labf = open(labfile, 'w')
firsterr=1   #  flag to allow for printing of headings for first error
numerr=0
validdata=0  #  default is that there is no valid data in variable
for i in xrange(varcount):
  validdata=0  #  default is that there is no valid data in variable
  dataCursor = spss.Cursor([i],'r')
  vallist = dataCursor.fetchall()
  dataCursor.close()
  for xval in xrange(len(vallist)) :
   if vallist[xval-1][0] or vallist[xval-1][0]==0 :          #   if var has a valid value (i.e. not none / false) for this case set validvalue flag to 1
     validdata=1
  if validdata==0 :    #   no valid values for this var for any cases
    if(firsterr==1):
      print "\n*******************************************************************************"
      firsterr=0
    print "Variable ",spss.GetVariableName(i)," has no valid values in the dataset "

thanks

Jon Johnson
Senior Database Manager
Centre for Longitudinal Studies
Email: [hidden email]
Tel: +44 (0)20 7612 6571
Follow CLS on Twitter www.twitter.com/clscohorts

Mailing Address
-------------------------------
Centre for Longitudinal Studies
Institute of Education
20 Bedford Way
London, WCIH 0AL

Office Location:
-------------------------------
55-59 Gordon Square
London
WC1H 0NT
-------------------------------------------------------
Following lives from birth and through the adult years.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Cursor memory leak Python Extension 20.0.0.2?

Albert-Jan Roskam

 hi,

perhaps running the garbage collector will help.
import gc
gc.collect()

Here are two versions of the same program. One requires spss, the other does not. Both use collections.Counter to do the counting of None.

# version 1
import collections
import spss
ncases = spss.GetCaseCount()
for i in xrange(spss.GetVariableCount()):
  try:
      dataCursor = spss.Cursor([i],'r')
      column = dataCursor.fetchall()
  finally:
      dataCursor.close()
  if collections.Counter(column)[None] == ncases:
      print "Variable %r has no valid values in the dataset" % spss.GetVariableName(i)

# version 2
import collections
from savReaderWriter import *
reader = SavReader("Employee data.sav")
ncases = len(reader)
with reader:
    for i in range(reader.numVars):
        column = reader[..., i]
        count = collections.Counter(column)[None]
        if count == ncases:
            print "variable %r has %d missing values" % (reader.varNames[i], count)


Regards,
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a
fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 

From: Jon Johnson <[hidden email]>
To: [hidden email]
Sent: Wednesday, June 19, 2013 11:50 AM
Subject: [SPSSX-L] Cursor memory leak Python Extension 20.0.0.2?

Hi

We have isolated the following code as being responsible for causing memory to accumulate and eventually the job runs out of memory, it worked OK under SPSS 18, so I suspect a memory leak in spss.Cursor() we are running the latest versions of SPSS20, 20.0.0.2 and the python extensions

Any help much appreciated

#  checking to see if any variables with all values set to sysmis or a missing value i.e. no valid data for any cases
labf = open(labfile, 'w')
firsterr=1  #  flag to allow for printing of headings for first error
numerr=0
validdata=0  #  default is that there is no valid data in variable
for i in xrange(varcount):
  validdata=0  #  default is that there is no valid data in variable
  dataCursor = spss.Cursor([i],'r')
  vallist = dataCursor.fetchall()
  dataCursor.close()
  for xval in xrange(len(vallist)) :
  if vallist[xval-1][0] or vallist[xval-1][0]==0 :          #  if var has a valid value (i.e. not none / false) for this case set validvalue flag to 1
    validdata=1
  if validdata==0 :    #  no valid values for this var for any cases
    if(firsterr==1):
      print "\n*******************************************************************************"
      firsterr=0
    print "Variable ",spss.GetVariableName(i)," has no valid values in the dataset "

thanks

Jon Johnson
Senior Database Manager
Centre for Longitudinal Studies
Email: [hidden email]
Tel: +44 (0)20 7612 6571
Follow CLS on Twitter www.twitter.com/clscohorts

Mailing Address
-------------------------------
Centre for Longitudinal Studies
Institute of Education
20 Bedford Way
London, WCIH 0AL

Office Location:
-------------------------------
55-59 Gordon Square
London
WC1H 0NT
-------------------------------------------------------
Following lives from birth and through the adult years.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Automatic reply: Cursor memory leak Python Extension 20.0.0.2?

Michelle D. Zeidman

I am on vacation and will return to the office on Monday, July 1. If you need to a response before then, please contact Celeste Gilman at [hidden email].

 

Thanks,

Michelle

 

Michelle Zeidman

Transit Program Operations Specialist

Transportation Services

University of Washington

Office: 206-616-6087

Cell: 206-518-1490

[hidden email]

 

Reply | Threaded
Open this post in threaded view
|

Re: Cursor memory leak Python Extension 20.0.0.2?

David Marso
Administrator
In reply to this post by Jon Johnson-3
So you are rewriting a tiny piece of AGGREGATE?
Rather silly approach IMO!
Jon Johnson-3 wrote
Hi

We have isolated the following code as being responsible for causing memory to accumulate and eventually the job runs out of memory, it worked OK under SPSS 18, so I suspect a memory leak in spss.Cursor() we are running the latest versions of SPSS20, 20.0.0.2 and the python extensions

Any help much appreciated

#   checking to see if any variables with all values set to sysmis or a missing value i.e. no valid data for any cases
labf = open(labfile, 'w')
firsterr=1   #  flag to allow for printing of headings for first error
numerr=0
validdata=0  #  default is that there is no valid data in variable
for i in xrange(varcount):
  validdata=0  #  default is that there is no valid data in variable
  dataCursor = spss.Cursor([i],'r')
  vallist = dataCursor.fetchall()
  dataCursor.close()
  for xval in xrange(len(vallist)) :
   if vallist[xval-1][0] or vallist[xval-1][0]==0 :          #   if var has a valid value (i.e. not none / false) for this case set validvalue flag to 1
     validdata=1
  if validdata==0 :    #   no valid values for this var for any cases
    if(firsterr==1):
      print "\n*******************************************************************************"
      firsterr=0
    print "Variable ",spss.GetVariableName(i)," has no valid values in the dataset "

thanks

Jon Johnson
Senior Database Manager
Centre for Longitudinal Studies
Email: [hidden email]
Tel: +44 (0)20 7612 6571
Follow CLS on Twitter www.twitter.com/clscohorts

Mailing Address
-------------------------------
Centre for Longitudinal Studies
Institute of Education
20 Bedford Way
London, WCIH 0AL

Office Location:
-------------------------------
55-59 Gordon Square
London
WC1H 0NT
-------------------------------------------------------
Following lives from birth and through the adult years.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Cursor memory leak Python Extension 20.0.0.2?

Jon K Peck
In reply to this post by Jon Johnson-3
First, I would like to know which process you are monitoring for a possible memory leak.  It can be a little difficult to tell from the Task Manager statistics what the actual memory usage is because of the automatic garbage collection behavior of Python (and Java).

Second, the code below is extremely inefficient.  It creates a cursor object and passes the entire dataset separately for each variable.  A much more efficient approach would be the way that the spssaux2.FindEmptyVars function works.  The spssaux2 module can be downloaded from the SPSS Community website in the Python utilities collection.  Usage examples are in the module.

HTH,

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Jon Johnson <[hidden email]>
To:        [hidden email],
Date:        06/19/2013 03:51 AM
Subject:        [SPSSX-L] Cursor memory leak Python Extension 20.0.0.2?
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Hi

We have isolated the following code as being responsible for causing memory to accumulate and eventually the job runs out of memory, it worked OK under SPSS 18, so I suspect a memory leak in spss.Cursor() we are running the latest versions of SPSS20, 20.0.0.2 and the python extensions

Any help much appreciated

#   checking to see if any variables with all values set to sysmis or a missing value i.e. no valid data for any cases
labf = open(labfile, 'w')
firsterr=1   #  flag to allow for printing of headings for first error
numerr=0
validdata=0  #  default is that there is no valid data in variable
for i in xrange(varcount):
 validdata=0  #  default is that there is no valid data in variable
 dataCursor = spss.Cursor([i],'r')
 vallist = dataCursor.fetchall()
 dataCursor.close()
 for xval in xrange(len(vallist)) :
  if vallist[xval-1][0] or vallist[xval-1][0]==0 :          #   if var has a valid value (i.e. not none / false) for this case set validvalue flag to 1
    validdata=1
 if validdata==0 :    #   no valid values for this var for any cases
   if(firsterr==1):
     print "\n*******************************************************************************"
     firsterr=0
   print "Variable ",spss.GetVariableName(i)," has no valid values in the dataset "

thanks

Jon Johnson
Senior Database Manager
Centre for Longitudinal Studies
Email: [hidden email]
Tel: +44 (0)20 7612 6571
Follow CLS on Twitter
www.twitter.com/clscohorts

Mailing Address
-------------------------------
Centre for Longitudinal Studies
Institute of Education
20 Bedford Way
London, WCIH 0AL

Office Location:
-------------------------------
55-59 Gordon Square
London
WC1H 0NT
-------------------------------------------------------
Following lives from birth and through the adult years.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD