SPSSX Discussion

Build variable list of variables containing small number of different values in Python

Classic

List

Threaded

2 messages Options

Ruben Geert van den Berg

Build variable list of variables containing small number of different values in Python

New Windows 7: Find the right PC for you. Learn more.

Jon K Peck

Re: Build variable list of variables containing small number of different values in Python

See below
Jon Peck
SPSS, an IBM Company
[hidden email]
312-651-3435

From:	Ruben van den Berg <[hidden email]>
To:	[hidden email]
Date:	07/15/2010 08:52 AM
Subject:	[SPSSX-L] Build variable list of variables containing small number of different values in Python
Sent by:	"SPSSX(r) Discussion" <[hidden email]>

Dear all,

I want to run some frequencies but I'd like to automatically filter out variables with more than 5 answer categories. I managed to construct Python code to do it but it's rather slow. I think it uses a data pass for every variable in the dataset in order to see how many values each variable contains. Is there any way to evaluate this condition for all variables simultaneously or speed up the code otherwise?

The syntax is:

begin program.
from __future__ import with_statement
import spss,spssdata
varlist=[]
for i in range(spss.GetVariableCount()):
with spssdata.Spssdata(indexes=[i]) as curs:
if len(set([j for j in curs])) <= 5:
varlist.append(spss.GetVariableName(i))
spss.Submit("""fre %s"""%" ".join(varlist))
end program.

>>>You are doing a separate data pass for each variable. Here is an example that finds the variables in one data pass and then runs FREQUENCIES in one additional pass.
As is, it checks all variables, but you could limit this to a specified set in the vardict = line.

Note that another approach, suitable if data are set up appropriately, would be to filter based on variable measurement levels and/or to filter on the number of value labels.

import spss, spssaux, spssdata

maxvalues = 5 # criterion for value count

spssaux.OpenDataFile("c:/spss18/samples/english/employee data.sav")
vardict = spssaux.VariableDict() # could specify a specific list of variables to process
varcount = len(vardict.variables)
valuesets = [set() for i in range(varcount)] # a list of empty sets
curs = spssdata.Spssdata(vardict.variables) # to accommodate specific variables
for case in curs:
for i in range(varcount):
if len(valuesets[i]) <= maxvalues+1: # don't create unnecessary huge sets
valuesets[i].add(case[i]) # accumulate list of values
curs.CClose()

varsforfreq = []
for i in range(varcount):
if len(valuesets[i]) <= maxvalues:
varsforfreq.append(vardict.variables[i])
if varsforfreq:
spss.Submit("FREQ " + " ".join(varsforfreq))

TIA!

Ruben van den Berg
Consultant Models & Methods
TNS NIPO
Email: [hidden email]
Mobiel: +31 6 24641435
Telefoon: +31 20 522 5738
Internet: www.tns-nipo.com

New Windows 7: Find the right PC for you. Learn more.