|
Dear all,
I want to run some frequencies but I'd like to automatically filter out variables with more than 5 answer categories. I managed to construct Python code to do it but it's rather slow. I think it uses a data pass for every variable in the dataset in order to see how many values each variable contains. Is there any way to evaluate this condition for all variables simultaneously or speed up the code otherwise? The syntax is: begin program. from __future__ import with_statement import spss,spssdata varlist=[] for i in range(spss.GetVariableCount()): with spssdata.Spssdata(indexes=[i]) as curs: if len(set([j for j in curs])) <= 5: varlist.append(spss.GetVariableName(i)) spss.Submit("""fre %s"""%" ".join(varlist)) end program. TIA! Ruben van den Berg Consultant Models & Methods TNS NIPO Email: [hidden email] Mobiel: +31 6 24641435 Telefoon: +31 20 522 5738 Internet: www.tns-nipo.com New Windows 7: Find the right PC for you. Learn more. |
|
See below Jon Peck SPSS, an IBM Company [hidden email] 312-651-3435
Dear all, I want to run some frequencies but I'd like to automatically filter out variables with more than 5 answer categories. I managed to construct Python code to do it but it's rather slow. I think it uses a data pass for every variable in the dataset in order to see how many values each variable contains. Is there any way to evaluate this condition for all variables simultaneously or speed up the code otherwise? The syntax is: begin program. from __future__ import with_statement import spss,spssdata varlist=[] for i in range(spss.GetVariableCount()): with spssdata.Spssdata(indexes=[i]) as curs: if len(set([j for j in curs])) <= 5: varlist.append(spss.GetVariableName(i)) spss.Submit("""fre %s"""%" ".join(varlist)) end program. >>>You are doing a separate data pass for each variable. Here is an example that finds the variables in one data pass and then runs FREQUENCIES in one additional pass. As is, it checks all variables, but you could limit this to a specified set in the vardict = line. Note that another approach, suitable if data are set up appropriately, would be to filter based on variable measurement levels and/or to filter on the number of value labels. import spss, spssaux, spssdata maxvalues = 5 # criterion for value count spssaux.OpenDataFile("c:/spss18/samples/english/employee data.sav") vardict = spssaux.VariableDict() # could specify a specific list of variables to process varcount = len(vardict.variables) valuesets = [set() for i in range(varcount)] # a list of empty sets curs = spssdata.Spssdata(vardict.variables) # to accommodate specific variables for case in curs: for i in range(varcount): if len(valuesets[i]) <= maxvalues+1: # don't create unnecessary huge sets valuesets[i].add(case[i]) # accumulate list of values curs.CClose() varsforfreq = [] for i in range(varcount): if len(valuesets[i]) <= maxvalues: varsforfreq.append(vardict.variables[i]) if varsforfreq: spss.Submit("FREQ " + " ".join(varsforfreq)) TIA! Ruben van den Berg Consultant Models & Methods TNS NIPO Email: [hidden email] Mobiel: +31 6 24641435 Telefoon: +31 20 522 5738 Internet: www.tns-nipo.com New Windows 7: Find the right PC for you. Learn more. |
| Free forum by Nabble | Edit this page |
