SPSSX Discussion

Python Question

Classic

List

Threaded

10 messages Options

Craig Johnson

Python Question

For the first case in a dataset, I want to look at a range of variables and generate a list of those variables that are set to 1. How would I do that? Below is my attempt in python which isn't working.

Thanks

begin program.

varlist=[]

test=[]

# Populate list

for i in range(838, 891):

varlist.append(spss.GetVariableName(i))

#For Case 0 create a list of all variables equal to 1

for item in varlist[:]:

if datasetObj.cases[0] and item==1:

test.append(item)

print test

end program.

Craig Johnson

Re: Python Question

Oopps found and mistake I could fix. Below is the updated code. It's still not working quite right.

begin program.

spss.StartDataStep()

varlist=[]

test=[]

datasetObj=spss.Dataset('DataSet1')

# Populate list

for i in range(838, 891):

varlist.append(spss.GetVariableName(i))

#Convert branch variable list to string

#varstr=str(varlist).strip("[]").replace("\'","")

#print varstr

#For Case 0 create a list of all variables equal to 1

for item in varlist[:]:

if datasetObj.cases[0] and item==1:

test.append(item)

print test

spss.EndDataStep()

end program.

On Thu, Apr 25, 2013 at 4:35 PM, Craig J <[hidden email]> wrote:

For the first case in a dataset, I want to look at a range of variables and generate a list of those variables that are set to 1. How would I do that? Below is my attempt in python which isn't working.

Thanks

begin program.

varlist=[]

test=[]

# Populate list

for i in range(838, 891):

   varlist.append(spss.GetVariableName(i))

#For Case 0 create a list of all variables equal to 1

for item in varlist[:]:

   if datasetObj.cases[0] and item==1:

      test.append(item)

print test

end program.

Jon K Peck

Re: Python Question

Here is an easier way to do this.
begin program.
import spss, spssdata
case1 = spssdata.Spssdata().fetchone()
isone = [spss.GetVariableName(i) for i in range(838,891) if case1[i] == 1]
print isone
end program.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621

From: Craig J <[hidden email]>
To: [hidden email],
Date: 04/25/2013 05:43 PM
Subject: Re: [SPSSX-L] Python Question
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Oopps found and mistake I could fix. Below is the updated code. It's still not working quite right.

begin program.
spss.StartDataStep()
varlist=[]
test=[]
datasetObj=spss.Dataset('DataSet1')

# Populate list
for i in range(838, 891):
varlist.append(spss.GetVariableName(i))

#Convert branch variable list to string
#varstr=str(varlist).strip("[]").replace("\'","")
#print varstr

#For Case 0 create a list of all variables equal to 1
for item in varlist[:]:
if datasetObj.cases[0] and item==1:
test.append(item)

print test
spss.EndDataStep()
end program.

On Thu, Apr 25, 2013 at 4:35 PM, Craig J <cjohns38@...> wrote:
For the first case in a dataset, I want to look at a range of variables and generate a list of those variables that are set to 1. How would I do that? Below is my attempt in python which isn't working.

Thanks

begin program.

varlist=[]

test=[]

# Populate list

for i in range(838, 891):

varlist.append(spss.GetVariableName(i))

#For Case 0 create a list of all variables equal to 1

for item in varlist[:]:

if datasetObj.cases[0] and item==1:

test.append(item)

print test

end program.

John F Hall

Re: Python Question

In reply to this post by Craig Johnson

How about . . .? (untested)

temp.

select if $casenum = 1 .

mult resp groups y (v1 to vn (1))

/freq y.

John F Hall (Mr)

[Retired academic survey researcher]

Email: [hidden email]

Website: www.surveyresearch.weebly.com

Start page: www.surveyresearch.weebly.com/spss-without-tears.html

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Craig J
Sent: 26 April 2013 01:36
To: [hidden email]
Subject: Python Question

Thanks

begin program.

varlist=[]

test=[]

# Populate list

for i in range(838, 891):

varlist.append(spss.GetVariableName(i))

#For Case 0 create a list of all variables equal to 1

for item in varlist[:]:

if datasetObj.cases[0] and item==1:

test.append(item)

print test

end program.

Craig Johnson

Re: Python Question

In reply to this post by Jon K Peck

Hi Jon,

This is a continuation of a question I posted some time ago I’m still working to get the thing coded (http://spssx-discussion.1045642.n5.nabble.com/Python-Question-td5716276.html).

The response you provided would work but I’m trying to nest it into a bigger series of code. Brief version is in various data sets I have a range of variables coded 1/0 and need to find the fewest number of cases to “trigger” each variable (IE variable = 1). The process I outlined was was trying to create a list of variables, sum the list, sort, find the case with the largest number of triggered variables, mark it to keep, figure out the variables it triggered, remove triggered variables from the list, loop until all variables have been finished. Below is the code I started writing…obviously still a work in progress. Perhaps there is a better way to go about this entirely.

begin program.

import spss, spssdata

#Create initial variables

spss.Submit("COMPUTE KEEP=0.")

spss.Submit("COMPUTE FILTER_VAR=0.")

spss.Submit("Execute.")

#While BranchSum>0:

#Create a list of branch variables

varlist=[]

for i in range(838, 891): #ideally you funnel in the start/stop variable name but I didn't get that far yet

varlist.append(spss.GetVariableName(i))

#Convert branch variable list to string

varstr=str(varlist).strip("[]").replace("\'","")

print varstr

#Sum the branch variables

spss.Submit("COMPUTE BranchSum= Sum(" + varstr + ")." )

spss.Submit("Execute.")

#Sort

spss.Submit("Sort Cases by KEEP(A) BranchSum(D).")

#Keep first largest case

spss.Submit("DO IF $CASENUM=1.")

spss.Submit("COMPUTE KEEP=1.")

spss.Submit("END IF.")

spss.Submit("EXECUTE.")

#Identify variables to skip over summing

case1 = spssdata.Spssdata().fetchone()

isone = [spss.GetVariableName(i) for i in range(838,891) if case1[i] == 1]

#Remove list items

for item in isone[:]:

varlist.remove(item)

#Return to top of loop until all the variables have been triggered.

end program.

On Thu, Apr 25, 2013 at 4:58 PM, Jon K Peck <[hidden email]> wrote:

Here is an easier way to do this.
begin program.
import spss, spssdata
case1 = spssdata.Spssdata().fetchone()
isone = [spss.GetVariableName(i) for i in range(838,891) if case1[i] == 1]
print isone
end program.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: <a href="tel:720-342-5621" value="+17203425621" target="_blank">720-342-5621

From: Craig J <[hidden email]>
To: [hidden email],
Date: 04/25/2013 05:43 PM
Subject: Re: [SPSSX-L] Python Question
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Oopps found and mistake I could fix. Below is the updated code. It's still not working quite right.

begin program.
spss.StartDataStep()
varlist=[]
test=[]
datasetObj=spss.Dataset('DataSet1')

# Populate list
for i in range(838, 891):
varlist.append(spss.GetVariableName(i))

#Convert branch variable list to string
#varstr=str(varlist).strip("[]").replace("\'","")
#print varstr

#For Case 0 create a list of all variables equal to 1
for item in varlist[:]:
if datasetObj.cases[0] and item==1:
test.append(item)

print test
spss.EndDataStep()
end program.

On Thu, Apr 25, 2013 at 4:35 PM, Craig J <[hidden email]> wrote:
For the first case in a dataset, I want to look at a range of variables and generate a list of those variables that are set to 1. How would I do that? Below is my attempt in python which isn't working.

Thanks

begin program.
varlist=[]
test=[]

# Populate list
for i in range(838, 891):
   varlist.append(spss.GetVariableName(i))

#For Case 0 create a list of all variables equal to 1
for item in varlist[:]:
   if datasetObj.cases[0] and item==1:
      test.append(item)

print test
end program.

Jon K Peck

Re: Python Question

Interesting. I think this is formally similar to the transpose of the problem that the SPSSINC TURF extension command solves. That command finds the set of variables that maximizes the reach in a dataset, where reach means the percentage of cases with a positive response to at least one of the variables. So if you transposed the dataset and applied TURF, you would get what you want. TURF expects to work with data with many more cases than variables, but if the number of cases is not too large or can be sampled, it might work.

The thing to realize is that, either way, this is going to be a combinatorially hard problem. Using TURF, you would probably have to increase the Python recursion limit and set a very high maximum operations count. For just 100 cases, evaluating all the possibilities would require 10**20 set comparison operations.

If you are willing to go with a more heuristic approach, calculating the one's count, sorting, and adding cases until all the variables are covered seems reasonable. So first, do
ones = sum(a to z) # assuming these are all 0/1 variables
sort cases descending by ones.
Then pass the data to Python and use set union case by case until the length of the union matches the sample size.

My two cents worth.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621

From: Craig J <[hidden email]>
To: Jon K Peck/Chicago/IBM@IBMUS,
Cc: [hidden email]
Date: 04/26/2013 10:33 AM
Subject: Re: [SPSSX-L] Python Question

Hi Jon,

This is a continuation of a question I posted some time ago I’m still working to get the thing coded (http://spssx-discussion.1045642.n5.nabble.com/Python-Question-td5716276.html).

begin program.

import spss, spssdata

#Create initial variables

spss.Submit("COMPUTE KEEP=0.")

spss.Submit("COMPUTE FILTER_VAR=0.")

spss.Submit("Execute.")

#While BranchSum>0:

#Create a list of branch variables

varlist=[]

for i in range(838, 891): #ideally you funnel in the start/stop variable name but I didn't get that far yet

varlist.append(spss.GetVariableName(i))

#Convert branch variable list to string

varstr=str(varlist).strip("[]").replace("\'","")

print varstr

#Sum the branch variables

spss.Submit("COMPUTE BranchSum= Sum(" + varstr + ")." )

spss.Submit("Execute.")

#Sort

spss.Submit("Sort Cases by KEEP(A) BranchSum(D).")

#Keep first largest case

spss.Submit("DO IF $CASENUM=1.")

spss.Submit("COMPUTE KEEP=1.")

spss.Submit("END IF.")

spss.Submit("EXECUTE.")

#Identify variables to skip over summing

case1 = spssdata.Spssdata().fetchone()

isone = [spss.GetVariableName(i) for i in range(838,891) if case1[i] == 1]

#Remove list items

for item in isone[:]:

varlist.remove(item)

#Return to top of loop until all the variables have been triggered.

end program.

On Thu, Apr 25, 2013 at 4:58 PM, Jon K Peck <peck@...> wrote:
Here is an easier way to do this.
begin program.
import spss, spssdata
case1 = spssdata.Spssdata().fetchone()
isone = [spss.GetVariableName(i) for i in range(838,891) if case1[i] == 1]
print isone
end program.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
peck@...
phone: <a href="tel:720-342-5621" target=_blank>720-342-5621

From: Craig J <cjohns38@...>
To: [hidden email],
Date: 04/25/2013 05:43 PM
Subject: Re: [SPSSX-L] Python Question
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Thanks

begin program.

varlist=[]

test=[]

# Populate list

for i in range(838, 891):

varlist.append(spss.GetVariableName(i))

#For Case 0 create a list of all variables equal to 1

for item in varlist[:]:

if datasetObj.cases[0] and item==1:

test.append(item)

print test

end program.

David Marso

Re: Python Question

Administrator

In reply to this post by Craig Johnson

Perusing the cited thread:
That thread left off in November with a MATRIX program I created (why do I bother?).
You left the thread hanging. Did it NOT work?
You stated vaguely 'It is Not a matrix'
ANY rectangular numerical data structure can be treated as a matrix.
Have fun taming python!
Open the mind, the code will follow!
----

Craig Johnson wrote

Hi Jon,

This is a continuation of a question I posted some time ago I’m still
working to get the thing coded (
http://spssx-discussion.1045642.n5.nabble.com/Python-Question-td5716276.html).

The response you provided would work but I’m trying to nest it into a
bigger series of code. Brief version is in various data sets I have a
range of variables coded 1/0 and need to find the fewest number of cases to
“trigger” each variable (IE variable = 1). The process I outlined was was
trying to create a list of variables, sum the list, sort, find the case
with the largest number of triggered variables, mark it to keep, figure out
the variables it triggered, remove triggered variables from the list, loop
until all variables have been finished. Below is the code I started
writing…obviously still a work in progress. Perhaps there is a better way
to go about this entirely.

begin program.

import spss, spssdata

#Create initial variables

spss.Submit("COMPUTE KEEP=0.")

spss.Submit("COMPUTE FILTER_VAR=0.")

spss.Submit("Execute.")

#While BranchSum>0:

#Create a list of branch variables

varlist=[]

for i in range(838, 891): #ideally you funnel in the start/stop variable
name but I didn't get that far yet

varlist.append(spss.GetVariableName(i))

#Convert branch variable list to string

varstr=str(varlist).strip("[]").replace("\'","")

print varstr

#Sum the branch variables

spss.Submit("COMPUTE BranchSum= Sum(" + varstr + ")." )

spss.Submit("Execute.")

#Sort

spss.Submit("Sort Cases by KEEP(A) BranchSum(D).")

#Keep first largest case

spss.Submit("DO IF $CASENUM=1.")

spss.Submit("COMPUTE KEEP=1.")

spss.Submit("END IF.")

spss.Submit("EXECUTE.")

#Identify variables to skip over summing

case1 = spssdata.Spssdata().fetchone()

isone = [spss.GetVariableName(i) for i in range(838,891) if case1[i] ==
1]

#Remove list items

for item in isone[:]:

varlist.remove(item)

#Return to top of loop until all the variables have been triggered.

end program.

On Thu, Apr 25, 2013 at 4:58 PM, Jon K Peck <[hidden email]> wrote:

> Here is an easier way to do this.
> begin program.
> import spss, spssdata
> case1 = spssdata.Spssdata().fetchone()
> isone = [spss.GetVariableName(i) for i in range(838,891) if case1[i] == 1]
> print isone
> end program.
>
>
>
> Jon Peck (no "h") aka Kim
> Senior Software Engineer, IBM
> [hidden email]
> phone: 720-342-5621
>
>
>
>
> From: Craig J <[hidden email]>
> To: [hidden email],
> Date: 04/25/2013 05:43 PM
> Subject: Re: [SPSSX-L] Python Question
> Sent by: "SPSSX(r) Discussion" <[hidden email]>
> ------------------------------
>
>
>
> Oopps found and mistake I could fix. Below is the updated code. It's
> still not working quite right.
>
> begin program.
> spss.StartDataStep()
> varlist=[]
> test=[]
> datasetObj=spss.Dataset('DataSet1')
>
> # Populate list
> for i in range(838, 891):
> varlist.append(spss.GetVariableName(i))
>
> #Convert branch variable list to string
> #varstr=str(varlist).strip("[]").replace("\'","")
> #print varstr
>
> #For Case 0 create a list of all variables equal to 1
> for item in varlist[:]:
> if datasetObj.cases[0] and item==1:
> test.append(item)
>
> print test
> spss.EndDataStep()
> end program.
>
>
> On Thu, Apr 25, 2013 at 4:35 PM, Craig J <*[hidden email]*<[hidden email]>>
> wrote:
> For the first case in a dataset, I want to look at a range of variables
> and generate a list of those variables that are set to 1. How would I do
> that? Below is my attempt in python which isn't working.
>
>
> Thanks
>
>
>
> begin program.
>
> varlist=[]
>
> test=[]
>
>
>
> # Populate list
>
> for i in range(838, 891):
>
> varlist.append(spss.GetVariableName(i))
>
>
>
> #For Case 0 create a list of all variables equal to 1
>
> for item in varlist[:]:
>
> if datasetObj.cases[0] and item==1:
>
> test.append(item)
>
>
>
> print test
>
> end program.
>
>

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

Craig Johnson

Re: Python Question

In reply to this post by Jon K Peck

Hi Jon,

I don't think the TURF approach will work or I'm not using it correctly. The current test dataset I have has 37,000 cases and 18 variables and its been running for 45 minutes with no results but lots of "running." Back to the heuristic approach which is essentially what I use when I do this by hand. I don't follow you on the "set union case by case until the length of the union matches the sample size. " Any chance you could expand your explanation to 4 cents? There are a seem to be some level of expectation that I know that I'm doing in python. :) I'm pretty green with all my more complex coding experience SAS macros. Below is what I'm interpreting your response to mean. It seems like I'm missing some big chunks of code which would make things loop to find cases which trigger the logic hence the confusion......

begin program.

#Sum the branch variables

spss.Submit("COMPUTE BranchSum= Sum( var1 to var18)." )

spss.Submit("Execute.")

#Sort

spss.Submit("Sort Cases by BranchSum(D).")

#Set union case by case until length of union matches sample size??? By sample size do you mean # of variables to trigger?

*magic

end program.

Craig

On Fri, Apr 26, 2013 at 10:21 AM, Jon K Peck <[hidden email]> wrote:

Interesting. I think this is formally similar to the transpose of the problem that the SPSSINC TURF extension command solves. That command finds the set of variables that maximizes the reach in a dataset, where reach means the percentage of cases with a positive response to at least one of the variables. So if you transposed the dataset and applied TURF, you would get what you want. TURF expects to work with data with many more cases than variables, but if the number of cases is not too large or can be sampled, it might work.

The thing to realize is that, either way, this is going to be a combinatorially hard problem. Using TURF, you would probably have to increase the Python recursion limit and set a very high maximum operations count. For just 100 cases, evaluating all the possibilities would require 10**20 set comparison operations.

If you are willing to go with a more heuristic approach, calculating the one's count, sorting, and adding cases until all the variables are covered seems reasonable. So first, do
ones = sum(a to z) # assuming these are all 0/1 variables
sort cases descending by ones.
Then pass the data to Python and use set union case by case until the length of the union matches the sample size.

My two cents worth.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: <a href="tel:720-342-5621" value="+17203425621" target="_blank">720-342-5621

From: Craig J <[hidden email]>
To: Jon K Peck/Chicago/IBM@IBMUS,
Cc: [hidden email]
Date: 04/26/2013 10:33 AM
Subject: Re: [SPSSX-L] Python Question

Hi Jon,

This is a continuation of a question I posted some time ago I’m still working to get the thing coded (http://spssx-discussion.1045642.n5.nabble.com/Python-Question-td5716276.html).
The response you provided would work but I’m trying to nest it into a bigger series of code. Brief version is in various data sets I have a range of variables coded 1/0 and need to find the fewest number of cases to “trigger” each variable (IE variable = 1). The process I outlined was was trying to create a list of variables, sum the list, sort, find the case with the largest number of triggered variables, mark it to keep, figure out the variables it triggered, remove triggered variables from the list, loop until all variables have been finished. Below is the code I started writing…obviously still a work in progress. Perhaps there is a better way to go about this entirely.
begin program.
import spss, spssdata

#Create initial variables
spss.Submit("COMPUTE KEEP=0.")
spss.Submit("COMPUTE FILTER_VAR=0.")
spss.Submit("Execute.")

#While BranchSum>0:
   #Create a list of branch variables
   varlist=[]
   for i in range(838, 891): #ideally you funnel in the start/stop variable name but I didn't get that far yet
      varlist.append(spss.GetVariableName(i))

   #Convert branch variable list to string
   varstr=str(varlist).strip("[]").replace("\'","")
   print varstr

   #Sum the branch variables
   spss.Submit("COMPUTE BranchSum= Sum(" + varstr + ")." )
   spss.Submit("Execute.")

   #Sort
   spss.Submit("Sort Cases by KEEP(A) BranchSum(D).")

   #Keep first largest case
   spss.Submit("DO IF $CASENUM=1.")
   spss.Submit("COMPUTE KEEP=1.")
   spss.Submit("END IF.")
   spss.Submit("EXECUTE.")

   #Identify variables to skip over summing
   case1 = spssdata.Spssdata().fetchone()
   isone = [spss.GetVariableName(i) for i in range(838,891) if case1[i] == 1]

   #Remove list items

   for item in isone[:]:
      varlist.remove(item)
   #Return to top of loop until all the variables have been triggered.

end program.

On Thu, Apr 25, 2013 at 4:58 PM, Jon K Peck <[hidden email]> wrote:
Here is an easier way to do this.
begin program.
import spss, spssdata
case1 = spssdata.Spssdata().fetchone()
isone = [spss.GetVariableName(i) for i in range(838,891) if case1[i] == 1]
print isone
end program.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: <a href="tel:720-342-5621" target="_blank">720-342-5621

From: Craig J <[hidden email]>
To: [hidden email],
Date: 04/25/2013 05:43 PM
Subject: Re: [SPSSX-L] Python Question
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Oopps found and mistake I could fix. Below is the updated code. It's still not working quite right.

begin program.
spss.StartDataStep()
varlist=[]
test=[]
datasetObj=spss.Dataset('DataSet1')

# Populate list
for i in range(838, 891):
varlist.append(spss.GetVariableName(i))

#Convert branch variable list to string
#varstr=str(varlist).strip("[]").replace("\'","")
#print varstr

#For Case 0 create a list of all variables equal to 1
for item in varlist[:]:
if datasetObj.cases[0] and item==1:
test.append(item)

print test
spss.EndDataStep()
end program.

On Thu, Apr 25, 2013 at 4:35 PM, Craig J <[hidden email]> wrote:
For the first case in a dataset, I want to look at a range of variables and generate a list of those variables that are set to 1. How would I do that? Below is my attempt in python which isn't working.

Thanks

begin program.
varlist=[]
test=[]

# Populate list
for i in range(838, 891):
   varlist.append(spss.GetVariableName(i))

#For Case 0 create a list of all variables equal to 1
for item in varlist[:]:
   if datasetObj.cases[0] and item==1:
      test.append(item)

print test
end program.

Jon K Peck

Re: Python Question

This is what I had in mind. It just prints a list of the selected cases, but you could save that in some more convenient way.

compute ones = sum(V1 to V18).
sort cases ones(d).

begin program.
import spss, spssdata
nvars = 18
curs = spssdata.Spssdata("V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18")
varhits = set()
caselist = set()
for i, case in enumerate(curs): # cases will be numbered from 0
nvarhits = len(varhits) # variables accounted for so far
if nvarhits >= nvars: # 100% coverage
break;
onevars = [j for j in range(nvars) if case[j] == 1] # all the ones in this case
# add case to list if it will add to the number of variable hits
if len(varhits.union(onevars)) > nvarhits:
varhits.update(onevars)
caselist.add(i)
curs.CClose()
print "Variable hits:", len(varhits)
print "Cases Used:"
print caselist
end program.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621

From: Craig J <[hidden email]>
To: [hidden email],
Date: 04/29/2013 06:20 PM
Subject: Re: [SPSSX-L] Python Question
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Hi Jon,

I don't think the TURF approach will work or I'm not using it correctly. The current test dataset I have has 37,000 cases and 18 variables and its been running for 45 minutes with no results but lots of "running." Back to the heuristic approach which is essentially what I use when I do this by hand. I don't follow you on the "set union case by case until the length of the union matches the sample size. " Any chance you could expand your explanation to 4 cents? There are a seem to be some level of expectation that I know that I'm doing in python. :) I'm pretty green with all my more complex coding experience SAS macros. Below is what I'm interpreting your response to mean. It seems like I'm missing some big chunks of code which would make things loop to find cases which trigger the logic hence the confusion......

begin program.
#Sum the branch variables
spss.Submit("COMPUTE BranchSum= Sum( var1 to var18)." )
spss.Submit("Execute.")

#Sort
spss.Submit("Sort Cases by BranchSum(D).")

#Set union case by case until length of union matches sample size??? By sample size do you mean # of variables to trigger?
*magic

end program.

Craig

On Fri, Apr 26, 2013 at 10:21 AM, Jon K Peck <peck@...> wrote:
Interesting. I think this is formally similar to the transpose of the problem that the SPSSINC TURF extension command solves. That command finds the set of variables that maximizes the reach in a dataset, where reach means the percentage of cases with a positive response to at least one of the variables. So if you transposed the dataset and applied TURF, you would get what you want. TURF expects to work with data with many more cases than variables, but if the number of cases is not too large or can be sampled, it might work.

The thing to realize is that, either way, this is going to be a combinatorially hard problem. Using TURF, you would probably have to increase the Python recursion limit and set a very high maximum operations count. For just 100 cases, evaluating all the possibilities would require 10**20 set comparison operations.

If you are willing to go with a more heuristic approach, calculating the one's count, sorting, and adding cases until all the variables are covered seems reasonable. So first, do
ones = sum(a to z) # assuming these are all 0/1 variables
sort cases descending by ones.
Then pass the data to Python and use set union case by case until the length of the union matches the sample size.

My two cents worth.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
peck@...
phone: <a href="tel:720-342-5621" target=_blank>720-342-5621

From: Craig J <cjohns38@...>
To: Jon K Peck/Chicago/IBM@IBMUS,
Cc: [hidden email]
Date: 04/26/2013 10:33 AM
Subject: Re: [SPSSX-L] Python Question

Hi Jon,

This is a continuation of a question I posted some time ago I’m still working to get the thing coded (http://spssx-discussion.1045642.n5.nabble.com/Python-Question-td5716276.html).

begin program.

import spss, spssdata

#Create initial variables

spss.Submit("COMPUTE KEEP=0.")

spss.Submit("COMPUTE FILTER_VAR=0.")

spss.Submit("Execute.")

#While BranchSum>0:

#Create a list of branch variables

varlist=[]

for i in range(838, 891): #ideally you funnel in the start/stop variable name but I didn't get that far yet

varlist.append(spss.GetVariableName(i))

#Convert branch variable list to string

varstr=str(varlist).strip("[]").replace("\'","")

print varstr

#Sum the branch variables

spss.Submit("COMPUTE BranchSum= Sum(" + varstr + ")." )

spss.Submit("Execute.")

#Sort

spss.Submit("Sort Cases by KEEP(A) BranchSum(D).")

#Keep first largest case

spss.Submit("DO IF $CASENUM=1.")

spss.Submit("COMPUTE KEEP=1.")

spss.Submit("END IF.")

spss.Submit("EXECUTE.")

#Identify variables to skip over summing

case1 = spssdata.Spssdata().fetchone()

isone = [spss.GetVariableName(i) for i in range(838,891) if case1[i] == 1]

#Remove list items

for item in isone[:]:

varlist.remove(item)

#Return to top of loop until all the variables have been triggered.

end program.

Thanks

begin program.

varlist=[]

test=[]

# Populate list

for i in range(838, 891):

varlist.append(spss.GetVariableName(i))

#For Case 0 create a list of all variables equal to 1

for item in varlist[:]:

if datasetObj.cases[0] and item==1:

test.append(item)

print test

end program.

Craig Johnson

Re: Python Question

Hi Jon,

Thanks a million for the sample code. One thing that immediately sticks out to me is the amount of work done directly in python rather than passing data back and forth between python and SPSS which was my approach. Perhaps I was trying to make things more difficult than needed. I'm still trying to wrap my head around how python is "holding" all the SPSS data and what it's possible to do.

Anyway, thanks again I really appreciate it. I've learned a lot through this but it's obvious I still have a very long, long, ways to go! I have however made the first baby steps in starting to use Python with SPSS.

On Mon, Apr 29, 2013 at 7:40 PM, Jon K Peck <[hidden email]> wrote:

This is what I had in mind. It just prints a list of the selected cases, but you could save that in some more convenient way.

compute ones = sum(V1 to V18).
sort cases ones(d).

begin program.
import spss, spssdata
nvars = 18
curs = spssdata.Spssdata("V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18")
varhits = set()
caselist = set()
for i, case in enumerate(curs): # cases will be numbered from 0
nvarhits = len(varhits) # variables accounted for so far
if nvarhits >= nvars: # 100% coverage
break;
onevars = [j for j in range(nvars) if case[j] == 1] # all the ones in this case
# add case to list if it will add to the number of variable hits
if len(varhits.union(onevars)) > nvarhits:
varhits.update(onevars)
caselist.add(i)
curs.CClose()
print "Variable hits:", len(varhits)
print "Cases Used:"
print caselist
end program.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: <a href="tel:720-342-5621" value="+17203425621" target="_blank">720-342-5621

From: Craig J <[hidden email]>
To: [hidden email],
Date: 04/29/2013 06:20 PM
Subject: Re: [SPSSX-L] Python Question
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Hi Jon,

I don't think the TURF approach will work or I'm not using it correctly. The current test dataset I have has 37,000 cases and 18 variables and its been running for 45 minutes with no results but lots of "running." Back to the heuristic approach which is essentially what I use when I do this by hand. I don't follow you on the "set union case by case until the length of the union matches the sample size. " Any chance you could expand your explanation to 4 cents? There are a seem to be some level of expectation that I know that I'm doing in python. :) I'm pretty green with all my more complex coding experience SAS macros. Below is what I'm interpreting your response to mean. It seems like I'm missing some big chunks of code which would make things loop to find cases which trigger the logic hence the confusion......

begin program.
#Sum the branch variables
spss.Submit("COMPUTE BranchSum= Sum( var1 to var18)." )
spss.Submit("Execute.")

#Sort
spss.Submit("Sort Cases by BranchSum(D).")

#Set union case by case until length of union matches sample size??? By sample size do you mean # of variables to trigger?
*magic

end program.

Craig

On Fri, Apr 26, 2013 at 10:21 AM, Jon K Peck <[hidden email]> wrote:
Interesting. I think this is formally similar to the transpose of the problem that the SPSSINC TURF extension command solves. That command finds the set of variables that maximizes the reach in a dataset, where reach means the percentage of cases with a positive response to at least one of the variables. So if you transposed the dataset and applied TURF, you would get what you want. TURF expects to work with data with many more cases than variables, but if the number of cases is not too large or can be sampled, it might work.

The thing to realize is that, either way, this is going to be a combinatorially hard problem. Using TURF, you would probably have to increase the Python recursion limit and set a very high maximum operations count. For just 100 cases, evaluating all the possibilities would require 10**20 set comparison operations.

If you are willing to go with a more heuristic approach, calculating the one's count, sorting, and adding cases until all the variables are covered seems reasonable. So first, do
ones = sum(a to z) # assuming these are all 0/1 variables
sort cases descending by ones.
Then pass the data to Python and use set union case by case until the length of the union matches the sample size.

My two cents worth.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: <a href="tel:720-342-5621" target="_blank">720-342-5621

From: Craig J <[hidden email]>
To: Jon K Peck/Chicago/IBM@IBMUS,
Cc: [hidden email]
Date: 04/26/2013 10:33 AM
Subject: Re: [SPSSX-L] Python Question

Hi Jon,

This is a continuation of a question I posted some time ago I’m still working to get the thing coded (http://spssx-discussion.1045642.n5.nabble.com/Python-Question-td5716276.html).
The response you provided would work but I’m trying to nest it into a bigger series of code. Brief version is in various data sets I have a range of variables coded 1/0 and need to find the fewest number of cases to “trigger” each variable (IE variable = 1). The process I outlined was was trying to create a list of variables, sum the list, sort, find the case with the largest number of triggered variables, mark it to keep, figure out the variables it triggered, remove triggered variables from the list, loop until all variables have been finished. Below is the code I started writing…obviously still a work in progress. Perhaps there is a better way to go about this entirely.
begin program.
import spss, spssdata

#Create initial variables
spss.Submit("COMPUTE KEEP=0.")
spss.Submit("COMPUTE FILTER_VAR=0.")
spss.Submit("Execute.")

#While BranchSum>0:
   #Create a list of branch variables
   varlist=[]
   for i in range(838, 891): #ideally you funnel in the start/stop variable name but I didn't get that far yet
      varlist.append(spss.GetVariableName(i))

   #Convert branch variable list to string
   varstr=str(varlist).strip("[]").replace("\'","")
   print varstr

   #Sum the branch variables
   spss.Submit("COMPUTE BranchSum= Sum(" + varstr + ")." )
   spss.Submit("Execute.")

   #Sort
   spss.Submit("Sort Cases by KEEP(A) BranchSum(D).")

   #Keep first largest case
   spss.Submit("DO IF $CASENUM=1.")
   spss.Submit("COMPUTE KEEP=1.")
   spss.Submit("END IF.")
   spss.Submit("EXECUTE.")

   #Identify variables to skip over summing
   case1 = spssdata.Spssdata().fetchone()
   isone = [spss.GetVariableName(i) for i in range(838,891) if case1[i] == 1]

   #Remove list items
   for item in isone[:]:
      varlist.remove(item)
   #Return to top of loop until all the variables have been triggered.

end program.

On Thu, Apr 25, 2013 at 4:58 PM, Jon K Peck <[hidden email]> wrote:
Here is an easier way to do this.
begin program.
import spss, spssdata
case1 = spssdata.Spssdata().fetchone()
isone = [spss.GetVariableName(i) for i in range(838,891) if case1[i] == 1]
print isone
end program.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: <a href="tel:720-342-5621" target="_blank">720-342-5621

From: Craig J <[hidden email]>
To: [hidden email],
Date: 04/25/2013 05:43 PM
Subject: Re: [SPSSX-L] Python Question
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Oopps found and mistake I could fix. Below is the updated code. It's still not working quite right.

begin program.
spss.StartDataStep()
varlist=[]
test=[]
datasetObj=spss.Dataset('DataSet1')

# Populate list
for i in range(838, 891):
varlist.append(spss.GetVariableName(i))

#Convert branch variable list to string
#varstr=str(varlist).strip("[]").replace("\'","")
#print varstr

#For Case 0 create a list of all variables equal to 1
for item in varlist[:]:
if datasetObj.cases[0] and item==1:
test.append(item)

print test
spss.EndDataStep()
end program.

On Thu, Apr 25, 2013 at 4:35 PM, Craig J <[hidden email]> wrote:
For the first case in a dataset, I want to look at a range of variables and generate a list of those variables that are set to 1. How would I do that? Below is my attempt in python which isn't working.

Thanks

begin program.
varlist=[]
test=[]

# Populate list
for i in range(838, 891):
   varlist.append(spss.GetVariableName(i))

#For Case 0 create a list of all variables equal to 1
for item in varlist[:]:
   if datasetObj.cases[0] and item==1:
      test.append(item)

print test
end program.