For the first case in a dataset, I want to look at a range of variables and generate a list of those variables that are set to 1. How would I do that? Below is my attempt in python which isn't working.
begin program. varlist=[] test=[]
# Populate list for i in range(838, 891): varlist.append(spss.GetVariableName(i))
#For Case 0 create a list of all variables equal to 1 for item in varlist[:]: if datasetObj.cases[0] and item==1: test.append(item)
print test end program. |
Oopps found and mistake I could fix. Below is the updated code. It's still not working quite right. begin program. spss.StartDataStep() varlist=[]
test=[] datasetObj=spss.Dataset('DataSet1') # Populate list for i in range(838, 891): varlist.append(spss.GetVariableName(i)) #Convert branch variable list to string
#varstr=str(varlist).strip("[]").replace("\'","") #print varstr #For Case 0 create a list of all variables equal to 1 for item in varlist[:]:
if datasetObj.cases[0] and item==1: test.append(item) print test spss.EndDataStep() end program.
On Thu, Apr 25, 2013 at 4:35 PM, Craig J <[hidden email]> wrote:
|
Here is an easier way to do this.
begin program. import spss, spssdata case1 = spssdata.Spssdata().fetchone() isone = [spss.GetVariableName(i) for i in range(838,891) if case1[i] == 1] print isone end program. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Craig J <[hidden email]> To: [hidden email], Date: 04/25/2013 05:43 PM Subject: Re: [SPSSX-L] Python Question Sent by: "SPSSX(r) Discussion" <[hidden email]> Oopps found and mistake I could fix. Below is the updated code. It's still not working quite right. begin program. spss.StartDataStep() varlist=[] test=[] datasetObj=spss.Dataset('DataSet1') # Populate list for i in range(838, 891): varlist.append(spss.GetVariableName(i)) #Convert branch variable list to string #varstr=str(varlist).strip("[]").replace("\'","") #print varstr #For Case 0 create a list of all variables equal to 1 for item in varlist[:]: if datasetObj.cases[0] and item==1: test.append(item) print test spss.EndDataStep() end program. On Thu, Apr 25, 2013 at 4:35 PM, Craig J <cjohns38@...> wrote: For the first case in a dataset, I want to look at a range of variables and generate a list of those variables that are set to 1. How would I do that? Below is my attempt in python which isn't working.
begin program. varlist=[] test=[]
# Populate list for i in range(838, 891): varlist.append(spss.GetVariableName(i))
#For Case 0 create a list of all variables equal to 1 for item in varlist[:]: if datasetObj.cases[0] and item==1: test.append(item)
print test end program.
|
In reply to this post by Craig Johnson
How about . . .? (untested) temp. select if $casenum = 1 . mult resp groups y (v1 to vn (1)) /freq y. John F Hall (Mr) [Retired academic survey researcher] Email: [hidden email] Website: www.surveyresearch.weebly.com Start page: www.surveyresearch.weebly.com/spss-without-tears.html From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Craig J For the first case in a dataset, I want to look at a range of variables and generate a list of those variables that are set to 1. How would I do that? Below is my attempt in python which isn't working.
begin program. varlist=[] test=[] # Populate list for i in range(838, 891): varlist.append(spss.GetVariableName(i)) #For Case 0 create a list of all variables equal to 1 for item in varlist[:]: if datasetObj.cases[0] and item==1: test.append(item) print test end program. |
In reply to this post by Jon K Peck
Hi Jon,
The response you provided would work but I’m trying to nest it into a bigger series of code. Brief version is in various data sets I have a range of variables coded 1/0 and need to find the fewest number of cases to “trigger” each variable (IE variable = 1). The process I outlined was was trying to create a list of variables, sum the list, sort, find the case with the largest number of triggered variables, mark it to keep, figure out the variables it triggered, remove triggered variables from the list, loop until all variables have been finished. Below is the code I started writing…obviously still a work in progress. Perhaps there is a better way to go about this entirely. begin program. import spss, spssdata
#Create initial variables spss.Submit("COMPUTE KEEP=0.") spss.Submit("COMPUTE FILTER_VAR=0.") spss.Submit("Execute.")
#While BranchSum>0: #Create a list of branch variables varlist=[] for i in range(838, 891): #ideally you funnel in the start/stop variable name but I didn't get that far yet varlist.append(spss.GetVariableName(i))
#Convert branch variable list to string varstr=str(varlist).strip("[]").replace("\'","") print varstr
#Sum the branch variables spss.Submit("COMPUTE BranchSum= Sum(" + varstr + ")." ) spss.Submit("Execute.")
#Sort spss.Submit("Sort Cases by KEEP(A) BranchSum(D).")
#Keep first largest case spss.Submit("DO IF $CASENUM=1.") spss.Submit("COMPUTE KEEP=1.") spss.Submit("END IF.") spss.Submit("EXECUTE.")
#Identify variables to skip over summing case1 = spssdata.Spssdata().fetchone() isone = [spss.GetVariableName(i) for i in range(838,891) if case1[i] == 1]
#Remove list items for item in isone[:]: varlist.remove(item) #Return to top of loop until all the variables have been triggered.
end program. On Thu, Apr 25, 2013 at 4:58 PM, Jon K Peck <[hidden email]> wrote: Here is an easier way to do this. |
Interesting. I think this is formally
similar to the transpose of the problem that the SPSSINC TURF extension
command solves. That command finds the set of variables that maximizes
the reach in a dataset, where reach means the percentage of cases with
a positive response to at least one of the variables. So if you transposed
the dataset and applied TURF, you would get what you want. TURF expects
to work with data with many more cases than variables, but if the number
of cases is not too large or can be sampled, it might work.
The thing to realize is that, either way, this is going to be a combinatorially hard problem. Using TURF, you would probably have to increase the Python recursion limit and set a very high maximum operations count. For just 100 cases, evaluating all the possibilities would require 10**20 set comparison operations. If you are willing to go with a more heuristic approach, calculating the one's count, sorting, and adding cases until all the variables are covered seems reasonable. So first, do ones = sum(a to z) # assuming these are all 0/1 variables sort cases descending by ones. Then pass the data to Python and use set union case by case until the length of the union matches the sample size. My two cents worth. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Craig J <[hidden email]> To: Jon K Peck/Chicago/IBM@IBMUS, Cc: [hidden email] Date: 04/26/2013 10:33 AM Subject: Re: [SPSSX-L] Python Question Hi Jon,
The response you provided would work but I’m trying to nest it into a bigger series of code. Brief version is in various data sets I have a range of variables coded 1/0 and need to find the fewest number of cases to “trigger” each variable (IE variable = 1). The process I outlined was was trying to create a list of variables, sum the list, sort, find the case with the largest number of triggered variables, mark it to keep, figure out the variables it triggered, remove triggered variables from the list, loop until all variables have been finished. Below is the code I started writing…obviously still a work in progress. Perhaps there is a better way to go about this entirely. begin program. import spss, spssdata
#Create initial variables spss.Submit("COMPUTE KEEP=0.") spss.Submit("COMPUTE FILTER_VAR=0.") spss.Submit("Execute.")
#While BranchSum>0: #Create a list of branch variables varlist=[] for i in range(838, 891): #ideally you funnel in the start/stop variable name but I didn't get that far yet varlist.append(spss.GetVariableName(i))
#Convert branch variable list to string varstr=str(varlist).strip("[]").replace("\'","") print varstr
#Sum the branch variables spss.Submit("COMPUTE BranchSum= Sum(" + varstr + ")." ) spss.Submit("Execute.")
#Sort spss.Submit("Sort Cases by KEEP(A) BranchSum(D).")
#Keep first largest case spss.Submit("DO IF $CASENUM=1.") spss.Submit("COMPUTE KEEP=1.") spss.Submit("END IF.") spss.Submit("EXECUTE.")
#Identify variables to skip over summing case1 = spssdata.Spssdata().fetchone() isone = [spss.GetVariableName(i) for i in range(838,891) if case1[i] == 1]
#Remove list items
for item in isone[:]: varlist.remove(item) #Return to top of loop until all the variables have been triggered.
end program.
Oopps found and mistake I could fix. Below is the updated code. It's still not working quite right. begin program. spss.StartDataStep() varlist=[] test=[] datasetObj=spss.Dataset('DataSet1') # Populate list for i in range(838, 891): varlist.append(spss.GetVariableName(i)) #Convert branch variable list to string #varstr=str(varlist).strip("[]").replace("\'","") #print varstr #For Case 0 create a list of all variables equal to 1 for item in varlist[:]: if datasetObj.cases[0] and item==1: test.append(item) print test spss.EndDataStep() end program. On Thu, Apr 25, 2013 at 4:35 PM, Craig J <cjohns38@...> wrote: For the first case in a dataset, I want to look at a range of variables and generate a list of those variables that are set to 1. How would I do that? Below is my attempt in python which isn't working.
begin program. varlist=[] test=[]
# Populate list for i in range(838, 891): varlist.append(spss.GetVariableName(i))
#For Case 0 create a list of all variables equal to 1 for item in varlist[:]: if datasetObj.cases[0] and item==1: test.append(item)
print test end program.
|
Administrator
|
In reply to this post by Craig Johnson
Perusing the cited thread:
That thread left off in November with a MATRIX program I created (why do I bother?). You left the thread hanging. Did it NOT work? You stated vaguely 'It is Not a matrix' ANY rectangular numerical data structure can be treated as a matrix. Have fun taming python! Open the mind, the code will follow! ----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Jon K Peck
Hi Jon,
I don't think the TURF approach will work or I'm not using it correctly. The current test dataset I have has 37,000 cases and 18 variables and its been running for 45 minutes with no results but lots of "running." Back to the heuristic approach which is essentially what I use when I do this by hand. I don't follow you on the "set union case by case until the length of the union matches the sample size. " Any chance you could expand your explanation to 4 cents? There are a seem to be some level of expectation that I know that I'm doing in python. :) I'm pretty green with all my more complex coding experience SAS macros. Below is what I'm interpreting your response to mean. It seems like I'm missing some big chunks of code which would make things loop to find cases which trigger the logic hence the confusion......
begin program. #Sum the branch variables spss.Submit("COMPUTE BranchSum= Sum( var1 to var18)." ) spss.Submit("Execute.")
#Sort spss.Submit("Sort Cases by BranchSum(D).") #Set union case by case until length of union matches sample size??? By sample size do you mean # of variables to trigger?
*magic end program. Craig On Fri, Apr 26, 2013 at 10:21 AM, Jon K Peck <[hidden email]> wrote: Interesting. I think this is formally similar to the transpose of the problem that the SPSSINC TURF extension command solves. That command finds the set of variables that maximizes the reach in a dataset, where reach means the percentage of cases with a positive response to at least one of the variables. So if you transposed the dataset and applied TURF, you would get what you want. TURF expects to work with data with many more cases than variables, but if the number of cases is not too large or can be sampled, it might work. |
This is what I had in mind. It just
prints a list of the selected cases, but you could save that in some more
convenient way.
compute ones = sum(V1 to V18). sort cases ones(d). begin program. import spss, spssdata nvars = 18 curs = spssdata.Spssdata("V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18") varhits = set() caselist = set() for i, case in enumerate(curs): # cases will be numbered from 0 nvarhits = len(varhits) # variables accounted for so far if nvarhits >= nvars: # 100% coverage break; onevars = [j for j in range(nvars) if case[j] == 1] # all the ones in this case # add case to list if it will add to the number of variable hits if len(varhits.union(onevars)) > nvarhits: varhits.update(onevars) caselist.add(i) curs.CClose() print "Variable hits:", len(varhits) print "Cases Used:" print caselist end program. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Craig J <[hidden email]> To: [hidden email], Date: 04/29/2013 06:20 PM Subject: Re: [SPSSX-L] Python Question Sent by: "SPSSX(r) Discussion" <[hidden email]> Hi Jon, I don't think the TURF approach will work or I'm not using it correctly. The current test dataset I have has 37,000 cases and 18 variables and its been running for 45 minutes with no results but lots of "running." Back to the heuristic approach which is essentially what I use when I do this by hand. I don't follow you on the "set union case by case until the length of the union matches the sample size. " Any chance you could expand your explanation to 4 cents? There are a seem to be some level of expectation that I know that I'm doing in python. :) I'm pretty green with all my more complex coding experience SAS macros. Below is what I'm interpreting your response to mean. It seems like I'm missing some big chunks of code which would make things loop to find cases which trigger the logic hence the confusion...... begin program. #Sum the branch variables spss.Submit("COMPUTE BranchSum= Sum( var1 to var18)." ) spss.Submit("Execute.") #Sort spss.Submit("Sort Cases by BranchSum(D).") #Set union case by case until length of union matches sample size??? By sample size do you mean # of variables to trigger? *magic end program. Craig On Fri, Apr 26, 2013 at 10:21 AM, Jon K Peck <peck@...> wrote: Interesting. I think this is formally similar to the transpose of the problem that the SPSSINC TURF extension command solves. That command finds the set of variables that maximizes the reach in a dataset, where reach means the percentage of cases with a positive response to at least one of the variables. So if you transposed the dataset and applied TURF, you would get what you want. TURF expects to work with data with many more cases than variables, but if the number of cases is not too large or can be sampled, it might work. The thing to realize is that, either way, this is going to be a combinatorially hard problem. Using TURF, you would probably have to increase the Python recursion limit and set a very high maximum operations count. For just 100 cases, evaluating all the possibilities would require 10**20 set comparison operations. If you are willing to go with a more heuristic approach, calculating the one's count, sorting, and adding cases until all the variables are covered seems reasonable. So first, do ones = sum(a to z) # assuming these are all 0/1 variables sort cases descending by ones. Then pass the data to Python and use set union case by case until the length of the union matches the sample size. My two cents worth. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM peck@... phone: <a href="tel:720-342-5621" target=_blank>720-342-5621 From: Craig J <cjohns38@...> To: Jon K Peck/Chicago/IBM@IBMUS, Cc: [hidden email] Date: 04/26/2013 10:33 AM Subject: Re: [SPSSX-L] Python Question Hi Jon,
The response you provided would work but I’m trying to nest it into a bigger series of code. Brief version is in various data sets I have a range of variables coded 1/0 and need to find the fewest number of cases to “trigger” each variable (IE variable = 1). The process I outlined was was trying to create a list of variables, sum the list, sort, find the case with the largest number of triggered variables, mark it to keep, figure out the variables it triggered, remove triggered variables from the list, loop until all variables have been finished. Below is the code I started writing…obviously still a work in progress. Perhaps there is a better way to go about this entirely. begin program. import spss, spssdata
#Create initial variables spss.Submit("COMPUTE KEEP=0.") spss.Submit("COMPUTE FILTER_VAR=0.") spss.Submit("Execute.")
#While BranchSum>0: #Create a list of branch variables varlist=[] for i in range(838, 891): #ideally you funnel in the start/stop variable name but I didn't get that far yet varlist.append(spss.GetVariableName(i))
#Convert branch variable list to string varstr=str(varlist).strip("[]").replace("\'","") print varstr
#Sum the branch variables spss.Submit("COMPUTE BranchSum= Sum(" + varstr + ")." ) spss.Submit("Execute.")
#Sort spss.Submit("Sort Cases by KEEP(A) BranchSum(D).")
#Keep first largest case spss.Submit("DO IF $CASENUM=1.") spss.Submit("COMPUTE KEEP=1.") spss.Submit("END IF.") spss.Submit("EXECUTE.")
#Identify variables to skip over summing case1 = spssdata.Spssdata().fetchone() isone = [spss.GetVariableName(i) for i in range(838,891) if case1[i] == 1]
#Remove list items for item in isone[:]: varlist.remove(item) #Return to top of loop until all the variables have been triggered.
end program.
Oopps found and mistake I could fix. Below is the updated code. It's still not working quite right. begin program. spss.StartDataStep() varlist=[] test=[] datasetObj=spss.Dataset('DataSet1') # Populate list for i in range(838, 891): varlist.append(spss.GetVariableName(i)) #Convert branch variable list to string #varstr=str(varlist).strip("[]").replace("\'","") #print varstr #For Case 0 create a list of all variables equal to 1 for item in varlist[:]: if datasetObj.cases[0] and item==1: test.append(item) print test spss.EndDataStep() end program. On Thu, Apr 25, 2013 at 4:35 PM, Craig J <cjohns38@...> wrote: For the first case in a dataset, I want to look at a range of variables and generate a list of those variables that are set to 1. How would I do that? Below is my attempt in python which isn't working.
begin program. varlist=[] test=[]
# Populate list for i in range(838, 891): varlist.append(spss.GetVariableName(i))
#For Case 0 create a list of all variables equal to 1 for item in varlist[:]: if datasetObj.cases[0] and item==1: test.append(item)
print test end program.
|
Hi Jon,
Thanks a million for the sample code. One thing that immediately sticks out to me is the amount of work done directly in python rather than passing data back and forth between python and SPSS which was my approach. Perhaps I was trying to make things more difficult than needed. I'm still trying to wrap my head around how python is "holding" all the SPSS data and what it's possible to do.
Anyway, thanks again I really appreciate it. I've learned a lot through this but it's obvious I still have a very long, long, ways to go! I have however made the first baby steps in starting to use Python with SPSS.
On Mon, Apr 29, 2013 at 7:40 PM, Jon K Peck <[hidden email]> wrote: This is what I had in mind. It just prints a list of the selected cases, but you could save that in some more convenient way. |
Free forum by Nabble | Edit this page |