Python Question

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Python Question

Craig Johnson
This is my first attempt at writing some python code to work with SPSS. Big picture, I'm trying to is set up a system to select cases meeting certain changing criteria (Please note this is not for statistical purposes).  All variables are binary. I'd like to sum a variable range, select the largest value, delete out all variables set to 1 in the selected case, resum, take the highest number, delete out all variables set to 1 in the selected case  resum, take the highest number.....etc.  

I'm trying to break this down into baby steps I can handle....here is the first piece 

1) Supply a text variable name (starting point)
2) Identify the index of that variable name
3) Select the variable AFTER that index (Start of the binary variables)
4) Select the last variable in the dataset (end of the binary variables)

I'm going to be playing around with this but if anyone has insight into the steps I'd be interested in knowing how you'd handle it. 

Thanks! 
Reply | Threaded
Open this post in threaded view
|

Re: Python Question

Jon K Peck
I suggest that you study the things that the Dataset class can do.  You might also want to read some of the Python material in the Programming and Data Management book available from the SPSS Community site.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        Craig J <[hidden email]>
To:        [hidden email],
Date:        11/16/2012 06:23 PM
Subject:        [SPSSX-L] Python Question
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




This is my first attempt at writing some python code to work with SPSS. Big picture, I'm trying to is set up a system to select cases meeting certain changing criteria (Please note this is not for statistical purposes).  All variables are binary. I'd like to sum a variable range, select the largest value, delete out all variables set to 1 in the selected case, resum, take the highest number, delete out all variables set to 1 in the selected case  resum, take the highest number.....etc.  

I'm trying to break this down into baby steps I can handle....here is the first piece 

1) Supply a text variable name (starting point)
2) Identify the index of that variable name
3) Select the variable AFTER that index (Start of the binary variables)
4) Select the last variable in the dataset (end of the binary variables)

I'm going to be playing around with this but if anyone has insight into the steps I'd be interested in knowing how you'd handle it. 

Thanks! 
Reply | Threaded
Open this post in threaded view
|

Re: Python Question

David Marso
Administrator
This post was updated on .
In reply to this post by Craig Johnson
Please reread your description and realize this is terribly vague.
What does "sum a variable range" mean?
What does "select the largest value" mean?
What does "delete out all variables set to 1 in the selected case" mean?
When do you decide to stop?
What is this supposed to achieve ie What output?
Why are you presuming Python is the appropriate solution?
<edited:flipped the following two lines >
Have you looked at the SPSS MATRIX language?
See CSUM, RSUM, : indexing operator, LOOP END LOOP control .
<edited : upped the ante with the home-brew ;-)
I'll bet a home-brew that MATRIX will rip any python solution to pieces WRT processing efficiency!
<edited>:ADDED
Have you considered RANK?
 
--

Realize that my ESPss and InterneTelepathy gifts are legendary however the signal is weak.
---
Craig Johnson wrote
This is my first attempt at writing some python code to work with SPSS. Big
picture, I'm trying to is set up a system to select cases meeting certain
changing criteria (Please note this is not for statistical purposes).  All
variables are binary. I'd like to sum a variable range, select the largest
value, delete out all variables set to 1 in the selected case, resum, take
the highest number, delete out all variables set to 1 in the selected case
resum, take the highest number.....etc.

I'm trying to break this down into baby steps I can handle....here is the
first piece

1) Supply a text variable name (starting point)
2) Identify the index of that variable name
3) Select the variable AFTER that index (Start of the binary variables)
4) Select the last variable in the dataset (end of the binary variables)

I'm going to be playing around with this but if anyone has insight into the
steps I'd be interested in knowing how you'd handle it.

Thanks!
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Python Question

Craig Johnson
Please reread your description and realize this is terribly vague. 
* Purposely vauge
What does "sum a variable range" mean?
* Compute Tot=Sum(VarA to VarZ).
What does "select the largest value" mean? 
* Sort Cases Tot (A). Select If $casenum=1.
What does "delete out all variables set to 1 in the selected case" mean?
* If $casenum=1 and any of the variables for that case are set to one delete that variable.
When do you decide to stop?
* When the range is null
What is this supposed to achieve ie What output?
* Set of cases that have at least one case that has the binary variable =1.  This is not a statistical operation.
Why are you presuming Python is the appropriate solution?
* It's possible it could be done with SPSS syntax. However, "appropriate solutions" are usually in the eyes of the beholder. In this instance I'd like to use Python to start using the language.
See CSUM, RSUM, : indexing operator, LOOP END LOOP control .
*  Familiar with all of these.  
Have you looked at the SPSS MATRIX language?
* It's not a matrix.
I'll bet a MATRIX solution will rip any python solution to pieces WRT
processing efficiency!
* I'm using a duel quad core on a PC on roughly 50k to 500k cases.  I'm not exactly worried about sucking up processing power from a mainframe.  If it takes longer to run that's fine especially since it will only be ran once.

--

Realize that my ESPss and InterneTelepathy gifts are legendary however the
signal is weak.
---

Craig Johnson wrote
> This is my first attempt at writing some python code to work with SPSS.
> Big
> picture, I'm trying to is set up a system to select cases meeting certain
> changing criteria (Please note this is not for statistical purposes).  All
> variables are binary. I'd like to sum a variable range, select the largest
> value, delete out all variables set to 1 in the selected case, resum, take
> the highest number, delete out all variables set to 1 in the selected case
> resum, take the highest number.....etc.
>
> I'm trying to break this down into baby steps I can handle....here is the
> first piece
>
> 1) Supply a text variable name (starting point)
> 2) Identify the index of that variable name
> 3) Select the variable AFTER that index (Start of the binary variables)
> 4) Select the last variable in the dataset (end of the binary variables)
>
> I'm going to be playing around with this but if anyone has insight into
> the
> steps I'd be interested in knowing how you'd handle it.
>
> Thanks!





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Python-Question-tp5716276p5716278.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Python Question

David Marso
Administrator
FWIW:
---
INPUT PROGRAM.
LOOP ID=1 TO 50000.
DO REPEAT V=V001 TO V100.
COMPUTE V=TRUNC(UNIFORM(2)).
END REPEAT.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
EXE.

SET WORKSPACE=500000.
SET MXLOOPS=1000000.
MATRIX .

GET ID /FILE */ VAR ID .
GET V  /FILE */ VAR V001 TO V100.
COMPUTE #N=NROW(ID).
COMPUTE #P=NCOL(V).
COMPUTE EMPTY=MAKE(#N,1,0).
COMPUTE IDS={-9}.
COMPUTE MAXSUMS={-9}.

LOOP.
+  COMPUTE SumV=RSUM(V).
** I suspect the following line will bust python's caps **.
+  COMPUTE MAX=CMAX(SUMV).
+  COMPUTE #found=0.
+  DO IF (MAX GT 0).
+    LOOP #=1 TO #N.
+      DO IF (SumV(#) EQ MAX) AND NOT (#found).
+        COMPUTE #found=1.
+        COMPUTE IDS={IDS,ID(#)}.
+        COMPUTE MAXSUMS={MAXSUMS,MAX}.
+        LOOP ##=1 TO #P.
+          DO IF (V(#,##) EQ 1).
+            COMPUTE V(:,##)=EMPTY.
+          END IF.
+        END LOOP.
+      END IF.
+    END LOOP IF #found.
+  END IF.
END LOOP IF MAX=0.
COMPUTE IDS=IDS(2:NCOL(IDS)).
COMPUTE MAXSUMS=MAXSUMS(2:NCOL(MAXSUMS)).
PRINT IDS.
PRINT MAXSUMS.
END MATRIX.



Craig Johnson wrote
>
> Please reread your description and realize this is terribly vague.

* Purposely vauge

> What does "sum a variable range" mean?

* Compute Tot=Sum(VarA to VarZ).

> What does "select the largest value" mean?

* Sort Cases Tot (A). Select If $casenum=1.

> What does "delete out all variables set to 1 in the selected case" mean?

* If $casenum=1 and any of the variables for that case are set to one
delete that variable.

> When do you decide to stop?

* When the range is null

> What is this supposed to achieve ie What output?

* Set of cases that have at least one case that has the binary variable =1.
 This is not a statistical operation.

> Why are you presuming Python is the appropriate solution?

* It's possible it could be done with SPSS syntax. However, "appropriate
solutions" are usually in the eyes of the beholder. In this instance I'd
like to use Python to start using the language.

> See CSUM, RSUM, : indexing operator, LOOP END LOOP control .

*  Familiar with all of these.

> Have you looked at the SPSS MATRIX language?

* It's not a matrix.

> I'll bet a MATRIX solution will rip any python solution to pieces WRT
> processing efficiency!

* I'm using a duel quad core on a PC on roughly 50k to 500k cases.  I'm not
exactly worried about sucking up processing power from a mainframe.  If it
takes longer to run that's fine especially since it will only be ran once.

>
> --
>
> Realize that my ESPss and InterneTelepathy gifts are legendary however the
> signal is weak.
> ---
>
> Craig Johnson wrote
> > This is my first attempt at writing some python code to work with SPSS.
> > Big
> > picture, I'm trying to is set up a system to select cases meeting certain
> > changing criteria (Please note this is not for statistical purposes).
>  All
> > variables are binary. I'd like to sum a variable range, select the
> largest
> > value, delete out all variables set to 1 in the selected case, resum,
> take
> > the highest number, delete out all variables set to 1 in the selected
> case
> > resum, take the highest number.....etc.
> >
> > I'm trying to break this down into baby steps I can handle....here is the
> > first piece
> >
> > 1) Supply a text variable name (starting point)
> > 2) Identify the index of that variable name
> > 3) Select the variable AFTER that index (Start of the binary variables)
> > 4) Select the last variable in the dataset (end of the binary variables)
> >
> > I'm going to be playing around with this but if anyone has insight into
> > the
> > steps I'd be interested in knowing how you'd handle it.
> >
> > Thanks!
>
>
>
>
>
> -----
> Please reply to the list and not to my personal email.
> Those desiring my consulting or training services please feel free to
> email me.
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/Python-Question-tp5716276p5716278.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"