I have a table that contains repeating terms (term) by student ID (ID)
which also contains a binary (prob) of whether for a specific term the student has been on probation. I want to create two new columns one of which is a cumulative count (pcnt) and the other a cumulative percent (ppct) of all probation activity for the student from any term backwards in time. I can easily create the total count & percent for all time of the student via the aggregate command & append to each record but I would like a cumulative count/percent to appear at each term. I would appreciate any suggestions (no python please). Listed below are the 3 original columns & the two desired columns when coding is complete. ID, term, prob, pcnt, ppct 12 19973 0 0 0 12 19981 0 0 0 12 19982 1 1 .33 12 19993 0 1 .25 12 20001 1 2 .40 12 20003 1 3 .50 12 20011 0 3 .43 12 20023 1 4 .50 12 20031 0 4 .44 (term is defined as calendar year digits 1-4 and semester code 5th digit in which 1=spring, 2=summer,3=fall) ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
DATA LIST FREE / ID term prob pcntX ppctX.
begin data 12 19973 0 0 0 12 19981 0 0 0 12 19982 1 1 .33 12 19993 0 1 .25 12 20001 1 2 .40 12 20003 1 3 .50 12 20011 0 3 .43 12 20023 1 4 .50 12 20031 0 4 .44 13 19973 0 0 0 13 19981 0 0 0 13 19982 1 1 .33 13 19993 0 1 .25 13 20001 1 2 .40 13 20003 1 3 .50 13 20011 0 3 .43 13 20023 1 4 .50 13 20031 0 4 .44 end data. SPLIT FILE BY ID. COMPUTE C=1. CREATE CN CP=CSUM(C prob). COMPUTE ppct=cp/cn. ** Alternatively **. DO IF $CASENUM=1 OR LAG(ID) NE ID. + COMPUTE CN=prob. + COMPUTE c=1. ELSE. + COMPUTE CN=SUM(LAG(CN),prob). + COMPUTE c=lag(c)+1. END IF. COMPUTE ppct=cn/c. exe.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by wsu_wright
Thanks David, that should do the trick. Happy Holidays...
On Sat, Dec 24, 2011 at 11:48 AM, David Marso wrote: > DATA LIST FREE / ID term prob pcntX ppctX. > begin data > 12 19973 0 0 0 > 12 19981 0 0 0 > 12 19982 1 1 .33 > 12 19993 0 1 .25 > 12 20001 1 2 .40 > 12 20003 1 3 .50 > 12 20011 0 3 .43 > 12 20023 1 4 .50 > 12 20031 0 4 .44 > 13 19973 0 0 0 > 13 19981 0 0 0 > 13 19982 1 1 .33 > 13 19993 0 1 .25 > 13 20001 1 2 .40 > 13 20003 1 3 .50 > 13 20011 0 3 .43 > 13 20023 1 4 .50 > 13 20031 0 4 .44 > end data. > SPLIT FILE BY ID. > COMPUTE C=1. > CREATE CN CP=CSUM(C prob). > COMPUTE ppct=cp/cn. > > ** Alternatively **. > DO IF $CASENUM=1 OR LAG(ID) NE ID. > + COMPUTE CN=prob. > + COMPUTE c=1. > ELSE. > + COMPUTE CN=SUM(LAG(CN),prob). > + COMPUTE c=lag(c)+1. > END IF. > COMPUTE ppct=cn/c. > exe. > > David Wright-6 wrote >> >> I have a table that contains repeating terms (term) by student ID >> (ID) >> which also contains a binary (prob) of whether for a specific term >> the >> student has been on probation. I want to create two new columns one >> of >> which is a cumulative count (pcnt) and the other a cumulative percent >> (ppct) of all probation activity for the student from any term >> backwards >> in time. I can easily create the total count & percent for all time >> of >> the student via the aggregate command & append to each record but I >> would like a cumulative count/percent to appear at each term. I >> would >> appreciate any suggestions (no python please). Listed below are the >> 3 >> original columns & the two desired columns when coding is complete. >> >> ID, term, prob, pcnt, ppct >> 12 19973 0 0 0 >> 12 19981 0 0 0 >> 12 19982 1 1 .33 >> 12 19993 0 1 .25 >> 12 20001 1 2 .40 >> 12 20003 1 3 .50 >> 12 20011 0 3 .43 >> 12 20023 1 4 .50 >> 12 20031 0 4 .44 >> >> (term is defined as calendar year digits 1-4 and semester code 5th >> digit >> in which 1=spring, 2=summer,3=fall) >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> LISTSERV@.UGA (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> > > > -- > View this message in context: > http://spssx-discussion.1045642.n5.nabble.com/create-cumulative-count-percent-for-a-repeating-term-table-tp5099383p5099491.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except > the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
You are welcome!
Even leaner and meaner (if perhaps slightly obscure ;-) ** Alternatively **. COMPUTE CN=SUM(prob,LAG(CN)*(LAG(ID) EQ ID)). COMPUTE C= SUM(1, LAG(C) *(LAG(ID) EQ ID)). COMPUTE ppct=cn/c. exe.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by wsu_wright
David,
What does the "1," in the second compute line do? Is this serving the same as your earlier "DO IF $CASENUM=1" to get around the starting case in the file? If so, without the DO IF, how does SPSS treat the 1 after the first case processing? On Sun, Dec 25, 2011 at 4:52 PM, David Marso wrote: > You are welcome! > Even leaner and meaner (if perhaps slightly obscure ;-) > ** Alternatively **. > COMPUTE CN=SUM(prob,LAG(CN)*(LAG(ID) EQ ID)). > COMPUTE C= SUM(1, LAG(C) *(LAG(ID) EQ ID)). > COMPUTE ppct=cn/c. > exe. > > David Wright-6 wrote >> >> Thanks David, that should do the trick. Happy Holidays... >> >> >> On Sat, Dec 24, 2011 at 11:48 AM, David Marso wrote: >> >>> DATA LIST FREE / ID term prob pcntX ppctX. >>> begin data >>> 12 19973 0 0 0 >>> 12 19981 0 0 0 >>> 12 19982 1 1 .33 >>> 12 19993 0 1 .25 >>> 12 20001 1 2 .40 >>> 12 20003 1 3 .50 >>> 12 20011 0 3 .43 >>> 12 20023 1 4 .50 >>> 12 20031 0 4 .44 >>> 13 19973 0 0 0 >>> 13 19981 0 0 0 >>> 13 19982 1 1 .33 >>> 13 19993 0 1 .25 >>> 13 20001 1 2 .40 >>> 13 20003 1 3 .50 >>> 13 20011 0 3 .43 >>> 13 20023 1 4 .50 >>> 13 20031 0 4 .44 >>> end data. >>> SPLIT FILE BY ID. >>> COMPUTE C=1. >>> CREATE CN CP=CSUM(C prob). >>> COMPUTE ppct=cp/cn. >>> >>> ** Alternatively **. >>> DO IF $CASENUM=1 OR LAG(ID) NE ID. >>> + COMPUTE CN=prob. >>> + COMPUTE c=1. >>> ELSE. >>> + COMPUTE CN=SUM(LAG(CN),prob). >>> + COMPUTE c=lag(c)+1. >>> END IF. >>> COMPUTE ppct=cn/c. >>> exe. >>> >>> David Wright-6 wrote >>>> >>>> I have a table that contains repeating terms (term) by student ID >>>> (ID) >>>> which also contains a binary (prob) of whether for a specific term >>>> the >>>> student has been on probation. I want to create two new columns >>>> one >>>> of >>>> which is a cumulative count (pcnt) and the other a cumulative >>>> percent >>>> (ppct) of all probation activity for the student from any term >>>> backwards >>>> in time. I can easily create the total count & percent for all >>>> time >>>> of >>>> the student via the aggregate command & append to each record but I >>>> would like a cumulative count/percent to appear at each term. I >>>> would >>>> appreciate any suggestions (no python please). Listed below are >>>> the >>>> 3 >>>> original columns & the two desired columns when coding is complete. >>>> >>>> ID, term, prob, pcnt, ppct >>>> 12 19973 0 0 0 >>>> 12 19981 0 0 0 >>>> 12 19982 1 1 .33 >>>> 12 19993 0 1 .25 >>>> 12 20001 1 2 .40 >>>> 12 20003 1 3 .50 >>>> 12 20011 0 3 .43 >>>> 12 20023 1 4 .50 >>>> 12 20031 0 4 .44 >>>> >>>> (term is defined as calendar year digits 1-4 and semester code 5th >>>> digit >>>> in which 1=spring, 2=summer,3=fall) >>>> >>>> ===================== >>>> To manage your subscription to SPSSX-L, send a message to >>>> LISTSERV@.UGA (not to SPSSX-L), with no body text except the >>>> command. To leave the list, send the command >>>> SIGNOFF SPSSX-L >>>> For a list of commands to manage subscriptions, send the command >>>> INFO REFCARD >>>> >>> >>> >>> -- >>> View this message in context: >>> >>> http://spssx-discussion.1045642.n5.nabble.com/create-cumulative-count-percent-for-a-repeating-term-table-tp5099383p5099491.html >>> Sent from the SPSSX Discussion mailing list archive at Nabble.com. >>> >>> ===================== >>> To manage your subscription to SPSSX-L, send a message to >>> LISTSERV@.UGA (not to SPSSX-L), with no body text except >>> the >>> command. To leave the list, send the command >>> SIGNOFF SPSSX-L >>> For a list of commands to manage subscriptions, send the command >>> INFO REFCARD >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> LISTSERV@.UGA (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> > > > -- > View this message in context: > http://spssx-discussion.1045642.n5.nabble.com/create-cumulative-count-percent-for-a-repeating-term-table-tp5099383p5100722.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except > the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
> COMPUTE C= SUM(1, LAG(C) *(LAG(ID) EQ ID)).
-- What I find interesting about this is that this particular construction (according to my intuitions) *SHOULD* fail with the LAG(C) because at that point C is not defined. SPSS must be doing something to initialize C at the onset of the COMPUTE . To unpack this a bit. Note 1. SUM(X,$SYSMIS)=X. 2. LAG(C) *(LAG(ID) EQ ID) = SYSMIS until ID changes and then = 0. ---- So first record gets assigned 1 [ SUM(1,SYSMIS) ], second record (assuming same value of ID) gets SUM(1, 1*1)=2 . AT the point where the value of ID changes: SUM(1,WhoCares*0)=1. So, we get a very compact technique for building counters within cases WITHOUT using $CASENUM or LAG(I) NE ID as means of reinitializing the value within each strata. Taking this approach one step further (let's say we have multiple strata): data list free /id id2 . begin data 1 1 1 1 1 2 1 2 1 3 1 3 2 1 2 1 2 2 2 2 2 2 2 3 end data. compute z=MAX(1, (id EQ LAG(id))*(id2 EQ LAG(id2)) * LAG(z)+1). list ID ID2 Z 1.00 1.00 1.00 1.00 1.00 2.00 1.00 2.00 1.00 1.00 2.00 2.00 1.00 3.00 1.00 1.00 3.00 2.00 2.00 1.00 1.00 2.00 1.00 2.00 2.00 2.00 1.00 2.00 2.00 2.00 2.00 2.00 3.00 2.00 3.00 1.00 Number of cases read: 12 Number of cases listed: 12
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
My last example was *NOT* exactly illustrative of what I was bending towards.
I was playing around with different idioms ;-) -- This is more in line: data list free /id id2 . begin data 1 1 1 1 1 2 1 2 1 3 1 3 2 1 2 1 2 2 2 2 2 2 2 3 end data. compute z=SUM(1, (id EQ LAG(id))*(id2 EQ LAG(id2)) * LAG(z)).
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by David Marso
The observed behavior follows from the
command order rules. COMPUTE is not executed until a data pass is
required, but a variable is created as soon as it is encountered in a data
definition context such as a formula target or a declaration such as NUMERIC,
although it has no values. So it actually exists before its values
are calculated.
Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] new phone: 720-342-5621 From: David Marso <[hidden email]> To: [hidden email] Date: 12/26/2011 05:26 AM Subject: Re: [SPSSX-L] create cumulative count & percent for a repeating term table Sent by: "SPSSX(r) Discussion" <[hidden email]> > COMPUTE C= SUM(1, LAG(C) *(LAG(ID) EQ ID)). -- What I find interesting about this is that this particular construction (according to my intuitions) *SHOULD* fail with the LAG(C) because at that point C is not defined. SPSS must be doing something to initialize C at the onset of the COMPUTE . To unpack this a bit. Note 1. SUM(X,$SYSMIS)=X. 2. LAG(C) *(LAG(ID) EQ ID) = SYSMIS until ID changes and then = 0. ---- So first record gets assigned 1 [ SUM(1,SYSMIS) ], second record (assuming same value of ID) gets SUM(1, 1*1)=2 . AT the point where the value of ID changes: SUM(1,WhoCares*0)=1. So, we get a very compact technique for building counters within cases WITHOUT using $CASENUM or LAG(I) NE ID as means of reinitializing the value within each strata. Taking this approach one step further (let's say we have multiple strata): data list free /id id2 . begin data 1 1 1 1 1 2 1 2 1 3 1 3 2 1 2 1 2 2 2 2 2 2 2 3 end data. compute z=MAX(1, (id EQ LAG(id))*(id2 EQ LAG(id2)) * LAG(z)+1). list ID ID2 Z 1.00 1.00 1.00 1.00 1.00 2.00 1.00 2.00 1.00 1.00 2.00 2.00 1.00 3.00 1.00 1.00 3.00 2.00 2.00 1.00 1.00 2.00 1.00 2.00 2.00 2.00 1.00 2.00 2.00 2.00 2.00 2.00 3.00 2.00 3.00 1.00 Number of cases read: 12 Number of cases listed: 12 David Wright-6 wrote > > David, > > What does the "1," in the second compute line do? Is this serving the > same as your earlier "DO IF $CASENUM=1" to get around the starting case > in the file? If so, without the DO IF, how does SPSS treat the 1 after > the first case processing? > > > On Sun, Dec 25, 2011 at 4:52 PM, David Marso wrote: > >> You are welcome! >> Even leaner and meaner (if perhaps slightly obscure ;-) >> ** Alternatively **. >> COMPUTE CN=SUM(prob,LAG(CN)*(LAG(ID) EQ ID)). >> COMPUTE C= SUM(1, LAG(C) *(LAG(ID) EQ ID)). >> COMPUTE ppct=cn/c. >> exe. >> >> David Wright-6 wrote >>> >>> Thanks David, that should do the trick. Happy Holidays... >>> >>> >>> On Sat, Dec 24, 2011 at 11:48 AM, David Marso wrote: >>> >>>> DATA LIST FREE / ID term prob pcntX ppctX. >>>> begin data >>>> 12 19973 0 0 0 >>>> 12 19981 0 0 0 >>>> 12 19982 1 1 .33 >>>> 12 19993 0 1 .25 >>>> 12 20001 1 2 .40 >>>> 12 20003 1 3 .50 >>>> 12 20011 0 3 .43 >>>> 12 20023 1 4 .50 >>>> 12 20031 0 4 .44 >>>> 13 19973 0 0 0 >>>> 13 19981 0 0 0 >>>> 13 19982 1 1 .33 >>>> 13 19993 0 1 .25 >>>> 13 20001 1 2 .40 >>>> 13 20003 1 3 .50 >>>> 13 20011 0 3 .43 >>>> 13 20023 1 4 .50 >>>> 13 20031 0 4 .44 >>>> end data. >>>> SPLIT FILE BY ID. >>>> COMPUTE C=1. >>>> CREATE CN CP=CSUM(C prob). >>>> COMPUTE ppct=cp/cn. >>>> >>>> ** Alternatively **. >>>> DO IF $CASENUM=1 OR LAG(ID) NE ID. >>>> + COMPUTE CN=prob. >>>> + COMPUTE c=1. >>>> ELSE. >>>> + COMPUTE CN=SUM(LAG(CN),prob). >>>> + COMPUTE c=lag(c)+1. >>>> END IF. >>>> COMPUTE ppct=cn/c. >>>> exe. >>>> >>>> David Wright-6 wrote >>>>> >>>>> I have a table that contains repeating terms (term) by student ID >>>>> (ID) >>>>> which also contains a binary (prob) of whether for a specific term >>>>> the >>>>> student has been on probation. I want to create two new columns >>>>> one >>>>> of >>>>> which is a cumulative count (pcnt) and the other a cumulative >>>>> percent >>>>> (ppct) of all probation activity for the student from any term >>>>> backwards >>>>> in time. I can easily create the total count & percent for all >>>>> time >>>>> of >>>>> the student via the aggregate command & append to each record but I >>>>> would like a cumulative count/percent to appear at each term. I >>>>> would >>>>> appreciate any suggestions (no python please). Listed below are >>>>> the >>>>> 3 >>>>> original columns & the two desired columns when coding is complete. >>>>> >>>>> ID, term, prob, pcnt, ppct >>>>> 12 19973 0 0 0 >>>>> 12 19981 0 0 0 >>>>> 12 19982 1 1 .33 >>>>> 12 19993 0 1 .25 >>>>> 12 20001 1 2 .40 >>>>> 12 20003 1 3 .50 >>>>> 12 20011 0 3 .43 >>>>> 12 20023 1 4 .50 >>>>> 12 20031 0 4 .44 >>>>> >>>>> (term is defined as calendar year digits 1-4 and semester code 5th >>>>> digit >>>>> in which 1=spring, 2=summer,3=fall) >>>>> >>>>> ===================== >>>>> To manage your subscription to SPSSX-L, send a message to >>>>> LISTSERV@.UGA (not to SPSSX-L), with no body text except the >>>>> command. To leave the list, send the command >>>>> SIGNOFF SPSSX-L >>>>> For a list of commands to manage subscriptions, send the command >>>>> INFO REFCARD >>>>> >>>> >>>> >>>> -- >>>> View this message in context: >>>> >>>> http://spssx-discussion.1045642.n5.nabble.com/create-cumulative-count-percent-for-a-repeating-term-table-tp5099383p5099491.html >>>> Sent from the SPSSX Discussion mailing list archive at Nabble.com. >>>> >>>> ===================== >>>> To manage your subscription to SPSSX-L, send a message to >>>> LISTSERV@.UGA (not to SPSSX-L), with no body text except >>>> the >>>> command. To leave the list, send the command >>>> SIGNOFF SPSSX-L >>>> For a list of commands to manage subscriptions, send the command >>>> INFO REFCARD >>> >>> ===================== >>> To manage your subscription to SPSSX-L, send a message to >>> LISTSERV@.UGA (not to SPSSX-L), with no body text except the >>> command. To leave the list, send the command >>> SIGNOFF SPSSX-L >>> For a list of commands to manage subscriptions, send the command >>> INFO REFCARD >>> >> >> >> -- >> View this message in context: >> http://spssx-discussion.1045642.n5.nabble.com/create-cumulative-count-percent-for-a-repeating-term-table-tp5099383p5100722.html >> Sent from the SPSSX Discussion mailing list archive at Nabble.com. >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> LISTSERV@.UGA (not to SPSSX-L), with no body text except >> the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/create-cumulative-count-percent-for-a-repeating-term-table-tp5099383p5101466.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |