create cumulative count & percent for a repeating term table

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

create cumulative count & percent for a repeating term table

wsu_wright
I have a table that contains repeating terms (term) by student ID (ID)
which also contains a binary (prob) of whether for a specific term the
student has been on probation.  I want to create two new columns one of
which is a cumulative count (pcnt) and the other a cumulative percent
(ppct) of all probation activity for the student from any term backwards
in time.  I can easily create the total count & percent for all time of
the student via the aggregate command & append to each record but I
would like a cumulative count/percent to appear at each term.  I would
appreciate any suggestions (no python please).  Listed below are the 3
original columns & the two desired columns when coding is complete.

ID, term,  prob, pcnt, ppct
12 19973 0 0 0
12 19981 0 0 0
12 19982 1 1 .33
12 19993 0 1 .25
12 20001 1 2 .40
12 20003 1 3 .50
12 20011 0 3 .43
12 20023 1 4 .50
12 20031 0 4 .44

(term is defined as calendar year digits 1-4 and semester code 5th digit
in which 1=spring, 2=summer,3=fall)

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: create cumulative count & percent for a repeating term table

David Marso
Administrator
DATA LIST FREE / ID term  prob pcntX ppctX.
begin data
12 19973 0 0 0
12 19981 0 0 0
12 19982 1 1 .33
12 19993 0 1 .25
12 20001 1 2 .40
12 20003 1 3 .50
12 20011 0 3 .43
12 20023 1 4 .50
12 20031 0 4 .44
13 19973 0 0 0
13 19981 0 0 0
13 19982 1 1 .33
13 19993 0 1 .25
13 20001 1 2 .40
13 20003 1 3 .50
13 20011 0 3 .43
13 20023 1 4 .50
13 20031 0 4 .44
end data.
SPLIT FILE BY ID.
COMPUTE C=1.
CREATE CN CP=CSUM(C prob).
COMPUTE ppct=cp/cn.

** Alternatively **.
DO IF $CASENUM=1 OR LAG(ID) NE ID.
+  COMPUTE CN=prob.
+  COMPUTE  c=1.
ELSE.
+  COMPUTE CN=SUM(LAG(CN),prob).
+  COMPUTE  c=lag(c)+1.
END IF.
COMPUTE ppct=cn/c.
exe.
David Wright-6 wrote
I have a table that contains repeating terms (term) by student ID (ID)
which also contains a binary (prob) of whether for a specific term the
student has been on probation.  I want to create two new columns one of
which is a cumulative count (pcnt) and the other a cumulative percent
(ppct) of all probation activity for the student from any term backwards
in time.  I can easily create the total count & percent for all time of
the student via the aggregate command & append to each record but I
would like a cumulative count/percent to appear at each term.  I would
appreciate any suggestions (no python please).  Listed below are the 3
original columns & the two desired columns when coding is complete.

ID, term,  prob, pcnt, ppct
12 19973 0 0 0
12 19981 0 0 0
12 19982 1 1 .33
12 19993 0 1 .25
12 20001 1 2 .40
12 20003 1 3 .50
12 20011 0 3 .43
12 20023 1 4 .50
12 20031 0 4 .44

(term is defined as calendar year digits 1-4 and semester code 5th digit
in which 1=spring, 2=summer,3=fall)

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: create cumulative count & percent for a repeating term table

wsu_wright
In reply to this post by wsu_wright
Thanks David, that should do the trick.  Happy Holidays...


On Sat, Dec 24, 2011 at 11:48 AM, David Marso wrote:

> DATA LIST FREE / ID term  prob pcntX ppctX.
> begin data
> 12 19973 0 0 0
> 12 19981 0 0 0
> 12 19982 1 1 .33
> 12 19993 0 1 .25
> 12 20001 1 2 .40
> 12 20003 1 3 .50
> 12 20011 0 3 .43
> 12 20023 1 4 .50
> 12 20031 0 4 .44
> 13 19973 0 0 0
> 13 19981 0 0 0
> 13 19982 1 1 .33
> 13 19993 0 1 .25
> 13 20001 1 2 .40
> 13 20003 1 3 .50
> 13 20011 0 3 .43
> 13 20023 1 4 .50
> 13 20031 0 4 .44
> end data.
> SPLIT FILE BY ID.
> COMPUTE C=1.
> CREATE CN CP=CSUM(C prob).
> COMPUTE ppct=cp/cn.
>
> ** Alternatively **.
> DO IF $CASENUM=1 OR LAG(ID) NE ID.
> +  COMPUTE CN=prob.
> +  COMPUTE  c=1.
> ELSE.
> +  COMPUTE CN=SUM(LAG(CN),prob).
> +  COMPUTE  c=lag(c)+1.
> END IF.
> COMPUTE ppct=cn/c.
> exe.
>
> David Wright-6 wrote
>>
>> I have a table that contains repeating terms (term) by student ID
>> (ID)
>> which also contains a binary (prob) of whether for a specific term
>> the
>> student has been on probation.  I want to create two new columns one
>> of
>> which is a cumulative count (pcnt) and the other a cumulative percent
>> (ppct) of all probation activity for the student from any term
>> backwards
>> in time.  I can easily create the total count & percent for all time
>> of
>> the student via the aggregate command & append to each record but I
>> would like a cumulative count/percent to appear at each term.  I
>> would
>> appreciate any suggestions (no python please).  Listed below are the
>> 3
>> original columns & the two desired columns when coding is complete.
>>
>> ID, term,  prob, pcnt, ppct
>> 12 19973 0 0 0
>> 12 19981 0 0 0
>> 12 19982 1 1 .33
>> 12 19993 0 1 .25
>> 12 20001 1 2 .40
>> 12 20003 1 3 .50
>> 12 20011 0 3 .43
>> 12 20023 1 4 .50
>> 12 20031 0 4 .44
>>
>> (term is defined as calendar year digits 1-4 and semester code 5th
>> digit
>> in which 1=spring, 2=summer,3=fall)
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/create-cumulative-count-percent-for-a-repeating-term-table-tp5099383p5099491.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except
> the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: create cumulative count & percent for a repeating term table

David Marso
Administrator
You are welcome!
Even leaner and meaner (if perhaps slightly obscure ;-)
** Alternatively **.
COMPUTE CN=SUM(prob,LAG(CN)*(LAG(ID) EQ ID)).
COMPUTE C= SUM(1,   LAG(C) *(LAG(ID) EQ ID)).
COMPUTE ppct=cn/c.
exe.
David Wright-6 wrote
Thanks David, that should do the trick.  Happy Holidays...


On Sat, Dec 24, 2011 at 11:48 AM, David Marso wrote:

> DATA LIST FREE / ID term  prob pcntX ppctX.
> begin data
> 12 19973 0 0 0
> 12 19981 0 0 0
> 12 19982 1 1 .33
> 12 19993 0 1 .25
> 12 20001 1 2 .40
> 12 20003 1 3 .50
> 12 20011 0 3 .43
> 12 20023 1 4 .50
> 12 20031 0 4 .44
> 13 19973 0 0 0
> 13 19981 0 0 0
> 13 19982 1 1 .33
> 13 19993 0 1 .25
> 13 20001 1 2 .40
> 13 20003 1 3 .50
> 13 20011 0 3 .43
> 13 20023 1 4 .50
> 13 20031 0 4 .44
> end data.
> SPLIT FILE BY ID.
> COMPUTE C=1.
> CREATE CN CP=CSUM(C prob).
> COMPUTE ppct=cp/cn.
>
> ** Alternatively **.
> DO IF $CASENUM=1 OR LAG(ID) NE ID.
> +  COMPUTE CN=prob.
> +  COMPUTE  c=1.
> ELSE.
> +  COMPUTE CN=SUM(LAG(CN),prob).
> +  COMPUTE  c=lag(c)+1.
> END IF.
> COMPUTE ppct=cn/c.
> exe.
>
> David Wright-6 wrote
>>
>> I have a table that contains repeating terms (term) by student ID
>> (ID)
>> which also contains a binary (prob) of whether for a specific term
>> the
>> student has been on probation.  I want to create two new columns one
>> of
>> which is a cumulative count (pcnt) and the other a cumulative percent
>> (ppct) of all probation activity for the student from any term
>> backwards
>> in time.  I can easily create the total count & percent for all time
>> of
>> the student via the aggregate command & append to each record but I
>> would like a cumulative count/percent to appear at each term.  I
>> would
>> appreciate any suggestions (no python please).  Listed below are the
>> 3
>> original columns & the two desired columns when coding is complete.
>>
>> ID, term,  prob, pcnt, ppct
>> 12 19973 0 0 0
>> 12 19981 0 0 0
>> 12 19982 1 1 .33
>> 12 19993 0 1 .25
>> 12 20001 1 2 .40
>> 12 20003 1 3 .50
>> 12 20011 0 3 .43
>> 12 20023 1 4 .50
>> 12 20031 0 4 .44
>>
>> (term is defined as calendar year digits 1-4 and semester code 5th
>> digit
>> in which 1=spring, 2=summer,3=fall)
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/create-cumulative-count-percent-for-a-repeating-term-table-tp5099383p5099491.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except
> the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: create cumulative count & percent for a repeating term table

wsu_wright
In reply to this post by wsu_wright
David,

What does the "1," in the second compute line do?  Is this serving the
same as your earlier "DO IF $CASENUM=1" to get around the starting case
in the file?  If so, without the DO IF, how does SPSS treat the 1 after
the first case processing?


On Sun, Dec 25, 2011 at 4:52 PM, David Marso wrote:

> You are welcome!
> Even leaner and meaner (if perhaps slightly obscure ;-)
> ** Alternatively **.
> COMPUTE CN=SUM(prob,LAG(CN)*(LAG(ID) EQ ID)).
> COMPUTE C= SUM(1,   LAG(C) *(LAG(ID) EQ ID)).
> COMPUTE ppct=cn/c.
> exe.
>
> David Wright-6 wrote
>>
>> Thanks David, that should do the trick.  Happy Holidays...
>>
>>
>> On Sat, Dec 24, 2011 at 11:48 AM, David Marso wrote:
>>
>>> DATA LIST FREE / ID term  prob pcntX ppctX.
>>> begin data
>>> 12 19973 0 0 0
>>> 12 19981 0 0 0
>>> 12 19982 1 1 .33
>>> 12 19993 0 1 .25
>>> 12 20001 1 2 .40
>>> 12 20003 1 3 .50
>>> 12 20011 0 3 .43
>>> 12 20023 1 4 .50
>>> 12 20031 0 4 .44
>>> 13 19973 0 0 0
>>> 13 19981 0 0 0
>>> 13 19982 1 1 .33
>>> 13 19993 0 1 .25
>>> 13 20001 1 2 .40
>>> 13 20003 1 3 .50
>>> 13 20011 0 3 .43
>>> 13 20023 1 4 .50
>>> 13 20031 0 4 .44
>>> end data.
>>> SPLIT FILE BY ID.
>>> COMPUTE C=1.
>>> CREATE CN CP=CSUM(C prob).
>>> COMPUTE ppct=cp/cn.
>>>
>>> ** Alternatively **.
>>> DO IF $CASENUM=1 OR LAG(ID) NE ID.
>>> +  COMPUTE CN=prob.
>>> +  COMPUTE  c=1.
>>> ELSE.
>>> +  COMPUTE CN=SUM(LAG(CN),prob).
>>> +  COMPUTE  c=lag(c)+1.
>>> END IF.
>>> COMPUTE ppct=cn/c.
>>> exe.
>>>
>>> David Wright-6 wrote
>>>>
>>>> I have a table that contains repeating terms (term) by student ID
>>>> (ID)
>>>> which also contains a binary (prob) of whether for a specific term
>>>> the
>>>> student has been on probation.  I want to create two new columns
>>>> one
>>>> of
>>>> which is a cumulative count (pcnt) and the other a cumulative
>>>> percent
>>>> (ppct) of all probation activity for the student from any term
>>>> backwards
>>>> in time.  I can easily create the total count & percent for all
>>>> time
>>>> of
>>>> the student via the aggregate command & append to each record but I
>>>> would like a cumulative count/percent to appear at each term.  I
>>>> would
>>>> appreciate any suggestions (no python please).  Listed below are
>>>> the
>>>> 3
>>>> original columns & the two desired columns when coding is complete.
>>>>
>>>> ID, term,  prob, pcnt, ppct
>>>> 12 19973 0 0 0
>>>> 12 19981 0 0 0
>>>> 12 19982 1 1 .33
>>>> 12 19993 0 1 .25
>>>> 12 20001 1 2 .40
>>>> 12 20003 1 3 .50
>>>> 12 20011 0 3 .43
>>>> 12 20023 1 4 .50
>>>> 12 20031 0 4 .44
>>>>
>>>> (term is defined as calendar year digits 1-4 and semester code 5th
>>>> digit
>>>> in which 1=spring, 2=summer,3=fall)
>>>>
>>>> =====================
>>>> To manage your subscription to SPSSX-L, send a message to
>>>> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
>>>> command. To leave the list, send the command
>>>> SIGNOFF SPSSX-L
>>>> For a list of commands to manage subscriptions, send the command
>>>> INFO REFCARD
>>>>
>>>
>>>
>>> --
>>> View this message in context:
>>>
>>> http://spssx-discussion.1045642.n5.nabble.com/create-cumulative-count-percent-for-a-repeating-term-table-tp5099383p5099491.html
>>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>> LISTSERV@.UGA (not to SPSSX-L), with no body text except
>>> the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/create-cumulative-count-percent-for-a-repeating-term-table-tp5099383p5100722.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except
> the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: create cumulative count & percent for a repeating term table

David Marso
Administrator
> COMPUTE C= SUM(1,   LAG(C) *(LAG(ID) EQ ID)).
--
What I find interesting about this is that this particular construction (according to my intuitions) *SHOULD* fail with the LAG(C) because at that point C is not defined.  SPSS must be doing something to initialize C at the onset of the COMPUTE .
To unpack this a bit.
Note
1. SUM(X,$SYSMIS)=X.
2. LAG(C) *(LAG(ID) EQ ID) = SYSMIS until ID changes and then = 0.
----
So first record gets assigned 1 [ SUM(1,SYSMIS) ], second record (assuming same value of ID) gets SUM(1, 1*1)=2 .
AT the point where the value of ID changes:
SUM(1,WhoCares*0)=1.
So, we get a very compact technique for building counters within cases
WITHOUT using $CASENUM or LAG(I) NE ID as means of reinitializing the value within each strata.
Taking this approach one step further (let's say we have multiple strata):
data list free /id id2 .
begin data
1 1 1 1 1 2 1 2 1 3 1 3
2 1 2 1 2 2 2 2 2 2 2 3
end data.
compute z=MAX(1,
             (id EQ LAG(id))*(id2 EQ LAG(id2))
           * LAG(z)+1).
list
      ID      ID2        Z

    1.00     1.00     1.00
    1.00     1.00     2.00
    1.00     2.00     1.00
    1.00     2.00     2.00
    1.00     3.00     1.00
    1.00     3.00     2.00
    2.00     1.00     1.00
    2.00     1.00     2.00
    2.00     2.00     1.00
    2.00     2.00     2.00
    2.00     2.00     3.00
    2.00     3.00     1.00


Number of cases read:  12    Number of cases listed:  12





David Wright-6 wrote
David,

What does the "1," in the second compute line do?  Is this serving the
same as your earlier "DO IF $CASENUM=1" to get around the starting case
in the file?  If so, without the DO IF, how does SPSS treat the 1 after
the first case processing?


On Sun, Dec 25, 2011 at 4:52 PM, David Marso wrote:

> You are welcome!
> Even leaner and meaner (if perhaps slightly obscure ;-)
> ** Alternatively **.
> COMPUTE CN=SUM(prob,LAG(CN)*(LAG(ID) EQ ID)).
> COMPUTE C= SUM(1,   LAG(C) *(LAG(ID) EQ ID)).
> COMPUTE ppct=cn/c.
> exe.
>
> David Wright-6 wrote
>>
>> Thanks David, that should do the trick.  Happy Holidays...
>>
>>
>> On Sat, Dec 24, 2011 at 11:48 AM, David Marso wrote:
>>
>>> DATA LIST FREE / ID term  prob pcntX ppctX.
>>> begin data
>>> 12 19973 0 0 0
>>> 12 19981 0 0 0
>>> 12 19982 1 1 .33
>>> 12 19993 0 1 .25
>>> 12 20001 1 2 .40
>>> 12 20003 1 3 .50
>>> 12 20011 0 3 .43
>>> 12 20023 1 4 .50
>>> 12 20031 0 4 .44
>>> 13 19973 0 0 0
>>> 13 19981 0 0 0
>>> 13 19982 1 1 .33
>>> 13 19993 0 1 .25
>>> 13 20001 1 2 .40
>>> 13 20003 1 3 .50
>>> 13 20011 0 3 .43
>>> 13 20023 1 4 .50
>>> 13 20031 0 4 .44
>>> end data.
>>> SPLIT FILE BY ID.
>>> COMPUTE C=1.
>>> CREATE CN CP=CSUM(C prob).
>>> COMPUTE ppct=cp/cn.
>>>
>>> ** Alternatively **.
>>> DO IF $CASENUM=1 OR LAG(ID) NE ID.
>>> +  COMPUTE CN=prob.
>>> +  COMPUTE  c=1.
>>> ELSE.
>>> +  COMPUTE CN=SUM(LAG(CN),prob).
>>> +  COMPUTE  c=lag(c)+1.
>>> END IF.
>>> COMPUTE ppct=cn/c.
>>> exe.
>>>
>>> David Wright-6 wrote
>>>>
>>>> I have a table that contains repeating terms (term) by student ID
>>>> (ID)
>>>> which also contains a binary (prob) of whether for a specific term
>>>> the
>>>> student has been on probation.  I want to create two new columns
>>>> one
>>>> of
>>>> which is a cumulative count (pcnt) and the other a cumulative
>>>> percent
>>>> (ppct) of all probation activity for the student from any term
>>>> backwards
>>>> in time.  I can easily create the total count & percent for all
>>>> time
>>>> of
>>>> the student via the aggregate command & append to each record but I
>>>> would like a cumulative count/percent to appear at each term.  I
>>>> would
>>>> appreciate any suggestions (no python please).  Listed below are
>>>> the
>>>> 3
>>>> original columns & the two desired columns when coding is complete.
>>>>
>>>> ID, term,  prob, pcnt, ppct
>>>> 12 19973 0 0 0
>>>> 12 19981 0 0 0
>>>> 12 19982 1 1 .33
>>>> 12 19993 0 1 .25
>>>> 12 20001 1 2 .40
>>>> 12 20003 1 3 .50
>>>> 12 20011 0 3 .43
>>>> 12 20023 1 4 .50
>>>> 12 20031 0 4 .44
>>>>
>>>> (term is defined as calendar year digits 1-4 and semester code 5th
>>>> digit
>>>> in which 1=spring, 2=summer,3=fall)
>>>>
>>>> =====================
>>>> To manage your subscription to SPSSX-L, send a message to
>>>> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
>>>> command. To leave the list, send the command
>>>> SIGNOFF SPSSX-L
>>>> For a list of commands to manage subscriptions, send the command
>>>> INFO REFCARD
>>>>
>>>
>>>
>>> --
>>> View this message in context:
>>>
>>> http://spssx-discussion.1045642.n5.nabble.com/create-cumulative-count-percent-for-a-repeating-term-table-tp5099383p5099491.html
>>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>> LISTSERV@.UGA (not to SPSSX-L), with no body text except
>>> the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/create-cumulative-count-percent-for-a-repeating-term-table-tp5099383p5100722.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except
> the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: create cumulative count & percent for a repeating term table

David Marso
Administrator
My last example was *NOT* exactly illustrative of what I was bending towards.
I was playing around with different idioms ;-)
--
This is more in line:
data list free /id id2 .
begin data
1 1 1 1 1 2 1 2 1 3 1 3
2 1 2 1 2 2 2 2 2 2 2 3
end data.
compute z=SUM(1,
             (id EQ LAG(id))*(id2 EQ LAG(id2))
           * LAG(z)).
David Marso wrote
> COMPUTE C= SUM(1,   LAG(C) *(LAG(ID) EQ ID)).
--
What I find interesting about this is that this particular construction (according to my intuitions) *SHOULD* fail with the LAG(C) because at that point C is not defined.  SPSS must be doing something to initialize C at the onset of the COMPUTE .
To unpack this a bit.
Note
1. SUM(X,$SYSMIS)=X.
2. LAG(C) *(LAG(ID) EQ ID) = SYSMIS until ID changes and then = 0.
----
So first record gets assigned 1 [ SUM(1,SYSMIS) ], second record (assuming same value of ID) gets SUM(1, 1*1)=2 .
AT the point where the value of ID changes:
SUM(1,WhoCares*0)=1.
So, we get a very compact technique for building counters within cases
WITHOUT using $CASENUM or LAG(I) NE ID as means of reinitializing the value within each strata.
Taking this approach one step further (let's say we have multiple strata):
data list free /id id2 .
begin data
1 1 1 1 1 2 1 2 1 3 1 3
2 1 2 1 2 2 2 2 2 2 2 3
end data.
compute z=MAX(1,
             (id EQ LAG(id))*(id2 EQ LAG(id2))
           * LAG(z)+1).
list
      ID      ID2        Z

    1.00     1.00     1.00
    1.00     1.00     2.00
    1.00     2.00     1.00
    1.00     2.00     2.00
    1.00     3.00     1.00
    1.00     3.00     2.00
    2.00     1.00     1.00
    2.00     1.00     2.00
    2.00     2.00     1.00
    2.00     2.00     2.00
    2.00     2.00     3.00
    2.00     3.00     1.00


Number of cases read:  12    Number of cases listed:  12





David Wright-6 wrote
David,

What does the "1," in the second compute line do?  Is this serving the
same as your earlier "DO IF $CASENUM=1" to get around the starting case
in the file?  If so, without the DO IF, how does SPSS treat the 1 after
the first case processing?


On Sun, Dec 25, 2011 at 4:52 PM, David Marso wrote:

> You are welcome!
> Even leaner and meaner (if perhaps slightly obscure ;-)
> ** Alternatively **.
> COMPUTE CN=SUM(prob,LAG(CN)*(LAG(ID) EQ ID)).
> COMPUTE C= SUM(1,   LAG(C) *(LAG(ID) EQ ID)).
> COMPUTE ppct=cn/c.
> exe.
>
> David Wright-6 wrote
>>
>> Thanks David, that should do the trick.  Happy Holidays...
>>
>>
>> On Sat, Dec 24, 2011 at 11:48 AM, David Marso wrote:
>>
>>> DATA LIST FREE / ID term  prob pcntX ppctX.
>>> begin data
>>> 12 19973 0 0 0
>>> 12 19981 0 0 0
>>> 12 19982 1 1 .33
>>> 12 19993 0 1 .25
>>> 12 20001 1 2 .40
>>> 12 20003 1 3 .50
>>> 12 20011 0 3 .43
>>> 12 20023 1 4 .50
>>> 12 20031 0 4 .44
>>> 13 19973 0 0 0
>>> 13 19981 0 0 0
>>> 13 19982 1 1 .33
>>> 13 19993 0 1 .25
>>> 13 20001 1 2 .40
>>> 13 20003 1 3 .50
>>> 13 20011 0 3 .43
>>> 13 20023 1 4 .50
>>> 13 20031 0 4 .44
>>> end data.
>>> SPLIT FILE BY ID.
>>> COMPUTE C=1.
>>> CREATE CN CP=CSUM(C prob).
>>> COMPUTE ppct=cp/cn.
>>>
>>> ** Alternatively **.
>>> DO IF $CASENUM=1 OR LAG(ID) NE ID.
>>> +  COMPUTE CN=prob.
>>> +  COMPUTE  c=1.
>>> ELSE.
>>> +  COMPUTE CN=SUM(LAG(CN),prob).
>>> +  COMPUTE  c=lag(c)+1.
>>> END IF.
>>> COMPUTE ppct=cn/c.
>>> exe.
>>>
>>> David Wright-6 wrote
>>>>
>>>> I have a table that contains repeating terms (term) by student ID
>>>> (ID)
>>>> which also contains a binary (prob) of whether for a specific term
>>>> the
>>>> student has been on probation.  I want to create two new columns
>>>> one
>>>> of
>>>> which is a cumulative count (pcnt) and the other a cumulative
>>>> percent
>>>> (ppct) of all probation activity for the student from any term
>>>> backwards
>>>> in time.  I can easily create the total count & percent for all
>>>> time
>>>> of
>>>> the student via the aggregate command & append to each record but I
>>>> would like a cumulative count/percent to appear at each term.  I
>>>> would
>>>> appreciate any suggestions (no python please).  Listed below are
>>>> the
>>>> 3
>>>> original columns & the two desired columns when coding is complete.
>>>>
>>>> ID, term,  prob, pcnt, ppct
>>>> 12 19973 0 0 0
>>>> 12 19981 0 0 0
>>>> 12 19982 1 1 .33
>>>> 12 19993 0 1 .25
>>>> 12 20001 1 2 .40
>>>> 12 20003 1 3 .50
>>>> 12 20011 0 3 .43
>>>> 12 20023 1 4 .50
>>>> 12 20031 0 4 .44
>>>>
>>>> (term is defined as calendar year digits 1-4 and semester code 5th
>>>> digit
>>>> in which 1=spring, 2=summer,3=fall)
>>>>
>>>> =====================
>>>> To manage your subscription to SPSSX-L, send a message to
>>>> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
>>>> command. To leave the list, send the command
>>>> SIGNOFF SPSSX-L
>>>> For a list of commands to manage subscriptions, send the command
>>>> INFO REFCARD
>>>>
>>>
>>>
>>> --
>>> View this message in context:
>>>
>>> http://spssx-discussion.1045642.n5.nabble.com/create-cumulative-count-percent-for-a-repeating-term-table-tp5099383p5099491.html
>>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>> LISTSERV@.UGA (not to SPSSX-L), with no body text except
>>> the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/create-cumulative-count-percent-for-a-repeating-term-table-tp5099383p5100722.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except
> the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: create cumulative count & percent for a repeating term table

Jon K Peck
In reply to this post by David Marso
The observed behavior follows from the command order rules.  COMPUTE is not executed until a data pass is required, but a variable is created as soon as it is encountered in a data definition context such as a formula target or a declaration such as NUMERIC, although it has no values.  So it actually exists before its values are calculated.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        David Marso <[hidden email]>
To:        [hidden email]
Date:        12/26/2011 05:26 AM
Subject:        Re: [SPSSX-L] create cumulative count & percent for a repeating              term table
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




> COMPUTE C= SUM(1,   LAG(C) *(LAG(ID) EQ ID)).
--
What I find interesting about this is that this particular construction
(according to my intuitions) *SHOULD* fail with the LAG(C) because at that
point C is not defined.  SPSS must be doing something to initialize C at the
onset of the COMPUTE .
To unpack this a bit.
Note
1. SUM(X,$SYSMIS)=X.
2. LAG(C) *(LAG(ID) EQ ID) = SYSMIS until ID changes and then = 0.
----
So first record gets assigned 1 [ SUM(1,SYSMIS) ], second record (assuming
same value of ID) gets SUM(1, 1*1)=2 .
AT the point where the value of ID changes:
SUM(1,WhoCares*0)=1.
So, we get a very compact technique for building counters within cases
WITHOUT using $CASENUM or LAG(I) NE ID as means of reinitializing the value
within each strata.
Taking this approach one step further (let's say we have multiple strata):
data list free /id id2 .
begin data
1 1 1 1 1 2 1 2 1 3 1 3
2 1 2 1 2 2 2 2 2 2 2 3
end data.
compute z=MAX(1,
            (id EQ LAG(id))*(id2 EQ LAG(id2))
          * LAG(z)+1).
list
     ID      ID2        Z

   1.00     1.00     1.00
   1.00     1.00     2.00
   1.00     2.00     1.00
   1.00     2.00     2.00
   1.00     3.00     1.00
   1.00     3.00     2.00
   2.00     1.00     1.00
   2.00     1.00     2.00
   2.00     2.00     1.00
   2.00     2.00     2.00
   2.00     2.00     3.00
   2.00     3.00     1.00


Number of cases read:  12    Number of cases listed:  12






David Wright-6 wrote
>
> David,
>
> What does the "1," in the second compute line do?  Is this serving the
> same as your earlier "DO IF $CASENUM=1" to get around the starting case
> in the file?  If so, without the DO IF, how does SPSS treat the 1 after
> the first case processing?
>
>
> On Sun, Dec 25, 2011 at 4:52 PM, David Marso wrote:
>
>> You are welcome!
>> Even leaner and meaner (if perhaps slightly obscure ;-)
>> ** Alternatively **.
>> COMPUTE CN=SUM(prob,LAG(CN)*(LAG(ID) EQ ID)).
>> COMPUTE C= SUM(1,   LAG(C) *(LAG(ID) EQ ID)).
>> COMPUTE ppct=cn/c.
>> exe.
>>
>> David Wright-6 wrote
>>>
>>> Thanks David, that should do the trick.  Happy Holidays...
>>>
>>>
>>> On Sat, Dec 24, 2011 at 11:48 AM, David Marso wrote:
>>>
>>>> DATA LIST FREE / ID term  prob pcntX ppctX.
>>>> begin data
>>>> 12 19973 0 0 0
>>>> 12 19981 0 0 0
>>>> 12 19982 1 1 .33
>>>> 12 19993 0 1 .25
>>>> 12 20001 1 2 .40
>>>> 12 20003 1 3 .50
>>>> 12 20011 0 3 .43
>>>> 12 20023 1 4 .50
>>>> 12 20031 0 4 .44
>>>> 13 19973 0 0 0
>>>> 13 19981 0 0 0
>>>> 13 19982 1 1 .33
>>>> 13 19993 0 1 .25
>>>> 13 20001 1 2 .40
>>>> 13 20003 1 3 .50
>>>> 13 20011 0 3 .43
>>>> 13 20023 1 4 .50
>>>> 13 20031 0 4 .44
>>>> end data.
>>>> SPLIT FILE BY ID.
>>>> COMPUTE C=1.
>>>> CREATE CN CP=CSUM(C prob).
>>>> COMPUTE ppct=cp/cn.
>>>>
>>>> ** Alternatively **.
>>>> DO IF $CASENUM=1 OR LAG(ID) NE ID.
>>>> +  COMPUTE CN=prob.
>>>> +  COMPUTE  c=1.
>>>> ELSE.
>>>> +  COMPUTE CN=SUM(LAG(CN),prob).
>>>> +  COMPUTE  c=lag(c)+1.
>>>> END IF.
>>>> COMPUTE ppct=cn/c.
>>>> exe.
>>>>
>>>> David Wright-6 wrote
>>>>>
>>>>> I have a table that contains repeating terms (term) by student ID
>>>>> (ID)
>>>>> which also contains a binary (prob) of whether for a specific term
>>>>> the
>>>>> student has been on probation.  I want to create two new columns
>>>>> one
>>>>> of
>>>>> which is a cumulative count (pcnt) and the other a cumulative
>>>>> percent
>>>>> (ppct) of all probation activity for the student from any term
>>>>> backwards
>>>>> in time.  I can easily create the total count & percent for all
>>>>> time
>>>>> of
>>>>> the student via the aggregate command & append to each record but I
>>>>> would like a cumulative count/percent to appear at each term.  I
>>>>> would
>>>>> appreciate any suggestions (no python please).  Listed below are
>>>>> the
>>>>> 3
>>>>> original columns & the two desired columns when coding is complete.
>>>>>
>>>>> ID, term,  prob, pcnt, ppct
>>>>> 12 19973 0 0 0
>>>>> 12 19981 0 0 0
>>>>> 12 19982 1 1 .33
>>>>> 12 19993 0 1 .25
>>>>> 12 20001 1 2 .40
>>>>> 12 20003 1 3 .50
>>>>> 12 20011 0 3 .43
>>>>> 12 20023 1 4 .50
>>>>> 12 20031 0 4 .44
>>>>>
>>>>> (term is defined as calendar year digits 1-4 and semester code 5th
>>>>> digit
>>>>> in which 1=spring, 2=summer,3=fall)
>>>>>
>>>>> =====================
>>>>> To manage your subscription to SPSSX-L, send a message to
>>>>> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
>>>>> command. To leave the list, send the command
>>>>> SIGNOFF SPSSX-L
>>>>> For a list of commands to manage subscriptions, send the command
>>>>> INFO REFCARD
>>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>>
>>>>
http://spssx-discussion.1045642.n5.nabble.com/create-cumulative-count-percent-for-a-repeating-term-table-tp5099383p5099491.html
>>>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>>>
>>>> =====================
>>>> To manage your subscription to SPSSX-L, send a message to
>>>> LISTSERV@.UGA (not to SPSSX-L), with no body text except
>>>> the
>>>> command. To leave the list, send the command
>>>> SIGNOFF SPSSX-L
>>>> For a list of commands to manage subscriptions, send the command
>>>> INFO REFCARD
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>>
>>
>>
>> --
>> View this message in context:
>>
http://spssx-discussion.1045642.n5.nabble.com/create-cumulative-count-percent-for-a-repeating-term-table-tp5099383p5100722.html
>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> LISTSERV@.UGA (not to SPSSX-L), with no body text except
>> the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>


--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/create-cumulative-count-percent-for-a-repeating-term-table-tp5099383p5101466.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD