I have for each semester (sem) within a year set (year, comprised of
F,S,U sem) a cumulative total of counts by event. The maximum number of event IDs is constant by semester, for example in the dummy data below F always has 5 events, S has 3 events & U has 4 events per year (the actual data has much greater event counts but still constant by sem). I am needing to create a cumulative count across the entire year as displayed in the 2nd data "desired data after transformations". I've gotten about 1/2 of the way there but having a problem with the S & U semesters using the last event total of the previous semester as its start. Any assistance would be appreciated. No pyhton please. Original data: year,event,sem,semcuml 2010 1 F 125 2010 2 F 200 2010 3 F 310 2010 4 F 489 2010 5 F 601 2010 1 S 138 2010 2 S 469 2010 3 S 510 2010 1 U 59 2010 2 U 257 2010 3 U 325 2010 4 U 423 2011 1 F 132 2011 2 F 207 2011 3 F 317 2011 4 F 496 2011 5 F 608 2011 1 S 145 2011 2 S 476 2011 3 S 517 2011 1 U 66 2011 2 U 264 2011 3 U 332 2011 4 U 430 desired data after transformations: year,event,sem,semcuml,yrevt,yrcuml 2010 1 F 125 1 125 2010 2 F 200 2 200 2010 3 F 310 3 310 2010 4 F 489 4 489 2010 5 F 601 5 601 2010 1 S 138 6 739 2010 2 S 469 7 1070 2010 3 S 510 8 1111 2010 1 U 59 9 1170 2010 2 U 257 10 1368 2010 3 U 325 11 1436 2010 4 U 423 12 1534 2011 1 F 132 1 132 2011 2 F 207 2 207 2011 3 F 317 3 317 2011 4 F 496 4 496 2011 5 F 608 5 608 2011 1 S 145 6 753 2011 2 S 476 7 1084 2011 3 S 517 8 1125 2011 1 U 66 9 1191 2011 2 U 264 10 1389 2011 3 U 332 11 1457 2011 4 U 430 12 1555 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
First, thank you for the clear example! Here what I did was to turn the cumulative counts into the original values, and then do the new cumulative sum for the different grouping.
****************************************. DATA LIST FREE / year event (2F4.0) sem (A1) semcuml (F3.0). BEGIN DATA 2010 1 F 125 2010 2 F 200 2010 3 F 310 2010 4 F 489 2010 5 F 601 2010 1 S 138 2010 2 S 469 2010 3 S 510 2010 1 U 59 2010 2 U 257 2010 3 U 325 2010 4 U 423 2011 1 F 132 2011 2 F 207 2011 3 F 317 2011 4 F 496 2011 5 F 608 2011 1 S 145 2011 2 S 476 2011 3 S 517 2011 1 U 66 2011 2 U 264 2011 3 U 332 2011 4 U 430 END DATA. *Lets make actual counts instead of of CSUM. DO IF ($casenum = 1) OR (Event < LAG(Event)). COMPUTE semVal = semcuml. ELSE. COMPUTE semVal = semcuml - LAG(semcuml). END IF. *Now we can create the CSUM within the year grouping. DO IF ($casenum = 1) OR (Year <> LAG(Year)). COMPUTE yrevt = 1. COMPUTE yrcuml = semVal. ELSE. COMPUTE yrevt = LAG(yrevt) + 1. COMPUTE yrcuml = semVal + LAG(yrcuml). END IF. EXECUTE. ****************************************. |
In reply to this post by wsu_wright
Thanks Andy, this did the trick and concisely at that!
David, On Wed, Apr 16, 2014 at 12:05 PM, Andy W wrote: > First, thank you for the clear example! Here what I did was to turn > the > cumulative counts into the original values, and then do the new > cumulative > sum for the different grouping. > > ****************************************. > DATA LIST FREE / year event (2F4.0) sem (A1) semcuml (F3.0). > BEGIN DATA > 2010 1 F 125 > 2010 2 F 200 > 2010 3 F 310 > 2010 4 F 489 > 2010 5 F 601 > 2010 1 S 138 > 2010 2 S 469 > 2010 3 S 510 > 2010 1 U 59 > 2010 2 U 257 > 2010 3 U 325 > 2010 4 U 423 > 2011 1 F 132 > 2011 2 F 207 > 2011 3 F 317 > 2011 4 F 496 > 2011 5 F 608 > 2011 1 S 145 > 2011 2 S 476 > 2011 3 S 517 > 2011 1 U 66 > 2011 2 U 264 > 2011 3 U 332 > 2011 4 U 430 > END DATA. > > *Lets make actual counts instead of of CSUM. > DO IF ($casenum = 1) OR (Event < LAG(Event)). > COMPUTE semVal = semcuml. > ELSE. > COMPUTE semVal = semcuml - LAG(semcuml). > END IF. > > *Now we can create the CSUM within the year grouping. > DO IF ($casenum = 1) OR (Year <> LAG(Year)). > COMPUTE yrevt = 1. > COMPUTE yrcuml = semVal. > ELSE. > COMPUTE yrevt = LAG(yrevt) + 1. > COMPUTE yrcuml = semVal + LAG(yrcuml). > END IF. > EXECUTE. > ****************************************. > > > > ----- > Andy W > [hidden email] > http://andrewpwheeler.wordpress.com/ > -- > View this message in context: > http://spssx-discussion.1045642.n5.nabble.com/adding-cumulative-count-series-records-tp5725487p5725490.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except > the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by Andy W
Following Andy,
First transforming to original values and then using CREATE to obtain CSUM. A bit more concise ;-) COMPUTE semVal=SUM(semcuml,(sem EQ LAG(sem))* -1 * LAG(semcuml)). SPLIT FILE BY Year. CREATE YrCuml=CSUM(semVal). SPLIT FILE OFF.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by wsu_wright
Thanks David. Just curious, while the current file is not large, if I
were working on a very large file, does the split file introduce a possible drag on processing speed similar to sorting large files? On Mon, Apr 21, 2014 at 3:58 AM, David Marso wrote: > Following Andy, > First transforming to original values and then using CREATE to obtain > CSUM. > A bit more concise ;-) > > COMPUTE semVal=SUM(semcuml,(sem EQ LAG(sem))* -1 * LAG(semcuml)). > SPLIT FILE BY Year. > CREATE YrCuml=CSUM(semVal). > SPLIT FILE OFF. > > > > Andy W wrote >> First, thank you for the clear example! Here what I did was to turn >> the >> cumulative counts into the original values, and then do the new >> cumulative >> sum for the different grouping. >> >> ****************************************. >> DATA LIST FREE / year event (2F4.0) sem (A1) semcuml (F3.0). >> BEGIN DATA >> 2010 1 F 125 >> 2010 2 F 200 >> 2010 3 F 310 >> 2010 4 F 489 >> 2010 5 F 601 >> 2010 1 S 138 >> 2010 2 S 469 >> 2010 3 S 510 >> 2010 1 U 59 >> 2010 2 U 257 >> 2010 3 U 325 >> 2010 4 U 423 >> 2011 1 F 132 >> 2011 2 F 207 >> 2011 3 F 317 >> 2011 4 F 496 >> 2011 5 F 608 >> 2011 1 S 145 >> 2011 2 S 476 >> 2011 3 S 517 >> 2011 1 U 66 >> 2011 2 U 264 >> 2011 3 U 332 >> 2011 4 U 430 >> END DATA. >> >> *Lets make actual counts instead of of CSUM. >> DO IF ($casenum = 1) OR (Event < LAG(Event)). >> COMPUTE semVal = semcuml. >> ELSE. >> COMPUTE semVal = semcuml - LAG(semcuml). >> END IF. >> >> *Now we can create the CSUM within the year grouping. >> DO IF ($casenum = 1) OR (Year <> LAG(Year)). >> COMPUTE yrevt = 1. >> COMPUTE yrcuml = semVal. >> ELSE. >> COMPUTE yrevt = LAG(yrevt) + 1. >> COMPUTE yrcuml = semVal + LAG(yrcuml). >> END IF. >> EXECUTE. >> ****************************************. > > > > > > ----- > Please reply to the list and not to my personal email. > Those desiring my consulting or training services please feel free to > email me. > --- > "Nolite dare sanctum canibus neque mittatis margaritas vestras ante > porcos ne forte conculcent eas pedibus suis." > Cum es damnatorum possederunt porcos iens ut salire off sanguinum > cliff in abyssum?" > -- > View this message in context: > http://spssx-discussion.1045642.n5.nabble.com/adding-cumulative-count-series-records-tp5725487p5725555.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except > the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
I can't imagine so.
All that should be required is some clean up and initialization code at the beginning of each split.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
The interesting thing about CREATE is that,
although it might seem to be a transformation, it is actually a procedure.
Transformations do not care about split files, but procedures do,
which is why CREATE is useful with groups. (We should mention this
in the CSR). SHIFT VALUES is, similarly, a procedure honoring split,
not a transformation command. The CSR does mention that.
But, as David imagined, apart from the initial order requirement for using SPLIT FILES, there is negligible overhead in having splits on, unless, I suppose, every case was a new split. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: David Marso <[hidden email]> To: [hidden email], Date: 04/21/2014 03:35 AM Subject: Re: [SPSSX-L] adding cumulative count series records Sent by: "SPSSX(r) Discussion" <[hidden email]> I can't imagine so. All that should be required is some clean up and initialization code at the beginning of each split. David Wright-6 wrote > Thanks David. Just curious, while the current file is not large, if I > were working on a very large file, does the split file introduce a > possible drag on processing speed similar to sorting large files? > > > On Mon, Apr 21, 2014 at 3:58 AM, David Marso wrote: > >> Following Andy, >> First transforming to original values and then using CREATE to obtain >> CSUM. >> A bit more concise ;-) >> >> COMPUTE semVal=SUM(semcuml,(sem EQ LAG(sem))* -1 * LAG(semcuml)). >> SPLIT FILE BY Year. >> CREATE YrCuml=CSUM(semVal). >> SPLIT FILE OFF. >> >> >> >> Andy W wrote >>> First, thank you for the clear example! Here what I did was to turn >>> the >>> cumulative counts into the original values, and then do the new >>> cumulative >>> sum for the different grouping. >>> >>> ****************************************. >>> DATA LIST FREE / year event (2F4.0) sem (A1) semcuml (F3.0). >>> BEGIN DATA >>> 2010 1 F 125 >>> 2010 2 F 200 >>> 2010 3 F 310 >>> 2010 4 F 489 >>> 2010 5 F 601 >>> 2010 1 S 138 >>> 2010 2 S 469 >>> 2010 3 S 510 >>> 2010 1 U 59 >>> 2010 2 U 257 >>> 2010 3 U 325 >>> 2010 4 U 423 >>> 2011 1 F 132 >>> 2011 2 F 207 >>> 2011 3 F 317 >>> 2011 4 F 496 >>> 2011 5 F 608 >>> 2011 1 S 145 >>> 2011 2 S 476 >>> 2011 3 S 517 >>> 2011 1 U 66 >>> 2011 2 U 264 >>> 2011 3 U 332 >>> 2011 4 U 430 >>> END DATA. >>> >>> *Lets make actual counts instead of of CSUM. >>> DO IF ($casenum = 1) OR (Event < LAG(Event)). >>> COMPUTE semVal = semcuml. >>> ELSE. >>> COMPUTE semVal = semcuml - LAG(semcuml). >>> END IF. >>> >>> *Now we can create the CSUM within the year grouping. >>> DO IF ($casenum = 1) OR (Year <> LAG(Year)). >>> COMPUTE yrevt = 1. >>> COMPUTE yrcuml = semVal. >>> ELSE. >>> COMPUTE yrevt = LAG(yrevt) + 1. >>> COMPUTE yrcuml = semVal + LAG(yrcuml). >>> END IF. >>> EXECUTE. >>> ****************************************. >> >> >> >> >> >> ----- >> Please reply to the list and not to my personal email. >> Those desiring my consulting or training services please feel free to >> email me. >> --- >> "Nolite dare sanctum canibus neque mittatis margaritas vestras ante >> porcos ne forte conculcent eas pedibus suis." >> Cum es damnatorum possederunt porcos iens ut salire off sanguinum >> cliff in abyssum?" >> -- >> View this message in context: >> http://spssx-discussion.1045642.n5.nabble.com/adding-cumulative-count-series-records-tp5725487p5725555.html >> Sent from the SPSSX Discussion mailing list archive at Nabble.com. >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> > LISTSERV@.UGA > (not to SPSSX-L), with no body text except >> the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ----- Please reply to the list and not to my personal email. Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/adding-cumulative-count-series-records-tp5725487p5725557.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
Of course one could do the requested task with simply transformations. COMPUTE #semVal=SUM(semcuml,(sem EQ LAG(sem))* -1 * LAG(semcuml)). COMPUTE YrCuml=SUM(#semVal,(YEAR EQ LAG(Year))*LAG(YrCuml)). In fact it could be slammed into an eye bleeder one liner which will dazzle any who attempt to parse it without copious amounts of coffee ;-) COMPUTE YrCum2=SUM(SUM(semcuml,(sem EQ LAG(sem))* -1 * LAG(semcuml)),(YEAR EQ LAG(Year))*LAG(YrCuml)). Bletch ;-) Are we having fun yet?
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Very nice code, once I understand how it works! The SUM handles the missing LAG()s and the comparisons resolve to 0 or 1...
I believe the last LAG(YrCuml) should read LAG(semCuml). And then there was the Event counter. So all thats needed are 3 transformations and an EXECUTE. COMPUTE semVal=SUM(semcuml,(sem EQ LAG(sem))* -1 * LAG(semcuml)). COMPUTE YrCum2=SUM(SUM(semcuml,(sem EQ LAG(sem)) * -1 * LAG(semcuml)), (YEAR EQ LAG(Year))*LAG(semCuml)). COMPUTE YrEvnt=SUM(1,(YEAR EQ LAG(Year))*(LAG(YrEvnt))). EXECUTE. /PR |
Free forum by Nabble | Edit this page |