Hello Everyone, I’m trying to run below lag syntax on an hourly interval dataset, with 30,000+ records, to calculate averages over the past 52 weeks of the same hour and day but it is not working. SPSS keeps saying Transformations Pending. Any idea what is wrong with the syntax? I ran it on SPSS 19 for Windows. IF $CASENUM > 8736 Test_L52=(SUM(LAG(Test,168),LAG(Test,336),LAG(Test,504),LAG(Test,672), LAG(Test,840),LAG(Test,1008),LAG(Test,1176),LAG(Test,1344),LAG(Test,1512),LAG(Test,1680), LAG(Test,1848),LAG(Test,2016),LAG(Test,2184),LAG(Test,2352),LAG(Test,2520),LAG(Test,2688), LAG(Test,2856),LAG(Test,3024),LAG(Test,3192),LAG(Test,3360),LAG(Test,3528),LAG(Test,3696), LAG(Test,3864),LAG(Test,4032),LAG(Test,4200),LAG(Test,4368),LAG(Test,4536),LAG(Test,4704), LAG(Test,4872),LAG(Test,5040),LAG(Test,5208),LAG(Test,5376),LAG(Test,5544),LAG(Test,5712), LAG(Test,5880),LAG(Test,6048),LAG(Test,6216),LAG(Test,6384),LAG(Test,6552),LAG(Test,6720), LAG(Test,6888),LAG(Test,7056),LAG(Test,7224),LAG(Test,7392),LAG(Test,7560),LAG(Test,7728), LAG(Test,7896),LAG(Test,8064),LAG(Test,8232),LAG(Test,8400),LAG(Test,8568),LAG(Test,8736)))/52. Thank in advance, Boreak This email is intended solely for the named addressee. |
HI Boreak,
Try adding an execute after your last line of syntax. ... LAG(Test,7896),LAG(Test,8064),LAG(Test,8232),LAG(Test,8400),LAG(Test,8568),LAG(Test,8736)))/52. execute. Emma ________________________________________ From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Boreak Silk [[hidden email]] Sent: Thursday, June 06, 2013 6:46 PM To: [hidden email] Subject: Problem running lag syntax Hello Everyone, I’m trying to run below lag syntax on an hourly interval dataset, with 30,000+ records, to calculate averages over the past 52 weeks of the same hour and day but it is not working. SPSS keeps saying Transformations Pending. Any idea what is wrong with the syntax? I ran it on SPSS 19 for Windows. IF $CASENUM > 8736 Test_L52=(SUM(LAG(Test,168),LAG(Test,336),LAG(Test,504),LAG(Test,672), LAG(Test,840),LAG(Test,1008),LAG(Test,1176),LAG(Test,1344),LAG(Test,1512),LAG(Test,1680), LAG(Test,1848),LAG(Test,2016),LAG(Test,2184),LAG(Test,2352),LAG(Test,2520),LAG(Test,2688), LAG(Test,2856),LAG(Test,3024),LAG(Test,3192),LAG(Test,3360),LAG(Test,3528),LAG(Test,3696), LAG(Test,3864),LAG(Test,4032),LAG(Test,4200),LAG(Test,4368),LAG(Test,4536),LAG(Test,4704), LAG(Test,4872),LAG(Test,5040),LAG(Test,5208),LAG(Test,5376),LAG(Test,5544),LAG(Test,5712), LAG(Test,5880),LAG(Test,6048),LAG(Test,6216),LAG(Test,6384),LAG(Test,6552),LAG(Test,6720), LAG(Test,6888),LAG(Test,7056),LAG(Test,7224),LAG(Test,7392),LAG(Test,7560),LAG(Test,7728), LAG(Test,7896),LAG(Test,8064),LAG(Test,8232),LAG(Test,8400),LAG(Test,8568),LAG(Test,8736)))/52. Thank in advance, Boreak This email is intended solely for the named addressee. If you are not the addressee indicated please delete it immediately. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Boreak Silk
If I understand correctly, each record in your data is a consecutive hour. For each combination of day (1 through 7) and hour (0 through 23) you'd like to calculate a mean over 52 (weeks) data point, right? So you'll have 168 (7 days * 24 hours) new variables.
In that case, you could try something like the following (you'll need to have the Python essentials installed for this): *1. Create test data. input program. loop # = 1 to 8736. compute outcome = rv.ber(.5). end case. end loop. end file. end input program. exe. *2. Create week, day, hour day_hour as variables. compute week = tru(($casenum - 1) /168) + 1. compute day = mod(tru(($casenum - 1) / 24) ,7) + 1. compute hour = mod(($casenum - 1),24). compute day_hour=100*day + hour./*temporary variable, needed later on. exe./*better omitted but allows for quick visual check. *3. Important. format day_hour(f3.0). *4. Compute new variables. begin program. import spss,spssaux cmd = spssaux.CreateXMLOutput("means outcome by day_hour.") for day in range(1,8): for hour in range(24): mean = float(spssaux.GetValuesFromXMLWorkspace(cmd,"Report",cellAttrib="number",colCategory="Mean",rowCategory=str(100 * day + hour))[0]) spss.Submit('compute mean_day_%(day)d_hour_%(hour)d = %(mean)f.'%locals()) end program. *5. Visual inspection. means outcome by day by hour. *6. Delete temp variable. delete variable day_hour. HTH, Ruben |
In reply to this post by Boreak Silk
IMO you will probably want to look at the CREATE command, in particular the prior moving average. If you create a categorical variable to ID the hour of day and day of week, you can then use split files to create the running mean within that category.
|
In reply to this post by Ruben Geert van den Berg
Although a lot of time has already been invested in the project, I think that, depending on how time is coded, the problem could be worked using aggregate. In a one year period you have 168 blocks of 52 records, each block corresponds to an hour of the week. So, if time were numbered correctly and sorted correctly, the break variable would be HourOfWeek.
Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Ruben Geert van den Berg Sent: Friday, June 07, 2013 4:14 AM To: [hidden email] Subject: Re: Problem running lag syntax If I understand correctly, each record in your data is a consecutive hour. For each combination of day (1 through 7) and hour (0 through 23) you'd like to calculate a mean over 52 (weeks) data point, right? So you'll have 168 (7 days * 24 hours) new variables. In that case, you could try something like the following (you'll need to have the Python essentials installed for this): *1. Create test data. input program. loop # = 1 to 8736. compute outcome = rv.ber(.5). end case. end loop. end file. end input program. exe. *2. Create week, day, hour day_hour as variables. compute week = tru(($casenum - 1) /168) + 1. compute day = mod(tru(($casenum - 1) / 24) ,7) + 1. compute hour = mod(($casenum - 1),24). compute day_hour=100*day + hour./*temporary variable, needed later on. exe./*better omitted but allows for quick visual check. *3. Important. format day_hour(f3.0). *4. Compute new variables. begin program. import spss,spssaux cmd = spssaux.CreateXMLOutput("means outcome by day_hour.") for day in range(1,8): for hour in range(24): mean = float(spssaux.GetValuesFromXMLWorkspace(cmd,"Report",cellAttrib="number",colCategory="Mean",rowCategory=str(100 * day + hour))[0]) spss.Submit('compute mean_day_%(day)d_hour_%(hour)d = %(mean)f.'%locals()) end program. *5. Visual inspection. means outcome by day by hour. *6. Delete temp variable. delete variable day_hour. HTH, Ruben -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Problem-running-lag-syntax-tp5720617p5720622.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Aggregate won't work for running averages. In addition to Emma's initial advice (problem solved via execute!) here is an example updating Ruben's fake data slightly. I'm confused where Ruben thinks the OP is asking for 168 seperate variables, but maybe I'm confused.
****************************************************. *Fake data. input program. loop #i = 1 to 100000. compute outcome = rv.ber(.5). end case. end loop. end file. end input program. exe. compute #daybase = DATE.MDY(01,01,2000). compute #timebase = 0. compute myDateTime = #daybase + #timebase + ($casenum*600). formats myDateTime (DATETIME17). *Categorical variable. compute #hour = XDATE.HOUR(myDateTime). compute #day = XDATE.WKDAY(myDateTime). compute cat = #day + (#hour/100). *Split file and PMA. sort cases by cat myDateTime. split file by cat. CREATE /mean_out=PMA(outcome 52). split file off. ****************************************************. |
Administrator
|
In reply to this post by Boreak Silk
Use EXECUTE to force a data pass or even better run some stats to check the result.
Looking at that code compels me to suggest something more along the lines of the following. Certainly more maintainable than dealing with those hard coded numbers! DEFINE L52 (!POS !TOKENS(1) / !POS !TOKENS(1) ) . DO IF $CASENUM GT 8736. !DO !I=168 !TO 8736 !BY 168. COMPUTE !1=SUM(!1,LAG(!2,!I)). !DOEND COMPUTE !1=!1/52. END IF. !ENDDEFINE . L52 XL_52 X . EXECUTE.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
That can be further refined by noticing that cases 168 away from each other will differ by only 2 elements.
Use this to implement a provisional means style approach rather than adding 52 things together when 50 have already been added and that lives above in the file. Leaving the solution for you (TODO) as an exercise in efficient coding ;-)! --
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Thanks you all for your contribution, especially to David - your
approach worked well without Python. Boreak -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso Sent: Saturday, 8 June 2013 4:47 AM To: [hidden email] Subject: Re: Problem running lag syntax That can be further refined by noticing that cases 168 away from each other will differ by only 2 elements. Use this to implement a provisional means style approach rather than adding 52 things together when 50 have already been added and that lives above in the file. Leaving the solution for you (TODO) as an exercise in efficient coding ;-)! -- David Marso wrote > Use EXECUTE to force a data pass or even better run some stats to > check the result. > Looking at that code compels me to suggest something more along the > lines of the following. > Certainly more maintainable than dealing with those hard coded numbers! > > DEFINE L52 (!POS !TOKENS(1) / !POS !TOKENS(1) ) . > DO IF $CASENUM GT 8736. > !DO !I=168 !TO 8736 !BY 168. > COMPUTE !1=SUM(!1,LAG(!2,!I)). > !DOEND > COMPUTE !1=!1/52. > END IF. > > !ENDDEFINE . > > L52 XL_52 X . > EXECUTE. > Boreak Silk wrote >> Hello Everyone, >> >> >> >> I'm trying to run below lag syntax on an hourly interval dataset, >> with 30,000+ records, to calculate averages >> >> over the past 52 weeks of the same hour and day but it is not >> SPSS keeps saying >> >> Transformations Pending. Any idea what is wrong with the syntax? I >> ran it on SPSS 19 for Windows. >> >> >> >> IF $CASENUM > 8736 >> Test_L52=(SUM(LAG(Test,168),LAG(Test,336),LAG(Test,504),LAG(Test,672) >> , >> >> >> LAG(Test,840),LAG(Test,1008),LAG(Test,1176),LAG(Test,1344),LAG(Test,1 >> 512 >> ),LAG(Test,1680), >> >> LAG(Test,1848),LAG(Test,2016),LAG(Test,2184),LAG(Test,2352),LAG(Test, >> 252 >> 0),LAG(Test,2688), >> >> LAG(Test,2856),LAG(Test,3024),LAG(Test,3192),LAG(Test,3360),LAG(Test, >> 352 >> 8),LAG(Test,3696), >> >> LAG(Test,3864),LAG(Test,4032),LAG(Test,4200),LAG(Test,4368),LAG(Test, >> 453 >> 6),LAG(Test,4704), >> >> LAG(Test,4872),LAG(Test,5040),LAG(Test,5208),LAG(Test,5376),LAG(Test, >> 554 >> 4),LAG(Test,5712), >> >> >> LAG(Test,5880),LAG(Test,6048),LAG(Test,6216),LAG(Test,6384),LAG(Test, >> 655 >> 2),LAG(Test,6720), >> >> LAG(Test,6888),LAG(Test,7056),LAG(Test,7224),LAG(Test,7392),LAG(Test, >> 756 >> 0),LAG(Test,7728), >> >> >> LAG(Test,7896),LAG(Test,8064),LAG(Test,8232),LAG(Test,8400),LAG(Test, >> 856 >> 8),LAG(Test,8736)))/52. >> >> >> >> >> >> Thank in advance, >> >> >> >> Boreak >> >> >> >> This email is intended solely for the named addressee. >> If you are not the addressee indicated please delete it immediately. ----- Please reply to the list and not to my personal email. Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Problem-running-lag-syntax -tp5720617p5720636.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |