Good Afternoon All, I’m hoping someone in here would have a solution for a data manipulation problem I am facing, and one which eludes me: I have cases, some of which are consecutive in time and the only indicator I have is the month – so for a given case, I have an Identifier, and the month (plus other data which are no help!) The problem is that for a given identifier, there are breaks between continuous periods, and I need to be able to measure the length of time any given contiguous period – somewhat like a length of stay in a hospital. I have done something like this in the past with regular dates, but cannot for the life of me remember how I did so, and would appreciate some advice. Data may look like this: ID Month 1a 3 1a 6 1a 7 1a 8 1a 11 1a 12 2a 9 2a 11 2a 12 2a 15 2a 16 2a 17 3a 1 3a 3 3a 4 3a 7 3a 8 3a 9 So, 1a has two spans of 3 consecutive months and 2 months, 2a has 1 span of 2 consecutive months, etc. TIA Mike |
An example using scratch variables to “copy” the sequence identifiers across variable number of consecutive cases: ** sample data. NEW FILE. DATA LIST FREE /id (a2) month (f8.0). BEGIN DATA 1a 3 1a 6 1a 7 1a 8 1a 11 1a 12 2a 9 2a 11 2a 12 2a 15 2a 16 2a 17 3a 1 3a 3 3a 4 3a 7 3a 8 3a 9 END DATA. DATASET NAME tst WINDOW= FRONT. COMPUTE month_seq = month. IF id NE LAG(id) month_seq = 0. IF id EQ LAG(id) month_seq = LAG(month)+1. COMPUTE same_seq = month = month_seq. COMPUTE new_seq = same_seq =0. FREQUENCIES new_seq. FILTER BY new_seq. ** RANK requires numeric values to work for each ID values. COMPUTE id_nbr = numeric(substr(id,1,1),f8.0). RANK month BY id_nbr /RANK INTO seq_nbr. FILTER OFF . DO IF new_seq. COMPUTE #seqhold = seq_nbr. ELSE . COMPUTE seq_nbr = #seqhold. END IF . EXECUTE . Jim Marks Director, Market Research x1616 From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Roberts, Michael Good Afternoon All, I’m hoping someone in here would have a solution for a data manipulation problem I am facing, and one which eludes me: I have cases, some of which are consecutive in time and the only indicator I have is the month – so for a given case, I have an Identifier, and the month (plus other data which are no help!) The problem is that for a given identifier, there are breaks between continuous periods, and I need to be able to measure the length of time any given contiguous period – somewhat like a length of stay in a hospital. I have done something like this in the past with regular dates, but cannot for the life of me remember how I did so, and would appreciate some advice. Data may look like this: ID Month 1a 3 1a 6 1a 7 1a 8 1a 11 1a 12 2a 9 2a 11 2a 12 2a 15 2a 16 2a 17 3a 1 3a 3 3a 4 3a 7 3a 8 3a 9 So, 1a has two spans of 3 consecutive months and 2 months, 2a has 1 span of 2 consecutive months, etc. TIA Mike |
Jim, I just ran through the example you sent and it looks like it does what I need to get done. Thank You! May your weekend be filled with many good things!!! J Best Regards Mike From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Marks, Jim An example using scratch variables to “copy” the sequence identifiers across variable number of consecutive cases: ** sample data. NEW FILE. DATA LIST FREE /id (a2) month (f8.0). BEGIN DATA 1a 3 1a 6 1a 7 1a 8 1a 11 1a 12 2a 9 2a 11 2a 12 2a 15 2a 16 2a 17 3a 1 3a 3 3a 4 3a 7 3a 8 3a 9 END DATA. DATASET NAME tst WINDOW= FRONT. COMPUTE month_seq = month. IF id NE LAG(id) month_seq = 0. IF id EQ LAG(id) month_seq = LAG(month)+1. COMPUTE same_seq = month = month_seq. COMPUTE new_seq = same_seq =0. FREQUENCIES new_seq. FILTER BY new_seq. ** RANK requires numeric values to work for each ID values. COMPUTE id_nbr = numeric(substr(id,1,1),f8.0). RANK month BY id_nbr /RANK INTO seq_nbr. FILTER OFF . DO IF new_seq. COMPUTE #seqhold = seq_nbr. ELSE . COMPUTE seq_nbr = #seqhold. END IF . EXECUTE . Jim Marks Director, Market Research x1616 From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Roberts, Michael Good Afternoon All, I’m hoping someone in here would have a solution for a data manipulation problem I am facing, and one which eludes me: I have cases, some of which are consecutive in time and the only indicator I have is the month – so for a given case, I have an Identifier, and the month (plus other data which are no help!) The problem is that for a given identifier, there are breaks between continuous periods, and I need to be able to measure the length of time any given contiguous period – somewhat like a length of stay in a hospital. I have done something like this in the past with regular dates, but cannot for the life of me remember how I did so, and would appreciate some advice. Data may look like this: ID Month 1a 3 1a 6 1a 7 1a 8 1a 11 1a 12 2a 9 2a 11 2a 12 2a 15 2a 16 2a 17 3a 1 3a 3 3a 4 3a 7 3a 8 3a 9 So, 1a has two spans of 3 consecutive months and 2 months, 2a has 1 span of 2 consecutive months, etc. TIA Mike |
Administrator
|
In reply to this post by Roberts, Michael-2
I believe this does the same as other solution.
To get the length of runs use AGGREGATE. IF ID<>LAG(ID) OR $CASENUM=1 #SEQ=1. IF MONTH-LAG(MONTH) >1 #SEQ=#SEQ+1. COMPUTE SEQ_nbr=#SEQ. LIST.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
OR A mind bending one liner ;-) Grok this and you'll get dizzy .... I wouldn't use this in production code because it is somewhat cranially oppressive. OTOH, the basic technique might be useful in some applications. ---------------------- COMPUTE SEQ_nbr3= SUM( (ID<>LAG(ID) OR $CASENUM=1) , ( ID=LAG(ID)) * SUM( (MONTH-LAG(MONTH)>1 ), LAG(SEQ_nbr3) ) ). LIST.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
definitely "cranially oppressive". Succinct but cranially oppressive.
Art On 1/17/2011 5:27 AM, David Marso wrote: > David Marso wrote: >> I believe this does the same as other solution. >> To get the length of runs use AGGREGATE. >> IF ID<>LAG(ID) OR $CASENUM=1 #SEQ=1. >> IF MONTH-LAG(MONTH)>1 #SEQ=#SEQ+1. >> COMPUTE SEQ_nbr=#SEQ. >> LIST. >> > OR A mind bending one liner ;-) > Grok this and you'll get dizzy .... > I wouldn't use this in production code because it is somewhat cranially > oppressive. > OTOH, the basic technique might be useful in some applications. > ---------------------- > COMPUTE SEQ_nbr3= SUM( > (ID<>LAG(ID) OR $CASENUM=1) , > ( ID=LAG(ID)) * SUM( > (MONTH-LAG(MONTH)>1 ), > LAG(SEQ_nbr3) > ) > ). > LIST. > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/How-to-identify-blocks-of-consecutive-cases-in-SPSS-tp3342019p3344167.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
In reply to this post by David Marso
David,
Thank you very much for both solutions to my data manipulation problem! Your second solution is, as Art says, succinct, but warps the edges of my mind, but I will get to the bottom of it, yet! :) Again, thanks for the help. Mike -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Marso Sent: Monday, January 17, 2011 5:27 AM To: [hidden email] Subject: Re: How to identify blocks of consecutive cases in SPSS? David Marso wrote: > > I believe this does the same as other solution. > To get the length of runs use AGGREGATE. > IF ID<>LAG(ID) OR $CASENUM=1 #SEQ=1. > IF MONTH-LAG(MONTH) >1 #SEQ=#SEQ+1. > COMPUTE SEQ_nbr=#SEQ. > LIST. > OR A mind bending one liner ;-) Grok this and you'll get dizzy .... I wouldn't use this in production code because it is somewhat cranially oppressive. OTOH, the basic technique might be useful in some applications. ---------------------- COMPUTE SEQ_nbr3= SUM( (ID<>LAG(ID) OR $CASENUM=1) , ( ID=LAG(ID)) * SUM( (MONTH-LAG(MONTH)>1 ), LAG(SEQ_nbr3) ) ). LIST. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/How-to-identify-blocks-of-consecutive-cases-in-SPSS-tp3342019p3344167.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |