|
I have a data set with each patient call made to our call center within a
certain time period; each call is a record. There is a unique patient ID, call date, and call time (among many other variables). What I'm trying to do for each patient is count up all calls that preceded each call within the last 90 days. This is turning out to be more complicated than I thought. PRIOR calls were all initialized to zero, then the **first** record for each ID was updated with data from another file. There are no missing values. Cases are sorted by ID CALL_DATE CALL_TIME. Some patients made more than one call on the same day. At first I thought this would work: If ID=lag(ID) and CTIME.DAYS(CALL_DATE-lag(CALL_DATE)) <= 90 prior_calls=lag(prior_calls)+1. This is what I got: ID CALL_DATE CALL_TIME PRIOR_CALLS 1 29-DEC-04 16:12 0 1 12-JAN-05 11:57 1 1 01-FEB-05 14:26 2 1 11-FEB-05 11:14 3 1 14-FEB-05 18:39 4 1 10-MAR-05 15:19 5 1 02-APR-05 12:34 6 1 02-APR-05 16:16 7 1 14-MAY-05 08:19 8 Problem here. What would be accurate would be only the number of calls made in the 90 days prior to each call, i.e. ID CALL_DATE CALL_TIME PRIOR_CALLS 1 29-DEC-04 16:12 0 1 12-JAN-05 11:57 1 1 01-FEB-05 14:26 2 1 11-FEB-05 11:14 3 1 14-FEB-05 18:39 4 1 10-MAR-05 15:19 5 1 02-APR-05 12:34 5 1 02-APR-05 16:16 6 1 14-MAY-05 08:19 4 But I can't figure out the syntax needed to do this - at least with current data structure. What approach could I use? I'm wondering if this is going to entail restructuring my data long to wide and then applying vector/loop syntax to loop through all the call dates for each record (not in my comfort zone!) Thanks! Tanya Temkin Research Associate AACC Reporting Northern California Regional Office The Permanente Medical Group (510) 625-6680 NOTICE TO RECIPIENT: If you are not the intended recipient of this e-mail, you are prohibited from sharing, copying, or otherwise using or disclosing its contents. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and permanently delete this e-mail and any attachments without reading, forwarding or saving them. Thank you. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Thanks for the advice, Dennis .... I started to follow your suggestion,
and got this far - up to the summing of days from prior call date (this is different set of test data from one in prior posting): (days diff=time in days from previous call; r_sum=running sum of time from 1st call) ID call-num call_date days_diff r_sum 1000xx 1 29-Dec-200y 0 0 1000xx 2 12-Jan-200x 14 14 1000xx 3 02-Feb-200x 21 35 1000xx 4 11-Feb-200x 9 44 1000xx 5 09-Mar-200x 26 70 1000xx 6 10-Mar-200x 1 71 1000xx 7 02-Apr-200x 23 94 1000xx 8 04-Apr-200x 2 96 1000xx 9 09-Apr-200x 5 101 1000xx 10 14-May-200x 35 136 1000xx 11 20-May-200x 6 142 1000xx 12 03-Jun-200x 14 156 1000xx 13 26-Jun-200x 23 179 1000xx 14 27-Jun-200x 1 180 1000xx 15 27-Jun-200x 0 180 1000xx 16 28-Jun-200x 1 181 1000xx 17 29-Jun-200x 1 182 1000xx 18 05-Jul-200x 6 188 1000xx 19 10-Jul-200x 5 193 1000xx 20 12-Aug-200x 33 226 1001xx 0 25-Aug-200x 0 0 1001xx 1 12-Nov-200x 79 79 1002xx 1 06-Feb-200x 0 0 1002xx 2 09-Feb-200x 3 3 1002xx 3 08-Aug-200x 180 183 1002xx 4 09-Aug-200x 1 184 1002xx 5 12-Aug-200x 3 187 1002xx 6 14-Aug-200x 2 189 1002xx 7 15-Aug-200x 1 190 1002xx 8 31-Aug-200x 16 206 1002xx 9 05-Sep-200x 5 211 1002xx 10 03-Nov-200x 59 270 1002xx 11 09-Dec-200x 36 306 1002xx 12 11-Dec-200x 2 308 The wheels started to come off when I got to the first 90+ day value (7th record) ....the previous dates back to 12 Jan 05 are in the prior 90 day period. It's not just a matter of incrementing by 1 for each 90 day period following first call, but checking to see if each prior call is in the 90-day period preceding the current call. Are you sure I am not going to have to resort to a vector/loop routine to loop through each call for each ID? Or am I missing something in your instructions? Thanks again.... Tanya Temkin Research Associate AACC Reporting Northern California Regional Office The Permanente Medical Group (510) 625-6680 NOTICE TO RECIPIENT: If you are not the intended recipient of this e-mail, you are prohibited from sharing, copying, or otherwise using or disclosing its contents. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and permanently delete this e-mail and any attachments without reading, forwarding or saving them. Thank you. "Dennis Deck" <[hidden email]> 07/17/2008 12:42 AM To Tanya L TemKin/CA/KAIPERM@KAIPERM cc Subject RE: computations with dates That is tricky because the number of prior records to sum across is unknown and keeps changing from ID to ID. One approach might be something like: number the records within ID track the prior comparison record number starting with 1 sum the days from the prior comparison record and keep a running sum from prior comparion if prior comparison is more than 90 days then increment and resum until days are less than 90 Dennis -----Original Message----- From: Tanya Temkin [mailto:[hidden email]] Sent: Wednesday, July 16, 2008 1:39 PM Subject: computations with dates I have a data set with each patient call made to our call center within a certain time period; each call is a record. There is a unique patient ID, call date, and call time (among many other variables). What I'm trying to do for each patient is count up all calls that preceded each call within the last 90 days. This is turning out to be more complicated than I thought. PRIOR calls were all initialized to zero, then the **first** record for each ID was updated with data from another file. There are no missing values. Cases are sorted by ID CALL_DATE CALL_TIME. Some patients made more than one call on the same day. At first I thought this would work: If ID=lag(ID) and CTIME.DAYS(CALL_DATE-lag(CALL_DATE)) <= 90 prior_calls=lag(prior_calls)+1. This is what I got: ID CALL_DATE CALL_TIME PRIOR_CALLS 1 29-DEC-04 16:12 0 1 12-JAN-05 11:57 1 1 01-FEB-05 14:26 2 1 11-FEB-05 11:14 3 1 14-FEB-05 18:39 4 1 10-MAR-05 15:19 5 1 02-APR-05 12:34 6 1 02-APR-05 16:16 7 1 14-MAY-05 08:19 8 Problem here. What would be accurate would be only the number of calls made in the 90 days prior to each call, i.e. ID CALL_DATE CALL_TIME PRIOR_CALLS 1 29-DEC-04 16:12 0 1 12-JAN-05 11:57 1 1 01-FEB-05 14:26 2 1 11-FEB-05 11:14 3 1 14-FEB-05 18:39 4 1 10-MAR-05 15:19 5 1 02-APR-05 12:34 5 1 02-APR-05 16:16 6 1 14-MAY-05 08:19 4 But I can't figure out the syntax needed to do this - at least with current data structure. What approach could I use? I'm wondering if this is going to entail restructuring my data long to wide and then applying vector/loop syntax to loop through all the call dates for each record (not in my comfort zone!) Thanks! Tanya Temkin Research Associate AACC Reporting Northern California Regional Office The Permanente Medical Group (510) 625-6680 NOTICE TO RECIPIENT: If you are not the intended recipient of this e-mail, you are prohibited from sharing, copying, or otherwise using or disclosing its contents. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and permanently delete this e-mail and any attachments without reading, forwarding or saving them. Thank you. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Tanya:
A brute force solution, based on a solution to a similar problem from Art Kendall: *** SAMPLE DATA. data list free /id (f8.0) date (date10) time (time5). Begin data 1 29-DEC-04 16:12 1 12-JAN-05 11:57 1 01-FEB-05 14:26 1 11-FEB-05 11:14 1 14-FEB-05 18:39 1 10-MAR-05 15:19 1 02-APR-05 12:34 1 02-APR-05 16:16 1 14-MAY-05 08:19 2 14-FEB-05 18:39 2 10-MAR-05 15:19 2 02-APR-06 12:34 2 11-APR-06 16:16 2 14-JUN-06 08:19 end data. *** create variables to flag if the nth lag is within 90 days. *** edit for the maximum number of cases within id. numeric pr_1 to pr_10. *** calculate the flags. do repeat v = pr_1 to pr_10 /i = 1 to 10 . if datediff(date,lag(date,i),"days") LE 90 and id = lag(id,i) v = 1. end repeat. *** create a variable to hold the number of lags. count priors =pr_1 to pr_10 (1). *** cleanup-- drop the flag variables. add files file = * /drop pr_1 to pr_10. *** trigger data pass. LIST. HTH --jim -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Tanya Temkin Sent: Thursday, July 17, 2008 1:54 PM To: [hidden email] Subject: Re: computations with dates Thanks for the advice, Dennis .... I started to follow your suggestion, and got this far - up to the summing of days from prior call date (this is different set of test data from one in prior posting): (days diff=time in days from previous call; r_sum=running sum of time from 1st call) ID call-num call_date days_diff r_sum 1000xx 1 29-Dec-200y 0 0 1000xx 2 12-Jan-200x 14 14 1000xx 3 02-Feb-200x 21 35 1000xx 4 11-Feb-200x 9 44 1000xx 5 09-Mar-200x 26 70 1000xx 6 10-Mar-200x 1 71 1000xx 7 02-Apr-200x 23 94 1000xx 8 04-Apr-200x 2 96 1000xx 9 09-Apr-200x 5 101 1000xx 10 14-May-200x 35 136 1000xx 11 20-May-200x 6 142 1000xx 12 03-Jun-200x 14 156 1000xx 13 26-Jun-200x 23 179 1000xx 14 27-Jun-200x 1 180 1000xx 15 27-Jun-200x 0 180 1000xx 16 28-Jun-200x 1 181 1000xx 17 29-Jun-200x 1 182 1000xx 18 05-Jul-200x 6 188 1000xx 19 10-Jul-200x 5 193 1000xx 20 12-Aug-200x 33 226 1001xx 0 25-Aug-200x 0 0 1001xx 1 12-Nov-200x 79 79 1002xx 1 06-Feb-200x 0 0 1002xx 2 09-Feb-200x 3 3 1002xx 3 08-Aug-200x 180 183 1002xx 4 09-Aug-200x 1 184 1002xx 5 12-Aug-200x 3 187 1002xx 6 14-Aug-200x 2 189 1002xx 7 15-Aug-200x 1 190 1002xx 8 31-Aug-200x 16 206 1002xx 9 05-Sep-200x 5 211 1002xx 10 03-Nov-200x 59 270 1002xx 11 09-Dec-200x 36 306 1002xx 12 11-Dec-200x 2 308 The wheels started to come off when I got to the first 90+ day value (7th record) ....the previous dates back to 12 Jan 05 are in the prior 90 day period. It's not just a matter of incrementing by 1 for each 90 day period following first call, but checking to see if each prior call is in the 90-day period preceding the current call. Are you sure I am not going to have to resort to a vector/loop routine to loop through each call for each ID? Or am I missing something in your instructions? Thanks again.... Tanya Temkin Research Associate AACC Reporting Northern California Regional Office The Permanente Medical Group (510) 625-6680 NOTICE TO RECIPIENT: If you are not the intended recipient of this e-mail, you are prohibited from sharing, copying, or otherwise using or disclosing its contents. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and permanently delete this e-mail and any attachments without reading, forwarding or saving them. Thank you. "Dennis Deck" <[hidden email]> 07/17/2008 12:42 AM To Tanya L TemKin/CA/KAIPERM@KAIPERM cc Subject RE: computations with dates That is tricky because the number of prior records to sum across is unknown and keeps changing from ID to ID. One approach might be something like: number the records within ID track the prior comparison record number starting with 1 sum the days from the prior comparison record and keep a running sum from prior comparion if prior comparison is more than 90 days then increment and resum until days are less than 90 Dennis -----Original Message----- From: Tanya Temkin [mailto:[hidden email]] Sent: Wednesday, July 16, 2008 1:39 PM Subject: computations with dates I have a data set with each patient call made to our call center within a certain time period; each call is a record. There is a unique patient ID, call date, and call time (among many other variables). What I'm trying to do for each patient is count up all calls that preceded each call within the last 90 days. This is turning out to be more complicated than I thought. PRIOR calls were all initialized to zero, then the **first** record for each ID was updated with data from another file. There are no missing values. Cases are sorted by ID CALL_DATE CALL_TIME. Some patients made more than one call on the same day. At first I thought this would work: If ID=lag(ID) and CTIME.DAYS(CALL_DATE-lag(CALL_DATE)) <= 90 prior_calls=lag(prior_calls)+1. This is what I got: ID CALL_DATE CALL_TIME PRIOR_CALLS 1 29-DEC-04 16:12 0 1 12-JAN-05 11:57 1 1 01-FEB-05 14:26 2 1 11-FEB-05 11:14 3 1 14-FEB-05 18:39 4 1 10-MAR-05 15:19 5 1 02-APR-05 12:34 6 1 02-APR-05 16:16 7 1 14-MAY-05 08:19 8 Problem here. What would be accurate would be only the number of calls made in the 90 days prior to each call, i.e. ID CALL_DATE CALL_TIME PRIOR_CALLS 1 29-DEC-04 16:12 0 1 12-JAN-05 11:57 1 1 01-FEB-05 14:26 2 1 11-FEB-05 11:14 3 1 14-FEB-05 18:39 4 1 10-MAR-05 15:19 5 1 02-APR-05 12:34 5 1 02-APR-05 16:16 6 1 14-MAY-05 08:19 4 But I can't figure out the syntax needed to do this - at least with current data structure. What approach could I use? I'm wondering if this is going to entail restructuring my data long to wide and then applying vector/loop syntax to loop through all the call dates for each record (not in my comfort zone!) Thanks! Tanya Temkin Research Associate AACC Reporting Northern California Regional Office The Permanente Medical Group (510) 625-6680 NOTICE TO RECIPIENT: If you are not the intended recipient of this e-mail, you are prohibited from sharing, copying, or otherwise using or disclosing its contents. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and permanently delete this e-mail and any attachments without reading, forwarding or saving them. Thank you. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Tanya Temkin
Shalom
Your task can be solved in more then one way , here is a many to many approach that is more common in SQ L programing . This approach may not work if you have a huge file ( like 1000000 subject with 500 call per subject ) . data list free /ID (f5)call_num (f4) call_date (edate14) days_diff (f5) r_sum(f6). Begin data 1000 1 29-Dec-2006 0 0 1000 2 12-Jan-2007 14 14 1000 3 02-Feb-2007 21 35 1000 4 11-Feb-2007 9 44 1000 5 09-Mar-2007 26 70 1000 6 10-Mar-2007 1 71 1000 7 02-Apr-2007 23 94 1000 8 04-Apr-2007 2 96 1000 9 09-Apr-2007 5 101 1000 10 14-May-2007 35 136 1000 11 20-May-2007 6 142 1000 12 03-Jun-2007 14 156 1000 13 26-Jun-2007 23 179 1000 14 27-Jun-2007 1 180 1000 15 27-Jun-2007 0 180 1000 16 28-Jun-2007 1 181 1000 17 29-Jun-2007 1 182 1000 18 05-Jul-2007 6 188 1000 19 10-Jul-2007 5 193 1000 20 12-Aug-2007 33 226 1001 0 25-Aug-2007 0 0 1001 1 12-Nov-2007 79 79 1002 1 06-Feb-2007 0 0 1002 2 09-Feb-2007 3 3 1002 3 08-Aug-2007 180 183 1002 4 09-Aug-2007 1 184 1002 5 12-Aug-2007 3 187 1002 6 14-Aug-2007 2 189 1002 7 15-Aug-2007 1 190 1002 8 31-Aug-2007 16 206 1002 9 05-Sep-2007 5 211 1002 10 03-Nov-2007 59 270 1002 11 09-Dec-2007 36 306 1002 12 11-Dec-2007 2 308 end data. save outfile='c:\temp\tmp.sav' . SORT CASES BY ID call_num . CASESTOVARS /ID = ID /INDEX = call_num /drop=days_diff r_sum /GROUPBY = VARIABLE . MATCH FILES /TABLE=* /FILE='C:\Temp\tmp.sav' /BY ID. execute . * chnage call_date.20 to the max call per subject in your data . VARSTOCASES /MAKE dates FROM call_date.0 to call_date.20 /INDEX = seq(21) /KEEP = ID call_num call_date days_diff r_sum /NULL = KEEP. compute days=datedif(call_date,dates,'days'). select if (days gt 0) and (days le 90) . execute . SORT CASES BY ID call_num . **** if you need the data per call . CASESTOVARS /ID = ID call_num /drop=days_diff r_sum seq /GROUPBY = VARIABLE . Hillel Vardi BGU Tanya Temkin wrote: > Thanks for the advice, Dennis .... I started to follow your suggestion, > and got this far - up to the summing of days from prior call date (this is > different set of test data from one in prior posting): > > (days diff=time in days from previous call; r_sum=running sum of time from > 1st call) > > ID call-num call_date days_diff r_sum > 1000xx 1 29-Dec-200y 0 0 > 1000xx 2 12-Jan-200x 14 14 > 1000xx 3 02-Feb-200x 21 35 > 1000xx 4 11-Feb-200x 9 44 > 1000xx 5 09-Mar-200x 26 70 > 1000xx 6 10-Mar-200x 1 71 > 1000xx 7 02-Apr-200x 23 94 > 1000xx 8 04-Apr-200x 2 96 > 1000xx 9 09-Apr-200x 5 101 > 1000xx 10 14-May-200x 35 136 > 1000xx 11 20-May-200x 6 142 > 1000xx 12 03-Jun-200x 14 156 > 1000xx 13 26-Jun-200x 23 179 > 1000xx 14 27-Jun-200x 1 180 > 1000xx 15 27-Jun-200x 0 180 > 1000xx 16 28-Jun-200x 1 181 > 1000xx 17 29-Jun-200x 1 182 > 1000xx 18 05-Jul-200x 6 188 > 1000xx 19 10-Jul-200x 5 193 > 1000xx 20 12-Aug-200x 33 226 > 1001xx 0 25-Aug-200x 0 0 > 1001xx 1 12-Nov-200x 79 79 > 1002xx 1 06-Feb-200x 0 0 > 1002xx 2 09-Feb-200x 3 3 > 1002xx 3 08-Aug-200x 180 183 > 1002xx 4 09-Aug-200x 1 184 > 1002xx 5 12-Aug-200x 3 187 > 1002xx 6 14-Aug-200x 2 189 > 1002xx 7 15-Aug-200x 1 190 > 1002xx 8 31-Aug-200x 16 206 > 1002xx 9 05-Sep-200x 5 211 > 1002xx 10 03-Nov-200x 59 270 > 1002xx 11 09-Dec-200x 36 306 > 1002xx 12 11-Dec-200x 2 308 > > The wheels started to come off when I got to the first 90+ day value (7th > record) ....the previous dates back to 12 Jan 05 are in the prior 90 day > period. It's not just a matter of incrementing by 1 for each 90 day period > following first call, but checking to see if each prior call is in the > 90-day period preceding the current call. Are you sure I am not going to > have to resort to a vector/loop routine to loop through each call for each > ID? Or am I missing something in your instructions? > > Thanks again.... > > > Tanya Temkin > Research Associate > AACC Reporting > Northern California Regional Office > The Permanente Medical Group > (510) 625-6680 > > NOTICE TO RECIPIENT: If you are not the intended recipient of this > e-mail, you are prohibited from sharing, copying, or otherwise using or > disclosing its contents. If you have received this e-mail in error, > please notify the sender immediately by reply e-mail and permanently > delete this e-mail and any attachments without reading, forwarding or > saving them. Thank you. > > > > > "Dennis Deck" <[hidden email]> > 07/17/2008 12:42 AM > > To > Tanya L TemKin/CA/KAIPERM@KAIPERM > cc > > Subject > RE: computations with dates > > > > > > > That is tricky because the number of prior records to sum across is > unknown and > keeps changing from ID to ID. > > One approach might be something like: > number the records within ID > track the prior comparison record number starting with 1 > sum the days from the prior comparison record and keep a running sum > from prior comparion > if prior comparison is more than 90 days then increment and resum > until days are less than 90 > > Dennis > > -----Original Message----- > From: Tanya Temkin [mailto:[hidden email]] > Sent: Wednesday, July 16, 2008 1:39 PM > Subject: computations with dates > > I have a data set with each patient call made to our call center within > a certain time period; each call is a record. There is a unique patient > ID, call date, and call time (among many other variables). What I'm > trying to do for each patient is count up all calls that preceded each > call within the last 90 days. This is turning out to be more complicated > than I thought. > > PRIOR calls were all initialized to zero, then the **first** record for > each ID was updated with data from another file. > There are no missing values. Cases are sorted by ID CALL_DATE > CALL_TIME. > Some patients made more than one call on the same day. > > At first I thought this would work: > If ID=lag(ID) and CTIME.DAYS(CALL_DATE-lag(CALL_DATE)) <= 90 > prior_calls=lag(prior_calls)+1. > > This is what I got: > > ID CALL_DATE CALL_TIME PRIOR_CALLS > 1 29-DEC-04 16:12 0 > 1 12-JAN-05 11:57 1 > 1 01-FEB-05 14:26 2 > 1 11-FEB-05 11:14 3 > 1 14-FEB-05 18:39 4 > 1 10-MAR-05 15:19 5 > 1 02-APR-05 12:34 6 > 1 02-APR-05 16:16 7 > 1 14-MAY-05 08:19 8 > > Problem here. What would be accurate would be only the number of calls > made in the 90 days prior to each call, i.e. > > ID CALL_DATE CALL_TIME PRIOR_CALLS > 1 29-DEC-04 16:12 0 > 1 12-JAN-05 11:57 1 > 1 01-FEB-05 14:26 2 > 1 11-FEB-05 11:14 3 > 1 14-FEB-05 18:39 4 > 1 10-MAR-05 15:19 5 > 1 02-APR-05 12:34 5 > 1 02-APR-05 16:16 6 > 1 14-MAY-05 08:19 4 > > But I can't figure out the syntax needed to do this - at least with > current data structure. What approach could I use? > > I'm wondering if this is going to entail restructuring my data long to > wide and then applying vector/loop syntax to loop through all the call > dates for each record (not in my comfort zone!) > > Thanks! > > Tanya Temkin > Research Associate > AACC Reporting > Northern California Regional Office > The Permanente Medical Group > (510) 625-6680 > > NOTICE TO RECIPIENT: If you are not the intended recipient of this > e-mail, you are prohibited from sharing, copying, or otherwise using or > disclosing its contents. If you have received this e-mail in error, > please notify the sender immediately by reply e-mail and permanently > delete this e-mail and any attachments without reading, forwarding or > saving them. Thank you. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Marks, Jim
Thanks Jim! I tried this on the test data and it worked. Hillel, I'll hold
on to your solution for use in the future...good to know a number of ways to solve this problem. But please help me understand exactly what I'm doing with some of this syntax.... specifically do repeat v = pr_1 to pr_10 /i = 1 to 10 . if datediff(date,lag(date,i),"days") LE 90 and id = lag(id,i) v = 1. end repeat. I understand the pr_1 to pr_10 creates a set of new variables. But what is the function of "i"? I'm guessing it's some kind of index. But is this somehow looping through all the previous cases for each ID? I thought indexes could just be used in computations across variables, not cases. Anyway, thanks for helping me understand what you helped me do. Tanya Temkin Research Associate AACC Reporting Northern California Regional Office The Permanente Medical Group (510) 625-6680 NOTICE TO RECIPIENT: If you are not the intended recipient of this e-mail, you are prohibited from sharing, copying, or otherwise using or disclosing its contents. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and permanently delete this e-mail and any attachments without reading, forwarding or saving them. Thank you. "Marks, Jim" <[hidden email]> Sent by: "SPSSX(r) Discussion" <[hidden email]> 07/17/2008 06:03 PM Please respond to "Marks, Jim" <[hidden email]> To [hidden email] cc Subject Re: computations with dates Tanya: A brute force solution, based on a solution to a similar problem from Art Kendall: *** SAMPLE DATA. data list free /id (f8.0) date (date10) time (time5). Begin data 1 29-DEC-04 16:12 1 12-JAN-05 11:57 1 01-FEB-05 14:26 1 11-FEB-05 11:14 1 14-FEB-05 18:39 1 10-MAR-05 15:19 1 02-APR-05 12:34 1 02-APR-05 16:16 1 14-MAY-05 08:19 2 14-FEB-05 18:39 2 10-MAR-05 15:19 2 02-APR-06 12:34 2 11-APR-06 16:16 2 14-JUN-06 08:19 end data. *** create variables to flag if the nth lag is within 90 days. *** edit for the maximum number of cases within id. numeric pr_1 to pr_10. *** calculate the flags. do repeat v = pr_1 to pr_10 /i = 1 to 10 . if datediff(date,lag(date,i),"days") LE 90 and id = lag(id,i) v = 1. end repeat. *** create a variable to hold the number of lags. count priors =pr_1 to pr_10 (1). *** cleanup-- drop the flag variables. add files file = * /drop pr_1 to pr_10. *** trigger data pass. LIST. HTH --jim -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Tanya Temkin Sent: Thursday, July 17, 2008 1:54 PM To: [hidden email] Subject: Re: computations with dates Thanks for the advice, Dennis .... I started to follow your suggestion, and got this far - up to the summing of days from prior call date (this is different set of test data from one in prior posting): (days diff=time in days from previous call; r_sum=running sum of time from 1st call) ID call-num call_date days_diff r_sum 1000xx 1 29-Dec-200y 0 0 1000xx 2 12-Jan-200x 14 14 1000xx 3 02-Feb-200x 21 35 1000xx 4 11-Feb-200x 9 44 1000xx 5 09-Mar-200x 26 70 1000xx 6 10-Mar-200x 1 71 1000xx 7 02-Apr-200x 23 94 1000xx 8 04-Apr-200x 2 96 1000xx 9 09-Apr-200x 5 101 1000xx 10 14-May-200x 35 136 1000xx 11 20-May-200x 6 142 1000xx 12 03-Jun-200x 14 156 1000xx 13 26-Jun-200x 23 179 1000xx 14 27-Jun-200x 1 180 1000xx 15 27-Jun-200x 0 180 1000xx 16 28-Jun-200x 1 181 1000xx 17 29-Jun-200x 1 182 1000xx 18 05-Jul-200x 6 188 1000xx 19 10-Jul-200x 5 193 1000xx 20 12-Aug-200x 33 226 1001xx 0 25-Aug-200x 0 0 1001xx 1 12-Nov-200x 79 79 1002xx 1 06-Feb-200x 0 0 1002xx 2 09-Feb-200x 3 3 1002xx 3 08-Aug-200x 180 183 1002xx 4 09-Aug-200x 1 184 1002xx 5 12-Aug-200x 3 187 1002xx 6 14-Aug-200x 2 189 1002xx 7 15-Aug-200x 1 190 1002xx 8 31-Aug-200x 16 206 1002xx 9 05-Sep-200x 5 211 1002xx 10 03-Nov-200x 59 270 1002xx 11 09-Dec-200x 36 306 1002xx 12 11-Dec-200x 2 308 The wheels started to come off when I got to the first 90+ day value (7th record) ....the previous dates back to 12 Jan 05 are in the prior 90 day period. It's not just a matter of incrementing by 1 for each 90 day period following first call, but checking to see if each prior call is in the 90-day period preceding the current call. Are you sure I am not going to have to resort to a vector/loop routine to loop through each call for each ID? Or am I missing something in your instructions? Thanks again.... Tanya Temkin Research Associate AACC Reporting Northern California Regional Office The Permanente Medical Group (510) 625-6680 NOTICE TO RECIPIENT: If you are not the intended recipient of this e-mail, you are prohibited from sharing, copying, or otherwise using or disclosing its contents. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and permanently delete this e-mail and any attachments without reading, forwarding or saving them. Thank you. "Dennis Deck" <[hidden email]> 07/17/2008 12:42 AM To Tanya L TemKin/CA/KAIPERM@KAIPERM cc Subject RE: computations with dates That is tricky because the number of prior records to sum across is unknown and keeps changing from ID to ID. One approach might be something like: number the records within ID track the prior comparison record number starting with 1 sum the days from the prior comparison record and keep a running sum from prior comparion if prior comparison is more than 90 days then increment and resum until days are less than 90 Dennis -----Original Message----- From: Tanya Temkin [mailto:[hidden email]] Sent: Wednesday, July 16, 2008 1:39 PM Subject: computations with dates I have a data set with each patient call made to our call center within a certain time period; each call is a record. There is a unique patient ID, call date, and call time (among many other variables). What I'm trying to do for each patient is count up all calls that preceded each call within the last 90 days. This is turning out to be more complicated than I thought. PRIOR calls were all initialized to zero, then the **first** record for each ID was updated with data from another file. There are no missing values. Cases are sorted by ID CALL_DATE CALL_TIME. Some patients made more than one call on the same day. At first I thought this would work: If ID=lag(ID) and CTIME.DAYS(CALL_DATE-lag(CALL_DATE)) <= 90 prior_calls=lag(prior_calls)+1. This is what I got: ID CALL_DATE CALL_TIME PRIOR_CALLS 1 29-DEC-04 16:12 0 1 12-JAN-05 11:57 1 1 01-FEB-05 14:26 2 1 11-FEB-05 11:14 3 1 14-FEB-05 18:39 4 1 10-MAR-05 15:19 5 1 02-APR-05 12:34 6 1 02-APR-05 16:16 7 1 14-MAY-05 08:19 8 Problem here. What would be accurate would be only the number of calls made in the 90 days prior to each call, i.e. ID CALL_DATE CALL_TIME PRIOR_CALLS 1 29-DEC-04 16:12 0 1 12-JAN-05 11:57 1 1 01-FEB-05 14:26 2 1 11-FEB-05 11:14 3 1 14-FEB-05 18:39 4 1 10-MAR-05 15:19 5 1 02-APR-05 12:34 5 1 02-APR-05 16:16 6 1 14-MAY-05 08:19 4 But I can't figure out the syntax needed to do this - at least with current data structure. What approach could I use? I'm wondering if this is going to entail restructuring my data long to wide and then applying vector/loop syntax to loop through all the call dates for each record (not in my comfort zone!) Thanks! Tanya Temkin Research Associate AACC Reporting Northern California Regional Office The Permanente Medical Group (510) 625-6680 NOTICE TO RECIPIENT: If you are not the intended recipient of this e-mail, you are prohibited from sharing, copying, or otherwise using or disclosing its contents. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and permanently delete this e-mail and any attachments without reading, forwarding or saving them. Thank you. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
DO REPEAT is a function that generates code, without having to type it
out. it takes the code, replaces the indicator and runs it with each item from the list : first pair: if datediff(date,lag(date,1),"days") LE 90 and id = lag(id,1) pr_1 = 1. second pair: if datediff(date,lag(date,2),"days") LE 90 and id = lag(id,2) pr_2 = 1. etc HTH --jim From: [hidden email] [mailto:[hidden email]] Sent: Friday, July 18, 2008 12:36 PM To: [hidden email] Cc: Marks, Jim Subject: Re: computations with dates Thanks Jim! I tried this on the test data and it worked. Hillel, I'll hold on to your solution for use in the future...good to know a number of ways to solve this problem. But please help me understand exactly what I'm doing with some of this syntax.... specifically do repeat v = pr_1 to pr_10 /i = 1 to 10 . if datediff(date,lag(date,i),"days") LE 90 and id = lag(id,i) v = 1. end repeat. I understand the pr_1 to pr_10 creates a set of new variables. But what is the function of "i"? I'm guessing it's some kind of index. But is this somehow looping through all the previous cases for each ID? I thought indexes could just be used in computations across variables, not cases. Anyway, thanks for helping me understand what you helped me do. Tanya Temkin Research Associate AACC Reporting Northern California Regional Office The Permanente Medical Group (510) 625-6680 NOTICE TO RECIPIENT: If you are not the intended recipient of this e-mail, you are prohibited from sharing, copying, or otherwise using or disclosing its contents. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and permanently delete this e-mail and any attachments without reading, forwarding or saving them. Thank you. "Marks, Jim" <[hidden email]> Sent by: "SPSSX(r) Discussion" <[hidden email]> 07/17/2008 06:03 PM Please respond to "Marks, Jim" <[hidden email]> To [hidden email] cc Subject Re: computations with dates Tanya: A brute force solution, based on a solution to a similar problem from Art Kendall: *** SAMPLE DATA. data list free /id (f8.0) date (date10) time (time5). Begin data 1 29-DEC-04 16:12 1 12-JAN-05 11:57 1 01-FEB-05 14:26 1 11-FEB-05 11:14 1 14-FEB-05 18:39 1 10-MAR-05 15:19 1 02-APR-05 12:34 1 02-APR-05 16:16 1 14-MAY-05 08:19 2 14-FEB-05 18:39 2 10-MAR-05 15:19 2 02-APR-06 12:34 2 11-APR-06 16:16 2 14-JUN-06 08:19 end data. *** create variables to flag if the nth lag is within 90 days. *** edit for the maximum number of cases within id. numeric pr_1 to pr_10. *** calculate the flags. do repeat v = pr_1 to pr_10 /i = 1 to 10 . if datediff(date,lag(date,i),"days") LE 90 and id = lag(id,i) v = 1. end repeat. *** create a variable to hold the number of lags. count priors =pr_1 to pr_10 (1). *** cleanup-- drop the flag variables. add files file = * /drop pr_1 to pr_10. *** trigger data pass. LIST. HTH --jim -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Tanya Temkin Sent: Thursday, July 17, 2008 1:54 PM To: [hidden email] Subject: Re: computations with dates Thanks for the advice, Dennis .... I started to follow your suggestion, and got this far - up to the summing of days from prior call date (this is different set of test data from one in prior posting): (days diff=time in days from previous call; r_sum=running sum of time from 1st call) ID call-num call_date days_diff r_sum 1000xx 1 29-Dec-200y 0 0 1000xx 2 12-Jan-200x 14 14 1000xx 3 02-Feb-200x 21 35 1000xx 4 11-Feb-200x 9 44 1000xx 5 09-Mar-200x 26 70 1000xx 6 10-Mar-200x 1 71 1000xx 7 02-Apr-200x 23 94 1000xx 8 04-Apr-200x 2 96 1000xx 9 09-Apr-200x 5 101 1000xx 10 14-May-200x 35 136 1000xx 11 20-May-200x 6 142 1000xx 12 03-Jun-200x 14 156 1000xx 13 26-Jun-200x 23 179 1000xx 14 27-Jun-200x 1 180 1000xx 15 27-Jun-200x 0 180 1000xx 16 28-Jun-200x 1 181 1000xx 17 29-Jun-200x 1 182 1000xx 18 05-Jul-200x 6 188 1000xx 19 10-Jul-200x 5 193 1000xx 20 12-Aug-200x 33 226 1001xx 0 25-Aug-200x 0 0 1001xx 1 12-Nov-200x 79 79 1002xx 1 06-Feb-200x 0 0 1002xx 2 09-Feb-200x 3 3 1002xx 3 08-Aug-200x 180 183 1002xx 4 09-Aug-200x 1 184 1002xx 5 12-Aug-200x 3 187 1002xx 6 14-Aug-200x 2 189 1002xx 7 15-Aug-200x 1 190 1002xx 8 31-Aug-200x 16 206 1002xx 9 05-Sep-200x 5 211 1002xx 10 03-Nov-200x 59 270 1002xx 11 09-Dec-200x 36 306 1002xx 12 11-Dec-200x 2 308 The wheels started to come off when I got to the first 90+ day value (7th record) ....the previous dates back to 12 Jan 05 are in the prior 90 day period. It's not just a matter of incrementing by 1 for each 90 day period following first call, but checking to see if each prior call is in the 90-day period preceding the current call. Are you sure I am not going to have to resort to a vector/loop routine to loop through each call for each ID? Or am I missing something in your instructions? Thanks again.... Tanya Temkin Research Associate AACC Reporting Northern California Regional Office The Permanente Medical Group (510) 625-6680 NOTICE TO RECIPIENT: If you are not the intended recipient of this e-mail, you are prohibited from sharing, copying, or otherwise using or disclosing its contents. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and permanently delete this e-mail and any attachments without reading, forwarding or saving them. Thank you. "Dennis Deck" <[hidden email]> 07/17/2008 12:42 AM To Tanya L TemKin/CA/KAIPERM@KAIPERM cc Subject RE: computations with dates That is tricky because the number of prior records to sum across is unknown and keeps changing from ID to ID. One approach might be something like: number the records within ID track the prior comparison record number starting with 1 sum the days from the prior comparison record and keep a running sum from prior comparion if prior comparison is more than 90 days then increment and resum until days are less than 90 Dennis -----Original Message----- From: Tanya Temkin [mailto:[hidden email]] Sent: Wednesday, July 16, 2008 1:39 PM Subject: computations with dates I have a data set with each patient call made to our call center within a certain time period; each call is a record. There is a unique patient ID, call date, and call time (among many other variables). What I'm trying to do for each patient is count up all calls that preceded each call within the last 90 days. This is turning out to be more complicated than I thought. PRIOR calls were all initialized to zero, then the **first** record for each ID was updated with data from another file. There are no missing values. Cases are sorted by ID CALL_DATE CALL_TIME. Some patients made more than one call on the same day. At first I thought this would work: If ID=lag(ID) and CTIME.DAYS(CALL_DATE-lag(CALL_DATE)) <= 90 prior_calls=lag(prior_calls)+1. This is what I got: ID CALL_DATE CALL_TIME PRIOR_CALLS 1 29-DEC-04 16:12 0 1 12-JAN-05 11:57 1 1 01-FEB-05 14:26 2 1 11-FEB-05 11:14 3 1 14-FEB-05 18:39 4 1 10-MAR-05 15:19 5 1 02-APR-05 12:34 6 1 02-APR-05 16:16 7 1 14-MAY-05 08:19 8 Problem here. What would be accurate would be only the number of calls made in the 90 days prior to each call, i.e. ID CALL_DATE CALL_TIME PRIOR_CALLS 1 29-DEC-04 16:12 0 1 12-JAN-05 11:57 1 1 01-FEB-05 14:26 2 1 11-FEB-05 11:14 3 1 14-FEB-05 18:39 4 1 10-MAR-05 15:19 5 1 02-APR-05 12:34 5 1 02-APR-05 16:16 6 1 14-MAY-05 08:19 4 But I can't figure out the syntax needed to do this - at least with current data structure. What approach could I use? I'm wondering if this is going to entail restructuring my data long to wide and then applying vector/loop syntax to loop through all the call dates for each record (not in my comfort zone!) Thanks! Tanya Temkin Research Associate AACC Reporting Northern California Regional Office The Permanente Medical Group (510) 625-6680 NOTICE TO RECIPIENT: If you are not the intended recipient of this e-mail, you are prohibited from sharing, copying, or otherwise using or disclosing its contents. If you have received this e-mail in error, please notify the sender immediately by reply e-mail and permanently delete this e-mail and any attachments without reading, forwarding or saving them. Thank you. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ====================To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
