Hello: Let me start with an illustration of my dataset: ID Date 1G3k4 11/11/2009 1G3k4 12/06/2009 1G3k4 12/15/2009 1G3k4 12/19/2009 1G3k4 02/22/2010 5TRJ1 11/10/2009 RQR12 11/10/2009 . . . The variable type of "ID" is STRING and the variable type of "Date" is DATE with a format mm/dd/yyyy. I would like to remove "subsequent" cases that are within a 30-day period of the first case associated with a particular ID in a given time period. Let's take "1G3k4"...I would like the case entered on 11/11/2009 (first time this ID appears in the dataset) to be retained, but the case entered on 12/06/2009 should be deleted. Now that we've entered a new 30-day period for that same ID (on 12/15/2009), the first case entered on 12/15/2009 should be retained but the following case associated with that same ID on 12/19/2009 sholud be deleted. The next case associated with ID "1G3k4" does not appear until 02/22/2010 and there are no subsequent cases within a 30-day period associated with that ID, so that case is retained and nothing else needs to be done. Same goes with the other two example IDs provided. Hope this step-by-step illustration is not convoluted.
I've been playing about with the lag function, but I can't seem to get it to do what I want. Any thoughts would be most appreciated. Apologies if I've asked this question in the past. Thanks, Ryan
|
Ryan, I think this will work. I supplemented your example data to add a different an exact 30 condition Gene Maguin data list list / ID(a5) Date(adate10). begin data 1G3k4 11/11/2009 1G3k4 12/06/2009 1G3k4 12/15/2009 1G3k4 12/19/2009 1G3k4 02/22/2010 5TRJ1 11/10/2009 5TRJ1 11/15/2009 5TRJ1 11/30/2009 5TRJ1 12/10/2009 5TRJ1 12/13/2009 5TRJ1 01/10/2010 5TRJ1 02/13/2010 end data. format data(adate10). do if ($casenum eq 1 or id ne lag(id)). + compute #d=date. + compute pick=1. else. + do if (datesum(#d,30,'days','rollover') ge date). + compute pick=0. + else. + compute pick=1. + compute #d=date. + end if. end if. execute. From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of R B Hello: Let me start with an illustration of my dataset: The variable type of "ID" is STRING and the variable type of "Date" is DATE with a format mm/dd/yyyy. I would like to remove "subsequent" cases that are within a 30-day period of the first case associated with a particular ID in a given time period. Let's take "1G3k4"...I would like the case entered on 11/11/2009 (first time this ID appears in the dataset) to be retained, but the case entered on 12/06/2009 should be deleted. Now that we've entered a new 30-day period for that same ID (on 12/15/2009), the first case entered on 12/15/2009 should be retained but the following case associated with that same ID on 12/19/2009 sholud be deleted. The next case associated with ID "1G3k4" does not appear until 02/22/2010 and there are no subsequent cases within a 30-day period associated with that ID, so that case is retained and nothing else needs to be done. Same goes with the other two example IDs provided. Hope this step-by-step illustration is not convoluted. I've been playing about with the lag function, but I can't seem to get it to do what I want. Any thoughts would be most appreciated. Apologies if I've asked this question in the past. Thanks, Ryan |
Administrator
|
In reply to this post by Ryan
Hi Ryan,
Try this. DATA LIST LIST /ID (A5) Date (ADATE). BEGIN DATA 1G3k4 11/11/2009 1G3k4 12/06/2009 1G3k4 12/15/2009 1G3k4 12/19/2009 1G3k4 02/22/2010 5TRJ1 11/10/2009 RQR12 11/10/2009 END DATA. LIST. DO IF $CASENUM=1 OR ID NE LAG(ID). + COMPUTE #DATE=DATE. + COMPUTE RETAIN=1. ELSE. + DO IF CTIME.DAYS(DATE-#DATE) LT 30. + COMPUTE RETAIN=0. + ELSE. + COMPUTE RETAIN=1. + COMPUTE #DATE=DATE. + END IF. END IF. EXE. LIST.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Thank you, David and Gene. Will try out both sets of code. Much appreciated. Best, Ryan On Fri, Jun 15, 2012 at 10:15 AM, David Marso <[hidden email]> wrote: Hi Ryan, |
Administrator
|
Or how about a s(L)ick 2 liner ;-)
-- COMPUTE #DATE=MAX(ANY(1,$CASENUM,ID NE LAG(ID), CTIME.DAYS(DATE-#DATE) GT 30)*DATE,#DATE). COMPUTE RETAIN=DATE=#DATE. EXE.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
OOPS!
Better: COMPUTE #DATE=MAX(ANY(1,$CASENUM,ID NE LAG(ID), CTIME.DAYS(DATE-#DATE) GT 30)*DATE,#DATE). COMPUTE RETAIN=DATE EQ #DATE OR $CASENUM EQ 1 OR ID NE LAG(ID).
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Free forum by Nabble | Edit this page |