How to delete user-defined duplicate cases
Posted by
Ryan on
Jun 15, 2012; 12:17pm
URL: http://spssx-discussion.165.s1.nabble.com/How-to-delete-user-defined-duplicate-cases-tp5713684.html
Hello:
Let me start with an illustration of my dataset:
ID Date
1G3k4 11/11/2009
1G3k4 12/06/2009
1G3k4 12/15/2009
1G3k4 12/19/2009
1G3k4 02/22/2010
5TRJ1 11/10/2009
RQR12 11/10/2009
.
.
.
The variable type of "ID" is STRING and the variable type of "Date" is DATE with a format mm/dd/yyyy.
I would like to remove "subsequent" cases that are within a 30-day period of the first case associated with a particular ID in a given time period. Let's take "1G3k4"...I would like the case entered on 11/11/2009 (first time this ID appears in the dataset) to be retained, but the case entered on 12/06/2009 should be deleted. Now that we've entered a new 30-day period for that same ID (on 12/15/2009), the first case entered on 12/15/2009 should be retained but the following case associated with that same ID on 12/19/2009 sholud be deleted. The next case associated with ID "1G3k4" does not appear until 02/22/2010 and there are no subsequent cases within a 30-day period associated with that ID, so that case is retained and nothing else needs to be done. Same goes with the other two example IDs provided. Hope this step-by-step illustration is not convoluted.
I've been playing about with the lag function, but I can't seem to get it to do what I want. Any thoughts would be most appreciated. Apologies if I've asked this question in the past.
Thanks,
Ryan