How to delete user-defined duplicate cases

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

How to delete user-defined duplicate cases

Ryan
Hello:
 
Let me start with an illustration of my dataset:

ID      Date
1G3k4  11/11/2009
1G3k4  12/06/2009
1G3k4  12/15/2009
1G3k4  12/19/2009
1G3k4  02/22/2010
5TRJ1  11/10/2009
RQR12  11/10/2009
.
.
.
 
The variable type of "ID" is STRING and the variable type of "Date" is DATE with a format mm/dd/yyyy.
 
I would like to remove "subsequent" cases that are within a 30-day period of the first case associated with a particular ID in a given time period. Let's take "1G3k4"...I would like the case entered on 11/11/2009 (first time this ID appears in the dataset) to be retained, but the case entered on 12/06/2009 should be deleted. Now that we've entered a new 30-day period for that same ID (on 12/15/2009), the first case entered on 12/15/2009 should be retained but the following case associated with that same ID on 12/19/2009 sholud be deleted. The next case associated with ID "1G3k4" does not appear until 02/22/2010 and there are no subsequent cases within a 30-day period associated with that ID, so that case is retained and nothing else needs to be done. Same goes with the other two example IDs provided. Hope this step-by-step illustration is not convoluted.
 
I've been playing about with the lag function, but I can't seem to get it to do what I want. Any thoughts would be most appreciated. Apologies if I've asked this question in the past.
 
Thanks,
 
Ryan
Reply | Threaded
Open this post in threaded view
|

Re: How to delete user-defined duplicate cases

Maguin, Eugene

Ryan,

 

I think this will work. I supplemented your example data to add a different an exact 30 condition

 

Gene Maguin

 

data list list / ID(a5) Date(adate10).

begin data

1G3k4  11/11/2009

1G3k4  12/06/2009

1G3k4  12/15/2009

1G3k4  12/19/2009

1G3k4  02/22/2010

5TRJ1  11/10/2009

5TRJ1  11/15/2009

5TRJ1  11/30/2009

5TRJ1  12/10/2009

5TRJ1  12/13/2009

5TRJ1  01/10/2010

5TRJ1  02/13/2010

end data.

format data(adate10).

 

do if ($casenum eq 1 or id ne lag(id)).

+  compute #d=date.

+  compute pick=1.

else.

+  do if (datesum(#d,30,'days','rollover') ge date).

+      compute pick=0.

+  else.

+      compute pick=1.

+      compute #d=date.

+  end if.

end if.

execute.

 

 

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of R B
Sent: Friday, June 15, 2012 8:17 AM
To: [hidden email]
Subject: How to delete user-defined duplicate cases

 

Hello:

 

Let me start with an illustration of my dataset:

ID      Date
1G3k4  11/11/2009
1G3k4  12/06/2009
1G3k4  12/15/2009
1G3k4  12/19/2009
1G3k4  02/22/2010
5TRJ1  11/10/2009
RQR12  11/10/2009
.
.
.

 

The variable type of "ID" is STRING and the variable type of "Date" is DATE with a format mm/dd/yyyy.

 

I would like to remove "subsequent" cases that are within a 30-day period of the first case associated with a particular ID in a given time period. Let's take "1G3k4"...I would like the case entered on 11/11/2009 (first time this ID appears in the dataset) to be retained, but the case entered on 12/06/2009 should be deleted. Now that we've entered a new 30-day period for that same ID (on 12/15/2009), the first case entered on 12/15/2009 should be retained but the following case associated with that same ID on 12/19/2009 sholud be deleted. The next case associated with ID "1G3k4" does not appear until 02/22/2010 and there are no subsequent cases within a 30-day period associated with that ID, so that case is retained and nothing else needs to be done. Same goes with the other two example IDs provided. Hope this step-by-step illustration is not convoluted.

 

I've been playing about with the lag function, but I can't seem to get it to do what I want. Any thoughts would be most appreciated. Apologies if I've asked this question in the past.

 

Thanks,

 

Ryan

Reply | Threaded
Open this post in threaded view
|

Re: How to delete user-defined duplicate cases

David Marso
Administrator
In reply to this post by Ryan
Hi Ryan,
Try this.
DATA LIST LIST /ID (A5) Date (ADATE).
BEGIN DATA
1G3k4  11/11/2009
1G3k4  12/06/2009
1G3k4  12/15/2009
1G3k4  12/19/2009
1G3k4  02/22/2010
5TRJ1  11/10/2009
RQR12  11/10/2009
END DATA.
LIST.
DO IF $CASENUM=1 OR ID NE LAG(ID).
+  COMPUTE #DATE=DATE.
+  COMPUTE RETAIN=1.
ELSE.
+  DO IF CTIME.DAYS(DATE-#DATE) LT 30.
+    COMPUTE RETAIN=0.
+  ELSE.
+    COMPUTE RETAIN=1.
+    COMPUTE #DATE=DATE.
+  END IF.
END IF.
EXE.
LIST.
R B wrote
Hello:

Let me start with an illustration of my dataset:

ID      Date
1G3k4  11/11/2009
1G3k4  12/06/2009
1G3k4  12/15/2009
1G3k4  12/19/2009
1G3k4  02/22/2010
5TRJ1  11/10/2009
RQR12  11/10/2009
.
.
.

The variable type of "ID" is STRING and the variable type of "Date" is DATE
with a format mm/dd/yyyy.

I would like to remove "subsequent" cases that are within a 30-day period
of the first case associated with a particular ID in a given time period.
Let's take "1G3k4"...I would like the case entered on 11/11/2009 (first
time this ID appears in the dataset) to be retained, but the case entered
on 12/06/2009 should be deleted. Now that we've entered a new 30-day period
for that same ID (on 12/15/2009), the first case entered on 12/15/2009
should be retained but the following case associated with that same ID on
12/19/2009 sholud be deleted. The next case associated with ID "1G3k4" does
not appear until 02/22/2010 and there are no subsequent cases within a
30-day period associated with that ID, so that case is retained and nothing
else needs to be done. Same goes with the other two example IDs provided.
Hope this step-by-step illustration is not convoluted.

I've been playing about with the lag function, but I can't seem to get it
to do what I want. Any thoughts would be most appreciated. Apologies if
I've asked this question in the past.

Thanks,

Ryan
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: How to delete user-defined duplicate cases

Ryan
Thank you, David and Gene. Will try out both sets of code. Much appreciated.
 
Best,
 
Ryan

On Fri, Jun 15, 2012 at 10:15 AM, David Marso <[hidden email]> wrote:
Hi Ryan,
Try this.
DATA LIST LIST /ID (A5) Date (ADATE).
BEGIN DATA
1G3k4  11/11/2009
1G3k4  12/06/2009
1G3k4  12/15/2009
1G3k4  12/19/2009
1G3k4  02/22/2010
5TRJ1  11/10/2009
RQR12  11/10/2009
END DATA.
LIST.
DO IF $CASENUM=1 OR ID NE LAG(ID).
+  COMPUTE #DATE=DATE.
+  COMPUTE RETAIN=1.
ELSE.
+  DO IF CTIME.DAYS(DATE-#DATE) LT 30.
+    COMPUTE RETAIN=0.
+  ELSE.
+    COMPUTE RETAIN=1.
+    COMPUTE #DATE=DATE.
+  END IF.
END IF.
EXE.
LIST.

R B wrote
>
> Hello:
>
> Let me start with an illustration of my dataset:
>
> ID      Date
> 1G3k4  11/11/2009
> 1G3k4  12/06/2009
> 1G3k4  12/15/2009
> 1G3k4  12/19/2009
> 1G3k4  02/22/2010
> 5TRJ1  11/10/2009
> RQR12  11/10/2009
> .
> .
> .
>
> The variable type of "ID" is STRING and the variable type of "Date" is
> DATE
> with a format mm/dd/yyyy.
>
> I would like to remove "subsequent" cases that are within a 30-day period
> of the first case associated with a particular ID in a given time period.
> Let's take "1G3k4"...I would like the case entered on 11/11/2009 (first
> time this ID appears in the dataset) to be retained, but the case entered
> on 12/06/2009 should be deleted. Now that we've entered a new 30-day
> period
> for that same ID (on 12/15/2009), the first case entered on 12/15/2009
> should be retained but the following case associated with that same ID on
> 12/19/2009 sholud be deleted. The next case associated with ID "1G3k4"
> does
> not appear until 02/22/2010 and there are no subsequent cases within a
> 30-day period associated with that ID, so that case is retained and
> nothing
> else needs to be done. Same goes with the other two example IDs provided.
> Hope this step-by-step illustration is not convoluted.
>
> I've been playing about with the lag function, but I can't seem to get it
> to do what I want. Any thoughts would be most appreciated. Apologies if
> I've asked this question in the past.
>
> Thanks,
>
> Ryan
>


--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/How-to-delete-user-defined-duplicate-cases-tp5713684p5713687.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: How to delete user-defined duplicate cases

David Marso
Administrator
Or how about a s(L)ick 2 liner ;-)
--
COMPUTE #DATE=MAX(ANY(1,$CASENUM,ID NE LAG(ID),
                                              CTIME.DAYS(DATE-#DATE) GT 30)*DATE,#DATE).
COMPUTE RETAIN=DATE=#DATE.
EXE.
R B wrote
Thank you, David and Gene. Will try out both sets of code. Much appreciated.

Best,

Ryan

On Fri, Jun 15, 2012 at 10:15 AM, David Marso <[hidden email]> wrote:

> Hi Ryan,
> Try this.
> DATA LIST LIST /ID (A5) Date (ADATE).
> BEGIN DATA
> 1G3k4  11/11/2009
> 1G3k4  12/06/2009
> 1G3k4  12/15/2009
> 1G3k4  12/19/2009
> 1G3k4  02/22/2010
> 5TRJ1  11/10/2009
> RQR12  11/10/2009
> END DATA.
> LIST.
> DO IF $CASENUM=1 OR ID NE LAG(ID).
> +  COMPUTE #DATE=DATE.
> +  COMPUTE RETAIN=1.
> ELSE.
> +  DO IF CTIME.DAYS(DATE-#DATE) LT 30.
> +    COMPUTE RETAIN=0.
> +  ELSE.
> +    COMPUTE RETAIN=1.
> +    COMPUTE #DATE=DATE.
> +  END IF.
> END IF.
> EXE.
> LIST.
>
> R B wrote
> >
> > Hello:
> >
> > Let me start with an illustration of my dataset:
> >
> > ID      Date
> > 1G3k4  11/11/2009
> > 1G3k4  12/06/2009
> > 1G3k4  12/15/2009
> > 1G3k4  12/19/2009
> > 1G3k4  02/22/2010
> > 5TRJ1  11/10/2009
> > RQR12  11/10/2009
> > .
> > .
> > .
> >
> > The variable type of "ID" is STRING and the variable type of "Date" is
> > DATE
> > with a format mm/dd/yyyy.
> >
> > I would like to remove "subsequent" cases that are within a 30-day period
> > of the first case associated with a particular ID in a given time period.
> > Let's take "1G3k4"...I would like the case entered on 11/11/2009 (first
> > time this ID appears in the dataset) to be retained, but the case entered
> > on 12/06/2009 should be deleted. Now that we've entered a new 30-day
> > period
> > for that same ID (on 12/15/2009), the first case entered on 12/15/2009
> > should be retained but the following case associated with that same ID on
> > 12/19/2009 sholud be deleted. The next case associated with ID "1G3k4"
> > does
> > not appear until 02/22/2010 and there are no subsequent cases within a
> > 30-day period associated with that ID, so that case is retained and
> > nothing
> > else needs to be done. Same goes with the other two example IDs provided.
> > Hope this step-by-step illustration is not convoluted.
> >
> > I've been playing about with the lag function, but I can't seem to get it
> > to do what I want. Any thoughts would be most appreciated. Apologies if
> > I've asked this question in the past.
> >
> > Thanks,
> >
> > Ryan
> >
>
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/How-to-delete-user-defined-duplicate-cases-tp5713684p5713687.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: How to delete user-defined duplicate cases

David Marso
Administrator
OOPS!
Better:
COMPUTE #DATE=MAX(ANY(1,$CASENUM,ID NE LAG(ID),
                                   CTIME.DAYS(DATE-#DATE) GT 30)*DATE,#DATE).
COMPUTE RETAIN=DATE EQ #DATE OR $CASENUM EQ 1 OR ID NE LAG(ID).

David Marso wrote
Or how about a s(L)ick 2 liner ;-)
--
COMPUTE #DATE=MAX(ANY(1,$CASENUM,ID NE LAG(ID),
                                              CTIME.DAYS(DATE-#DATE) GT 30)*DATE,#DATE).
COMPUTE RETAIN=DATE=#DATE.
EXE.
R B wrote
Thank you, David and Gene. Will try out both sets of code. Much appreciated.

Best,

Ryan

On Fri, Jun 15, 2012 at 10:15 AM, David Marso <[hidden email]> wrote:

> Hi Ryan,
> Try this.
> DATA LIST LIST /ID (A5) Date (ADATE).
> BEGIN DATA
> 1G3k4  11/11/2009
> 1G3k4  12/06/2009
> 1G3k4  12/15/2009
> 1G3k4  12/19/2009
> 1G3k4  02/22/2010
> 5TRJ1  11/10/2009
> RQR12  11/10/2009
> END DATA.
> LIST.
> DO IF $CASENUM=1 OR ID NE LAG(ID).
> +  COMPUTE #DATE=DATE.
> +  COMPUTE RETAIN=1.
> ELSE.
> +  DO IF CTIME.DAYS(DATE-#DATE) LT 30.
> +    COMPUTE RETAIN=0.
> +  ELSE.
> +    COMPUTE RETAIN=1.
> +    COMPUTE #DATE=DATE.
> +  END IF.
> END IF.
> EXE.
> LIST.
>
> R B wrote
> >
> > Hello:
> >
> > Let me start with an illustration of my dataset:
> >
> > ID      Date
> > 1G3k4  11/11/2009
> > 1G3k4  12/06/2009
> > 1G3k4  12/15/2009
> > 1G3k4  12/19/2009
> > 1G3k4  02/22/2010
> > 5TRJ1  11/10/2009
> > RQR12  11/10/2009
> > .
> > .
> > .
> >
> > The variable type of "ID" is STRING and the variable type of "Date" is
> > DATE
> > with a format mm/dd/yyyy.
> >
> > I would like to remove "subsequent" cases that are within a 30-day period
> > of the first case associated with a particular ID in a given time period.
> > Let's take "1G3k4"...I would like the case entered on 11/11/2009 (first
> > time this ID appears in the dataset) to be retained, but the case entered
> > on 12/06/2009 should be deleted. Now that we've entered a new 30-day
> > period
> > for that same ID (on 12/15/2009), the first case entered on 12/15/2009
> > should be retained but the following case associated with that same ID on
> > 12/19/2009 sholud be deleted. The next case associated with ID "1G3k4"
> > does
> > not appear until 02/22/2010 and there are no subsequent cases within a
> > 30-day period associated with that ID, so that case is retained and
> > nothing
> > else needs to be done. Same goes with the other two example IDs provided.
> > Hope this step-by-step illustration is not convoluted.
> >
> > I've been playing about with the lag function, but I can't seem to get it
> > to do what I want. Any thoughts would be most appreciated. Apologies if
> > I've asked this question in the past.
> >
> > Thanks,
> >
> > Ryan
> >
>
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/How-to-delete-user-defined-duplicate-cases-tp5713684p5713687.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"