Selecting unique records from dataset with duplicates

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Selecting unique records from dataset with duplicates

Nandini Rao
I have a large data set (750,000+ records).  I have multiple records with
the same ID but different payment dates.  I want to create a subset of
unique records with the last payment date.  I appreciate any suggestions on
how I could do this.

I am using SPSS 12.0 on a Windows XP platform.

Thanks for your help,
nan
Reply | Threaded
Open this post in threaded view
|

Re: Selecting unique records from dataset with duplicates

Maguin, Eugene
Nandini,

I think the aggregate procedure would work best here. Read up on it in the
syntax reference.

Sort cases by id paymentdate.
Aggregate outfile=*/presorted/break=id/paymentdate=last(paymentdate).

Gene Maguin
Reply | Threaded
Open this post in threaded view
|

Re: Selecting unique records from dataset with duplicates

Mahbub Khandoker
In reply to this post by Nandini Rao
Hi Nandini,
You can use lag function to get your desire subset of unique records. The procedure is provided below.

DATA LIST free/ idp(a2) visit(date).
BEGIN DATA
1 1/1/1990
1 7/2/1990
1 7/3/1999
2 1/4/1999
2 1/5/1999
2 1/7/1999
2 1/5/2000
2 1/5/2001
3 1/2/2000
3 1/7/2000
3 1/7/2003
3 1/7/2004
END DATA.
FORMATS visit (ADATE11).
EXECUTE .

SORT CASES BY
  idp (A) visit (D) .
compute Visitn = 1.
if (idp = lag(idp, 1))  Visitn = lag(Visitn, 1)+1.
Exe.

FILTER OFF.
USE ALL.
SELECT IF(visitn = 1).
EXECUTE .

Cheers!
Mahbub




 -----Original Message-----
From:   SPSSX(r) Discussion [mailto:[hidden email]]  On Behalf Of Nandini Rao
Sent:   8-Nov-06 7:58 AM
To:     [hidden email]
Subject:             Selecting unique records from dataset with duplicates

I have a large data set (750,000+ records).  I have multiple records with
the same ID but different payment dates.  I want to create a subset of
unique records with the last payment date.  I appreciate any suggestions on
how I could do this.

I am using SPSS 12.0 on a Windows XP platform.

Thanks for your help,
nan