SPSSX Discussion

Selecting unique records from dataset with duplicates

Classic

List

Threaded

3 messages Options

Nandini Rao

Selecting unique records from dataset with duplicates

I have a large data set (750,000+ records). I have multiple records with
the same ID but different payment dates. I want to create a subset of
unique records with the last payment date. I appreciate any suggestions on
how I could do this.

I am using SPSS 12.0 on a Windows XP platform.

Thanks for your help,
nan

Maguin, Eugene

Re: Selecting unique records from dataset with duplicates

Nandini,

I think the aggregate procedure would work best here. Read up on it in the
syntax reference.

Sort cases by id paymentdate.
Aggregate outfile=*/presorted/break=id/paymentdate=last(paymentdate).

Gene Maguin

Mahbub Khandoker

Re: Selecting unique records from dataset with duplicates

In reply to this post by Nandini Rao

Hi Nandini,
You can use lag function to get your desire subset of unique records. The procedure is provided below.

DATA LIST free/ idp(a2) visit(date).
BEGIN DATA
1 1/1/1990
1 7/2/1990
1 7/3/1999
2 1/4/1999
2 1/5/1999
2 1/7/1999
2 1/5/2000
2 1/5/2001
3 1/2/2000
3 1/7/2000
3 1/7/2003
3 1/7/2004
END DATA.
FORMATS visit (ADATE11).
EXECUTE .

SORT CASES BY
idp (A) visit (D) .
compute Visitn = 1.
if (idp = lag(idp, 1)) Visitn = lag(Visitn, 1)+1.
Exe.

FILTER OFF.
USE ALL.
SELECT IF(visitn = 1).
EXECUTE .

Cheers!
Mahbub

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Nandini Rao
Sent: 8-Nov-06 7:58 AM
To: [hidden email]
Subject: Selecting unique records from dataset with duplicates

I have a large data set (750,000+ records). I have multiple records with
the same ID but different payment dates. I want to create a subset of
unique records with the last payment date. I appreciate any suggestions on
how I could do this.

I am using SPSS 12.0 on a Windows XP platform.

Thanks for your help,
nan