SPSSX Discussion

Keeping unique records

Classic

List

Threaded

4 messages Options

Alina Sheyman-3

Keeping unique records

In my dataset I have muliple instances of a single id, with one of five
possible codes assisgned to each instance of that id. I want to keep only
unique instances of each code,which would result in five records for each
id. Is there a way to do this in SPSS?

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Maguin, Eugene

Re: Keeping unique records

Alina,

>>In my dataset I have muliple instances of a single id, with one of five
possible codes assisgned to each instance of that id. I want to keep only
unique instances of each code,which would result in five records for each
id.

Yes, I'm pretty sure it's not too hard but your message, especially the part
about '... one of five possible codes assisgned to each instance of that
id.' is kind of cryptic. I'll assume you have some other mechanism for doing
the code assignment. Anyway, let ID be your id variable and Code be your
code variable.

Sort cases by ID Code.

Compute RecID=1.
Compute Dup=0.
Do if (ID eq lag(ID)).
+ Do if (Code eq lag(Code)).
+ Compute RecID=lag(RecID).
+ Compute Dup=1.
+ Else.
+ Compute RecID=lag(RecID)+1.
+ End if.
End if.

Select if (Dup eq 0).

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Melissa Ives

Re: Keeping unique records

In reply to this post by Alina Sheyman-3

Alternatively, you could aggregate on RecID and Code to get unique
records of the R/C combination as well as a count of the number of
records of each.

Melissa

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Gene Maguin
Sent: Monday, December 10, 2007 4:35 PM
To: [hidden email]
Subject: Re: [SPSSX-L] Keeping unique records

Alina,

>>In my dataset I have muliple instances of a single id, with one of
>>five
possible codes assisgned to each instance of that id. I want to keep
only unique instances of each code,which would result in five records
for each id.

Yes, I'm pretty sure it's not too hard but your message, especially the
part about '... one of five possible codes assisgned to each instance of
that id.' is kind of cryptic. I'll assume you have some other mechanism
for doing the code assignment. Anyway, let ID be your id variable and
Code be your code variable.

Sort cases by ID Code.

Compute RecID=1.
Compute Dup=0.
Do if (ID eq lag(ID)).
+ Do if (Code eq lag(Code)).
+ Compute RecID=lag(RecID).
+ Compute Dup=1.
+ Else.
+ Compute RecID=lag(RecID)+1.
+ End if.
End if.

Select if (Dup eq 0).

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list
of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Richard Ristow

Re: Keeping unique records

In reply to this post by Alina Sheyman-3

At 04:56 PM 12/10/2007, Alina Sheyman wrote:

>In my dataset I have multiple instances of a single id, with one of
>five possible codes assigned to each instance of that id. I want to
>keep only unique instances of each code,which would result in five
>records for each id. Is there a way to do this in SPSS?

That is (on Gene's and Melissa's understanding, which is also mine),
you may have multiple records for some combinations of id and code,
and you want to keep only one for each combination.

As they both wrote, it isn't too hard. I haven't looked hard at
Gene's code, nor tested it, but it looks perfectly workable to me.

That's the programming side. On the analysis side, always be careful
when doing this. Ask yourself,

Where did those 'duplicates' come from? Are they always identical on
all variables? If not, do they actually represent separate events, or
something, and keeping only one is throwing away data - and WHICH
one? Similarly, if they represent different copies of what should be
the same information, what copy is most reliable - the first one by
time of entry, the last one, or what?

Sometimes (as an example - do NOT leap on it as the solution for
you), all the records, though with different values, may be
representative of that combination of id and code; in that case,
averaging using AGGREGATE may be more appropriate than selecting one
of the records.

-With best wishes,
Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD