removing duplicates in spss

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

removing duplicates in spss

jimjohn
can someone help me with this. for three variables (out of many variables that i have in my data set), i only want to keep every unique combination of these three variables once. so i want to remove any duplicates (not duplicates of my data set but duplicates where these three variables have the same result). thx.
Reply | Threaded
Open this post in threaded view
|

Re: removing duplicates in spss

Mark Palmberg
"i only want to keep every unique combination of
these three variables once."

What about doing a SELECT IF for cases where the three variables are equal
(or the same) and then removing the dupes?  That's probably what I'd try in
the absence of any better syntax.

Mark

On Tue, Mar 18, 2008 at 9:17 AM, jimjohn <[hidden email]> wrote:

> can someone help me with this. for three variables (out of many variables
> that i have in my data set), i only want to keep every unique combination
> of
> these three variables once. so i want to remove any duplicates (not
> duplicates of my data set but duplicates where these three variables have
> the same result). thx.
> --
> View this message in context:
> http://www.nabble.com/removing-duplicates-in-spss-tp16121958p16121958.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: removing duplicates in spss

Oliver, Richard
Something like this will keep the first occurrence of each unique combination (you could use the LAST keyword to select the last occurrence):

SORT CASES BY var1 var2 var3.
MATCH FILES
  /FILE=*
  /BY var1 var2 var3
  /FIRST=First.
SELECT IF First=1.

The only drawback is that the file must be sorted by the variables of interest, so "first" or "last" is only relative to the sorted order.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Mark Palmberg
Sent: Tuesday, March 18, 2008 9:31 AM
To: [hidden email]
Subject: Re: removing duplicates in spss

"i only want to keep every unique combination of
these three variables once."

What about doing a SELECT IF for cases where the three variables are equal
(or the same) and then removing the dupes?  That's probably what I'd try in
the absence of any better syntax.

Mark

On Tue, Mar 18, 2008 at 9:17 AM, jimjohn <[hidden email]> wrote:

> can someone help me with this. for three variables (out of many variables
> that i have in my data set), i only want to keep every unique combination
> of
> these three variables once. so i want to remove any duplicates (not
> duplicates of my data set but duplicates where these three variables have
> the same result). thx.
> --
> View this message in context:
> http://www.nabble.com/removing-duplicates-in-spss-tp16121958p16121958.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: removing duplicates in spss

Richard Ristow
In reply to this post by jimjohn
At 10:17 AM 3/18/2008, jimjohn wrote:

>can someone help me with this. for three variables (out of many variables
>that i have in my data set), i only want to keep every unique
>combination of these three variables once.

See Richard Oliver's logic for *how* to do it. (There are similar
alternatives based on LAG, but with a multi-variable key, "/FIRST]="
or "/LAST=" logic is easier.

Now, *whether* to do it. Unless you're absolutely sure that the other
values are all the same whenever the three key variables are the same
(and it looks like you aren't), you're throwing away information. AND
you're creating an output dataset that isn't determined by the input:
that is, *which* record you keep isn't defined, but depends on the
initial sort order of the records.

Dangerous. Make sure you have a better justification than that it's
inconvenient to have duplicate key sets.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: removing duplicates in spss

Peck, Jon
In reply to this post by jimjohn
Although syntax for this has been posted, building syntax for this problem is what the Data>Identify Duplicate Cases dialog does, and it has lots of bells and whistles.  It's a lot easier than hand rolling this.

HTH,
Jon Peck

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of jimjohn
Sent: Tuesday, March 18, 2008 8:18 AM
To: [hidden email]
Subject: [SPSSX-L] removing duplicates in spss

can someone help me with this. for three variables (out of many variables
that i have in my data set), i only want to keep every unique combination of
these three variables once. so i want to remove any duplicates (not
duplicates of my data set but duplicates where these three variables have
the same result). thx.
--
View this message in context: http://www.nabble.com/removing-duplicates-in-spss-tp16121958p16121958.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD