SPSSX Discussion

Duplicates WITHIN cases?

Classic

List

Threaded

4 messages Options

Sarah van Mastrigt

Duplicates WITHIN cases?

Hi All-

I am trying to identify (and delete) duplicate values across several
variables (within a single case), and couldn't find any suggestions in the
archives. The structure of the data is as follows:

ID V1 V2 V3 V4 V5 V6 V7...V93
1 23 23 34 23 53 43 34
2 17 1 32 67 17 32 1

I am trying to identify duplicate values across all 93 variables.
Unfortunately, the number of possible values makes a 'count' function for
each possibility unmanageable.

Is there a way that I could maintain the original value for the first
variable in which it appears, and recode duplicates for subsequent
variables -9 (or alternatively system missing), as below.

ID V1 V2 V3 V4 V5 V6 V7
1 23 -9 34 -9 53 43 -9
2 17 1 32 67 -9 -9 -9

Also, does anyone know if it is possible to create a new variable, with a
count of the duplicates, in the process (as is the case when identifying
primary cases in the duplicate CASES function)?

Any suggestions would be very much appreciated. I am using SPSS version 14.

--
Sarah van Mastrigt
Doctoral Researcher
Institute of Criminology
University of Cambridge

Maguin, Eugene

Re: Duplicates WITHIN cases?

Sarah,

This is now I'd go at it. There may be more elegant methods.

I'll assume that the 93 variables are arranged in sequence as you have shown
them. If they aren't that way, you'll have to get them that way. I'll also
assume that you are concerned only with locating duplicate non missing
values. I checked this over before posting it but please look at your
results very carefully as I might have missed something. If I did, please
let me know and describe the error.

Gene Maguin

Vector v=v1 to v93.
Compute dups=0.
Loop i=1 to 92.
+ do if (v(i) ne -9 and not(missing(v(i)))).
+ Loop j=i+1 to 93.
+ if (v(i) eq v(j)) dups=dups+1.
+ if (v(i) eq v(j)) v(j)=-9.
+ End loop.
+ end if.
End loop.

Sarah van Mastrigt

Re: Duplicates WITHIN cases?

In reply to this post by Sarah van Mastrigt

Thanks for your quick reply, Gene. I would never have figued this out on
my own. Your sytax worked like a charm! Many thanks!

Sarah

On Thu, 1 Mar 2007 14:06:24 -0500, Gene Maguin <[hidden email]> wrote:

Sarah,

This is now I'd go at it. There may be more elegant methods. I'll assume
that the 93 variables are arranged in sequence as you have shown them. If
they aren't that way, you'll have to get them that way. I'll also
assume that you are concerned only with locating duplicate non missing
values. I checked this over before posting it but please look at your
results very carefully as I might have missed something. If I did, please
let me know and describe the error.

Gene Maguin

>
>
>Vector v=v1 to v93.
>Compute dups=0.
>Loop i=1 to 92.
>+ do if (v(i) ne -9 and not(missing(v(i)))).
>+ Loop j=i+1 to 93.
>+ if (v(i) eq v(j)) dups=dups+1.
>+ if (v(i) eq v(j)) v(j)=-9.
>+ End loop.
>+ end if.
>End loop.

Richard Ristow

Re: Duplicates WITHIN cases?

In reply to this post by Maguin, Eugene

At 02:06 PM 3/1/2007, Gene Maguin wrote:

>This is now I'd go at it. There may be more elegant methods.

I'd give you a number of elegance points for this code (copied below).

The technical inefficiency, which may be what bothered you, is that it
takes time in proportion to the square of the number of variables
(n**2). Best case is n*log(n).

But yours is neat, clear, and very serviceable (and well laid out). I
was working myself up to doing a 'faster' solution, but more complex,
by sorting the variable values in MATRIX. Yours is the way to go.

>Vector v=v1 to v93.
>Compute dups=0.
>Loop i=1 to 92.
>+ do if (v(i) ne -9 and not(missing(v(i)))).
>+ Loop j=i+1 to 93.
>+ if (v(i) eq v(j)) dups=dups+1.
>+ if (v(i) eq v(j)) v(j)=-9.
>+ End loop.
>+ end if.
>End loop.