Duplicates WITHIN cases?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Duplicates WITHIN cases?

Sarah van Mastrigt
Hi All-

I am trying to identify (and delete) duplicate values across several
variables (within a single case), and couldn't find any suggestions in the
archives. The structure of the data is as follows:

ID  V1  V2  V3  V4  V5  V6  V7...V93
1   23  23  34  23  53  43  34
2   17  1   32  67  17  32   1

I am trying to identify duplicate values across all 93 variables.
Unfortunately, the number of possible values makes a 'count' function for
each possibility unmanageable.

Is there a way that I could maintain the original value for the first
variable in which it appears, and recode duplicates for subsequent
variables -9 (or alternatively system missing), as below.

ID  V1  V2  V3  V4  V5  V6  V7
1   23  -9  34  -9  53  43  -9
2   17  1   32  67  -9  -9  -9

Also, does anyone know if it is possible to create a new variable, with a
count of the duplicates, in the process (as is the case when identifying
primary cases in the duplicate CASES function)?

Any suggestions would be very much appreciated. I am using SPSS version 14.


--
Sarah van Mastrigt
Doctoral Researcher
Institute of Criminology
University of Cambridge
Reply | Threaded
Open this post in threaded view
|

Re: Duplicates WITHIN cases?

Maguin, Eugene
Sarah,

This is now I'd go at it. There may be more elegant methods.

I'll assume that the 93 variables are arranged in sequence as you have shown
them. If they aren't that way, you'll have to get them that way. I'll also
assume that you are concerned only with locating duplicate non missing
values. I checked this over before posting it but please look at your
results very carefully as I might have missed something. If I did, please
let me know and describe the error.

Gene Maguin


Vector v=v1 to v93.
Compute dups=0.
Loop i=1 to 92.
+  do if (v(i) ne -9 and not(missing(v(i)))).
+     Loop j=i+1 to 93.
+        if (v(i) eq v(j)) dups=dups+1.
+        if (v(i) eq v(j)) v(j)=-9.
+     End loop.
+  end if.
End loop.
Reply | Threaded
Open this post in threaded view
|

Re: Duplicates WITHIN cases?

Sarah van Mastrigt
In reply to this post by Sarah van Mastrigt
Thanks for your quick reply, Gene. I would never have figued this out on
my own. Your sytax worked like a charm! Many thanks!

Sarah


On Thu, 1 Mar 2007 14:06:24 -0500, Gene Maguin <[hidden email]> wrote:

Sarah,

This is now I'd go at it. There may be more elegant methods. I'll assume
that the 93 variables are arranged in sequence as you have shown them. If
they aren't that way, you'll have to get them that way. I'll also
assume that you are concerned only with locating duplicate non missing
values. I checked this over before posting it but please look at your
results very carefully as I might have missed something. If I did, please
let me know and describe the error.

Gene Maguin

>
>
>Vector v=v1 to v93.
>Compute dups=0.
>Loop i=1 to 92.
>+  do if (v(i) ne -9 and not(missing(v(i)))).
>+     Loop j=i+1 to 93.
>+        if (v(i) eq v(j)) dups=dups+1.
>+        if (v(i) eq v(j)) v(j)=-9.
>+     End loop.
>+  end if.
>End loop.
Reply | Threaded
Open this post in threaded view
|

Re: Duplicates WITHIN cases?

Richard Ristow
In reply to this post by Maguin, Eugene
At 02:06 PM 3/1/2007, Gene Maguin wrote:

>This is now I'd go at it. There may be more elegant methods.

I'd give you a number of elegance points for this code (copied below).

The technical inefficiency, which may be what bothered you, is that it
takes time in proportion to the square of the number of variables
(n**2). Best case is n*log(n).

But yours is neat, clear, and very serviceable (and well laid out). I
was working myself up to doing a 'faster' solution, but more complex,
by sorting the variable values in MATRIX. Yours is the way to go.

>Vector v=v1 to v93.
>Compute dups=0.
>Loop i=1 to 92.
>+  do if (v(i) ne -9 and not(missing(v(i)))).
>+     Loop j=i+1 to 93.
>+        if (v(i) eq v(j)) dups=dups+1.
>+        if (v(i) eq v(j)) v(j)=-9.
>+     End loop.
>+  end if.
>End loop.