Hi All-
I am trying to identify (and delete) duplicate values across several variables (within a single case), and couldn't find any suggestions in the archives. The structure of the data is as follows: ID V1 V2 V3 V4 V5 V6 V7...V93 1 23 23 34 23 53 43 34 2 17 1 32 67 17 32 1 I am trying to identify duplicate values across all 93 variables. Unfortunately, the number of possible values makes a 'count' function for each possibility unmanageable. Is there a way that I could maintain the original value for the first variable in which it appears, and recode duplicates for subsequent variables -9 (or alternatively system missing), as below. ID V1 V2 V3 V4 V5 V6 V7 1 23 -9 34 -9 53 43 -9 2 17 1 32 67 -9 -9 -9 Also, does anyone know if it is possible to create a new variable, with a count of the duplicates, in the process (as is the case when identifying primary cases in the duplicate CASES function)? Any suggestions would be very much appreciated. I am using SPSS version 14. -- Sarah van Mastrigt Doctoral Researcher Institute of Criminology University of Cambridge |
Sarah,
This is now I'd go at it. There may be more elegant methods. I'll assume that the 93 variables are arranged in sequence as you have shown them. If they aren't that way, you'll have to get them that way. I'll also assume that you are concerned only with locating duplicate non missing values. I checked this over before posting it but please look at your results very carefully as I might have missed something. If I did, please let me know and describe the error. Gene Maguin Vector v=v1 to v93. Compute dups=0. Loop i=1 to 92. + do if (v(i) ne -9 and not(missing(v(i)))). + Loop j=i+1 to 93. + if (v(i) eq v(j)) dups=dups+1. + if (v(i) eq v(j)) v(j)=-9. + End loop. + end if. End loop. |
In reply to this post by Sarah van Mastrigt
Thanks for your quick reply, Gene. I would never have figued this out on
my own. Your sytax worked like a charm! Many thanks! Sarah On Thu, 1 Mar 2007 14:06:24 -0500, Gene Maguin <[hidden email]> wrote: Sarah, This is now I'd go at it. There may be more elegant methods. I'll assume that the 93 variables are arranged in sequence as you have shown them. If they aren't that way, you'll have to get them that way. I'll also assume that you are concerned only with locating duplicate non missing values. I checked this over before posting it but please look at your results very carefully as I might have missed something. If I did, please let me know and describe the error. Gene Maguin > > >Vector v=v1 to v93. >Compute dups=0. >Loop i=1 to 92. >+ do if (v(i) ne -9 and not(missing(v(i)))). >+ Loop j=i+1 to 93. >+ if (v(i) eq v(j)) dups=dups+1. >+ if (v(i) eq v(j)) v(j)=-9. >+ End loop. >+ end if. >End loop. |
In reply to this post by Maguin, Eugene
At 02:06 PM 3/1/2007, Gene Maguin wrote:
>This is now I'd go at it. There may be more elegant methods. I'd give you a number of elegance points for this code (copied below). The technical inefficiency, which may be what bothered you, is that it takes time in proportion to the square of the number of variables (n**2). Best case is n*log(n). But yours is neat, clear, and very serviceable (and well laid out). I was working myself up to doing a 'faster' solution, but more complex, by sorting the variable values in MATRIX. Yours is the way to go. >Vector v=v1 to v93. >Compute dups=0. >Loop i=1 to 92. >+ do if (v(i) ne -9 and not(missing(v(i)))). >+ Loop j=i+1 to 93. >+ if (v(i) eq v(j)) dups=dups+1. >+ if (v(i) eq v(j)) v(j)=-9. >+ End loop. >+ end if. >End loop. |
Free forum by Nabble | Edit this page |