|
Hi everyone, Does anybody know a way to get around the 64 variable limit
for the Identify Duplicate Cases tool? Or another solution for large data with
tons of variables? Thanks, David |
|
You can get around this by running if
statements that include all of your criteria for example: If (a=1) & (c=2) & ((d=3) or
(d=7)) dup = 1. I don’t think there’s a 64
variable limit on an if statement. Then you just filter on the dup variable. Thanks Matt Matthew Pirritano, Ph.D. Research Analyst IV Medical Services Initiative (MSI) Orange County Health Care Agency (714) 568-5648 From: SPSSX(r)
Discussion [mailto:[hidden email]] On
Behalf Of David J. Peng Hi everyone, Does anybody know a way to get around the 64 variable limit
for the Identify Duplicate Cases tool? Or another solution for large data with
tons of variables? Thanks, David |
|
In reply to this post by David J. Peng
Actually more appropriately it would be a
bunch of if statements using the lag function: First you would sort by all of the
variables you want to id dups on. Then just say: Do if (lag(a) ne a) or (lag(b)
ne b) … or (lag(z)
ne z).
Compute count = 1. Else if (lag(a) = a) &
(lag(b) = b) … &
(lag(z) = z)
Compute count = count + 1. End if Since you’ve got more than 64 I
would advise using excel to create the syntax. I use the drag function to
duplicate lots of duplicate syntax and the concatenate to create the full
syntax. Each line in excel could be the column (lag(
[new column] ‘a’ [just copy and past from the variable view
all of your variables] ) new column ne new column ‘a’ [copy and
past your variables again] new column ) Then you just drag all of the redundant
components to get as many as you need. Concatenate the columns and then paste
the finished product into spss syntax. Hope that’s sort of clear. There’s
probably a simpler way. But that’ s how I’d do it. Relatively quick
and painless though it would take up a lot of page space. Then you just select only the 1’s. matt Matthew Pirritano, Ph.D. Research Analyst IV Medical Services Initiative (MSI) Orange County Health Care Agency (714) 568-5648 From: SPSSX(r)
Discussion [mailto:[hidden email]] On
Behalf Of David J. Peng Hi everyone, Does anybody know a way to get around the 64 variable limit
for the Identify Duplicate Cases tool? Or another solution for large data with
tons of variables? Thanks, David |
|
In reply to this post by David J. Peng
Although a few
workarounds have been posted, this seems like a pretty unusual request. What
is it you are really trying to do? And note that
the SORT command, which you would need in order to arrange the cases for
detecting duplicates in adjacent ones, has a 64 variable limit, which is why
IDC is limited to 64. With care you can get around that with repeated
sorts, but still… Regards, Jon Peck From: SPSSX(r)
Discussion [mailto:[hidden email]] On
Behalf Of David J. Peng Hi everyone, Does anybody know a way to get around the 64 variable limit
for the Identify Duplicate Cases tool? Or another solution for large data with
tons of variables? Thanks, David |
|
In reply to this post by mpirritano
At 07:08 PM 5/4/2009, Pirritano, Matthew wrote:
[You could use] of IF statements using the lag function. First you would sort by all of the variables you want to id dups on.Do if (lag(a) ne a) or (lag(b) ne b) or (lag(z) ne z). Compute count = 1. Else if (lag(a) = a) & (lag(b) = b) & (lag(z) = z) Compute count = count + 1. End if Since youve got more than 64 I would advise using excel to create the syntax. Probably even easier with DO REPEAT, like this (untested). As in the above example, it only works if LEAVE is specified for "count". And treating the first case as special is necessary. (And take care what happens if you have missing data.) Code not tested: COMPUTE #Dup = 1. DO IF $CASENUM EQ 1. . COMPUTE #Dup = 0. ELSE. . DO REPEAT TestVar = a TO z. . IF LAG(TestVar) NE TestVar . #Dup = 0. . END REPEAT. DO IF #Dup. . COMPUTE Count = Count + 1. ELSE. . COMPUTE Count = 1. END IF. ====================To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
