Hi, I use the following syntax to identify duplicate cases: sort cases by ID. match files file=* /by ID / first = CheckDedupe. value labels CheckDedupe 0 'Duplicated record' 1 'Unique record'. freq CheckDedupe. What I'd like to do is not only identify the duplicate case with a 1 but also its original (first) record with a 1 also. Even better would be to identify the first pair (or possibly more) of duplicate cases with a 1 then the next set of duplicate cases with 2 and so on (hope that makes sense?)
Any help much appreciated. Many thanks, Jignesh |
The Identify Duplicate Cases wizard can
generate syntax for sequentially numbering the duplicates (MatchSequence).
Jon Peck Senior Software Engineer, IBM [hidden email] new phone: 720-342-5621 From: J Sutar <[hidden email]> To: [hidden email] Date: 08/02/2011 07:54 AM Subject: [SPSSX-L] Selecting Duplicate Cases (along with their unique pair) Sent by: "SPSSX(r) Discussion" <[hidden email]> Hi, I use the following syntax to identify duplicate cases: sort cases by ID. match files file=* /by ID / first = CheckDedupe. value labels CheckDedupe 0 'Duplicated record' 1 'Unique record'. freq CheckDedupe. What I'd like to do is not only identify the duplicate case with a 1 but also its original (first) record with a 1 also. Even better would be to identify the first pair (or possibly more) of duplicate cases with a 1 then the next set of duplicate cases with 2 and so on (hope that makes sense?) Any help much appreciated. Many thanks, Jignesh |
Administrator
|
In reply to this post by Jignesh Sutar
DO IF $CASENUM=1 OR ID <> LAG(ID) .
+ COMPUTE YOKEPAIR=1. ELSE. + COMPUTE YOKEPAIR=LAG(YOKEPAIR)+1. END IF. COMPUTE YOKEPAIR=TRUNC((YOKEPAIR+1)/2).
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
In reply to this post by Jignesh Sutar
So you really want to create a new ID variable that ranges from 1 to N? How about AUTORECODE?
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Administrator
|
I read the question as desire to create something like: ID Yokepair 1 1 1 1 1 2 1 2 2 1 3 1 3 1 3 2 3 2 4 1 4 1 ....
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
The inclusion "or possibly more" (see below) made me think it's just a recode of the ID. I think the OP needs to clarify.
"What I'd like to do is not only identify the duplicate case with a 1 but also its original (first) record with a 1 also. Even better would be to identify the first pair (or possibly more) of duplicate cases with a 1 then the next set of duplicate cases with 2 and so on (hope that makes sense?)"
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Jignesh Sutar
At 09:42 AM 8/2/2011, J Sutar wrote:
>I use the following syntax to identify duplicate cases: > >sort cases by ID. >match files file=* /by ID / first = CheckDedupe. >value labels CheckDedupe 0 'Duplicated record' 1 'Unique record'. >freq CheckDedupe. > >What I'd like to do is not only identify the duplicate case with a 1 >but also its original (first) record with a 1 also. (untested) AGGREGATE OUTFILE=* MODE=ADDVARIABLES /BREAK=ID /CheckDedupe 'Occurrences of this ID in file' = NU. VALUE LABELS CheckDedupe 1 'ID is unique'. FREQUENCIES CheckDedupe. The cumulative count of all CheckDedupe greater than 1 is the number of *records* belonging to duplicated IDs. (The number of duplicated IDs, themselves, will be at most half this number, since each duplicated ID has at least two records.) >Even better would be to identify the first pair (or possibly more) >of duplicate cases with a 1 then the next set of duplicate cases >with 2 and so on (hope that makes sense?) There are ways of doing this -- some have been posted -- but, why bother? You already have an identifying variable for the pair (or triplet, or whatever): variable "ID", itself. If you want to study the duplicated IDs in your file, as you very well may, proceed as above and then SELECT IF CheckDedupe GT 1. to get all records belonging to duplicated IDs. (Don't do this on the master copy of your file, of course, as it *loses* the non-duplicated IDs.) Does this get you closer? ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Jon K Peck
I understand the original intention to be that the program
should define multiple pairs from a set of duplicates for one ID. Jon's result of a sequence 1,2,3,4, ... within an ID puts you there almost immediately. Just take the pairs in order: Recode seqvar(1,2=1)(3,4=2)(5,6=3) ... /into seqpairs . -- Rich Ulrich Date: Tue, 2 Aug 2011 08:02:38 -0600 From: [hidden email] Subject: Re: Selecting Duplicate Cases (along with their unique pair) To: [hidden email] The Identify Duplicate Cases wizard can generate syntax for sequentially numbering the duplicates (MatchSequence). Jon Peck Senior Software Engineer, IBM [hidden email] new phone: 720-342-5621 From: J Sutar <[hidden email]> To: [hidden email] Date: 08/02/2011 07:54 AM Subject: [SPSSX-L] Selecting Duplicate Cases (along with their unique pair) Sent by: "SPSSX(r) Discussion" <[hidden email]> Hi, I use the following syntax to identify duplicate cases: sort cases by ID. match files file=* /by ID / first = CheckDedupe. value labels CheckDedupe 0 'Duplicated record' 1 'Unique record'. freq CheckDedupe. What I'd like to do is not only identify the duplicate case with a 1 but also its original (first) record with a 1 also. Even better would be to identify the first pair (or possibly more) of duplicate cases with a 1 then the next set of duplicate cases with 2 and so on (hope that makes sense?) Any help much appreciated. Many thanks, Jignesh |
In reply to this post by Richard Ristow
AGGREGATE NU function (and the comments Richard made) captures the answer to the question I posted. Apologies if there was ambiguity. MatchSequence from Identify Duplicate Cases wizard works just as well with the added bonus in that it provides information on how many duplicate records there are for duplicated IDs.
Many thanks for the responses. Jignesh On 2 August 2011 21:12, Richard Ristow <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |