I'm trying to figure out a way to identify both the original plus the
duplicate case. The next step would be to identify which survey was entered first, and delete the other, but I think I need to get both cases first. I've been using LAG, and successfully generated a numeric or string variable. Here's the string version. STRING MultiParent (A20). SORT CASES BY RecordID (A). COMPUTE MultiParent = STRING(RecordID,F8.0). IF LAG(RecordID) = RecordID MultiParent=CONCAT(STRING(RecordID,F8.0)," 2nd"). EXECUTE. As a temporary workaround, I'm using "SORT CASES BY MultiParent Vdatesub" which is OK because there are (so far) only a few duplicates. But is there a better way? Thanks! Mike _________________________________________________________________________ Mike Pritchard | [hidden email] | 5 Circles Research | 425-444-3410 (c) | 425-968-3883 (o) ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Hm, and why exactly are you not using the "identify duplicate id" function?
bozena -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Mike Pritchard Sent: October 28, 2015 11:16 AM To: [hidden email] Subject: Identify duplicate case plus original I'm trying to figure out a way to identify both the original plus the duplicate case. The next step would be to identify which survey was entered first, and delete the other, but I think I need to get both cases first. I've been using LAG, and successfully generated a numeric or string variable. Here's the string version. STRING MultiParent (A20). SORT CASES BY RecordID (A). COMPUTE MultiParent = STRING(RecordID,F8.0). IF LAG(RecordID) = RecordID MultiParent=CONCAT(STRING(RecordID,F8.0)," 2nd"). EXECUTE. As a temporary workaround, I'm using "SORT CASES BY MultiParent Vdatesub" which is OK because there are (so far) only a few duplicates. But is there a better way? Thanks! Mike _________________________________________________________________________ Mike Pritchard | [hidden email] | 5 Circles Research | 425-444-3410 (c) | 425-968-3883 (o) ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Mike Pritchard
Do you have a date-of-entry variable? Are you using double entry for quality control?
What do you want to do if the two entries of what is supposed to be identical data in fact differs? Do you have access to original forms, papers, etc. that is being entered? Please describe your situation in more detail. As another reply said, did you try "identify duplicate cases"?
Art Kendall
Social Research Consultants |
In reply to this post by Mike Pritchard
At 02:16 PM 10/28/2015, Mike Pritchard wrote:
>I'm trying to figure out a way to identify both the original plus the >duplicate case. The next step would be to identify which survey was entered >first, and delete the other, but I think I need to get both cases first. Would this (untested) do what you want? It keeps only the record with the lowest value of Vdatesub among those with the same RecordID. (It assumes that Vdatesub is an SPSS date variable, or is in some other form that sorts with earliest value first.) Adding variable NInstances, to flag records that occur in duplicate, is optional. AGGREGATE OUTFILE=* /BREAK = RecordID /NInstances 'Instances of this value of RecordID' = NU /Earliest 'Earliest date for this RecordID' = MIN(Vdatesub). Then, to keep only the earliest (but if there are several records on the earliest date, it will keep them all), SELECT IF Vdatesub EQ Earliest. STRING MultiParent (A20). SORT CASES BY RecordID (A). COMPUTE MultiParent = STRING(RecordID,F8.0). IF LAG(RecordID) = RecordID MultiParent=CONCAT(STRING(RecordID,F8.0)," 2nd"). EXECUTE. As a temporary workaround, I'm using "SORT CASES BY MultiParent Vdatesub" which is OK because there are (so far) only a few duplicates. But is there a better way? ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Art Kendall
Thanks Art. I didnt want to overload the question with background, but I
can see now that would have been helpful. I didn't see the other reply about "identify duplicate cases" - I couldn't find it in spam folders. I've dabbled with it in the past, but always been a little scared because the syntax generated makes it look like cases are dropped, not just identified. But I tried it and it seems to do exactly what I needed. So, the background is probably moot, but just in case... The survey is for satisfaction among soccer club members who are parents/guardians of the players. Sampling was complicated because of a couple of factors including multiple players in a household and two sets of parent/guardian fields for most of the records. We decided to send invitations to a single person in the household. It turns out this was a mistake for a couple of reasons, including a scolding email from the president complaining about low response rate - sent to everyone. We "fixed" this with a campaign to parent/guardian 2, but now have to deal with multiple responses from different people in the household. They aren't all multiples - sometimes parent/guardian 1 didn't take the survey. We do have a recordID that uniquely identifies each player, so that's what I used as part of the SPSS identify duplicates procedure - along with date submitted to accept the first survey submitted. We still have to deal with other combinations including the multiplayer households who may be duplicates for parts of the survey. So there is still some work to do to get to a clean data set. Mike _________________________________________________________________________ Mike Pritchard | [hidden email] | 5 Circles Research | 425-444-3410 (c) | 425-968-3883 (o) -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Art Kendall Sent: Wednesday, October 28, 2015 12:06 PM To: [hidden email] Subject: Re: Identify duplicate case plus original Do you have a date-of-entry variable? Are you using double entry for quality control? What do you want to do if the two entries of what is supposed to be identical data in fact differs? Do you have access to original forms, papers, etc. that is being entered? Please describe your situation in more detail. As another reply said, did you try "identify duplicate cases"? ----- Art Kendall Social Research Consultants -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Identify-duplicate-case-plus-o riginal-tp5730896p5730898.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Zdaniuk, Bozena-3
Hi. We've probably moved past this, but the original reason for not using
"identify duplicate cases" was because of some problems I thought I had and misunderstandings from long ago, as noted in another reply. Then when I looked up "SPSS identify duplicate cases" I found an article from the University of Washington. Even if it didn't resolve the issue (and the real SPSS capability did), I learned a bit about LAG. Thanks Mike _________________________________________________________________________ Mike Pritchard | [hidden email] | 425-444-3410 (c) | 425-968-3883 (o) -----Original Message----- From: Zdaniuk, Bozena [mailto:[hidden email]] Sent: Wednesday, October 28, 2015 11:42 AM To: 'Mike Pritchard' <[hidden email]>; [hidden email] Subject: RE: Identify duplicate case plus original Hm, and why exactly are you not using the "identify duplicate id" function? bozena -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Mike Pritchard Sent: October 28, 2015 11:16 AM To: [hidden email] Subject: Identify duplicate case plus original I'm trying to figure out a way to identify both the original plus the duplicate case. The next step would be to identify which survey was entered first, and delete the other, but I think I need to get both cases first. I've been using LAG, and successfully generated a numeric or string variable. Here's the string version. STRING MultiParent (A20). SORT CASES BY RecordID (A). COMPUTE MultiParent = STRING(RecordID,F8.0). IF LAG(RecordID) = RecordID MultiParent=CONCAT(STRING(RecordID,F8.0)," 2nd"). EXECUTE. As a temporary workaround, I'm using "SORT CASES BY MultiParent Vdatesub" which is OK because there are (so far) only a few duplicates. But is there a better way? Thanks! Mike _________________________________________________________________________ Mike Pritchard | [hidden email] | 5 Circles Research | 425-444-3410 (c) | 425-968-3883 (o) ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Richard Ristow
Thanks Richard. I may need to use some additional techniques for this
particular project, so thanks for the syntax ideas. Mike _________________________________________________________________________ Mike Pritchard | [hidden email] | 425-444-3410 (c) | 425-968-3883 (o) -----Original Message----- From: Richard Ristow [mailto:[hidden email]] Sent: Wednesday, October 28, 2015 2:48 PM To: Mike Pritchard <[hidden email]>; [hidden email] Subject: Re: Identify duplicate case plus original At 02:16 PM 10/28/2015, Mike Pritchard wrote: >I'm trying to figure out a way to identify both the original plus the >duplicate case. The next step would be to identify which survey was >entered first, and delete the other, but I think I need to get both cases first. Would this (untested) do what you want? It keeps only the record with the lowest value of Vdatesub among those with the same RecordID. (It assumes that Vdatesub is an SPSS date variable, or is in some other form that sorts with earliest value first.) Adding variable NInstances, to flag records that occur in duplicate, is optional. AGGREGATE OUTFILE=* /BREAK = RecordID /NInstances 'Instances of this value of RecordID' = NU /Earliest 'Earliest date for this RecordID' = MIN(Vdatesub). Then, to keep only the earliest (but if there are several records on the earliest date, it will keep them all), SELECT IF Vdatesub EQ Earliest. STRING MultiParent (A20). SORT CASES BY RecordID (A). COMPUTE MultiParent = STRING(RecordID,F8.0). IF LAG(RecordID) = RecordID MultiParent=CONCAT(STRING(RecordID,F8.0)," 2nd"). EXECUTE. As a temporary workaround, I'm using "SORT CASES BY MultiParent Vdatesub" which is OK because there are (so far) only a few duplicates. But is there a better way? ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |