|
In my dataset I have muliple instances of a single id, with one of five
possible codes assisgned to each instance of that id. I want to keep only unique instances of each code,which would result in five records for each id. Is there a way to do this in SPSS? ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Alina,
>>In my dataset I have muliple instances of a single id, with one of five possible codes assisgned to each instance of that id. I want to keep only unique instances of each code,which would result in five records for each id. Yes, I'm pretty sure it's not too hard but your message, especially the part about '... one of five possible codes assisgned to each instance of that id.' is kind of cryptic. I'll assume you have some other mechanism for doing the code assignment. Anyway, let ID be your id variable and Code be your code variable. Sort cases by ID Code. Compute RecID=1. Compute Dup=0. Do if (ID eq lag(ID)). + Do if (Code eq lag(Code)). + Compute RecID=lag(RecID). + Compute Dup=1. + Else. + Compute RecID=lag(RecID)+1. + End if. End if. Select if (Dup eq 0). Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Alina Sheyman-3
Alternatively, you could aggregate on RecID and Code to get unique
records of the R/C combination as well as a count of the number of records of each. Melissa -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Gene Maguin Sent: Monday, December 10, 2007 4:35 PM To: [hidden email] Subject: Re: [SPSSX-L] Keeping unique records Alina, >>In my dataset I have muliple instances of a single id, with one of >>five possible codes assisgned to each instance of that id. I want to keep only unique instances of each code,which would result in five records for each id. Yes, I'm pretty sure it's not too hard but your message, especially the part about '... one of five possible codes assisgned to each instance of that id.' is kind of cryptic. I'll assume you have some other mechanism for doing the code assignment. Anyway, let ID be your id variable and Code be your code variable. Sort cases by ID Code. Compute RecID=1. Compute Dup=0. Do if (ID eq lag(ID)). + Do if (Code eq lag(Code)). + Compute RecID=lag(RecID). + Compute Dup=1. + Else. + Compute RecID=lag(RecID)+1. + End if. End if. Select if (Dup eq 0). Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Alina Sheyman-3
At 04:56 PM 12/10/2007, Alina Sheyman wrote:
>In my dataset I have multiple instances of a single id, with one of >five possible codes assigned to each instance of that id. I want to >keep only unique instances of each code,which would result in five >records for each id. Is there a way to do this in SPSS? That is (on Gene's and Melissa's understanding, which is also mine), you may have multiple records for some combinations of id and code, and you want to keep only one for each combination. As they both wrote, it isn't too hard. I haven't looked hard at Gene's code, nor tested it, but it looks perfectly workable to me. That's the programming side. On the analysis side, always be careful when doing this. Ask yourself, Where did those 'duplicates' come from? Are they always identical on all variables? If not, do they actually represent separate events, or something, and keeping only one is throwing away data - and WHICH one? Similarly, if they represent different copies of what should be the same information, what copy is most reliable - the first one by time of entry, the last one, or what? Sometimes (as an example - do NOT leap on it as the solution for you), all the records, though with different values, may be representative of that combination of id and code; in that case, averaging using AGGREGATE may be more appropriate than selecting one of the records. -With best wishes, Richard ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
