Hi Guys:
I have a "suspicious" file. By that I mean there could by duplicate information in it. Obviously not the same ID number, or name twice. Any ideas on how to check if there are duplicate records in it... Maybe someone has had such experience in the past and could point me in the right direction. Thanks for your time, |
At 11:30 AM 1/16/2007, Eugenio Grant wrote:
>Hi Guys: > > > >I have a "suspicious" file. By that I mean there could by duplicate >information in it. Obviously not the same ID number, or name twice. Any >ideas on how to check if there are duplicate records in it... The first thing is that you have to decide what makes a record a "duplicate"? If you simply mean that the values of all variables are the same, then you can easily do that by choosing Data/Identify duplicate cases from the drop-down menu. Select all variables with CTRL-A, and select the option to move all duplicates to the top of the file if you want to. My guess is that you don't mean all variables are the same, but rather that *some* are. Once you decide which variables uniquely define a case, if matched with another record, you can use the same menu system to identify your duplicates. Bob Schacht >Maybe someone has had such experience in the past and could point me in the >right direction. > > > >Thanks for your time, Robert M. Schacht, Ph.D. <[hidden email]> Pacific Basin Rehabilitation Research & Training Center 1268 Young Street, Suite #204 Research Center, University of Hawaii Honolulu, HI 96814 |
In reply to this post by Eugenio Grant
The SPSSX-L list server accused me of sending a duplicate message
yesterday, which it refused therefore to post. However, as far as I have been able to tell, my message has not appeared on the list even once, so I am sending it again, adding and deleting a few things. At 11:30 AM 1/16/2007, Eugenio Grant wrote: >I have a "suspicious" file. By that I mean there could by duplicate >information in it. Obviously not the same ID number, or name twice. Any >ideas on how to check if there are duplicate records in it... The first thing is that you have to decide what makes a record a "duplicate"? If you simply mean that the values of all variables are the same, then you can easily do that by choosing Data/Identify duplicate cases from the drop-down menu. Select all variables with CTRL-A, and select the option to move all duplicates to the top of the file if you want to. My guess is that you don't mean all variables are the same, but rather that *some* are. You will have to decide how a duplicate case would be identified. Once you decide which variables uniquely define a case, if matched with another record, you can use the same menu system (Data/Identify duplicate cases) to identify your duplicates by choosing the set of variables to identify duplicates accordingly. This is part of the general subset of problems with the heading, "When is a duplicate really a duplicate?" Bob Schacht >Maybe someone has had such experience in the past and could point me in the >right direction. > > > >Thanks for your time, Robert M. Schacht, Ph.D. <[hidden email]> Pacific Basin Rehabilitation Research & Training Center 1268 Young Street, Suite #204 Research Center, University of Hawaii Honolulu, HI 96814 |
Free forum by Nabble | Edit this page |