|
Hello list, I'm hoping someone can assist me with this duplicate item issue.
I have a file in
which I need to exclude duplicate records but I have some cases where people are
known by more than one ID.
My current duplicate
check is by Record_Date, then Record_ID, then Person_ID but if the person has an
alternate ID the duplicate check is incorrect.
As you can see above, person 341 is also person 616 and is in
the same record as person 104. I need to count only one record for this person
on the 25th Oct 05. I would also count a record for person 104 on the same
day.
Can someone assist
with some syntax to provide me with the correct duplicate identification? I can't permanently recode IDs as I will be receiving more data with additional duplicate records and will therefore need to do this check again.
I have SPSS V15 Thanks, Christine |
|
Hi Christine,
1) Compute the lowest of all IDs of a
person
compute Person_ID_min = min(Person_ID,
Person_ID_Alternate).
2) Do duplicate cases analyzis for the new variable
Person_ID_min. This ID is unique and the person contained in the first three
rows will have 341 in all three cases.
Best regards,
Jan
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Christine Sent: Friday, March 06, 2009 6:12 AM To: [hidden email] Subject: Identify duplicates from more than one variable Hello list, I'm hoping someone can assist me with this
duplicate item issue.
I have a file in which I need to exclude
duplicate records but I have some cases where people are known by more than one
ID.
My current duplicate check is by Record_Date,
then Record_ID, then Person_ID but if the person has an alternate ID the
duplicate check is incorrect.
As you can see above, person 341 is
also person 616 and is in the same record as person 104. I need to count only
one record for this person on the 25th Oct 05. I would also count a record for
person 104 on the same day.
Can someone assist with some syntax to
provide me with the correct duplicate identification? I can't permanently recode
IDs as I will be receiving more data with additional duplicate records and will
therefore need to do this check again.
I have SPSS V15 Thanks, Christine _____________ Tato zpráva
a všechny připojené
soubory jsou důvěrné a určené výlučně adresátovi(-ům). Jestliže nejste
oprávněným adresátem, je zakázáno jakékoliv zveřejňování, zprostředkování nebo
jiné použití těchto informací. Jestliže jste tento mail dostali neoprávněně,
prosím, uvědomte odesilatele a smažte zprávu i přiložené soubory. Odesilatel
nezodpovídá za jakékoliv chyby nebo opomenutí způsobené tímto
přenosem.
P Are you sure that you
really need a print version of this message and/or its attachments? Think about
nature.
|
|
Ok, thanks for the replies so far but the IDs are randomised and alphanumeric i.e IDs are similar to X-0701-G57P. Neither the numerical order or the alpha order is sequenced. Sorry to confuse the matter more, I didn't want to include the original format (privacy) due but didn't realise it would complicate matters.
The dataset runs into the 000s of cases so I can't do this manually. Any further assistance would be much appreciated. Thankyou, Christine 2009/3/6 Spousta Jan <[hidden email]>
|
|
In reply to this post by Christine-28
I was wondering about some kind of third variable, can you have a look at my previous post about the sequence of the IDs (they're randomised) and tell me if this could still work?
I might try it to start with. Thanks, Christine 2009/3/6 Jason Schoeneberger <[hidden email]>
|
| Free forum by Nabble | Edit this page |
