How to look for possible name matches in restricted subsets?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

How to look for possible name matches in restricted subsets?

Art Kendall
Context: I suspect that field operations may have not been consistent in
assuring they correctly matched Pre and Post cases.

There are a few hundred HousIDs in each city.
each case has
Country City HouseID PrePost Name sex v1 v2 v3.
PrePost has values 1 'Pre' 2 'Post'.


As a quality check on whether HouseID is plausibly referring to roughly the
same people, I am looking for a way to get data to eyeball for name sex v1
v2 v3.
1) take the first (say) 5 or 10 HouseIDs with 'Pre' on PrePost in a city
2) see if there is overlap between the set of names in that HouseID and
those in any HouseID in the 'Post' set for that city.  
3) output needed 1 'HouseID with some overlap' 2 'No overlap with any
HouseID.
3) overlap means that at least 1 Name from a 'Pre' HouseID has a match

Match means the last word in the set of words is the same AND at least 1 of
the other words matches.






-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: How to look for possible name matches in restricted subsets?

Maguin, Eugene
I think I understand the problem but some elements are confusing. Sounds like you're doing a household census. Is it that Name is FN+LN of one person for a one-person household and FN1+LN1, ... FN(j)+LN(j) for a j person household. Or, is Name FN+LN for a specific person in the household.
In your definition of a match does 'set of words' refer only to the contents, i.e., the FNs and LNs, of Name or does 'set of words' refer to the contents of Name and sex, v1 v2 v3.
Maybe an example?

Gene Maguin


-----Original Message-----
From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Art Kendall
Sent: Saturday, April 24, 2021 11:28 AM
To: [hidden email]
Subject: How to look for possible name matches in restricted subsets?

Context: I suspect that field operations may have not been consistent in assuring they correctly matched Pre and Post cases.

There are a few hundred HousIDs in each city.
each case has
Country City HouseID PrePost Name sex v1 v2 v3.
PrePost has values 1 'Pre' 2 'Post'.


As a quality check on whether HouseID is plausibly referring to roughly the
same people, I am looking for a way to get data to eyeball for name sex v1
v2 v3.
1) take the first (say) 5 or 10 HouseIDs with 'Pre' on PrePost in a city
2) see if there is overlap between the set of names in that HouseID and
those in any HouseID in the 'Post' set for that city.  
3) output needed 1 'HouseID with some overlap' 2 'No overlap with any
HouseID.
3) overlap means that at least 1 Name from a 'Pre' HouseID has a match

Match means the last word in the set of words is the same AND at least 1 of
the other words matches.






-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: How to look for possible name matches in restricted subsets?

Andy W
In reply to this post by Art Kendall
How about this Art, calculate the Levenshtein distance between the string
names, and just a binary same/different for the sex variable. Can be done
for the whole set, not just a small sample.

Then you can combine those two differences somehow (e.g. normalized
Levenshtein over 0.2 + different sex), and then aggregate up to the city
level. Or do a scatterplot/boxplot with City on X and distances on Y to see
if any city has weird distances.

To get an estimate of what is a reasonable distance, you might randomly
match pairs you know are different,
https://andrewpwheeler.com/2015/07/01/some-ad-hoc-fuzzy-name-matching-within-police-databases/,
but I bet even without going to all that trouble if your assertion is
correct some cities will show a much larger variance in the distances.



-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: How to look for possible name matches in restricted subsets?

Art Kendall
In reply to this post by Maguin, Eugene
The NGO meant well.  They just picked houses here and there before
evacuation. After return, they went back but had no way to affirm they went
to the same house. <head slap>



I just need a few things to point out that would support my argument that
*as is *the data can only be a way to point out problems to avoid in the
future.



-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants