Fuzzy Matching Help!

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Fuzzy Matching Help!

DKUKEC
I was wondering if someone could recommend a resource for trying to
develop a matching syntax that examines the likelihood that a match
exists between individual records in two different tables.  For
example, it would be a one-to-many match against offenders by last
name, middle name, last name, sex, date of birth, race, and if I am
luck social security number.  Ideally, I would like to have a score
associated with each unique record that indicates the likelihood a
match is 100% or 75% or something like that.

I have reviewed and even tried to use FEBRL; however, I have had no
luck getting it working - as I am not a computer programmer.  I am
also not having any luck with CASECTRL.py from the developer center.
Any suggestions would be greatly appreciated.


Cheers,
Damir

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Fuzzy Matching Help!

vlad simion
Hi Damir,

Try propensity score matching based on logistic regression.
Do a search on this list by this keyword and I think you'll find some examples.

Best of luck,
Vlad


On Fri, Oct 16, 2009 at 3:44 PM, Damir <[hidden email]> wrote:

> I was wondering if someone could recommend a resource for trying to
> develop a matching syntax that examines the likelihood that a match
> exists between individual records in two different tables.  For
> example, it would be a one-to-many match against offenders by last
> name, middle name, last name, sex, date of birth, race, and if I am
> luck social security number.  Ideally, I would like to have a score
> associated with each unique record that indicates the likelihood a
> match is 100% or 75% or something like that.
>
> I have reviewed and even tried to use FEBRL; however, I have had no
> luck getting it working - as I am not a computer programmer.  I am
> also not having any luck with CASECTRL.py from the developer center.
> Any suggestions would be greatly appreciated.
>
>
> Cheers,
> Damir
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Fuzzy Matching Help!

Muccatira, Devaiah M.
Damir,

Have you tried FRIL?

Devaiah Muccatira

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of vlad simion
Sent: Friday, October 16, 2009 9:07 AM
To: [hidden email]
Subject: Re: Fuzzy Matching Help!

Hi Damir,

Try propensity score matching based on logistic regression.
Do a search on this list by this keyword and I think you'll find some examples.

Best of luck,
Vlad


On Fri, Oct 16, 2009 at 3:44 PM, Damir <[hidden email]> wrote:

> I was wondering if someone could recommend a resource for trying to
> develop a matching syntax that examines the likelihood that a match
> exists between individual records in two different tables.  For
> example, it would be a one-to-many match against offenders by last
> name, middle name, last name, sex, date of birth, race, and if I am
> luck social security number.  Ideally, I would like to have a score
> associated with each unique record that indicates the likelihood a
> match is 100% or 75% or something like that.
>
> I have reviewed and even tried to use FEBRL; however, I have had no
> luck getting it working - as I am not a computer programmer.  I am
> also not having any luck with CASECTRL.py from the developer center.
> Any suggestions would be greatly appreciated.
>
>
> Cheers,
> Damir
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Fuzzy Matching Help!

Anton-24
In reply to this post by DKUKEC
Hi Damir-

Try Link Plus, which is free from the CDC. It's easy to learn and does a
fairly good job of identifying potential matches.

http://www.cdc.gov/cancer/npcr/tools/registryplus/lp.htm

Anton


On Fri, 16 Oct 2009 08:44:58 -0400, Damir <[hidden email]> wrote:

>I was wondering if someone could recommend a resource for trying to
>develop a matching syntax that examines the likelihood that a match
>exists between individual records in two different tables.  For
>example, it would be a one-to-many match against offenders by last
>name, middle name, last name, sex, date of birth, race, and if I am
>luck social security number.  Ideally, I would like to have a score
>associated with each unique record that indicates the likelihood a
>match is 100% or 75% or something like that.
>
>I have reviewed and even tried to use FEBRL; however, I have had no
>luck getting it working - as I am not a computer programmer.  I am
>also not having any luck with CASECTRL.py from the developer center.
>Any suggestions would be greatly appreciated.
>
>
>Cheers,
>Damir
>
>=====================
>To manage your subscription to SPSSX-L, send a message to
>[hidden email] (not to SPSSX-L), with no body text except the
>command. To leave the list, send the command
>SIGNOFF SPSSX-L
>For a list of commands to manage subscriptions, send the command
>INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Fuzzy Matching Help!

Dennis Deck
In reply to this post by DKUKEC
Link Plus is a nice little product and it is available at no charge.
Another is Link King (http://www.the-link-king.com/index.html) - this is
also free and has exceptional performance.
However, note that Link King requires SAS whereas Link Plus is a stand
alone program.

A couple starter resources are:
  http://en.wikipedia.org/wiki/Propensity_score_matching
  http://www.chrp.org/love/ASACleveland2003Propensity.pdf
  http://www.urban.org/toolkit/data-methods/propensity.cfm
Some articles are:

 - Rosenbaum PR, Rubin DB. The central role of the propensity score in
observational studies for causal effects. Biometrika. 1983;70:41-55.

 - D'Agostino RB. Tutorial in Biostatistics: Propensity score methods
for bias reduction in the comparison of a treatment to a non-randomized
control group. Statistics in Medicine. 1998;17(19):2265-2281.

 - Shadish, WR, Clark MH. An introduction to propensity scores.
Metodologia de las Ciencias del Comportamiento Journal.
2002;4(2):291-300.

I recommend considering how you will apply the propensity scores - there
are various alternatives discussed in the linkage literature.  For
example, one can apply weights rather than draw a sample. One such
approach is Inverse Probability of Treatment Weights (IPTW) - see
articles by Raudenbush.


Dennis Deck
RMC Research

-----Original Message-----
From: Anton [mailto:[hidden email]]
Sent: Friday, October 16, 2009 7:35 AM
Subject: Re: Fuzzy Matching Help!

Hi Damir-

Try Link Plus, which is free from the CDC. It's easy to learn and does a
fairly good job of identifying potential matches.

http://www.cdc.gov/cancer/npcr/tools/registryplus/lp.htm

Anton

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD