CONTENTS DELETED
The author has deleted this message.
|
Administrator
|
See SORT,ADD FILES, LAG, XSAVE and do some research on flavors if SOUNDEX.
--
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
The FUZZY extension command can do matches
that are, well, fuzzy, on numeric variables, but strings have to match
exactly, since there is no obvious metric for differences. For
names, though, there are some functions available for metrics. You
can get these in the extendedTransforms programmability module. I
can provide details if you want to go that route.
Soundex is a primitive way to code names into a 4-character code that roughly approximates the sound. So you could code and match nysiis is a more sophisticated name matching function. And, if you are mainly concerned about things like spelling errors, levenshtein distance can be used, but it is a lot more complicated to set up. So you could do the matching as a two-step process. In step 1, use FUZZY to do exact matches including the names. Remove the matched cases, and then do an exact match on the encoded names using one of the functions above. Or just do a 1-step process using, say nysiis. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] new phone: 720-342-5621 From: David Marso <[hidden email]> To: [hidden email] Date: 02/16/2012 10:01 PM Subject: Re: [SPSSX-L] Fuzzy matching Sent by: "SPSSX(r) Discussion" <[hidden email]> See SORT,ADD FILES, LAG, XSAVE and do some research on flavors if SOUNDEX. -- vijayanti wrote > > I have two data sets that I would like to match using fuzzy matching > in SPSS. Is SPSS able to do this? I have read about the Python > function "Fuzzy" but am unsure of how to make this work with string > variables. > > If I can't find an exact match by last name and first name, I want to > do a fuzzy match using date of birth, last name and first name within > geographic region. > > Cases that are an exact match on date of birth and geographic region > and are a highly probable match on last name and first name would be > matched together. > > Does anyone have an example of a syntax that would accomplish this? > > Vijayanti > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Fuzzy-matching-tp5491229p5491485.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
We collect student data and our main issue is that our students use a variation of their name that doesn’t match their official name of record but is something close. I didn’t realize that you could use soundex or nysiis within SPSS. A one step process like nysiis would be nice. Where can I get more information on this? Veena From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jon K Peck The FUZZY extension command can do matches that are, well, fuzzy, on numeric variables, but strings have to match exactly, since there is no obvious metric for differences. For names, though, there are some functions available for metrics. You can get these in the extendedTransforms programmability module. I can provide details if you want to go that route.
|
These functions are implemented using Python
programmability, so you would need to
- install the Python Essentials from the SPSS Community website - download the extendedTransforms.py module from that site from the Utilities Collection and save it in the extensions subdirectory of your Statistics installation (or elsewhere that Python can find it) - download and install the SPSSINC TRANS extension command from the Extension Commands Collection - To use FUZZY, download and install that extension command from the Extension Commands Collection if it isn't included in your Essentials module That's the hard part. Then this syntax would generate the nysiis value in a variable named code, assuming an input variable called name. spssinc trans result=code type=30 /formula "extendedTransforms.nysiis(name)". For soundex, it would be spssinc trans result=code type=30 /formula "extendedTransforms.soundex(name)". Note that the letter case of the part in quotation marks matters. HTH, Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] new phone: 720-342-5621 From: Veena Nambiar <[hidden email]> To: [hidden email] Date: 02/17/2012 10:52 AM Subject: Re: [SPSSX-L] Fuzzy matching Sent by: "SPSSX(r) Discussion" <[hidden email]> We collect student data and our main issue is that our students use a variation of their name that doesn’t match their official name of record but is something close. I didn’t realize that you could use soundex or nysiis within SPSS. A one step process like nysiis would be nice. Where can I get more information on this? Veena From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Jon K Peck Sent: Friday, February 17, 2012 6:15 AM To: [hidden email] Subject: Re: Fuzzy matching The FUZZY extension command can do matches that are, well, fuzzy, on numeric variables, but strings have to match exactly, since there is no obvious metric for differences. For names, though, there are some functions available for metrics. You can get these in the extendedTransforms programmability module. I can provide details if you want to go that route. Soundex is a primitive way to code names into a 4-character code that roughly approximates the sound. So you could code and match nysiis is a more sophisticated name matching function. And, if you are mainly concerned about things like spelling errors, levenshtein distance can be used, but it is a lot more complicated to set up. So you could do the matching as a two-step process. In step 1, use FUZZY to do exact matches including the names. Remove the matched cases, and then do an exact match on the encoded names using one of the functions above. Or just do a 1-step process using, say nysiis. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM peck@... new phone: 720-342-5621 From: David Marso <david.marso@...> To: [hidden email] Date: 02/16/2012 10:01 PM Subject: Re: [SPSSX-L] Fuzzy matching Sent by: "SPSSX(r) Discussion" <[hidden email]> See SORT,ADD FILES, LAG, XSAVE and do some research on flavors if SOUNDEX. -- vijayanti wrote > > I have two data sets that I would like to match using fuzzy matching > in SPSS. Is SPSS able to do this? I have read about the Python > function "Fuzzy" but am unsure of how to make this work with string > variables. > > If I can't find an exact match by last name and first name, I want to > do a fuzzy match using date of birth, last name and first name within > geographic region. > > Cases that are an exact match on date of birth and geographic region > and are a highly probable match on last name and first name would be > matched together. > > Does anyone have an example of a syntax that would accomplish this? > > Vijayanti > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Fuzzy-matching-tp5491229p5491485.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |