I must not be searching the archives correctly.
I seem to recall that there were some extensions to calculate the distances between strings. e.g. "123 Oak street Someplace MD" is closer to "123 Oak St, someplace, MD 21111" than "123 Oak street Someplace MD" is to "456 Maple Street Someplace MD" I have version 21.
Art Kendall
Social Research Consultants |
These are functions in the extendedTransforms.py
module, and they can be used with the SPSSINC TRANS extension command.
levenshteindistance: calculate similarity between two strings jaroWinkler calculate Jaro-Winkler string similarity measure DiceStringSimilarity compare strings using Dice bigram metric Dictdict find best match of strings using Dice metric There are also a few specialized encoding functions for names (soundex, nysiis) There are examples for most of those with SPSSINC TRANS in comments in extendedTransforms.py. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Art Kendall <[hidden email]> To: [hidden email] Date: 09/10/2014 08:20 AM Subject: [SPSSX-L] how do I get to the extensions forstring distances . Sent by: "SPSSX(r) Discussion" <[hidden email]> I must not be searching the archives correctly. I seem to recall that there were some extensions to calculate the distances between strings. e.g. "123 Oak street Someplace MD" is closer to "123 Oak St, someplace, MD 21111" than "123 Oak street Someplace MD" is to "456 Maple Street Someplace MD" I have version 21. ----- Art Kendall Social Research Consultants -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/how-do-I-get-to-the-extensions-forstring-distances-tp5727203.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Thank you
I now have the following syntax spssinc trans result = jaroWinkler type = 0 /formula "extendedTransforms.jaroWinkler(V4,cleanAddress, 0)". spssinc trans result = LevenDistance type = 0 /formula "extendedTransforms.levenshteindistance(V4,cleanAddress)". spssinc trans result = Dictdict type = 0 /formula "extendedTransforms.Dictdict(V4,cleanAddress)". spssinc trans result = StringSim type = 0 /formula "extendedTransforms.DiceStringSimilarity(V4,cleanAddress,casesensitive=False,splitwhite=True))". the call to the Dictdict function results in this message. "Warnings cannot import name Dictdict" The call to DiceStringSimilarity results in this message. "Warnings The formula syntax given is invalid: unexpected EOF while parsing (<string>, line 1)"
Art Kendall
Social Research Consultants |
It's Dicedict, not Dictdict, but if you
are using this class, see the code example as it takes a dataset of words
that needs to be set up with the INITIAL subcommand of SPSSINC TRANS.
The DiceStringsSimilarity call has an extra closing parenthesis. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Art Kendall <[hidden email]> To: [hidden email] Date: 09/10/2014 10:50 AM Subject: Re: [SPSSX-L] how do I get to the extensions for string distances . Sent by: "SPSSX(r) Discussion" <[hidden email]> Thank you I now have the following syntax spssinc trans result = jaroWinkler type = 0 /formula "extendedTransforms.jaroWinkler(V4,cleanAddress, 0)". spssinc trans result = LevenDistance type = 0 /formula "extendedTransforms.levenshteindistance(V4,cleanAddress)". spssinc trans result = Dictdict type = 0 /formula "extendedTransforms.Dictdict(V4,cleanAddress)". spssinc trans result = StringSim type = 0 /formula "extendedTransforms.DiceStringSimilarity(V4,cleanAddress,casesensitive=False,splitwhite=True))". the call to the Dictdict function results in this message. "Warnings cannot import name Dictdict" The call to DiceStringSimilarity results in this message. "Warnings The formula syntax given is invalid: unexpected EOF while parsing (<string>, line 1)" ----- Art Kendall Social Research Consultants -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/how-do-I-get-to-the-extensions-forstring-distances-tp5727203p5727205.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |