how do I get to the extensions forstring distances .

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

how do I get to the extensions forstring distances .

Art Kendall
I must not be searching the archives correctly.
I seem to recall that there were some extensions to calculate the distances between strings.
e.g.
 "123 Oak street Someplace MD" is closer to "123 Oak St, someplace, MD 21111"
than
"123 Oak street Someplace MD" is to "456 Maple Street Someplace MD"

I have version 21.


Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: how do I get to the extensions for string distances .

Jon K Peck
These are functions in the extendedTransforms.py module, and they can be used with the SPSSINC TRANS extension command.

levenshteindistance:          calculate similarity between two strings
jaroWinkler                        calculate Jaro-Winkler string similarity measure
DiceStringSimilarity           compare strings using Dice bigram metric
Dictdict                             find best match of strings using Dice metric

There are also a few specialized encoding functions for names (soundex, nysiis)

There are examples for most of those with SPSSINC TRANS in comments in extendedTransforms.py.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Art Kendall <[hidden email]>
To:        [hidden email]
Date:        09/10/2014 08:20 AM
Subject:        [SPSSX-L] how do I get to the extensions forstring distances  .
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




I must not be searching the archives correctly.
I seem to recall that there were some extensions to calculate the distances
between strings.
e.g.
"123 Oak street Someplace MD" is closer to "123 Oak St, someplace, MD
21111"
than
"123 Oak street Someplace MD" is to "456 Maple Street Someplace MD"

I have version 21.






-----
Art Kendall
Social Research Consultants
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/how-do-I-get-to-the-extensions-forstring-distances-tp5727203.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: how do I get to the extensions for string distances .

Art Kendall
Thank you

I now have the following syntax
spssinc trans result = jaroWinkler type = 0
 /formula "extendedTransforms.jaroWinkler(V4,cleanAddress, 0)".
spssinc trans result = LevenDistance type = 0
 /formula "extendedTransforms.levenshteindistance(V4,cleanAddress)".
spssinc trans result = Dictdict type = 0
 /formula "extendedTransforms.Dictdict(V4,cleanAddress)".
spssinc trans result = StringSim type = 0
 /formula "extendedTransforms.DiceStringSimilarity(V4,cleanAddress,casesensitive=False,splitwhite=True))".

the call to the Dictdict function results in this message.
"Warnings
cannot import name Dictdict"

The call to DiceStringSimilarity results in this message.

"Warnings
The formula syntax given is invalid:
unexpected EOF while parsing (<string>, line 1)"





Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: how do I get to the extensions for string distances .

Jon K Peck
It's Dicedict, not Dictdict, but if you are using this class, see the code example as it takes a dataset of words that needs to be set up with the INITIAL subcommand of SPSSINC TRANS.

The DiceStringsSimilarity call has an extra closing parenthesis.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Art Kendall <[hidden email]>
To:        [hidden email]
Date:        09/10/2014 10:50 AM
Subject:        Re: [SPSSX-L] how do I get to the extensions for string distances  .
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Thank you

I now have the following syntax
spssinc trans result = jaroWinkler type = 0
/formula "extendedTransforms.jaroWinkler(V4,cleanAddress, 0)".
spssinc trans result = LevenDistance type = 0
/formula "extendedTransforms.levenshteindistance(V4,cleanAddress)".
spssinc trans result = Dictdict type = 0
/formula "extendedTransforms.Dictdict(V4,cleanAddress)".
spssinc trans result = StringSim type = 0
/formula
"extendedTransforms.DiceStringSimilarity(V4,cleanAddress,casesensitive=False,splitwhite=True))".

the call to the Dictdict function results in this message.
"Warnings
cannot import name Dictdict"

The call to DiceStringSimilarity results in this message.

"Warnings
The formula syntax given is invalid:
unexpected EOF while parsing (<string>, line 1)"









-----
Art Kendall
Social Research Consultants
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/how-do-I-get-to-the-extensions-forstring-distances-tp5727203p5727205.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD