|
Hi All,
I currently trying to update our open ended question algorithm for survey work we carry out online. I'm currently using the Soundex algorithm, and then doing a few manipulations for me to create the desired format. This process works well enough but I have recently been informed of another method which is potentially better and that is of the Levenshtein distance algorithm. http://en.wikipedia.org/wiki/Levenshtein_distance Has anyone used this before? Or written some code to use this in SPSS? Python or syntax is fine, I want to get a feel for how useful this will be compared to my current method. I'm generally only concerned with 1-4 words for different brands so text analysis would be an overkill here Thanks in advance Mike ________________________________________________________________________ This e-mail has been scanned for all viruses by Star. The service is powered by MessageLabs. For more information on a proactive anti-virus service working around the clock, around the globe, visit: http://www.star.net.uk ________________________________________________________________________ |
|
Hi Michael,
take a look at this examples from Ray's site: http://www.spsstools.net/Scripts/Utils/LevenshteinDistance.txt http://www.spsstools.net/Syntax/Strings/SoundexPhoneticComparison.txt hth, Vlad On 8/31/07, Michael Pearmain <[hidden email]> wrote: > > Hi All, > > I currently trying to update our open ended question algorithm for > survey work we carry out online. > > I'm currently using the Soundex algorithm, and then doing a few > manipulations for me to create the desired format. This process works > well enough but I have recently been informed of another method which is > potentially better and that is of the Levenshtein distance algorithm. > > http://en.wikipedia.org/wiki/Levenshtein_distance > > Has anyone used this before? Or written some code to use this in SPSS? > Python or syntax is fine, I want to get a feel for how useful this will > be compared to my current method. > > I'm generally only concerned with 1-4 words for different brands so text > analysis would be an overkill here > > Thanks in advance > > Mike > > ________________________________________________________________________ > This e-mail has been scanned for all viruses by Star. The > service is powered by MessageLabs. For more information on a proactive > anti-virus service working around the clock, around the globe, visit: > http://www.star.net.uk > ________________________________________________________________________ > -- Vlad Simion Data Analyst Tel: +40 720130611 |
|
Hi Michael,
The trans module of Python/SPSS offers NYSIIG, soundex, and Levenshtein distance. Check out chapter 18, p. 357 of the following: http://www.spss.com/spss/SPSSdatamgmt_4e.pdf Cheers!! Albert-Jan --- vlad simion <[hidden email]> wrote: > Hi Michael, > > take a look at this examples from Ray's site: > > http://www.spsstools.net/Scripts/Utils/LevenshteinDistance.txt > > http://www.spsstools.net/Syntax/Strings/SoundexPhoneticComparison.txt > > hth, > > Vlad > > On 8/31/07, Michael Pearmain > <[hidden email]> wrote: > > > > Hi All, > > > > I currently trying to update our open ended > question algorithm for > > survey work we carry out online. > > > > I'm currently using the Soundex algorithm, and > then doing a few > > manipulations for me to create the desired format. > This process works > > well enough but I have recently been informed of > another method which is > > potentially better and that is of the Levenshtein > distance algorithm. > > > > http://en.wikipedia.org/wiki/Levenshtein_distance > > > > Has anyone used this before? Or written some code > to use this in SPSS? > > Python or syntax is fine, I want to get a feel for > how useful this will > > be compared to my current method. > > > > I'm generally only concerned with 1-4 words for > different brands so text > > analysis would be an overkill here > > > > Thanks in advance > > > > Mike > > > > > > > This e-mail has been scanned for all viruses by > Star. The > > service is powered by MessageLabs. For more > information on a proactive > > anti-virus service working around the clock, > around the globe, visit: > > http://www.star.net.uk > > > ________________________________________________________________________ > > > > > > -- > Vlad Simion > Data Analyst > Tel: +40 720130611 > Cheers! Albert-Jan ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Did you know that 87.166253% of all statistics claim a precision of results that is not justified by the method employed? [HELMUT RICHTER] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ____________________________________________________________________________________ Take the Internet to Go: Yahoo!Go puts the Internet in your pocket: mail, news, photos & more. http://mobile.yahoo.com/go?refer=1GNXIC |
| Free forum by Nabble | Edit this page |
