Levenshtein distance for string variables

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Levenshtein distance for string variables

Mike P-5
Hi All,

I currently trying to update our open ended question algorithm for
survey work we carry out online.

I'm currently using the Soundex algorithm, and then doing a few
manipulations for me to create the desired format.  This process works
well enough but I have recently been informed of another method which is
potentially better and that is of the Levenshtein distance algorithm.

http://en.wikipedia.org/wiki/Levenshtein_distance

Has anyone used this before? Or written some code to use this in SPSS?
Python or syntax is fine, I want to get a feel for how useful this will
be compared to my current method.

I'm generally only concerned with 1-4 words for different brands so text
analysis would be an overkill here

Thanks in advance

Mike

________________________________________________________________________
This e-mail has been scanned for all viruses by Star. The
service is powered by MessageLabs. For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________
Reply | Threaded
Open this post in threaded view
|

Re: Levenshtein distance for string variables

vlad simion
Hi Michael,

take a look at this examples from Ray's site:

http://www.spsstools.net/Scripts/Utils/LevenshteinDistance.txt

http://www.spsstools.net/Syntax/Strings/SoundexPhoneticComparison.txt

hth,

Vlad

On 8/31/07, Michael Pearmain <[hidden email]> wrote:

>
> Hi All,
>
> I currently trying to update our open ended question algorithm for
> survey work we carry out online.
>
> I'm currently using the Soundex algorithm, and then doing a few
> manipulations for me to create the desired format.  This process works
> well enough but I have recently been informed of another method which is
> potentially better and that is of the Levenshtein distance algorithm.
>
> http://en.wikipedia.org/wiki/Levenshtein_distance
>
> Has anyone used this before? Or written some code to use this in SPSS?
> Python or syntax is fine, I want to get a feel for how useful this will
> be compared to my current method.
>
> I'm generally only concerned with 1-4 words for different brands so text
> analysis would be an overkill here
>
> Thanks in advance
>
> Mike
>
> ________________________________________________________________________
> This e-mail has been scanned for all viruses by Star. The
> service is powered by MessageLabs. For more information on a proactive
> anti-virus service working around the clock, around the globe, visit:
> http://www.star.net.uk
> ________________________________________________________________________
>



--
Vlad Simion
Data Analyst
Tel:      +40 720130611
Reply | Threaded
Open this post in threaded view
|

Re: Levenshtein distance for string variables

Albert-Jan Roskam
Hi Michael,

The trans module of Python/SPSS offers NYSIIG,
soundex, and Levenshtein distance.
Check out chapter 18, p. 357 of the following:
http://www.spss.com/spss/SPSSdatamgmt_4e.pdf

Cheers!!
Albert-Jan

--- vlad simion <[hidden email]> wrote:

> Hi Michael,
>
> take a look at this examples from Ray's site:
>
>
http://www.spsstools.net/Scripts/Utils/LevenshteinDistance.txt
>
>
http://www.spsstools.net/Syntax/Strings/SoundexPhoneticComparison.txt

>
> hth,
>
> Vlad
>
> On 8/31/07, Michael Pearmain
> <[hidden email]> wrote:
> >
> > Hi All,
> >
> > I currently trying to update our open ended
> question algorithm for
> > survey work we carry out online.
> >
> > I'm currently using the Soundex algorithm, and
> then doing a few
> > manipulations for me to create the desired format.
>  This process works
> > well enough but I have recently been informed of
> another method which is
> > potentially better and that is of the Levenshtein
> distance algorithm.
> >
> > http://en.wikipedia.org/wiki/Levenshtein_distance
> >
> > Has anyone used this before? Or written some code
> to use this in SPSS?
> > Python or syntax is fine, I want to get a feel for
> how useful this will
> > be compared to my current method.
> >
> > I'm generally only concerned with 1-4 words for
> different brands so text
> > analysis would be an overkill here
> >
> > Thanks in advance
> >
> > Mike
> >
> >
>
________________________________________________________________________
> > This e-mail has been scanned for all viruses by
> Star. The
> > service is powered by MessageLabs. For more
> information on a proactive
> > anti-virus service working around the clock,
> around the globe, visit:
> > http://www.star.net.uk
> >
>
________________________________________________________________________
> >
>
>
>
> --
> Vlad Simion
> Data Analyst
> Tel:      +40 720130611
>


Cheers!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Did you know that 87.166253% of all statistics claim a precision of results that is not justified by the method employed? [HELMUT RICHTER]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



____________________________________________________________________________________
Take the Internet to Go: Yahoo!Go puts the Internet in your pocket: mail, news, photos & more.
http://mobile.yahoo.com/go?refer=1GNXIC