|
Hi,
Is there a syntax that can be used to match two or more stings and identify the commonality? For example: Var1 Var2 Comm 12345 2457 245 4 134 4 2567 37 7 234 56 Given Var1 and Var2, can Comm be done on SPSS? If it is doable, can a Var3 be added to do the match? Thank you very much! Jon ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hi Jon,
There are quite a few string similarity measures. The first thing that came to my mind was n-gram indexing. There's a Python module for that: http://pypi.python.org/pypi/ngram/2.0.0b2 Below is an spss syntax that creates a variable with all common units (in this case, digits). data list / var1 1-5 (a) var2 7-11 (a) comm 12-18 (a). begin data 12345 2457 245 4 134 4 2567 37 7 234 56 1234 1234 1234 end data. string comm2 (a8). loop #i = 1 to 5. loop #j = 1 to 5. if (substr(var1,#i,1) = substr(var2,#j,1) ) comm2 = concat(rtrim(comm2), substr(var1,#i,1)). end loop. end loop. exe. Cheers!! Albert-Jan --- Jon Oh <[hidden email]> wrote: > Hi, > > Is there a syntax that can be used to match two or > more stings and identify > the commonality? > For example: > > Var1 Var2 Comm > 12345 2457 245 > 4 134 4 > 2567 37 7 > 234 56 > > Given Var1 and Var2, can Comm be done on SPSS? If it > is doable, can a Var3 > be added to do the match? > > Thank you very much! > Jon > > ===================== > To manage your subscription to SPSSX-L, send a > message to > [hidden email] (not to SPSSX-L), with no > body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send > the command > INFO REFCARD > ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
This will work as long as there are no duplicated characters within a string (assuming that duplicates should not appear in the result).
A nice way to do this with Python would be to use its set methods. The result would just be the intersection of the sets formed by the characters of the two input strings. Duplicates would automatically be removed. Here's a fragment. If var1 = "12345" and var2 = "2457", the characters in common would just be "".join(set(var1).intersection(set(var2))) set(var1) creates a set with members "1", "2", etc. The intersection operator does the calculation, and "".join(...) recombines the members of the resulting set into a string. HTH, Jon Peck -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Albert-jan Roskam Sent: Sunday, May 04, 2008 4:38 AM To: [hidden email] Subject: Re: [SPSSX-L] string match Hi Jon, There are quite a few string similarity measures. The first thing that came to my mind was n-gram indexing. There's a Python module for that: http://pypi.python.org/pypi/ngram/2.0.0b2 Below is an spss syntax that creates a variable with all common units (in this case, digits). data list / var1 1-5 (a) var2 7-11 (a) comm 12-18 (a). begin data 12345 2457 245 4 134 4 2567 37 7 234 56 1234 1234 1234 end data. string comm2 (a8). loop #i = 1 to 5. loop #j = 1 to 5. if (substr(var1,#i,1) = substr(var2,#j,1) ) comm2 = concat(rtrim(comm2), substr(var1,#i,1)). end loop. end loop. exe. Cheers!! Albert-Jan --- Jon Oh <[hidden email]> wrote: > Hi, > > Is there a syntax that can be used to match two or > more stings and identify > the commonality? > For example: > > Var1 Var2 Comm > 12345 2457 245 > 4 134 4 > 2567 37 7 > 234 56 > > Given Var1 and Var2, can Comm be done on SPSS? If it > is doable, can a Var3 > be added to do the match? > > Thank you very much! > Jon > > ===================== > To manage your subscription to SPSSX-L, send a > message to > [hidden email] (not to SPSSX-L), with no > body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send > the command > INFO REFCARD > ____________________________________________________________________________________ Be a better friend, newshound, and know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
