hi,
I am seeking help on how to check if a set is a subset of another using SPSS syntax. For instance, I have two variables, one with value of "ABCD", and the other one with "ABD", and how to determine "ABD" is a subset of "ABCD"? Thanks, |
There are probably quick algorithms if your whole universe of items
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
consists of only four letters (A,B,C,D), and the order of them is alphabetical or does not matter. If other details don't tell me otherwise, I might start like this: a) put the two sets in separate files (with ID info), b) use VarsToCases to write two LONG-format files, c) match using TABLE, noting IN= to get True for matches, d) recreate the original file with extra info about matches, e) apply the necessary logic to detect proper subsets. -- Rich Ulrich > Date: Sun, 16 Nov 2014 16:42:44 -0700 > From: [hidden email] > Subject: how to check if a set is a subset of another? > To: [hidden email] > > hi, > > I am seeking help on how to check if a set is a subset of another using SPSS > syntax. For instance, I have two variables, one with value of "ABCD", and > the other one with "ABD", and how to determine "ABD" is a subset of "ABCD"? > > Thanks, |
In reply to this post by albert_sun
Hi Albert If the number of characters that have to match is not important, nor is their order, I think the char.substr function will do it for you. So if you want to find out if any one of the characters in VAR2 appear in VAR1, you could use: if ((char.substr(VAR2,VAR1,1) gt 0) IS_A_SUBSET = 1 Any value of IS_A_SUBSET equal to 1 indicates a match If the pattern and length of subset is important, it would be harder, but, no doubt, doable with sufficient ingenuity (and, also no doubt, some Python) Hope this helps Regards, Adrian -- Adrian Barnett | "It's always the trombone player" | (Faye Dunaway in 'The Arrangement') Email: [hidden email]
|
This code checks whether the characters
in the first variable are a subset of those in the second regardless of
order or duplication. It requires the Python Essentials. Trailing
blanks are ignored.
data list list/var1 var2(2a4). begin data abcd abcd abc abcd abcd abc dcb bcd end data. dataset name data. begin program. def subset(var1, var2): return set(list(var1.rstrip())).issubset(set(list(var2.rstrip()))) end program. spssinc trans result=issubset /formula "subset(var1, var2)". Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Adrian Barnett <[hidden email]> To: [hidden email] Date: 11/16/2014 07:22 PM Subject: Re: [SPSSX-L] how to check if a set is a subset of another? Sent by: "SPSSX(r) Discussion" <[hidden email]> Hi Albert If the number of characters that have to match is not important, nor is their order, I think the char.substr function will do it for you. So if you want to find out if any one of the characters in VAR2 appear in VAR1, you could use: if ((char.substr(VAR2,VAR1,1) gt 0) IS_A_SUBSET = 1 Any value of IS_A_SUBSET equal to 1 indicates a match If the pattern and length of subset is important, it would be harder, but, no doubt, doable with sufficient ingenuity (and, also no doubt, some Python) Hope this helps Regards, Adrian -- Adrian Barnett | "It's always the trombone player" | (Faye Dunaway in 'The Arrangement') Email: [hidden email] From: albert_sun <[hidden email]> To: [hidden email] Sent: Monday, 17 November 2014, 10:12 Subject: how to check if a set is a subset of another? hi, I am seeking help on how to check if a set is a subset of another using SPSS syntax. For instance, I have two variables, one with value of "ABCD", and the other one with "ABD", and how to determine "ABD" is a subset of "ABCD"? Thanks, -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/how-to-check-if-a-set-is-a-subset-of-another-tp5727958.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by albert_sun
Presuming 'ABD' is a subset of 'ABCD', looping over the shorter string and evaluating the whether each character is within the larger string seems like an ok approach to me.
***************************************. DATA LIST FREE / S1 S2 (2A5). BEGIN DATA ABCD ABD ACDE ZWX AMRD RA AMRD DA AMRD AD END DATA. LOOP #i = 1 TO LENGTH(RTRIM(S2)). COMPUTE Check = (CHAR.INDEX(RTRIM(S1),CHAR.SUBSTR(RTRIM(S2),#i,1)) > 0). END LOOP IF Check = 0. EXECUTE. ***************************************. Another useful tool (although I'm guessing the above is sufficient) are string edit distances, like Levenshtein. Say if for some reason 'AB' is a subset of 'ABCD', but you did not want to include 'BA' as a substring. A way to do it would be to calculate the Levenshtein distance between the two strings. In the case of subsets it would simply be the difference in the string lengths, as one only need to delete certain characters to turn the larger string into the substring. So the distance between 'AB' and 'ABCD' would be 2, but 'BA' would be 3. Here I use the extendedTransforms with SPSSINC TRANS to calculate the distance as an example. ***************************************. BEGIN PROGRAM Python. import extendedTransforms as et x = 'ABCD' y1 = 'BAD' y2 = 'ABD' print et.levenshteindistance(x,y1) print et.levenshteindistance(y2,x) END PROGRAM. SPSSINC TRANS RESULT=Dist TYPE=0 /FORMULA extendedTransforms.levenshteindistance(S1,S2). COMPUTE #D = LENGTH(RTRIM(S1)) - LENGTH(RTRIM(S2)). COMPUTE Check2 = (Dist <= #D). EXECUTE. ***************************************. I imagine there are other edit distances one could use to evaluate subsets of differing definitions. |
Free forum by Nabble | Edit this page |