SPSSX Discussion

how to check if a set is a subset of another?

Classic

List

Threaded

5 messages Options

albert_sun

how to check if a set is a subset of another?

hi,

I am seeking help on how to check if a set is a subset of another using SPSS syntax. For instance, I have two variables, one with value of "ABCD", and the other one with "ABD", and how to determine "ABD" is a subset of "ABCD"?

Thanks,

Rich Ulrich

Re: how to check if a set is a subset of another?

There are probably quick algorithms if your whole universe of items
consists of only four letters (A,B,C,D), and the order of them
is alphabetical or does not matter.

If other details don't tell me otherwise, I might start like this:
a) put the two sets in separate files (with ID info),
b) use VarsToCases to write two LONG-format files,
c) match using TABLE, noting IN= to get True for matches,
d) recreate the original file with extra info about matches,
e) apply the necessary logic to detect proper subsets.

--
Rich Ulrich

> Date: Sun, 16 Nov 2014 16:42:44 -0700

> From: [hidden email]
> Subject: how to check if a set is a subset of another?
> To: [hidden email]
>
> hi,
>
> I am seeking help on how to check if a set is a subset of another using SPSS
> syntax. For instance, I have two variables, one with value of "ABCD", and
> the other one with "ABD", and how to determine "ABD" is a subset of "ABCD"?
>
> Thanks,

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Adrian Barnett

Re: how to check if a set is a subset of another?

In reply to this post by albert_sun

Hi Albert

If the number of characters that have to match is not important, nor is their order, I think the char.substr function will do it for you.

So if you want to find out if any one of the characters in VAR2 appear in VAR1, you could use:

if ((char.substr(VAR2,VAR1,1) gt 0) IS_A_SUBSET = 1

Any value of IS_A_SUBSET equal to 1 indicates a match

If the pattern and length of subset is important, it would be harder, but, no doubt, doable with sufficient ingenuity (and, also no doubt, some Python)

Hope this helps

Regards,

Adrian
--
Adrian Barnett | "It's always the trombone player"
| (Faye Dunaway in 'The Arrangement')
Email: [hidden email]

From: albert_sun <[hidden email]>
To: [hidden email]
Sent: Monday, 17 November 2014, 10:12
Subject: how to check if a set is a subset of another?

hi,

I am seeking help on how to check if a set is a subset of another using SPSS
syntax. For instance, I have two variables, one with value of "ABCD", and
the other one with "ABD", and how to determine "ABD" is a subset of "ABCD"?

Thanks,

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/how-to-check-if-a-set-is-a-subset-of-another-tp5727958.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Jon K Peck

Re: how to check if a set is a subset of another?

This code checks whether the characters in the first variable are a subset of those in the second regardless of order or duplication. It requires the Python Essentials. Trailing blanks are ignored.

data list list/var1 var2(2a4).
begin data
abcd abcd
abc abcd
abcd abc
dcb bcd
end data.
dataset name data.

begin program.
def subset(var1, var2):
return set(list(var1.rstrip())).issubset(set(list(var2.rstrip())))
end program.

spssinc trans result=issubset
/formula "subset(var1, var2)".

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621

From: Adrian Barnett <[hidden email]>
To: [hidden email]
Date: 11/16/2014 07:22 PM
Subject: Re: [SPSSX-L] how to check if a set is a subset of another?
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Hi Albert
If the number of characters that have to match is not important, nor is their order, I think the char.substr function will do it for you.

So if you want to find out if any one of the characters in VAR2 appear in VAR1, you could use:

if ((char.substr(VAR2,VAR1,1) gt 0) IS_A_SUBSET = 1

Any value of IS_A_SUBSET equal to 1 indicates a match

If the pattern and length of subset is important, it would be harder, but, no doubt, doable with sufficient ingenuity (and, also no doubt, some Python)

Hope this helps

Regards,

Adrian
--
Adrian Barnett | "It's always the trombone player"
| (Faye Dunaway in 'The Arrangement')
Email: [hidden email]

From: albert_sun <[hidden email]>
To: [hidden email]
Sent: Monday, 17 November 2014, 10:12
Subject: how to check if a set is a subset of another?

hi,

I am seeking help on how to check if a set is a subset of another using SPSS
syntax. For instance, I have two variables, one with value of "ABCD", and
the other one with "ABD", and how to determine "ABD" is a subset of "ABCD"?

Thanks,

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/how-to-check-if-a-set-is-a-subset-of-another-tp5727958.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Andy W

Re: how to check if a set is a subset of another?

In reply to this post by albert_sun

Presuming 'ABD' is a subset of 'ABCD', looping over the shorter string and evaluating the whether each character is within the larger string seems like an ok approach to me.

***************************************.
DATA LIST FREE / S1 S2 (2A5).
BEGIN DATA
ABCD ABD
ACDE ZWX
AMRD RA
AMRD DA
AMRD AD
END DATA.

LOOP #i = 1 TO LENGTH(RTRIM(S2)).
COMPUTE Check = (CHAR.INDEX(RTRIM(S1),CHAR.SUBSTR(RTRIM(S2),#i,1)) > 0).
END LOOP IF Check = 0.
EXECUTE.
***************************************.

Another useful tool (although I'm guessing the above is sufficient) are string edit distances, like Levenshtein. Say if for some reason 'AB' is a subset of 'ABCD', but you did not want to include 'BA' as a substring. A way to do it would be to calculate the Levenshtein distance between the two strings. In the case of subsets it would simply be the difference in the string lengths, as one only need to delete certain characters to turn the larger string into the substring. So the distance between 'AB' and 'ABCD' would be 2, but 'BA' would be 3.

Here I use the extendedTransforms with SPSSINC TRANS to calculate the distance as an example.

***************************************.
BEGIN PROGRAM Python.
import extendedTransforms as et
x = 'ABCD'
y1 = 'BAD'
y2 = 'ABD'
print et.levenshteindistance(x,y1)
print et.levenshteindistance(y2,x)
END PROGRAM.

SPSSINC TRANS RESULT=Dist TYPE=0
/FORMULA extendedTransforms.levenshteindistance(S1,S2).
COMPUTE #D = LENGTH(RTRIM(S1)) - LENGTH(RTRIM(S2)).
COMPUTE Check2 = (Dist <= #D).
EXECUTE.
***************************************.

I imagine there are other edit distances one could use to evaluate subsets of differing definitions.

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/