Hi All,
I have data that looks like this ID Var1 Var2 Var3 1 string string 2 3 4 |
Accidental send...continuing: My goal is to identify all possible combinations of these string variables and eventually count the cases for each.
ID Var1 Var2 Var3 1 string1 string2 string3 2 string2 string5 string1
On Thu, Sep 19, 2013 at 9:01 AM, Peter Spangler <[hidden email]> wrote:
|
Administrator
|
Maybe provide a specific concrete example of what you are looking for.
Sample input data? Sample of desired output data? All possible combinations or all existing combinations? -----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
Here is a way to generate all existing PAIRS!
DATA LIST LIST / Var01 TO Var03 (3a8). BEGIN DATA a b c a d e b f g a c f d e f j k l END DATA. LIST. COMPUTE ID=$CASENUM. STRING element1 element2 (A8). VECTOR vars=Var01 TO Var03. LOOP #=1 TO 2. + LOOP ##=#+1 TO 3. + COMPUTE element1=vars(#). + COMPUTE element2=Vars(##). + XSAVE OUTFILE "C:\TEMP\Pairs.sav" / KEEP ID element1 element2. + END LOOP. END LOOP. EXECUTE. GET FILE "C:\TEMP\Pairs.sav" . DATASET DECLARE aggpairs . AGGREGATE OUTFILE aggpairs / BREAK element1 element2 / NPair=N. DATASET ACTIVATE aggpairs. LIST. element1 element2 NPair a b 1 a c 2 a d 1 a e 1 a f 1 b c 1 b f 1 b g 1 c f 1 d e 2 d f 1 e f 1 f g 1 j k 1 j l 1 k l 1 Number of cases read: 16 Number of cases listed: 16
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by David Marso
Sample Input data: ID category1 category2 category3 1 clothing mens clothes shirts 2 accessories womens handbags
3 clothing mens clothes shirts Output. I realize I can create a variable to concatenate category1:category3, however, I would prefer to have a list of distinct combinations with counts of IDs in that combination.
Distinct Combination Count clothing mens clothes shirts 2 accessories womens handbags 1 On Thu, Sep 19, 2013 at 9:39 AM, David Marso <[hidden email]> wrote: Maybe provide a specific *concrete *example of what you are looking for. |
Concatenate the variables then aggregate
on the concatenated variables, getting the count for each aggregate category.
Rick Oliver Senior Information Developer IBM Business Analytics (SPSS) E-mail: [hidden email] From: Peter Spangler <[hidden email]> To: [hidden email], Date: 09/19/2013 12:30 PM Subject: Re: Variable combinations Sent by: "SPSSX(r) Discussion" <[hidden email]> Sample Input data: ID category1 category2 category3 1 clothing mens clothes shirts 2 accessories womens handbags 3 clothing mens clothes shirts Output. I realize I can create a variable to concatenate category1:category3, however, I would prefer to have a list of distinct combinations with counts of IDs in that combination. Distinct Combination Count clothing mens clothes shirts 2 accessories womens handbags 1 On Thu, Sep 19, 2013 at 9:39 AM, David Marso <david.marso@...> wrote: Maybe provide a specific *concrete *example of what you are looking for. Sample *input *data? Sample of desired *output *data? All *possible *combinations or all *existing *combinations? ----- Peter Spangler wrote > Accidental send...continuing: My goal is to identify all possible > combinations of these string variables and eventually count the cases for > each. > > ID Var1 Var2 Var3 > 1 string1 string2 string3 > 2 string2 string5 string1 > > > > On Thu, Sep 19, 2013 at 9:01 AM, Peter Spangler < > pspangler@ > >wrote: > >> Hi All, >> >> I have data that looks like this >> >> ID Var1 Var2 Var3 >> 1 string string >> 2 >> 3 >> 4 >> ----- Please reply to the list and not to my personal email. Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Variable-combinations-tp5722111p5722114.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Peter Spangler
data list list (",") /id (f1)
category1 category2 category3 (3a20).
begin data 1,clothing,mens clothes,shirts 2,accessories,womens,handbags 3,clothing,mens clothes,shirts end data. string newvar (a30). compute newvar=concat(rtrim(category1), " ", rtrim(category2), " ", category3). DATASET DECLARE aggfile. AGGREGATE /OUTFILE='aggfile' /BREAK=newvar /Count=N. Rick Oliver Senior Information Developer IBM Business Analytics (SPSS) E-mail: [hidden email] From: Peter Spangler <[hidden email]> To: [hidden email], Date: 09/19/2013 12:30 PM Subject: Re: Variable combinations Sent by: "SPSSX(r) Discussion" <[hidden email]> Sample Input data: ID category1 category2 category3 1 clothing mens clothes shirts 2 accessories womens handbags 3 clothing mens clothes shirts Output. I realize I can create a variable to concatenate category1:category3, however, I would prefer to have a list of distinct combinations with counts of IDs in that combination. Distinct Combination Count clothing mens clothes shirts 2 accessories womens handbags 1 On Thu, Sep 19, 2013 at 9:39 AM, David Marso <david.marso@...> wrote: Maybe provide a specific *concrete *example of what you are looking for. Sample *input *data? Sample of desired *output *data? All *possible *combinations or all *existing *combinations? ----- Peter Spangler wrote > Accidental send...continuing: My goal is to identify all possible > combinations of these string variables and eventually count the cases for > each. > > ID Var1 Var2 Var3 > 1 string1 string2 string3 > 2 string2 string5 string1 > > > > On Thu, Sep 19, 2013 at 9:01 AM, Peter Spangler < > pspangler@ > >wrote: > >> Hi All, >> >> I have data that looks like this >> >> ID Var1 Var2 Var3 >> 1 string string >> 2 >> 3 >> 4 >> ----- Please reply to the list and not to my personal email. Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Variable-combinations-tp5722111p5722114.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Rick Oliver-3
That will only work if order matters. Otherwise
the fields need to be sorted within case first.
Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Rick Oliver/Chicago/IBM@IBMUS To: [hidden email], Date: 09/19/2013 11:42 AM Subject: Re: [SPSSX-L] Variable combinations Sent by: "SPSSX(r) Discussion" <[hidden email]> Concatenate the variables then aggregate on the concatenated variables, getting the count for each aggregate category. Rick Oliver Senior Information Developer IBM Business Analytics (SPSS) E-mail: [hidden email] From: Peter Spangler <[hidden email]> To: [hidden email], Date: 09/19/2013 12:30 PM Subject: Re: Variable combinations Sent by: "SPSSX(r) Discussion" <[hidden email]> Sample Input data: ID category1 category2 category3 1 clothing mens clothes shirts 2 accessories womens handbags 3 clothing mens clothes shirts Output. I realize I can create a variable to concatenate category1:category3, however, I would prefer to have a list of distinct combinations with counts of IDs in that combination. Distinct Combination Count clothing mens clothes shirts 2 accessories womens handbags 1 On Thu, Sep 19, 2013 at 9:39 AM, David Marso <david.marso@...> wrote: Maybe provide a specific *concrete *example of what you are looking for. Sample *input *data? Sample of desired *output *data? All *possible *combinations or all *existing *combinations? ----- Peter Spangler wrote > Accidental send...continuing: My goal is to identify all possible > combinations of these string variables and eventually count the cases for > each. > > ID Var1 Var2 Var3 > 1 string1 string2 string3 > 2 string2 string5 string1 > > > > On Thu, Sep 19, 2013 at 9:01 AM, Peter Spangler < > pspangler@ > >wrote: > >> Hi All, >> >> I have data that looks like this >> >> ID Var1 Var2 Var3 >> 1 string string >> 2 >> 3 >> 4 >> ----- Please reply to the list and not to my personal email. Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Variable-combinations-tp5722111p5722114.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
And given that the OP used the word "combinations", one would guess that order does not matter. Before Jon posts his Python solution... ;-)
new file. dataset close all. data list list (",") /id (f1) category1 category2 category3 (3a20). begin data 1,clothing,mens clothes,shirts 2,accessories,womens,handbags 3,mens clothes,shirts,clothing end data. * OP said "combinations", which suggests order does not matter. * So as Jon P noted, one must sort within cases first. * No doubt, there is a Python solution to that problem, * but here's a native SPSS method that's fairly transparent. * Restructure from WIDE to LONG file format. VARSTOCASES /MAKE category FROM category1 category2 category3 /INDEX=CatNum(3) /KEEP=id . SORT CASES by ID category. * Compute a new index variable for the new sort order. IF $casenum EQ 1 or ID NE LAG(ID) NewIndex = 1. IF MISSING(NewIndex) NewIndex = LAG(NewIndex) + 1. FORMATS NewIndex (f2.0). EXECUTE. * Return to original file structure. CASESTOVARS /ID=id /INDEX=NewIndex /GROUPBY=VARIABLE /SEPARATOR="" /DROP = CatNum . DATASET NAME original. * Now use Rick's method. string newvar (a30). compute newvar=concat(rtrim(category1), ", ", rtrim(category2), ", ", category3). DATASET DECLARE aggfile. AGGREGATE /OUTFILE='aggfile' /BREAK=newvar /Count=N. DATASET ACTIVATE aggfile. LIST. OUTPUT: newvar Count accessories, handbags, womens 1 clothing, mens clothes, shirts 2 HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Sweet. I like it.
Rick Oliver Senior Information Developer IBM Business Analytics (SPSS) E-mail: [hidden email] From: Bruce Weaver <[hidden email]> To: [hidden email], Date: 09/19/2013 03:42 PM Subject: Re: Variable combinations Sent by: "SPSSX(r) Discussion" <[hidden email]> And given that the OP used the word "combinations", one would guess that order does not matter. Before Jon posts his Python solution... ;-) new file. dataset close all. data list list (",") /id (f1) category1 category2 category3 (3a20). begin data 1,clothing,mens clothes,shirts 2,accessories,womens,handbags 3,mens clothes,shirts,clothing end data. * OP said "combinations", which suggests order does not matter. * So as Jon P noted, one must sort within cases first. * No doubt, there is a Python solution to that problem, * but here's a native SPSS method that's fairly transparent. * Restructure from WIDE to LONG file format. VARSTOCASES /MAKE category FROM category1 category2 category3 /INDEX=CatNum(3) /KEEP=id . SORT CASES by ID category. * Compute a new index variable for the new sort order. IF $casenum EQ 1 or ID NE LAG(ID) NewIndex = 1. IF MISSING(NewIndex) NewIndex = LAG(NewIndex) + 1. FORMATS NewIndex (f2.0). EXECUTE. * Return to original file structure. CASESTOVARS /ID=id /INDEX=NewIndex /GROUPBY=VARIABLE /SEPARATOR="" /DROP = CatNum . DATASET NAME original. * Now use Rick's method. string newvar (a30). compute newvar=concat(rtrim(category1), ", ", rtrim(category2), ", ", category3). DATASET DECLARE aggfile. AGGREGATE /OUTFILE='aggfile' /BREAK=newvar /Count=N. DATASET ACTIVATE aggfile. LIST. OUTPUT: newvar Count accessories, handbags, womens 1 clothing, mens clothes, shirts 2 HTH. Jon K Peck wrote > That will only work if order matters. Otherwise the fields need to be > sorted within case first. > > > Jon Peck (no "h") aka Kim > Senior Software Engineer, IBM > peck@.ibm > phone: 720-342-5621 > > > > > From: Rick Oliver/Chicago/IBM@IBMUS > To: > SPSSX-L@.uga > , > Date: 09/19/2013 11:42 AM > Subject: Re: [SPSSX-L] Variable combinations > Sent by: "SPSSX(r) Discussion" < > SPSSX-L@.uga > > > > > > Concatenate the variables then aggregate on the concatenated variables, > getting the count for each aggregate category. > > Rick Oliver > Senior Information Developer > IBM Business Analytics (SPSS) > E-mail: > oliverr@.ibm > > > > From: Peter Spangler < > pspangler@ > > > To: > SPSSX-L@.uga > , > Date: 09/19/2013 12:30 PM > Subject: Re: Variable combinations > Sent by: "SPSSX(r) Discussion" < > SPSSX-L@.uga > > > > > > Sample Input data: > > ID category1 category2 category3 > 1 clothing mens clothes shirts > 2 accessories womens handbags > 3 clothing mens clothes shirts > > Output. I realize I can create a variable to concatenate > category1:category3, however, I would prefer to have a list of distinct > combinations with counts of IDs in that combination. > > Distinct Combination Count > clothing mens clothes shirts 2 > accessories womens handbags 1 > > > On Thu, Sep 19, 2013 at 9:39 AM, David Marso < > david.marso@ > > > wrote: > Maybe provide a specific *concrete *example of what you are looking for. > Sample *input *data? > Sample of desired *output *data? > All *possible *combinations or all *existing *combinations? > ----- > > Peter Spangler wrote >> Accidental send...continuing: My goal is to identify all possible >> combinations of these string variables and eventually count the cases > for >> each. >> >> ID Var1 Var2 Var3 >> 1 string1 string2 string3 >> 2 string2 string5 string1 >> >> >> >> On Thu, Sep 19, 2013 at 9:01 AM, Peter Spangler < > >> pspangler@ > >> >wrote: >> >>> Hi All, >>> >>> I have data that looks like this >>> >>> ID Var1 Var2 Var3 >>> 1 string string >>> 2 >>> 3 >>> 4 >>> > > > > > > ----- > Please reply to the list and not to my personal email. > Those desiring my consulting or training services please feel free to > email me. > --- > "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos > ne forte conculcent eas pedibus suis." > Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in > abyssum?" > -- > View this message in context: > http://spssx-discussion.1045642.n5.nabble.com/Variable-combinations-tp5722111p5722114.html > > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Variable-combinations-tp5722111p5722120.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by Bruce Weaver
I would AGGREGATE first (break on the category variables and get N).
VARSTOCASES. SORT. CASESTOVARS. AGGREGATE again. Since OP said he wanted individual variable rather than concatenated, skip Rick's step. Reason? I suspect AGGREGATE will be a lot faster than VARSTOCASES on the whole file. Squish it, butcher it, squish it again ;-)
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Bruce Weaver
Assuming the OP wants
combinations that occur
and that (s)he may mean permutations
data list list (",") /id (f1) category1 category2 category3 (3a20). begin data 1,clothing,mens clothes,shirts 2,accessories,womens,handbags 3,mens clothes,shirts,clothing end data. autorecode variables = category1 category2 category3 /into nvar1 to nvar3/ group /print. mult response groups=cats 'categories' (nvar1 to nvar3 (1,6)) /frequencies = cats /tables = cats by cats by cats. Art Kendall Social Research ConsultantsOn 9/19/2013 4:41 PM, Bruce Weaver [via SPSSX Discussion] wrote: data list list (",") /id (f1) category1 category2 category3 (3a20).
Art Kendall
Social Research Consultants |
In reply to this post by Bruce Weaver
Well, since you mentioned it, just the
sorting part :-)
spssinc trans result = Var01 to Var03 type=8 (or whatever string size is appropriate) /formula "sorted([Var01, Var02, Var03])" Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Bruce Weaver <[hidden email]> To: [hidden email], Date: 09/19/2013 02:41 PM Subject: Re: [SPSSX-L] Variable combinations Sent by: "SPSSX(r) Discussion" <[hidden email]> And given that the OP used the word "combinations", one would guess that order does not matter. Before Jon posts his Python solution... ;-) new file. dataset close all. data list list (",") /id (f1) category1 category2 category3 (3a20). begin data 1,clothing,mens clothes,shirts 2,accessories,womens,handbags 3,mens clothes,shirts,clothing end data. * OP said "combinations", which suggests order does not matter. * So as Jon P noted, one must sort within cases first. * No doubt, there is a Python solution to that problem, * but here's a native SPSS method that's fairly transparent. * Restructure from WIDE to LONG file format. VARSTOCASES /MAKE category FROM category1 category2 category3 /INDEX=CatNum(3) /KEEP=id . SORT CASES by ID category. * Compute a new index variable for the new sort order. IF $casenum EQ 1 or ID NE LAG(ID) NewIndex = 1. IF MISSING(NewIndex) NewIndex = LAG(NewIndex) + 1. FORMATS NewIndex (f2.0). EXECUTE. * Return to original file structure. CASESTOVARS /ID=id /INDEX=NewIndex /GROUPBY=VARIABLE /SEPARATOR="" /DROP = CatNum . DATASET NAME original. * Now use Rick's method. string newvar (a30). compute newvar=concat(rtrim(category1), ", ", rtrim(category2), ", ", category3). DATASET DECLARE aggfile. AGGREGATE /OUTFILE='aggfile' /BREAK=newvar /Count=N. DATASET ACTIVATE aggfile. LIST. OUTPUT: newvar Count accessories, handbags, womens 1 clothing, mens clothes, shirts 2 HTH. Jon K Peck wrote > That will only work if order matters. Otherwise the fields need to be > sorted within case first. > > > Jon Peck (no "h") aka Kim > Senior Software Engineer, IBM > peck@.ibm > phone: 720-342-5621 > > > > > From: Rick Oliver/Chicago/IBM@IBMUS > To: > SPSSX-L@.uga > , > Date: 09/19/2013 11:42 AM > Subject: Re: [SPSSX-L] Variable combinations > Sent by: "SPSSX(r) Discussion" < > SPSSX-L@.uga > > > > > > Concatenate the variables then aggregate on the concatenated variables, > getting the count for each aggregate category. > > Rick Oliver > Senior Information Developer > IBM Business Analytics (SPSS) > E-mail: > oliverr@.ibm > > > > From: Peter Spangler < > pspangler@ > > > To: > SPSSX-L@.uga > , > Date: 09/19/2013 12:30 PM > Subject: Re: Variable combinations > Sent by: "SPSSX(r) Discussion" < > SPSSX-L@.uga > > > > > > Sample Input data: > > ID category1 category2 category3 > 1 clothing mens clothes shirts > 2 accessories womens handbags > 3 clothing mens clothes shirts > > Output. I realize I can create a variable to concatenate > category1:category3, however, I would prefer to have a list of distinct > combinations with counts of IDs in that combination. > > Distinct Combination Count > clothing mens clothes shirts 2 > accessories womens handbags 1 > > > On Thu, Sep 19, 2013 at 9:39 AM, David Marso < > david.marso@ > > > wrote: > Maybe provide a specific *concrete *example of what you are looking for. > Sample *input *data? > Sample of desired *output *data? > All *possible *combinations or all *existing *combinations? > ----- > > Peter Spangler wrote >> Accidental send...continuing: My goal is to identify all possible >> combinations of these string variables and eventually count the cases > for >> each. >> >> ID Var1 Var2 Var3 >> 1 string1 string2 string3 >> 2 string2 string5 string1 >> >> >> >> On Thu, Sep 19, 2013 at 9:01 AM, Peter Spangler < > >> pspangler@ > >> >wrote: >> >>> Hi All, >>> >>> I have data that looks like this >>> >>> ID Var1 Var2 Var3 >>> 1 string string >>> 2 >>> 3 >>> 4 >>> > > > > > > ----- > Please reply to the list and not to my personal email. > Those desiring my consulting or training services please feel free to > email me. > --- > "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos > ne forte conculcent eas pedibus suis." > Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in > abyssum?" > -- > View this message in context: > http://spssx-discussion.1045642.n5.nabble.com/Variable-combinations-tp5722111p5722114.html > > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Variable-combinations-tp5722111p5722120.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by David Marso
Ah. Yes. The OP's desired result looked
like one variable that combined the three original variables. If you want
to preserve the original three variables, then you just use all three as
break variables. No need to concatenate.
Rick Oliver Senior Information Developer IBM Business Analytics (SPSS) E-mail: [hidden email] From: David Marso <[hidden email]> To: [hidden email], Date: 09/19/2013 04:08 PM Subject: Re: Variable combinations Sent by: "SPSSX(r) Discussion" <[hidden email]> I would AGGREGATE first (break on the category variables and get N). VARSTOCASES. SORT. CASESTOVARS. AGGREGATE again. Since OP said he wanted individual variable rather than concatenated, skip Rick's step. Reason? I suspect AGGREGATE will be a lot faster than VARSTOCASES on the whole file. Squish it, butcher it, squish it again ;-) Bruce Weaver wrote > And given that the OP used the word "combinations", one would guess that > order does not matter. Before Jon posts his Python solution... ;-) > > new file. > dataset close all. > > data list list (",") /id (f1) category1 category2 category3 (3a20). > begin data > 1,clothing,mens clothes,shirts > 2,accessories,womens,handbags > 3,mens clothes,shirts,clothing > end data. > > * OP said "combinations", which suggests order does not matter. > * So as Jon P noted, one must sort within cases first. > * No doubt, there is a Python solution to that problem, > * but here's a native SPSS method that's fairly transparent. > > * Restructure from WIDE to LONG file format. > VARSTOCASES > /MAKE category FROM category1 category2 category3 > /INDEX=CatNum(3) > /KEEP=id . > > SORT CASES by ID category. > > * Compute a new index variable for the new sort order. > IF $casenum EQ 1 or ID NE LAG(ID) NewIndex = 1. > IF MISSING(NewIndex) NewIndex = LAG(NewIndex) + 1. > FORMATS NewIndex (f2.0). > EXECUTE. > > * Return to original file structure. > CASESTOVARS > /ID=id > /INDEX=NewIndex > /GROUPBY=VARIABLE > /SEPARATOR="" > /DROP = CatNum > . > DATASET NAME original. > > * Now use Rick's method. > string newvar (a30). > compute newvar=concat(rtrim(category1), ", ", rtrim(category2), ", ", > category3). > DATASET DECLARE aggfile. > AGGREGATE > /OUTFILE='aggfile' > /BREAK=newvar > /Count=N. > DATASET ACTIVATE aggfile. > LIST. > > OUTPUT: > > newvar Count > accessories, handbags, womens 1 > clothing, mens clothes, shirts 2 > > HTH. > > Jon K Peck wrote >> That will only work if order matters. Otherwise the fields need to be >> sorted within case first. >> >> >> Jon Peck (no "h") aka Kim >> Senior Software Engineer, IBM >> peck@.ibm >> phone: 720-342-5621 >> >> >> >> >> From: Rick Oliver/Chicago/IBM@IBMUS >> To: >> SPSSX-L@.uga >> , >> Date: 09/19/2013 11:42 AM >> Subject: Re: [SPSSX-L] Variable combinations >> Sent by: "SPSSX(r) Discussion" < >> SPSSX-L@.uga >> > >> >> >> >> Concatenate the variables then aggregate on the concatenated variables, >> getting the count for each aggregate category. >> >> Rick Oliver >> Senior Information Developer >> IBM Business Analytics (SPSS) >> E-mail: >> oliverr@.ibm >> >> >> >> From: Peter Spangler < >> pspangler@ >> > >> To: >> SPSSX-L@.uga >> , >> Date: 09/19/2013 12:30 PM >> Subject: Re: Variable combinations >> Sent by: "SPSSX(r) Discussion" < >> SPSSX-L@.uga >> > >> >> >> >> Sample Input data: >> >> ID category1 category2 category3 >> 1 clothing mens clothes shirts >> 2 accessories womens handbags >> 3 clothing mens clothes shirts >> >> Output. I realize I can create a variable to concatenate >> category1:category3, however, I would prefer to have a list of distinct >> combinations with counts of IDs in that combination. >> >> Distinct Combination Count >> clothing mens clothes shirts 2 >> accessories womens handbags 1 >> >> >> On Thu, Sep 19, 2013 at 9:39 AM, David Marso < >> david.marso@ >> > >> wrote: >> Maybe provide a specific *concrete *example of what you are looking for. >> Sample *input *data? >> Sample of desired *output *data? >> All *possible *combinations or all *existing *combinations? >> ----- >> >> Peter Spangler wrote >>> Accidental send...continuing: My goal is to identify all possible >>> combinations of these string variables and eventually count the cases >> for >>> each. >>> >>> ID Var1 Var2 Var3 >>> 1 string1 string2 string3 >>> 2 string2 string5 string1 >>> >>> >>> >>> On Thu, Sep 19, 2013 at 9:01 AM, Peter Spangler < >> >>> pspangler@ >> >>> >wrote: >>> >>>> Hi All, >>>> >>>> I have data that looks like this >>>> >>>> ID Var1 Var2 Var3 >>>> 1 string string >>>> 2 >>>> 3 >>>> 4 >>>> >> >> >> >> >> >> ----- >> Please reply to the list and not to my personal email. >> Those desiring my consulting or training services please feel free to >> email me. >> --- >> "Nolite dare sanctum canibus neque mittatis margaritas vestras ante >> porcos >> ne forte conculcent eas pedibus suis." >> Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff >> in >> abyssum?" >> -- >> View this message in context: >> http://spssx-discussion.1045642.n5.nabble.com/Variable-combinations-tp5722111p5722114.html >> >> Sent from the SPSSX Discussion mailing list archive at Nabble.com. >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> LISTSERV@.UGA >> (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD ----- Please reply to the list and not to my personal email. Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Variable-combinations-tp5722111p5722122.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Jon K Peck
I have tried variations
on this with without square brackets, same variable names
in the result and in the list input to the sort, etc., but get
various error messages.
data list list (",") /id (f1) category1 category2 category3(3a20). begin data 1,clothing,mens clothes,shirts 2,accessories,womens,handbags 3,mens clothes,shirts,clothing end data. spssinc trans result = Var01 to Var03 type=20 /formula "sorted(Category1 TO Category3)". execute. Art Kendall Social Research ConsultantsOn 9/19/2013 5:29 PM, Jon K Peck [via SPSSX Discussion] wrote: Well, since you mentioned it, just the sorting part :-)
Art Kendall
Social Research Consultants |
You can't use TO directly in the formula,
because it is really Python notation. You can use it in a VARIABLES
subcommand on SPSSINC TRANS and then refer to those contents as <>
in the formula. Also, the sorted function requires a list,
which is why my examples is in square brackets.
Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Art Kendall <[hidden email]> To: [hidden email], Date: 09/19/2013 04:20 PM Subject: Re: [SPSSX-L] Variable combinations Sent by: "SPSSX(r) Discussion" <[hidden email]> I have tried variations on this with without square brackets, same variable names in the result and in the list input to the sort, etc., but get various error messages. data list list (",") /id (f1) category1 category2 category3(3a20). begin data 1,clothing,mens clothes,shirts 2,accessories,womens,handbags 3,mens clothes,shirts,clothing end data. spssinc trans result = Var01 to Var03 type=20 /formula "sorted(Category1 TO Category3)". execute. Art Kendall Social Research Consultants On 9/19/2013 5:29 PM, Jon K Peck [via SPSSX Discussion] wrote: Well, since you mentioned it, just the sorting part :-) spssinc trans result = Var01 to Var03 type=8 (or whatever string size is appropriate) /formula "sorted([Var01, Var02, Var03])" Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Bruce Weaver <[hidden email]> To: [hidden email], Date: 09/19/2013 02:41 PM Subject: Re: [SPSSX-L] Variable combinations Sent by: "SPSSX(r) Discussion" <[hidden email]> And given that the OP used the word "combinations", one would guess that order does not matter. Before Jon posts his Python solution... ;-) new file. dataset close all. data list list (",") /id (f1) category1 category2 category3 (3a20). begin data 1,clothing,mens clothes,shirts 2,accessories,womens,handbags 3,mens clothes,shirts,clothing end data. * OP said "combinations", which suggests order does not matter. * So as Jon P noted, one must sort within cases first. * No doubt, there is a Python solution to that problem, * but here's a native SPSS method that's fairly transparent. * Restructure from WIDE to LONG file format. VARSTOCASES /MAKE category FROM category1 category2 category3 /INDEX=CatNum(3) /KEEP=id . SORT CASES by ID category. * Compute a new index variable for the new sort order. IF $casenum EQ 1 or ID NE LAG(ID) NewIndex = 1. IF MISSING(NewIndex) NewIndex = LAG(NewIndex) + 1. FORMATS NewIndex (f2.0). EXECUTE. * Return to original file structure. CASESTOVARS /ID=id /INDEX=NewIndex /GROUPBY=VARIABLE /SEPARATOR="" /DROP = CatNum . DATASET NAME original. * Now use Rick's method. string newvar (a30). compute newvar=concat(rtrim(category1), ", ", rtrim(category2), ", ", category3). DATASET DECLARE aggfile. AGGREGATE /OUTFILE='aggfile' /BREAK=newvar /Count=N. DATASET ACTIVATE aggfile. LIST. OUTPUT: newvar Count accessories, handbags, womens 1 clothing, mens clothes, shirts 2 HTH. Jon K Peck wrote > That will only work if order matters. Otherwise the fields need to be > sorted within case first. > > > Jon Peck (no "h") aka Kim > Senior Software Engineer, IBM > [hidden email] > phone: 720-342-5621 > > > > > From: Rick Oliver/Chicago/IBM@IBMUS > To: > [hidden email] > , > Date: 09/19/2013 11:42 AM > Subject: Re: [SPSSX-L] Variable combinations > Sent by: "SPSSX(r) Discussion" < > [hidden email] > > > > > > Concatenate the variables then aggregate on the concatenated variables, > getting the count for each aggregate category. > > Rick Oliver > Senior Information Developer > IBM Business Analytics (SPSS) > E-mail: > [hidden email] > > > > From: Peter Spangler < > pspangler@ > > > To: > [hidden email] > , > Date: 09/19/2013 12:30 PM > Subject: Re: Variable combinations > Sent by: "SPSSX(r) Discussion" < > [hidden email] > > > > > > Sample Input data: > > ID category1 category2 category3 > 1 clothing mens clothes shirts > 2 accessories womens handbags > 3 clothing mens clothes shirts > > Output. I realize I can create a variable to concatenate > category1:category3, however, I would prefer to have a list of distinct > combinations with counts of IDs in that combination. > > Distinct Combination Count > clothing mens clothes shirts 2 > accessories womens handbags 1 > > > On Thu, Sep 19, 2013 at 9:39 AM, David Marso < > david.marso@ > > > wrote: > Maybe provide a specific *concrete *example of what you are looking for. > Sample *input *data? > Sample of desired *output *data? > All *possible *combinations or all *existing *combinations? > ----- > > Peter Spangler wrote >> Accidental send...continuing: My goal is to identify all possible >> combinations of these string variables and eventually count the cases > for >> each. >> >> ID Var1 Var2 Var3 >> 1 string1 string2 string3 >> 2 string2 string5 string1 >> >> >> >> On Thu, Sep 19, 2013 at 9:01 AM, Peter Spangler < > >> pspangler@ > >> >wrote: >> >>> Hi All, >>> >>> I have data that looks like this >>> >>> ID Var1 Var2 Var3 >>> 1 string string >>> 2 >>> 3 >>> 4 >>> > > > > > > ----- > Please reply to the list and not to my personal email. > Those desiring my consulting or training services please feel free to > email me. > --- > "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos > ne forte conculcent eas pedibus suis." > Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in > abyssum?" > -- > View this message in context: > http://spssx-discussion.1045642.n5.nabble.com/Variable-combinations-tp5722111p5722114.html > > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Variable-combinations-tp5722111p5722120.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD If you reply to this email, your message will be added to the discussion below: http://spssx-discussion.1045642.n5.nabble.com/Variable-combinations-tp5722111p5722124.html To start a new topic under SPSSX Discussion, email [hidden email] To unsubscribe from SPSSX Discussion, click here. NAML Art Kendall View this message in context: Re: Variable combinations Sent from the SPSSX Discussion mailing list archive at Nabble.com. |
In reply to this post by Rick Oliver-3
At 01:42 PM 9/19/2013, Rick Oliver wrote:
>Concatenate the variables then aggregate on the concatenated >variables, getting the count for each aggregate category. The thread has gone on beyond this, but I'd like to make one comment. If you take this approach (others have noted why it may not be the right approach), there's no need to catenate the variables. Instead of string newvar (a30). compute newvar=concat(rtrim(category1), " ", rtrim(category2), " ", category3). DATASET DECLARE aggfile. AGGREGATE /OUTFILE=aggfile /BREAK=newvar /Count=N. you'd write DATASET DECLARE aggfile. AGGREGATE /OUTFILE=aggfile /BREAK=category1 category2 category3 /Count=N. I've seen a lot of code that's over-complicated because of people forgetting that BY and BREAK clauses can take *sets* of variables. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
This is all such a great discussion/solution! Thank you all for your contributions. I'm running various examples now! On Fri, Sep 20, 2013 at 10:28 AM, Richard Ristow <[hidden email]> wrote:
|
You've got it.
Rick Oliver Senior Information Developer IBM Business Analytics (SPSS) E-mail: [hidden email] From: Peter Spangler <[hidden email]> To: [hidden email], Date: 09/20/2013 12:36 PM Subject: Re: Variable combinations Sent by: "SPSSX(r) Discussion" <[hidden email]> This is all such a great discussion/solution! Thank you all for your contributions. I'm running various examples now! On Fri, Sep 20, 2013 at 10:28 AM, Richard Ristow <wrristow@...> wrote: At 01:42 PM 9/19/2013, Rick Oliver wrote: Concatenate the variables then aggregate on the concatenated variables, getting the count for each aggregate category. The thread has gone on beyond this, but I'd like to make one comment. If you take this approach (others have noted why it may not be the right approach), there's no need to catenate the variables. Instead of string newvar (a30). compute newvar=concat(rtrim(category1), " ", rtrim(category2), " ", category3). DATASET DECLARE aggfile. AGGREGATE /OUTFILE=aggfile /BREAK=newvar /Count=N. you'd write DATASET DECLARE aggfile. AGGREGATE /OUTFILE=aggfile /BREAK=category1 category2 category3 /Count=N. I've seen a lot of code that's over-complicated because of people forgetting that BY and BREAK clauses can take *sets* of variables. ===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |