Variable combinations

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

Variable combinations

Peter Spangler
Hi All,

I have data that looks like this

ID   Var1  Var2  Var3
1    string  string
2
3
4
Reply | Threaded
Open this post in threaded view
|

Re: Variable combinations

Peter Spangler
Accidental send...continuing: My goal is to identify all possible combinations of these string variables and eventually count the cases for each.

ID   Var1  Var2  Var3           
1    string1  string2 string3  
2   string2   string5  string1



On Thu, Sep 19, 2013 at 9:01 AM, Peter Spangler <[hidden email]> wrote:
Hi All,

I have data that looks like this

ID   Var1  Var2  Var3
1    string  string
2
3
4

Reply | Threaded
Open this post in threaded view
|

Re: Variable combinations

David Marso
Administrator
Maybe provide a specific concrete example of what you are looking for.
Sample input data?
Sample of desired output data?
All possible combinations or all existing combinations?
-----
Peter Spangler wrote
Accidental send...continuing: My goal is to identify all possible
combinations of these string variables and eventually count the cases for
each.

ID   Var1  Var2  Var3
1    string1  string2 string3
2   string2   string5  string1



On Thu, Sep 19, 2013 at 9:01 AM, Peter Spangler <[hidden email]>wrote:

> Hi All,
>
> I have data that looks like this
>
> ID   Var1  Var2  Var3
> 1    string  string
> 2
> 3
> 4
>
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Variable combinations

David Marso
Administrator
Here is a way to generate all existing PAIRS!
DATA LIST LIST / Var01 TO Var03 (3a8).
BEGIN DATA
a b c
a d e
b f g
a c f
d e f
j k l
END DATA.
LIST.
COMPUTE ID=$CASENUM.
STRING element1 element2 (A8).
VECTOR vars=Var01 TO Var03.
LOOP #=1 TO 2.
+  LOOP ##=#+1 TO 3.
+     COMPUTE element1=vars(#).
+     COMPUTE element2=Vars(##).
+     XSAVE OUTFILE "C:\TEMP\Pairs.sav" / KEEP ID element1 element2.
+  END LOOP.
END LOOP.
EXECUTE.
GET FILE  "C:\TEMP\Pairs.sav" .
DATASET DECLARE aggpairs .
AGGREGATE OUTFILE aggpairs
 / BREAK element1 element2
 / NPair=N.
DATASET ACTIVATE aggpairs.
LIST.





element1 element2   NPair

a        b              1
a        c              2
a        d              1
a        e              1
a        f              1
b        c              1
b        f              1
b        g              1
c        f              1
d        e              2
d        f              1
e        f              1
f        g              1
j        k              1
j        l              1
k        l              1


Number of cases read:  16    Number of cases listed:  16
David Marso wrote
Maybe provide a specific concrete example of what you are looking for.
Sample input data?
Sample of desired output data?
All possible combinations or all existing combinations?
-----
Peter Spangler wrote
Accidental send...continuing: My goal is to identify all possible
combinations of these string variables and eventually count the cases for
each.

ID   Var1  Var2  Var3
1    string1  string2 string3
2   string2   string5  string1



On Thu, Sep 19, 2013 at 9:01 AM, Peter Spangler <[hidden email]>wrote:

> Hi All,
>
> I have data that looks like this
>
> ID   Var1  Var2  Var3
> 1    string  string
> 2
> 3
> 4
>
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Variable combinations

Peter Spangler
In reply to this post by David Marso
Sample Input data: 

ID   category1      category2           category3
1    clothing         mens clothes      shirts
2    accessories   womens             handbags
3    clothing         mens clothes      shirts 

Output. I realize I can create a variable to concatenate category1:category3, however, I would prefer to have a list of distinct combinations with counts of IDs in that combination. 

Distinct Combination                               Count
clothing mens clothes shirts                       2
accessories womens handbags                  1


On Thu, Sep 19, 2013 at 9:39 AM, David Marso <[hidden email]> wrote:
Maybe provide a specific *concrete *example of what you are looking for.
Sample *input *data?
Sample of desired *output *data?
All *possible *combinations or all *existing *combinations?
-----

Peter Spangler wrote
> Accidental send...continuing: My goal is to identify all possible
> combinations of these string variables and eventually count the cases for
> each.
>
> ID   Var1  Var2  Var3
> 1    string1  string2 string3
> 2   string2   string5  string1
>
>
>
> On Thu, Sep 19, 2013 at 9:01 AM, Peter Spangler &lt;

> pspangler@

> &gt;wrote:
>
>> Hi All,
>>
>> I have data that looks like this
>>
>> ID   Var1  Var2  Var3
>> 1    string  string
>> 2
>> 3
>> 4
>>





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Variable-combinations-tp5722111p5722114.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Variable combinations

Rick Oliver-3
Concatenate the variables then aggregate on the concatenated variables, getting the count for each aggregate category.

Rick Oliver
Senior Information Developer
IBM Business Analytics (SPSS)
E-mail: [hidden email]




From:        Peter Spangler <[hidden email]>
To:        [hidden email],
Date:        09/19/2013 12:30 PM
Subject:        Re: Variable combinations
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Sample Input data: 

ID   category1      category2           category3
1    clothing         mens clothes      shirts
2    accessories   womens             handbags
3    clothing         mens clothes      shirts 

Output. I realize I can create a variable to concatenate category1:category3, however, I would prefer to have a list of distinct combinations with counts of IDs in that combination. 

Distinct Combination                               Count
clothing mens clothes shirts                       2
accessories womens handbags                  1


On Thu, Sep 19, 2013 at 9:39 AM, David Marso <david.marso@...> wrote:
Maybe provide a specific *concrete *example of what you are looking for.
Sample *input *data?
Sample of desired *output *data?
All *possible *combinations or all *existing *combinations?
-----

Peter Spangler wrote

> Accidental send...continuing: My goal is to identify all possible
> combinations of these string variables and eventually count the cases for
> each.
>
> ID   Var1  Var2  Var3
> 1    string1  string2 string3
> 2   string2   string5  string1
>
>
>

> On Thu, Sep 19, 2013 at 9:01 AM, Peter Spangler &lt;

> pspangler@

> &gt;wrote:

>
>> Hi All,
>>
>> I have data that looks like this
>>
>> ID   Var1  Var2  Var3
>> 1    string  string
>> 2
>> 3
>> 4
>>





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Variable-combinations-tp5722111p5722114.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to

LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Variable combinations

Rick Oliver-3
In reply to this post by Peter Spangler
data list list (",") /id (f1) category1 category2 category3 (3a20).
begin data
1,clothing,mens clothes,shirts
2,accessories,womens,handbags
3,clothing,mens clothes,shirts
end data.
string newvar (a30).
compute newvar=concat(rtrim(category1), " ", rtrim(category2), " ", category3).
DATASET DECLARE aggfile.
AGGREGATE
  /OUTFILE='aggfile'
  /BREAK=newvar
  /Count=N.


Rick Oliver
Senior Information Developer
IBM Business Analytics (SPSS)
E-mail: [hidden email]




From:        Peter Spangler <[hidden email]>
To:        [hidden email],
Date:        09/19/2013 12:30 PM
Subject:        Re: Variable combinations
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Sample Input data: 

ID   category1      category2           category3
1    clothing         mens clothes      shirts
2    accessories   womens             handbags
3    clothing         mens clothes      shirts 

Output. I realize I can create a variable to concatenate category1:category3, however, I would prefer to have a list of distinct combinations with counts of IDs in that combination. 

Distinct Combination                               Count
clothing mens clothes shirts                       2
accessories womens handbags                  1


On Thu, Sep 19, 2013 at 9:39 AM, David Marso <david.marso@...> wrote:
Maybe provide a specific *concrete *example of what you are looking for.
Sample *input *data?
Sample of desired *output *data?
All *possible *combinations or all *existing *combinations?
-----

Peter Spangler wrote

> Accidental send...continuing: My goal is to identify all possible
> combinations of these string variables and eventually count the cases for
> each.
>
> ID   Var1  Var2  Var3
> 1    string1  string2 string3
> 2   string2   string5  string1
>
>
>

> On Thu, Sep 19, 2013 at 9:01 AM, Peter Spangler &lt;

> pspangler@

> &gt;wrote:

>
>> Hi All,
>>
>> I have data that looks like this
>>
>> ID   Var1  Var2  Var3
>> 1    string  string
>> 2
>> 3
>> 4
>>





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Variable-combinations-tp5722111p5722114.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to

LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Variable combinations

Jon K Peck
In reply to this post by Rick Oliver-3
That will only work if order matters.  Otherwise the fields need to be sorted within case first.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Rick Oliver/Chicago/IBM@IBMUS
To:        [hidden email],
Date:        09/19/2013 11:42 AM
Subject:        Re: [SPSSX-L] Variable combinations
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Concatenate the variables then aggregate on the concatenated variables, getting the count for each aggregate category.

Rick Oliver
Senior Information Developer
IBM Business Analytics (SPSS)
E-mail: [hidden email]




From:        
Peter Spangler <[hidden email]>
To:        
[hidden email],
Date:        
09/19/2013 12:30 PM
Subject:        
Re: Variable combinations
Sent by:        
"SPSSX(r) Discussion" <[hidden email]>




Sample Input data:  

ID   category1      category2           category3
1    clothing         mens clothes      shirts
2    accessories   womens             handbags
3    clothing         mens clothes      shirts  

Output. I realize I can create a variable to concatenate category1:category3, however, I would prefer to have a list of distinct combinations with counts of IDs in that combination.  

Distinct Combination                               Count
clothing mens clothes shirts                       2
accessories womens handbags                  1


On Thu, Sep 19, 2013 at 9:39 AM, David Marso <
david.marso@...> wrote:
Maybe provide a specific *concrete *example of what you are looking for.
Sample *input *data?
Sample of desired *output *data?
All *possible *combinations or all *existing *combinations?
-----

Peter Spangler wrote
> Accidental send...continuing: My goal is to identify all possible
> combinations of these string variables and eventually count the cases for
> each.
>
> ID   Var1  Var2  Var3
> 1    string1  string2 string3
> 2   string2   string5  string1
>
>
>
> On Thu, Sep 19, 2013 at 9:01 AM, Peter Spangler &lt;

> pspangler@

> &gt;wrote:
>
>> Hi All,
>>
>> I have data that looks like this
>>
>> ID   Var1  Var2  Var3
>> 1    string  string
>> 2
>> 3
>> 4
>>





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Variable-combinations-tp5722111p5722114.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to

LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Variable combinations

Bruce Weaver
Administrator
And given that the OP used the word "combinations", one would guess that order does not matter.  Before Jon posts his Python solution...   ;-)

new file.
dataset close all.

data list list (",") /id (f1) category1 category2 category3 (3a20).
begin data
1,clothing,mens clothes,shirts
2,accessories,womens,handbags
3,mens clothes,shirts,clothing
end data.

* OP said "combinations", which suggests order does not matter.
* So as Jon P noted, one must sort within cases first.
* No doubt, there is a Python solution to that problem,
* but here's a native SPSS method that's fairly transparent.

* Restructure from WIDE to LONG file format.
VARSTOCASES
  /MAKE category FROM category1 category2 category3
  /INDEX=CatNum(3)
  /KEEP=id .

SORT CASES by ID category.

* Compute a new index variable for the new sort order.
IF $casenum EQ 1 or ID NE LAG(ID) NewIndex = 1.
IF MISSING(NewIndex) NewIndex = LAG(NewIndex) + 1.
FORMATS NewIndex (f2.0).
EXECUTE.

* Return to original file structure.
CASESTOVARS
  /ID=id
  /INDEX=NewIndex
  /GROUPBY=VARIABLE
  /SEPARATOR=""
  /DROP = CatNum
.
DATASET NAME original.

* Now use Rick's method.
string newvar (a30).
compute newvar=concat(rtrim(category1), ", ", rtrim(category2), ", ", category3).
DATASET DECLARE aggfile.
AGGREGATE
  /OUTFILE='aggfile'
  /BREAK=newvar
  /Count=N.
DATASET ACTIVATE aggfile.
LIST.

OUTPUT:

newvar                           Count
accessories, handbags, womens        1
clothing, mens clothes, shirts       2
 
HTH.


Jon K Peck wrote
That will only work if order matters.  Otherwise the fields need to be
sorted within case first.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:   Rick Oliver/Chicago/IBM@IBMUS
To:     [hidden email],
Date:   09/19/2013 11:42 AM
Subject:        Re: [SPSSX-L] Variable combinations
Sent by:        "SPSSX(r) Discussion" <[hidden email]>



Concatenate the variables then aggregate on the concatenated variables,
getting the count for each aggregate category.

Rick Oliver
Senior Information Developer
IBM Business Analytics (SPSS)
E-mail: [hidden email]



From:        Peter Spangler <[hidden email]>
To:        [hidden email],
Date:        09/19/2013 12:30 PM
Subject:        Re: Variable combinations
Sent by:        "SPSSX(r) Discussion" <[hidden email]>



Sample Input data:

ID   category1      category2           category3
1    clothing         mens clothes      shirts
2    accessories   womens             handbags
3    clothing         mens clothes      shirts

Output. I realize I can create a variable to concatenate
category1:category3, however, I would prefer to have a list of distinct
combinations with counts of IDs in that combination.

Distinct Combination                               Count
clothing mens clothes shirts                       2
accessories womens handbags                  1


On Thu, Sep 19, 2013 at 9:39 AM, David Marso <[hidden email]>
wrote:
Maybe provide a specific *concrete *example of what you are looking for.
Sample *input *data?
Sample of desired *output *data?
All *possible *combinations or all *existing *combinations?
-----

Peter Spangler wrote
> Accidental send...continuing: My goal is to identify all possible
> combinations of these string variables and eventually count the cases
for
> each.
>
> ID   Var1  Var2  Var3
> 1    string1  string2 string3
> 2   string2   string5  string1
>
>
>
> On Thu, Sep 19, 2013 at 9:01 AM, Peter Spangler <

> pspangler@

> >wrote:
>
>> Hi All,
>>
>> I have data that looks like this
>>
>> ID   Var1  Var2  Var3
>> 1    string  string
>> 2
>> 3
>> 4
>>





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to
email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos
ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in
abyssum?"
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Variable-combinations-tp5722111p5722114.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Variable combinations

Rick Oliver-3
Sweet. I like it.

Rick Oliver
Senior Information Developer
IBM Business Analytics (SPSS)
E-mail: [hidden email]




From:        Bruce Weaver <[hidden email]>
To:        [hidden email],
Date:        09/19/2013 03:42 PM
Subject:        Re: Variable combinations
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




And given that the OP used the word "combinations", one would guess that
order does not matter.  Before Jon posts his Python solution...   ;-)

new file.
dataset close all.

data list list (",") /id (f1) category1 category2 category3 (3a20).
begin data
1,clothing,mens clothes,shirts
2,accessories,womens,handbags
3,mens clothes,shirts,clothing
end data.

* OP said "combinations", which suggests order does not matter.
* So as Jon P noted, one must sort within cases first.
* No doubt, there is a Python solution to that problem,
* but here's a native SPSS method that's fairly transparent.

* Restructure from WIDE to LONG file format.
VARSTOCASES
 /MAKE category FROM category1 category2 category3
 /INDEX=CatNum(3)
 /KEEP=id .

SORT CASES by ID category.

* Compute a new index variable for the new sort order.
IF $casenum EQ 1 or ID NE LAG(ID) NewIndex = 1.
IF MISSING(NewIndex) NewIndex = LAG(NewIndex) + 1.
FORMATS NewIndex (f2.0).
EXECUTE.

* Return to original file structure.
CASESTOVARS
 /ID=id
 /INDEX=NewIndex
 /GROUPBY=VARIABLE
 /SEPARATOR=""
 /DROP = CatNum
.
DATASET NAME original.

* Now use Rick's method.
string newvar (a30).
compute newvar=concat(rtrim(category1), ", ", rtrim(category2), ", ",
category3).
DATASET DECLARE aggfile.
AGGREGATE
 /OUTFILE='aggfile'
 /BREAK=newvar
 /Count=N.
DATASET ACTIVATE aggfile.
LIST.

OUTPUT:

newvar                           Count
accessories, handbags, womens        1
clothing, mens clothes, shirts       2

HTH.



Jon K Peck wrote
> That will only work if order matters.  Otherwise the fields need to be
> sorted within case first.
>
>
> Jon Peck (no "h") aka Kim
> Senior Software Engineer, IBM

> peck@.ibm

> phone: 720-342-5621
>
>
>
>
> From:   Rick Oliver/Chicago/IBM@IBMUS
> To:

> SPSSX-L@.uga

> ,
> Date:   09/19/2013 11:42 AM
> Subject:        Re: [SPSSX-L] Variable combinations
> Sent by:        "SPSSX(r) Discussion" &lt;

> SPSSX-L@.uga

> &gt;
>
>
>
> Concatenate the variables then aggregate on the concatenated variables,
> getting the count for each aggregate category.
>
> Rick Oliver
> Senior Information Developer
> IBM Business Analytics (SPSS)
> E-mail:

> oliverr@.ibm

>
>
>
> From:        Peter Spangler &lt;

> pspangler@

> &gt;
> To:

> SPSSX-L@.uga

> ,
> Date:        09/19/2013 12:30 PM
> Subject:        Re: Variable combinations
> Sent by:        "SPSSX(r) Discussion" &lt;

> SPSSX-L@.uga

> &gt;
>
>
>
> Sample Input data:
>
> ID   category1      category2           category3
> 1    clothing         mens clothes      shirts
> 2    accessories   womens             handbags
> 3    clothing         mens clothes      shirts
>
> Output. I realize I can create a variable to concatenate
> category1:category3, however, I would prefer to have a list of distinct
> combinations with counts of IDs in that combination.
>
> Distinct Combination                               Count
> clothing mens clothes shirts                       2
> accessories womens handbags                  1
>
>
> On Thu, Sep 19, 2013 at 9:39 AM, David Marso &lt;

> david.marso@

> &gt;
> wrote:
> Maybe provide a specific *concrete *example of what you are looking for.
> Sample *input *data?
> Sample of desired *output *data?
> All *possible *combinations or all *existing *combinations?
> -----
>
> Peter Spangler wrote
>> Accidental send...continuing: My goal is to identify all possible
>> combinations of these string variables and eventually count the cases
> for
>> each.
>>
>> ID   Var1  Var2  Var3
>> 1    string1  string2 string3
>> 2   string2   string5  string1
>>
>>
>>
>> On Thu, Sep 19, 2013 at 9:01 AM, Peter Spangler &lt;
>
>> pspangler@
>
>> &gt;wrote:
>>
>>> Hi All,
>>>
>>> I have data that looks like this
>>>
>>> ID   Var1  Var2  Var3
>>> 1    string  string
>>> 2
>>> 3
>>> 4
>>>
>
>
>
>
>
> -----
> Please reply to the list and not to my personal email.
> Those desiring my consulting or training services please feel free to
> email me.
> ---
> "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos
> ne forte conculcent eas pedibus suis."
> Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in
> abyssum?"
> --
> View this message in context:
>
http://spssx-discussion.1045642.n5.nabble.com/Variable-combinations-tp5722111p5722114.html
>
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Variable-combinations-tp5722111p5722120.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Variable combinations

David Marso
Administrator
In reply to this post by Bruce Weaver
I would AGGREGATE first (break on the category variables and get N).
VARSTOCASES.
SORT.
CASESTOVARS.
AGGREGATE again.
Since OP said he wanted individual variable rather than concatenated, skip Rick's step.
Reason?
I suspect AGGREGATE will be a lot faster than VARSTOCASES on the whole file.
Squish it, butcher it, squish it again ;-)

Bruce Weaver wrote
And given that the OP used the word "combinations", one would guess that order does not matter.  Before Jon posts his Python solution...   ;-)

new file.
dataset close all.

data list list (",") /id (f1) category1 category2 category3 (3a20).
begin data
1,clothing,mens clothes,shirts
2,accessories,womens,handbags
3,mens clothes,shirts,clothing
end data.

* OP said "combinations", which suggests order does not matter.
* So as Jon P noted, one must sort within cases first.
* No doubt, there is a Python solution to that problem,
* but here's a native SPSS method that's fairly transparent.

* Restructure from WIDE to LONG file format.
VARSTOCASES
  /MAKE category FROM category1 category2 category3
  /INDEX=CatNum(3)
  /KEEP=id .

SORT CASES by ID category.

* Compute a new index variable for the new sort order.
IF $casenum EQ 1 or ID NE LAG(ID) NewIndex = 1.
IF MISSING(NewIndex) NewIndex = LAG(NewIndex) + 1.
FORMATS NewIndex (f2.0).
EXECUTE.

* Return to original file structure.
CASESTOVARS
  /ID=id
  /INDEX=NewIndex
  /GROUPBY=VARIABLE
  /SEPARATOR=""
  /DROP = CatNum
.
DATASET NAME original.

* Now use Rick's method.
string newvar (a30).
compute newvar=concat(rtrim(category1), ", ", rtrim(category2), ", ", category3).
DATASET DECLARE aggfile.
AGGREGATE
  /OUTFILE='aggfile'
  /BREAK=newvar
  /Count=N.
DATASET ACTIVATE aggfile.
LIST.

OUTPUT:

newvar                           Count
accessories, handbags, womens        1
clothing, mens clothes, shirts       2
 
HTH.


Jon K Peck wrote
That will only work if order matters.  Otherwise the fields need to be
sorted within case first.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:   Rick Oliver/Chicago/IBM@IBMUS
To:     [hidden email],
Date:   09/19/2013 11:42 AM
Subject:        Re: [SPSSX-L] Variable combinations
Sent by:        "SPSSX(r) Discussion" <[hidden email]>



Concatenate the variables then aggregate on the concatenated variables,
getting the count for each aggregate category.

Rick Oliver
Senior Information Developer
IBM Business Analytics (SPSS)
E-mail: [hidden email]



From:        Peter Spangler <[hidden email]>
To:        [hidden email],
Date:        09/19/2013 12:30 PM
Subject:        Re: Variable combinations
Sent by:        "SPSSX(r) Discussion" <[hidden email]>



Sample Input data:

ID   category1      category2           category3
1    clothing         mens clothes      shirts
2    accessories   womens             handbags
3    clothing         mens clothes      shirts

Output. I realize I can create a variable to concatenate
category1:category3, however, I would prefer to have a list of distinct
combinations with counts of IDs in that combination.

Distinct Combination                               Count
clothing mens clothes shirts                       2
accessories womens handbags                  1


On Thu, Sep 19, 2013 at 9:39 AM, David Marso <[hidden email]>
wrote:
Maybe provide a specific *concrete *example of what you are looking for.
Sample *input *data?
Sample of desired *output *data?
All *possible *combinations or all *existing *combinations?
-----

Peter Spangler wrote
> Accidental send...continuing: My goal is to identify all possible
> combinations of these string variables and eventually count the cases
for
> each.
>
> ID   Var1  Var2  Var3
> 1    string1  string2 string3
> 2   string2   string5  string1
>
>
>
> On Thu, Sep 19, 2013 at 9:01 AM, Peter Spangler <

> pspangler@

> >wrote:
>
>> Hi All,
>>
>> I have data that looks like this
>>
>> ID   Var1  Var2  Var3
>> 1    string  string
>> 2
>> 3
>> 4
>>





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to
email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos
ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in
abyssum?"
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Variable-combinations-tp5722111p5722114.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Variable combinations

Art Kendall
In reply to this post by Bruce Weaver
Assuming the OP wants combinations that occur and that (s)he may mean permutations
data list list (",") /id (f1) category1 category2 category3 (3a20).
begin data
1,clothing,mens clothes,shirts
2,accessories,womens,handbags
3,mens clothes,shirts,clothing
end data.
autorecode variables = category1 category2 category3
/into nvar1 to nvar3/ group /print.
mult response groups=cats 'categories' (nvar1 to nvar3 (1,6))
 /frequencies = cats
 /tables = cats by cats by cats.

Art Kendall
Social Research Consultants
On 9/19/2013 4:41 PM, Bruce Weaver [via SPSSX Discussion] wrote:
data list list (",") /id (f1) category1 category2 category3 (3a20).
begin data
1,clothing,mens clothes,shirts
2,accessories,womens,handbags
3,mens clothes,shirts,clothing
end data.

Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Variable combinations

Jon K Peck
In reply to this post by Bruce Weaver
Well, since you mentioned it, just the sorting part :-)

spssinc trans result = Var01 to Var03 type=8   (or whatever string size is appropriate)
/formula "sorted([Var01, Var02, Var03])"


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Bruce Weaver <[hidden email]>
To:        [hidden email],
Date:        09/19/2013 02:41 PM
Subject:        Re: [SPSSX-L] Variable combinations
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




And given that the OP used the word "combinations", one would guess that
order does not matter.  Before Jon posts his Python solution...   ;-)

new file.
dataset close all.

data list list (",") /id (f1) category1 category2 category3 (3a20).
begin data
1,clothing,mens clothes,shirts
2,accessories,womens,handbags
3,mens clothes,shirts,clothing
end data.

* OP said "combinations", which suggests order does not matter.
* So as Jon P noted, one must sort within cases first.
* No doubt, there is a Python solution to that problem,
* but here's a native SPSS method that's fairly transparent.

* Restructure from WIDE to LONG file format.
VARSTOCASES
 /MAKE category FROM category1 category2 category3
 /INDEX=CatNum(3)
 /KEEP=id .

SORT CASES by ID category.

* Compute a new index variable for the new sort order.
IF $casenum EQ 1 or ID NE LAG(ID) NewIndex = 1.
IF MISSING(NewIndex) NewIndex = LAG(NewIndex) + 1.
FORMATS NewIndex (f2.0).
EXECUTE.

* Return to original file structure.
CASESTOVARS
 /ID=id
 /INDEX=NewIndex
 /GROUPBY=VARIABLE
 /SEPARATOR=""
 /DROP = CatNum
.
DATASET NAME original.

* Now use Rick's method.
string newvar (a30).
compute newvar=concat(rtrim(category1), ", ", rtrim(category2), ", ",
category3).
DATASET DECLARE aggfile.
AGGREGATE
 /OUTFILE='aggfile'
 /BREAK=newvar
 /Count=N.
DATASET ACTIVATE aggfile.
LIST.

OUTPUT:

newvar                           Count
accessories, handbags, womens        1
clothing, mens clothes, shirts       2

HTH.



Jon K Peck wrote
> That will only work if order matters.  Otherwise the fields need to be
> sorted within case first.
>
>
> Jon Peck (no "h") aka Kim
> Senior Software Engineer, IBM

> peck@.ibm

> phone: 720-342-5621
>
>
>
>
> From:   Rick Oliver/Chicago/IBM@IBMUS
> To:

> SPSSX-L@.uga

> ,
> Date:   09/19/2013 11:42 AM
> Subject:        Re: [SPSSX-L] Variable combinations
> Sent by:        "SPSSX(r) Discussion" &lt;

> SPSSX-L@.uga

> &gt;
>
>
>
> Concatenate the variables then aggregate on the concatenated variables,
> getting the count for each aggregate category.
>
> Rick Oliver
> Senior Information Developer
> IBM Business Analytics (SPSS)
> E-mail:

> oliverr@.ibm

>
>
>
> From:        Peter Spangler &lt;

> pspangler@

> &gt;
> To:

> SPSSX-L@.uga

> ,
> Date:        09/19/2013 12:30 PM
> Subject:        Re: Variable combinations
> Sent by:        "SPSSX(r) Discussion" &lt;

> SPSSX-L@.uga

> &gt;
>
>
>
> Sample Input data:
>
> ID   category1      category2           category3
> 1    clothing         mens clothes      shirts
> 2    accessories   womens             handbags
> 3    clothing         mens clothes      shirts
>
> Output. I realize I can create a variable to concatenate
> category1:category3, however, I would prefer to have a list of distinct
> combinations with counts of IDs in that combination.
>
> Distinct Combination                               Count
> clothing mens clothes shirts                       2
> accessories womens handbags                  1
>
>
> On Thu, Sep 19, 2013 at 9:39 AM, David Marso &lt;

> david.marso@

> &gt;
> wrote:
> Maybe provide a specific *concrete *example of what you are looking for.
> Sample *input *data?
> Sample of desired *output *data?
> All *possible *combinations or all *existing *combinations?
> -----
>
> Peter Spangler wrote
>> Accidental send...continuing: My goal is to identify all possible
>> combinations of these string variables and eventually count the cases
> for
>> each.
>>
>> ID   Var1  Var2  Var3
>> 1    string1  string2 string3
>> 2   string2   string5  string1
>>
>>
>>
>> On Thu, Sep 19, 2013 at 9:01 AM, Peter Spangler &lt;
>
>> pspangler@
>
>> &gt;wrote:
>>
>>> Hi All,
>>>
>>> I have data that looks like this
>>>
>>> ID   Var1  Var2  Var3
>>> 1    string  string
>>> 2
>>> 3
>>> 4
>>>
>
>
>
>
>
> -----
> Please reply to the list and not to my personal email.
> Those desiring my consulting or training services please feel free to
> email me.
> ---
> "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos
> ne forte conculcent eas pedibus suis."
> Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in
> abyssum?"
> --
> View this message in context:
>
http://spssx-discussion.1045642.n5.nabble.com/Variable-combinations-tp5722111p5722114.html
>
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Variable-combinations-tp5722111p5722120.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Variable combinations

Rick Oliver-3
In reply to this post by David Marso
Ah. Yes. The OP's desired result looked like one variable that combined the three original variables. If you want to preserve the original three variables, then you just use all three as break variables. No need to concatenate.

Rick Oliver
Senior Information Developer
IBM Business Analytics (SPSS)
E-mail: [hidden email]




From:        David Marso <[hidden email]>
To:        [hidden email],
Date:        09/19/2013 04:08 PM
Subject:        Re: Variable combinations
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




I would AGGREGATE first (break on the category variables and get N).
VARSTOCASES.
SORT.
CASESTOVARS.
AGGREGATE again.
Since OP said he wanted individual variable rather than concatenated, skip
Rick's step.
Reason?
I suspect AGGREGATE will be a lot faster than VARSTOCASES on the whole file.
Squish it, butcher it, squish it again ;-)


Bruce Weaver wrote
> And given that the OP used the word "combinations", one would guess that
> order does not matter.  Before Jon posts his Python solution...   ;-)
>
> new file.
> dataset close all.
>
> data list list (",") /id (f1) category1 category2 category3 (3a20).
> begin data
> 1,clothing,mens clothes,shirts
> 2,accessories,womens,handbags
> 3,mens clothes,shirts,clothing
> end data.
>
> * OP said "combinations", which suggests order does not matter.
> * So as Jon P noted, one must sort within cases first.
> * No doubt, there is a Python solution to that problem,
> * but here's a native SPSS method that's fairly transparent.
>
> * Restructure from WIDE to LONG file format.
> VARSTOCASES
>   /MAKE category FROM category1 category2 category3
>   /INDEX=CatNum(3)
>   /KEEP=id .
>
> SORT CASES by ID category.
>
> * Compute a new index variable for the new sort order.
> IF $casenum EQ 1 or ID NE LAG(ID) NewIndex = 1.
> IF MISSING(NewIndex) NewIndex = LAG(NewIndex) + 1.
> FORMATS NewIndex (f2.0).
> EXECUTE.
>
> * Return to original file structure.
> CASESTOVARS
>   /ID=id
>   /INDEX=NewIndex
>   /GROUPBY=VARIABLE
>   /SEPARATOR=""
>   /DROP = CatNum
> .
> DATASET NAME original.
>
> * Now use Rick's method.
> string newvar (a30).
> compute newvar=concat(rtrim(category1), ", ", rtrim(category2), ", ",
> category3).
> DATASET DECLARE aggfile.
> AGGREGATE
>   /OUTFILE='aggfile'
>   /BREAK=newvar
>   /Count=N.
> DATASET ACTIVATE aggfile.
> LIST.
>
> OUTPUT:
>
> newvar                           Count
> accessories, handbags, womens        1
> clothing, mens clothes, shirts       2
>
> HTH.
>
> Jon K Peck wrote
>> That will only work if order matters.  Otherwise the fields need to be
>> sorted within case first.
>>
>>
>> Jon Peck (no "h") aka Kim
>> Senior Software Engineer, IBM

>> peck@.ibm

>> phone: 720-342-5621
>>
>>
>>
>>
>> From:   Rick Oliver/Chicago/IBM@IBMUS
>> To:

>> SPSSX-L@.uga

>> ,
>> Date:   09/19/2013 11:42 AM
>> Subject:        Re: [SPSSX-L] Variable combinations
>> Sent by:        "SPSSX(r) Discussion" &lt;

>> SPSSX-L@.uga

>> &gt;
>>
>>
>>
>> Concatenate the variables then aggregate on the concatenated variables,
>> getting the count for each aggregate category.
>>
>> Rick Oliver
>> Senior Information Developer
>> IBM Business Analytics (SPSS)
>> E-mail:

>> oliverr@.ibm

>>
>>
>>
>> From:        Peter Spangler &lt;

>> pspangler@

>> &gt;
>> To:

>> SPSSX-L@.uga

>> ,
>> Date:        09/19/2013 12:30 PM
>> Subject:        Re: Variable combinations
>> Sent by:        "SPSSX(r) Discussion" &lt;

>> SPSSX-L@.uga

>> &gt;
>>
>>
>>
>> Sample Input data:
>>
>> ID   category1      category2           category3
>> 1    clothing         mens clothes      shirts
>> 2    accessories   womens             handbags
>> 3    clothing         mens clothes      shirts
>>
>> Output. I realize I can create a variable to concatenate
>> category1:category3, however, I would prefer to have a list of distinct
>> combinations with counts of IDs in that combination.
>>
>> Distinct Combination                               Count
>> clothing mens clothes shirts                       2
>> accessories womens handbags                  1
>>
>>
>> On Thu, Sep 19, 2013 at 9:39 AM, David Marso &lt;

>> david.marso@

>> &gt;
>> wrote:
>> Maybe provide a specific *concrete *example of what you are looking for.
>> Sample *input *data?
>> Sample of desired *output *data?
>> All *possible *combinations or all *existing *combinations?
>> -----
>>
>> Peter Spangler wrote
>>> Accidental send...continuing: My goal is to identify all possible
>>> combinations of these string variables and eventually count the cases
>> for
>>> each.
>>>
>>> ID   Var1  Var2  Var3
>>> 1    string1  string2 string3
>>> 2   string2   string5  string1
>>>
>>>
>>>
>>> On Thu, Sep 19, 2013 at 9:01 AM, Peter Spangler &lt;
>>
>>> pspangler@
>>
>>> &gt;wrote:
>>>
>>>> Hi All,
>>>>
>>>> I have data that looks like this
>>>>
>>>> ID   Var1  Var2  Var3
>>>> 1    string  string
>>>> 2
>>>> 3
>>>> 4
>>>>
>>
>>
>>
>>
>>
>> -----
>> Please reply to the list and not to my personal email.
>> Those desiring my consulting or training services please feel free to
>> email me.
>> ---
>> "Nolite dare sanctum canibus neque mittatis margaritas vestras ante
>> porcos
>> ne forte conculcent eas pedibus suis."
>> Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff
>> in
>> abyssum?"
>> --
>> View this message in context:
>>
http://spssx-discussion.1045642.n5.nabble.com/Variable-combinations-tp5722111p5722114.html
>>
>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to

>> LISTSERV@.UGA

>>  (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Variable-combinations-tp5722111p5722122.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Variable combinations

Art Kendall
In reply to this post by Jon K Peck
I have tried variations on this with without square brackets, same variable names in the result and in the list input to the sort, etc., but get various error messages.

data list list (",") /id (f1) category1 category2 category3(3a20).
begin data
1,clothing,mens clothes,shirts
2,accessories,womens,handbags
3,mens clothes,shirts,clothing
end data.

spssinc trans result = Var01 to Var03 type=20  
/formula "sorted(Category1 TO Category3)".
execute.

Art Kendall
Social Research Consultants
On 9/19/2013 5:29 PM, Jon K Peck [via SPSSX Discussion] wrote:
Well, since you mentioned it, just the sorting part :-)

spssinc trans result = Var01 to Var03 type=8   (or whatever string size is appropriate)
/formula "sorted([Var01, Var02, Var03])"


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Bruce Weaver <[hidden email]>
To:        [hidden email],
Date:        09/19/2013 02:41 PM
Subject:        Re: [SPSSX-L] Variable combinations
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




And given that the OP used the word "combinations", one would guess that
order does not matter.  Before Jon posts his Python solution...   ;-)

new file.
dataset close all.

data list list (",") /id (f1) category1 category2 category3 (3a20).
begin data
1,clothing,mens clothes,shirts
2,accessories,womens,handbags
3,mens clothes,shirts,clothing
end data.

* OP said "combinations", which suggests order does not matter.
* So as Jon P noted, one must sort within cases first.
* No doubt, there is a Python solution to that problem,
* but here's a native SPSS method that's fairly transparent.

* Restructure from WIDE to LONG file format.
VARSTOCASES
 /MAKE category FROM category1 category2 category3
 /INDEX=CatNum(3)
 /KEEP=id .

SORT CASES by ID category.

* Compute a new index variable for the new sort order.
IF $casenum EQ 1 or ID NE LAG(ID) NewIndex = 1.
IF MISSING(NewIndex) NewIndex = LAG(NewIndex) + 1.
FORMATS NewIndex (f2.0).
EXECUTE.

* Return to original file structure.
CASESTOVARS
 /ID=id
 /INDEX=NewIndex
 /GROUPBY=VARIABLE
 /SEPARATOR=""
 /DROP = CatNum
.
DATASET NAME original.

* Now use Rick's method.
string newvar (a30).
compute newvar=concat(rtrim(category1), ", ", rtrim(category2), ", ",
category3).
DATASET DECLARE aggfile.
AGGREGATE
 /OUTFILE='aggfile'
 /BREAK=newvar
 /Count=N.
DATASET ACTIVATE aggfile.
LIST.

OUTPUT:

newvar                           Count
accessories, handbags, womens        1
clothing, mens clothes, shirts       2

HTH.



Jon K Peck wrote
> That will only work if order matters.  Otherwise the fields need to be
> sorted within case first.
>
>
> Jon Peck (no "h") aka Kim
> Senior Software Engineer, IBM

> [hidden email]

> phone: 720-342-5621
>
>
>
>
> From:   Rick Oliver/Chicago/IBM@IBMUS
> To:

> [hidden email]

> ,
> Date:   09/19/2013 11:42 AM
> Subject:        Re: [SPSSX-L] Variable combinations
> Sent by:        "SPSSX(r) Discussion" &lt;

> [hidden email]

> &gt;
>
>
>
> Concatenate the variables then aggregate on the concatenated variables,
> getting the count for each aggregate category.
>
> Rick Oliver
> Senior Information Developer
> IBM Business Analytics (SPSS)
> E-mail:

> [hidden email]

>
>
>
> From:        Peter Spangler &lt;

> pspangler@

> &gt;
> To:

> [hidden email]

> ,
> Date:        09/19/2013 12:30 PM
> Subject:        Re: Variable combinations
> Sent by:        "SPSSX(r) Discussion" &lt;

> [hidden email]

> &gt;
>
>
>
> Sample Input data:
>
> ID   category1      category2           category3
> 1    clothing         mens clothes      shirts
> 2    accessories   womens             handbags
> 3    clothing         mens clothes      shirts
>
> Output. I realize I can create a variable to concatenate
> category1:category3, however, I would prefer to have a list of distinct
> combinations with counts of IDs in that combination.
>
> Distinct Combination                               Count
> clothing mens clothes shirts                       2
> accessories womens handbags                  1
>
>
> On Thu, Sep 19, 2013 at 9:39 AM, David Marso &lt;

> david.marso@

> &gt;
> wrote:
> Maybe provide a specific *concrete *example of what you are looking for.
> Sample *input *data?
> Sample of desired *output *data?
> All *possible *combinations or all *existing *combinations?
> -----
>
> Peter Spangler wrote
>> Accidental send...continuing: My goal is to identify all possible
>> combinations of these string variables and eventually count the cases
> for
>> each.
>>
>> ID   Var1  Var2  Var3
>> 1    string1  string2 string3
>> 2   string2   string5  string1
>>
>>
>>
>> On Thu, Sep 19, 2013 at 9:01 AM, Peter Spangler &lt;
>
>> pspangler@
>
>> &gt;wrote:
>>
>>> Hi All,
>>>
>>> I have data that looks like this
>>>
>>> ID   Var1  Var2  Var3
>>> 1    string  string
>>> 2
>>> 3
>>> 4
>>>
>
>
>
>
>
> -----
> Please reply to the list and not to my personal email.
> Those desiring my consulting or training services please feel free to
> email me.
> ---
> "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos
> ne forte conculcent eas pedibus suis."
> Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in
> abyssum?"
> --
> View this message in context:
>
http://spssx-discussion.1045642.n5.nabble.com/Variable-combinations-tp5722111p5722114.html
>
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> [hidden email]

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Variable-combinations-tp5722111p5722120.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD





If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Variable-combinations-tp5722111p5722124.html
To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML

Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Variable combinations

Jon K Peck
You can't use TO directly in the formula, because it is really Python notation.  You can use it in a VARIABLES subcommand on SPSSINC TRANS and then refer to those contents as <>  in the formula.  Also, the sorted function requires a list, which is why my examples is in square brackets.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Art Kendall <[hidden email]>
To:        [hidden email],
Date:        09/19/2013 04:20 PM
Subject:        Re: [SPSSX-L] Variable combinations
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




I have tried variations on this with without square brackets, same variable names in the result and in the list input to the sort, etc., but get various error messages.

data list list (",") /id (f1) category1 category2 category3(3a20).
begin data
1,clothing,mens clothes,shirts
2,accessories,womens,handbags
3,mens clothes,shirts,clothing
end data.

spssinc trans result = Var01 to Var03 type=20  
/formula "sorted(Category1 TO Category3)".
execute.

Art Kendall
Social Research Consultants

On 9/19/2013 5:29 PM, Jon K Peck [via SPSSX Discussion] wrote:
Well, since you mentioned it, just the sorting part :-)

spssinc trans result = Var01 to Var03 type=8   (or whatever string size is appropriate)

/formula "sorted([Var01, Var02, Var03])"



Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM

[hidden email]
phone: 720-342-5621





From:        
Bruce Weaver <[hidden email]>
To:        
[hidden email],
Date:        
09/19/2013 02:41 PM
Subject:        
Re: [SPSSX-L] Variable combinations
Sent by:        
"SPSSX(r) Discussion" <[hidden email]>




And given that the OP used the word "combinations", one would guess that
order does not matter.  Before Jon posts his Python solution...   ;-)

new file.
dataset close all.

data list list (",") /id (f1) category1 category2 category3 (3a20).
begin data
1,clothing,mens clothes,shirts
2,accessories,womens,handbags
3,mens clothes,shirts,clothing
end data.

* OP said "combinations", which suggests order does not matter.
* So as Jon P noted, one must sort within cases first.
* No doubt, there is a Python solution to that problem,
* but here's a native SPSS method that's fairly transparent.

* Restructure from WIDE to LONG file format.
VARSTOCASES
/MAKE category FROM category1 category2 category3
/INDEX=CatNum(3)
/KEEP=id .

SORT CASES by ID category.

* Compute a new index variable for the new sort order.
IF $casenum EQ 1 or ID NE LAG(ID) NewIndex = 1.
IF MISSING(NewIndex) NewIndex = LAG(NewIndex) + 1.
FORMATS NewIndex (f2.0).
EXECUTE.

* Return to original file structure.
CASESTOVARS
/ID=id
/INDEX=NewIndex
/GROUPBY=VARIABLE
/SEPARATOR=""
/DROP = CatNum
.
DATASET NAME original.

* Now use Rick's method.
string newvar (a30).
compute newvar=concat(rtrim(category1), ", ", rtrim(category2), ", ",
category3).
DATASET DECLARE aggfile.
AGGREGATE
/OUTFILE='aggfile'
/BREAK=newvar
/Count=N.
DATASET ACTIVATE aggfile.
LIST.

OUTPUT:

newvar                           Count
accessories, handbags, womens        1
clothing, mens clothes, shirts       2

HTH.



Jon K Peck wrote
> That will only work if order matters.  Otherwise the fields need to be
> sorted within case first.
>
>
> Jon Peck (no "h") aka Kim
> Senior Software Engineer, IBM

>
[hidden email]

> phone: 720-342-5621
>
>
>
>
> From:   Rick Oliver/Chicago/IBM@IBMUS
> To:

>
[hidden email]

> ,
> Date:   09/19/2013 11:42 AM
> Subject:        Re: [SPSSX-L] Variable combinations
> Sent by:        "SPSSX(r) Discussion" &lt;

>
[hidden email]

> &gt;
>
>
>
> Concatenate the variables then aggregate on the concatenated variables,
> getting the count for each aggregate category.
>
> Rick Oliver
> Senior Information Developer
> IBM Business Analytics (SPSS)
> E-mail:

>
[hidden email]

>
>
>
> From:        Peter Spangler &lt;

> pspangler@

> &gt;
> To:

>
[hidden email]

> ,
> Date:        09/19/2013 12:30 PM
> Subject:        Re: Variable combinations
> Sent by:        "SPSSX(r) Discussion" &lt;

>
[hidden email]

> &gt;
>
>
>
> Sample Input data:
>
> ID   category1      category2           category3
> 1    clothing         mens clothes      shirts
> 2    accessories   womens             handbags
> 3    clothing         mens clothes      shirts
>
> Output. I realize I can create a variable to concatenate
> category1:category3, however, I would prefer to have a list of distinct
> combinations with counts of IDs in that combination.
>
> Distinct Combination                               Count
> clothing mens clothes shirts                       2
> accessories womens handbags                  1
>
>
> On Thu, Sep 19, 2013 at 9:39 AM, David Marso &lt;

> david.marso@

> &gt;
> wrote:
> Maybe provide a specific *concrete *example of what you are looking for.
> Sample *input *data?
> Sample of desired *output *data?
> All *possible *combinations or all *existing *combinations?
> -----
>
> Peter Spangler wrote
>> Accidental send...continuing: My goal is to identify all possible
>> combinations of these string variables and eventually count the cases
> for
>> each.
>>
>> ID   Var1  Var2  Var3
>> 1    string1  string2 string3
>> 2   string2   string5  string1
>>
>>
>>
>> On Thu, Sep 19, 2013 at 9:01 AM, Peter Spangler &lt;
>
>> pspangler@
>
>> &gt;wrote:
>>
>>> Hi All,
>>>
>>> I have data that looks like this
>>>
>>> ID   Var1  Var2  Var3
>>> 1    string  string
>>> 2
>>> 3
>>> 4
>>>
>
>
>
>
>
> -----
> Please reply to the list and not to my personal email.
> Those desiring my consulting or training services please feel free to
> email me.
> ---
> "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos
> ne forte conculcent eas pedibus suis."
> Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in
> abyssum?"
> --
> View this message in context:
>
http://spssx-discussion.1045642.n5.nabble.com/Variable-combinations-tp5722111p5722114.html
>
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

>
[hidden email]

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver

[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Variable-combinations-tp5722111p5722120.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to

[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD






If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Variable-combinations-tp5722111p5722124.html
To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion,
click here.
NAML

Art Kendall
Social Research Consultants



View this message in context: Re: Variable combinations
Sent from the
SPSSX Discussion mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: Variable combinations

Richard Ristow
In reply to this post by Rick Oliver-3
At 01:42 PM 9/19/2013, Rick Oliver wrote:

>Concatenate the variables then aggregate on the concatenated
>variables, getting the count for each aggregate category.

The thread has gone on beyond this, but I'd like to make one comment.

If you take this approach (others have noted why it may not be the
right approach), there's no need to catenate the variables. Instead of

string newvar (a30).
compute newvar=concat(rtrim(category1), " ", rtrim(category2), " ",
category3).
DATASET DECLARE aggfile.
AGGREGATE
   /OUTFILE=aggfile
   /BREAK=newvar
   /Count=N.

you'd write

DATASET DECLARE aggfile.
AGGREGATE
   /OUTFILE=aggfile
   /BREAK=category1 category2 category3
   /Count=N.

I've seen a lot of code that's over-complicated because of people
forgetting that BY and BREAK clauses can take *sets* of variables.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Variable combinations

Peter Spangler
This is all such a great discussion/solution! Thank you all for your contributions. I'm running various examples now!


On Fri, Sep 20, 2013 at 10:28 AM, Richard Ristow <[hidden email]> wrote:
At 01:42 PM 9/19/2013, Rick Oliver wrote:

Concatenate the variables then aggregate on the concatenated
variables, getting the count for each aggregate category.

The thread has gone on beyond this, but I'd like to make one comment.

If you take this approach (others have noted why it may not be the
right approach), there's no need to catenate the variables. Instead of

string newvar (a30).
compute newvar=concat(rtrim(category1), " ", rtrim(category2), " ",

category3).
DATASET DECLARE aggfile.
AGGREGATE
  /OUTFILE=aggfile
  /BREAK=newvar
  /Count=N.

you'd write

DATASET DECLARE aggfile.
AGGREGATE
  /OUTFILE=aggfile
  /BREAK=category1 category2 category3
  /Count=N.

I've seen a lot of code that's over-complicated because of people
forgetting that BY and BREAK clauses can take *sets* of variables.


=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Variable combinations

Rick Oliver-3
You've got it.

Rick Oliver
Senior Information Developer
IBM Business Analytics (SPSS)
E-mail: [hidden email]




From:        Peter Spangler <[hidden email]>
To:        [hidden email],
Date:        09/20/2013 12:36 PM
Subject:        Re: Variable combinations
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




This is all such a great discussion/solution! Thank you all for your contributions. I'm running various examples now!


On Fri, Sep 20, 2013 at 10:28 AM, Richard Ristow <wrristow@...> wrote:
At 01:42 PM 9/19/2013, Rick Oliver wrote:

Concatenate the variables then aggregate on the concatenated
variables, getting the count for each aggregate category.


The thread has gone on beyond this, but I'd like to make one comment.

If you take this approach (others have noted why it may not be the
right approach), there's no need to catenate the variables. Instead of

string newvar (a30).
compute newvar=concat(rtrim(category1), " ", rtrim(category2), " ",


category3).
DATASET DECLARE aggfile.
AGGREGATE

  /OUTFILE=aggfile
  /BREAK=newvar
  /Count=N.

you'd write

DATASET DECLARE aggfile.
AGGREGATE
  /OUTFILE=aggfile
  /BREAK=category1 category2 category3
  /Count=N.

I've seen a lot of code that's over-complicated because of people
forgetting that BY and BREAK clauses can take *sets* of variables.



=====================
To manage your subscription to SPSSX-L, send a message to

LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD