Looping for generating seletion combinations

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Looping for generating seletion combinations

Manmit Shrimali-2
Hi Team:

I would appreciate your input on following problem.
I have 40 product variables as follows.

Begin data
PV1 Pv2 Pv3 Pv4 Pv5 Pv6
1      1  1 1  1
1  1      1    1

Those who select the product will have 1 under the respective product.
They can select multiple products. I want to generate all possible
combinations (can be double or triplets and so on)of selection with top
three most frequently selected products. One way is to concat across
rows. However, when I combine altogether I will have write too many if
states. Is there any way we can do this by looping?

I will highly appreciate your input. Marta, it will be great if you can
also provide your input.

Thanks,

Manmit
Reply | Threaded
Open this post in threaded view
|

Re: Looping for generating seletion combinations

Richard Ristow
At 05:46 PM 11/9/2006, Manmit Shrimali wrote:

>I have 40 product variables as follows.
>
>Begin data
>PV1 Pv2 Pv3 Pv4 Pv5 Pv6
>1      1  1 1  1
>1  1      1    1
>
>Those who select the product will have 1 under the respective product.
>They can select multiple products.

Here's the part I don't understand:

>I want to generate all possible combinations (can be double or
>triplets and so on)of selection with top three most frequently
>selected products.

All possible combinations of products selected? Or, more likely, all
combinations of selected products that occur?

>One way is to concat across rows. However, when I combine altogether I
>will have write too many if states. Is there any way we can do this by
>looping?

Suppose you had only 3 or 4 product variables. How would your syntax
look, and what would it give you? That will help us understand what you
really want.

-Good luck,
  Richard
Reply | Threaded
Open this post in threaded view
|

Re: Looping for generating seletion combinations

Manmit Shrimali-2
In reply to this post by Manmit Shrimali-2
Thanks Richard. I should have been more detailed. Respondent can select
multiple products. What I want to know is the most frequent selected
product combination - it can be combination of two products or three
products. For e.g.

1. among all products, Product 1 was selected the most (which is simple,
just frequency of all products

2. Then, the most frequently selected products in pair. For .e.g product
1, 2 and 3 has the highest number of "1" i.e. highest number of
frequencies.

Eventually what we want to know is which are the products/products
combinations respondent used just before using our product. We want to
identify from which product combination our business is coming from i.e.
people are switching from to our product. For this reason, we ask, just
before using our product, which all product you have used (select all
that apply).

I hope I am able to explain myself. Please let me know if further
clarification is required. Once again, thank you for your valuable
input.

-----Original Message-----
From: Richard Ristow [mailto:[hidden email]]
Sent: Friday, November 10, 2006 5:13 AM
To: Manmit Shrimali; [hidden email]
Subject: Re: Looping for generating seletion combinations

At 05:46 PM 11/9/2006, Manmit Shrimali wrote:

>I have 40 product variables as follows.
>
>Begin data
>PV1 Pv2 Pv3 Pv4 Pv5 Pv6
>1      1  1 1  1
>1  1      1    1
>
>Those who select the product will have 1 under the respective product.
>They can select multiple products.

Here's the part I don't understand:

>I want to generate all possible combinations (can be double or
>triplets and so on)of selection with top three most frequently
>selected products.

All possible combinations of products selected? Or, more likely, all
combinations of selected products that occur?

>One way is to concat across rows. However, when I combine altogether I
>will have write too many if states. Is there any way we can do this by
>looping?

Suppose you had only 3 or 4 product variables. How would your syntax
look, and what would it give you? That will help us understand what you
really want.

-Good luck,
  Richard
Reply | Threaded
Open this post in threaded view
|

Re: Looping for generating seletion combinations

Manmit Shrimali-2
In reply to this post by Manmit Shrimali-2
If had seven products only, then the output would like like below:

Frequencies (in brackets):
Product ab (50)
Product dc (10)
Product ef (2)
Product fi (200)
Product abi (30).



Syntax could be:

Compute combine=concat(ltrim(productlist)***


From the above output, we can determine, that those who currently use
our product have switched from product fi, followed by abi (when three
products were selected). We would also come to know that people usually
use atleat two products together before coming to our product.


-----Original Message-----
From: Richard Ristow [mailto:[hidden email]]
Sent: Friday, November 10, 2006 5:13 AM
To: Manmit Shrimali; [hidden email]
Subject: Re: Looping for generating seletion combinations

At 05:46 PM 11/9/2006, Manmit Shrimali wrote:

>I have 40 product variables as follows.
>
>Begin data
>PV1 Pv2 Pv3 Pv4 Pv5 Pv6
>1      1  1 1  1
>1  1      1    1
>
>Those who select the product will have 1 under the respective product.
>They can select multiple products.

Here's the part I don't understand:

>I want to generate all possible combinations (can be double or
>triplets and so on)of selection with top three most frequently
>selected products.

All possible combinations of products selected? Or, more likely, all
combinations of selected products that occur?

>One way is to concat across rows. However, when I combine altogether I
>will have write too many if states. Is there any way we can do this by
>looping?

Suppose you had only 3 or 4 product variables. How would your syntax
look, and what would it give you? That will help us understand what you
really want.

-Good luck,
  Richard
Reply | Threaded
Open this post in threaded view
|

Re: Looping for generating seletion combinations

Richard Ristow
At 07:47 PM 11/9/2006, Manmit Shrimali wrote:

Well, for better or for worse, I still don't understand.

Now, the equivalent of this

>Compute combine=concat(ltrim(productlist)***

isn't hard. If,
. Your product variables are Prdct01 TO Prdct40
. They are contiguous in the file
. They are numeric, 1 if the product was used, 0 if it wasn't
. FINALLY, you want a single 40-character string giving which products
were used,

you can do this. It will give a '.' in the proper position for each
product not used, and '1' for each product that was used. Not tested.
It will put in '?' for missing product variables, and 'X' for any
values other than 0 or 1.

STRING ALL_USED(A40).
COMPUTE ALL_USED = ' '.

DO REPEAT PRODUCT = Prdct01 TO Prdct40.
.  STRING #FLAG (A1).
.  DO IF    MISSING(PRODUCT).
.     COMPUTE #FLAG = '?'.
.  ELSE IF  PRODUCT EQ 0.
.     COMPUTE #FLAG = '.'.
.  ELSE IF  PRODUCT EQ 1.
.     COMPUTE #FLAG = '1'.
.  ELSE.
.     COMPUTE #FLAG = 'X'.
.  END IF.
.  COMPUTE ALL_USED
      =CONCAT(RTRIM(ALL_USED),#FLAG).
END REPEAT.

Here's the problem: you have 2**40 possible combinations. I don't see
how you can possibly analyze count what combinations occur; there are
two many. Even if people only use no more than three products, there
are over 10,000. What are you going to do?

I'm not searching back, but this question, or ones like it, seem to
keep coming up, and I just can't see them making any sense. To help me,
if you had 5 products and the following 20 customers, what combinations
would you count?

LIST.
|-----------------------------|---------------------------|
|Output Created               |09-NOV-2006 20:52:13       |
|-----------------------------|---------------------------|
  ID Pv1 Pv2 Pv3 Pv4 Pv5

001   1   1   0   1   1
002   0   1   0   1   1
003   1   1   0   1   0
004   0   0   1   0   1
005   1   0   1   1   0
006   0   1   0   1   0
007   1   1   1   1   0
008   0   0   0   0   0
009   1   0   1   0   0
010   0   1   0   0   1
011   1   0   0   0   0
012   1   1   0   0   1
013   1   0   0   0   0
014   1   1   0   0   1
015   0   1   1   1   0
016   0   0   0   1   0
017   1   0   0   1   0
018   0   0   1   0   1
019   0   0   1   0   0
020   0   1   0   0   0

Number of cases read:  20    Number of cases listed:  20
Reply | Threaded
Open this post in threaded view
|

Re: Looping for generating seletion combinations

Manmit Shrimali-2
In reply to this post by Manmit Shrimali-2
Here's the problem: you have 2**40 possible combinations. I don't see
how you can possibly analyze count what combinations occur; there are
two many. Even if people only use no more than three products, there
are over 10,000. What are you going to do?

I'm not searching back, but this question, or ones like it, seem to
keep coming up, and I just can't see them making any sense. To help me,
if you had 5 products and the following 20 customers, what combinations
would you count?

LIST.
|-----------------------------|---------------------------|
|Output Created               |09-NOV-2006 20:52:13       |
|-----------------------------|---------------------------|
  ID Pv1 Pv2 Pv3 Pv4 Pv5   Combo                        Sum of product
selected

001   1   1   0   1   1     Pv1,pv2,pv4,pv5             4
002   0   1   0   1   1     pv2,pv4,pv5                 3
003   1   1   0   1   0     pv1,pv2,pv4                 3
004   0   0   1   0   1     pv3,pv5                             2
005   1   0   1   1   0     pv1,pv3,pv4                 3
006   0   1   0   1   0     pv2,pv4                             2
007   1   1   1   1   0     pv1,pv2,pv3,pv4             4
008   0   0   0   0   0     .                                   0
009   1   0   1   0   0     pv1,pv3                             2
010   0   1   0   0   1     pv2,pv5                             2
011   1   0   0   0   0     pv1                         1
012   1   1   0   0   1     pv1,pv2,pv5                 3
013   1   0   0   0   0     pv1                         1
014   1   1   0   0   1     pv1,pv2,pv5                 3
015   0   1   1   1   0     pv2,pv3,pv4                 3
016   0   0   0   1   0     pv4                         1
017   1   0   0   1   0     pv1,pv4                             2
018   0   0   1   0   1     pv3,pv5                             2
019   0   0   1   0   0     pv3                         1
020   0   1   0   0   0     pv2                         1

Now, I know that average number of selection is 2.15 so I need to
generate probabilities of pair of two which is:

probability of PV1 and pv2 being together, probability of pv1 and pv3,
pv1 and pv5 pv2 and pv1 etc. followed by probability of pv1, pv2, pv3
being together, probablity of pv1, pv2, pv4 together etc.

Eventually based on the above exercise, I will be able to answer
following:
* Average number of products selected
* Most frequently selected product, pair of two, pair of three (which
will be derived from variable combo)
* Top five product combination that has highest probability of being
selected. (this is the most important and tricky part)

And so, in a way you are right, there can be multiple combination and we
have to calculate probability of each combinations. You are also right
that I asked similar type of question earlier but was not able to get
thorough solution.

Richard, I thank you for your input, I am sorry for not being able to
explain in simple way. Please let me know if you have any queries.

Thank you very much, once again.
Reply | Threaded
Open this post in threaded view
|

Re: Looping for generating seletion combinations

Gary Rosin
In reply to this post by Manmit Shrimali-2
> I have 40 product variables as follows.
>
> Begin data
> PV1 Pv2 Pv3 Pv4 Pv5 Pv6
> 1      1  1 1  1
> 1  1      1    1
>
> Those who select the product will have 1 under the ... product.
> They can select multiple products. I want to generate all possible
> combinations ... of selection with top three most frequently
> selected products.


I don't think I loop--or at least not at first.

  1. I'd put in 0's when the product was not selected.

  2. Then the means or sums of the products will give you
     the most frequently selected products.

  3. I'd create a product selection variable by concatenating
     the product selection 1's and 0's.  That would give you
     a unique binary number for every product combination.  A
     Frequency could then give you the most frequent combo's.

  4. I'd create a selection count variable by summing across
     products.  That would give you how many products each
     "case" purchased.

  5. At that point, you could data-mine any number of ways--
     by best-selling product, most frequent combo, number of
     products, etc.  You get a lot of power by creating only
     two additional variables.

Does that make sense?

Gary

   ---

Gary S. Rosin
Professor of Law
South Texas College of Law
1303 San Jacinto
Houston, TX  77002

<[hidden email]>
713-646-1854
Reply | Threaded
Open this post in threaded view
|

Re: Looping for generating seletion combinations

Manmit Shrimali-2
In reply to this post by Manmit Shrimali-2
Hey Gary:

Thanks for your valuable input. The challenge is still not solved. Based
on your solution, I will not be able to generate probabilities of
selecting each message with pair and triplets. That is the big thing.

Thank you very much for your input. Please provide any input if you
have.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Gary Rosin
Sent: Friday, November 10, 2006 9:56 AM
To: [hidden email]
Subject: Re: Looping for generating seletion combinations

> I have 40 product variables as follows.
>
> Begin data
> PV1 Pv2 Pv3 Pv4 Pv5 Pv6
> 1      1  1 1  1
> 1  1      1    1
>
> Those who select the product will have 1 under the ... product.
> They can select multiple products. I want to generate all possible
> combinations ... of selection with top three most frequently
> selected products.


I don't think I loop--or at least not at first.

  1. I'd put in 0's when the product was not selected.

  2. Then the means or sums of the products will give you
     the most frequently selected products.

  3. I'd create a product selection variable by concatenating
     the product selection 1's and 0's.  That would give you
     a unique binary number for every product combination.  A
     Frequency could then give you the most frequent combo's.

  4. I'd create a selection count variable by summing across
     products.  That would give you how many products each
     "case" purchased.

  5. At that point, you could data-mine any number of ways--
     by best-selling product, most frequent combo, number of
     products, etc.  You get a lot of power by creating only
     two additional variables.

Does that make sense?

Gary

   ---

Gary S. Rosin
Professor of Law
South Texas College of Law
1303 San Jacinto
Houston, TX  77002

<[hidden email]>
713-646-1854
Reply | Threaded
Open this post in threaded view
|

Re: Looping for generating seletion combinations

Richard Ristow
In reply to this post by Manmit Shrimali-2
OK, one more note fairly quickly, and then I may be off for a while.

At 09:24 PM 11/9/2006, Manmit Shrimali wrote:

>If you had 5 products and the following 20 customers, what
>combinations would you count?
>
>   ID Pv1 Pv2 Pv3 Pv4 Pv5   Combo              Sum of product
>selected
>
>001   1   1   0   1   1     Pv1,pv2,pv4,pv5     4
>002   0   1   0   1   1     pv2,pv4,pv5         3
>003   1   1   0   1   0     pv1,pv2,pv4         3
>004   0   0   1   0   1     pv3,pv5             2
>005   1   0   1   1   0     pv1,pv3,pv          3
>006   0   1   0   1   0     pv2,pv4             2
>007   1   1   1   1   0     pv1,pv2,pv3,pv4     4
>008   0   0   0   0   0     .                   0
>009   1   0   1   0   0     pv1,pv3             2
>010   0   1   0   0   1     pv2,pv5             2
>011   1   0   0   0   0     pv1                 1
>012   1   1   0   0   1     pv1,pv2,pv5         3
>013   1   0   0   0   0     pv1                 1
>014   1   1   0   0   1     pv1,pv2,pv5         3
>015   0   1   1   1   0     pv2,pv3,pv4         3
>016   0   0   0   1   0     pv4                 1
>017   1   0   0   1   0     pv1,pv4             2
>018   0   0   1   0   1     pv3,pv5             2
>019   0   0   1   0   0     pv3                 1
>020   0   1   0   0   0     pv2                 1
>
>Now, I know that average number of selection is 2.15 so I need to
>generate probabilities of pair of two which is:
>
>probability of PV1 and pv2 being together, probability of pv1 and pv3,
>pv1 and pv5 pv2 and pv1 etc. followed by probability of pv1, pv2, pv3
>being together, probablity of pv1, pv2, pv4 together etc.

This may be a little tricky. In the above data, Pv1 and Pv2 are
together in five cases, in four different combinations:

001   1   1   0   1   1     Pv1,pv2,pv4,pv5     4
003   1   1   0   1   0     pv1,pv2,pv4         3
007   1   1   1   1   0     pv1,pv2,pv3,pv4     4
012   1   1   0   0   1     pv1,pv2,pv5         3
014   1   1   0   0   1     pv1,pv2,pv5         3

I don't know how you're going to count those pairs, then. It may take
more programming than we've done so far.

>Eventually based on the above exercise, I will be able to answer
>following:
>* Average number of products selected

Easy to get the number of products selected in each case; then, easy to
take average, or other statistics.

>* Most frequently selected product

Again, easy; FREQUENCIES, or other procedures, on the individual
variables.

>[* Most frequently selected] pair of two, pair of three (which will be
>derived from variable combo)

This may be the hardest, because each pair can occur as part of several
combinations. Or, when you say "PV1 and pv2 being together", does that
mean those are the only two products selected? In that case it's easier
to count, but with all those combinations to choose from, the count may
be very low.

>* Top five product combination that has highest probability of being
>selected. (this is the most important and tricky part)

No, this isn't tricky. It's easy to count the total occurrences of each
combination (probably with AGGREGATE), and to sort and list the five
most frequently occurring. The answer may not be meaningful, though,
because there are so many possible combinations.

>You are right that I asked similar type of question earlier but was
>not able to get thorough solution.

Good, and I hope we get farther this time. I don't think it's easy,
though. Expressing clearly what you want, and the computations to do
it, are both subtle.

-Good luck,
  Richard
Reply | Threaded
Open this post in threaded view
|

Re: Looping for generating seletion combinations

Mike P-5
In reply to this post by Manmit Shrimali-2
I may be well off the mark here, but is this similar to TURF analysis
that your wanting to do?

Finding the reach, etc of the combinations?

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Richard Ristow
Sent: 10 November 2006 05:21
To: [hidden email]
Subject: Re: Looping for generating seletion combinations

OK, one more note fairly quickly, and then I may be off for a while.

At 09:24 PM 11/9/2006, Manmit Shrimali wrote:

>If you had 5 products and the following 20 customers, what combinations

>would you count?
>
>   ID Pv1 Pv2 Pv3 Pv4 Pv5   Combo              Sum of product
>selected
>
>001   1   1   0   1   1     Pv1,pv2,pv4,pv5     4
>002   0   1   0   1   1     pv2,pv4,pv5         3
>003   1   1   0   1   0     pv1,pv2,pv4         3
>004   0   0   1   0   1     pv3,pv5             2
>005   1   0   1   1   0     pv1,pv3,pv          3
>006   0   1   0   1   0     pv2,pv4             2
>007   1   1   1   1   0     pv1,pv2,pv3,pv4     4
>008   0   0   0   0   0     .                   0
>009   1   0   1   0   0     pv1,pv3             2
>010   0   1   0   0   1     pv2,pv5             2
>011   1   0   0   0   0     pv1                 1
>012   1   1   0   0   1     pv1,pv2,pv5         3
>013   1   0   0   0   0     pv1                 1
>014   1   1   0   0   1     pv1,pv2,pv5         3
>015   0   1   1   1   0     pv2,pv3,pv4         3
>016   0   0   0   1   0     pv4                 1
>017   1   0   0   1   0     pv1,pv4             2
>018   0   0   1   0   1     pv3,pv5             2
>019   0   0   1   0   0     pv3                 1
>020   0   1   0   0   0     pv2                 1
>
>Now, I know that average number of selection is 2.15 so I need to
>generate probabilities of pair of two which is:
>
>probability of PV1 and pv2 being together, probability of pv1 and pv3,
>pv1 and pv5 pv2 and pv1 etc. followed by probability of pv1, pv2, pv3
>being together, probablity of pv1, pv2, pv4 together etc.

This may be a little tricky. In the above data, Pv1 and Pv2 are together
in five cases, in four different combinations:

001   1   1   0   1   1     Pv1,pv2,pv4,pv5     4
003   1   1   0   1   0     pv1,pv2,pv4         3
007   1   1   1   1   0     pv1,pv2,pv3,pv4     4
012   1   1   0   0   1     pv1,pv2,pv5         3
014   1   1   0   0   1     pv1,pv2,pv5         3

I don't know how you're going to count those pairs, then. It may take
more programming than we've done so far.

>Eventually based on the above exercise, I will be able to answer
>following:
>* Average number of products selected

Easy to get the number of products selected in each case; then, easy to
take average, or other statistics.

>* Most frequently selected product

Again, easy; FREQUENCIES, or other procedures, on the individual
variables.

>[* Most frequently selected] pair of two, pair of three (which will be
>derived from variable combo)

This may be the hardest, because each pair can occur as part of several
combinations. Or, when you say "PV1 and pv2 being together", does that
mean those are the only two products selected? In that case it's easier
to count, but with all those combinations to choose from, the count may
be very low.

>* Top five product combination that has highest probability of being
>selected. (this is the most important and tricky part)

No, this isn't tricky. It's easy to count the total occurrences of each
combination (probably with AGGREGATE), and to sort and list the five
most frequently occurring. The answer may not be meaningful, though,
because there are so many possible combinations.

>You are right that I asked similar type of question earlier but was not

>able to get thorough solution.

Good, and I hope we get farther this time. I don't think it's easy,
though. Expressing clearly what you want, and the computations to do it,
are both subtle.

-Good luck,
  Richard

________________________________________________________________________
This e-mail has been scanned for all viruses by Star. The service is
powered by MessageLabs. For more information on a proactive anti-virus
service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________

______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email
______________________________________________________________________