Hi Team:
I would appreciate your input on following problem. I have 40 product variables as follows. Begin data PV1 Pv2 Pv3 Pv4 Pv5 Pv6 1 1 1 1 1 1 1 1 1 Those who select the product will have 1 under the respective product. They can select multiple products. I want to generate all possible combinations (can be double or triplets and so on)of selection with top three most frequently selected products. One way is to concat across rows. However, when I combine altogether I will have write too many if states. Is there any way we can do this by looping? I will highly appreciate your input. Marta, it will be great if you can also provide your input. Thanks, Manmit |
At 05:46 PM 11/9/2006, Manmit Shrimali wrote:
>I have 40 product variables as follows. > >Begin data >PV1 Pv2 Pv3 Pv4 Pv5 Pv6 >1 1 1 1 1 >1 1 1 1 > >Those who select the product will have 1 under the respective product. >They can select multiple products. Here's the part I don't understand: >I want to generate all possible combinations (can be double or >triplets and so on)of selection with top three most frequently >selected products. All possible combinations of products selected? Or, more likely, all combinations of selected products that occur? >One way is to concat across rows. However, when I combine altogether I >will have write too many if states. Is there any way we can do this by >looping? Suppose you had only 3 or 4 product variables. How would your syntax look, and what would it give you? That will help us understand what you really want. -Good luck, Richard |
In reply to this post by Manmit Shrimali-2
Thanks Richard. I should have been more detailed. Respondent can select
multiple products. What I want to know is the most frequent selected product combination - it can be combination of two products or three products. For e.g. 1. among all products, Product 1 was selected the most (which is simple, just frequency of all products 2. Then, the most frequently selected products in pair. For .e.g product 1, 2 and 3 has the highest number of "1" i.e. highest number of frequencies. Eventually what we want to know is which are the products/products combinations respondent used just before using our product. We want to identify from which product combination our business is coming from i.e. people are switching from to our product. For this reason, we ask, just before using our product, which all product you have used (select all that apply). I hope I am able to explain myself. Please let me know if further clarification is required. Once again, thank you for your valuable input. -----Original Message----- From: Richard Ristow [mailto:[hidden email]] Sent: Friday, November 10, 2006 5:13 AM To: Manmit Shrimali; [hidden email] Subject: Re: Looping for generating seletion combinations At 05:46 PM 11/9/2006, Manmit Shrimali wrote: >I have 40 product variables as follows. > >Begin data >PV1 Pv2 Pv3 Pv4 Pv5 Pv6 >1 1 1 1 1 >1 1 1 1 > >Those who select the product will have 1 under the respective product. >They can select multiple products. Here's the part I don't understand: >I want to generate all possible combinations (can be double or >triplets and so on)of selection with top three most frequently >selected products. All possible combinations of products selected? Or, more likely, all combinations of selected products that occur? >One way is to concat across rows. However, when I combine altogether I >will have write too many if states. Is there any way we can do this by >looping? Suppose you had only 3 or 4 product variables. How would your syntax look, and what would it give you? That will help us understand what you really want. -Good luck, Richard |
In reply to this post by Manmit Shrimali-2
If had seven products only, then the output would like like below:
Frequencies (in brackets): Product ab (50) Product dc (10) Product ef (2) Product fi (200) Product abi (30). Syntax could be: Compute combine=concat(ltrim(productlist)*** From the above output, we can determine, that those who currently use our product have switched from product fi, followed by abi (when three products were selected). We would also come to know that people usually use atleat two products together before coming to our product. -----Original Message----- From: Richard Ristow [mailto:[hidden email]] Sent: Friday, November 10, 2006 5:13 AM To: Manmit Shrimali; [hidden email] Subject: Re: Looping for generating seletion combinations At 05:46 PM 11/9/2006, Manmit Shrimali wrote: >I have 40 product variables as follows. > >Begin data >PV1 Pv2 Pv3 Pv4 Pv5 Pv6 >1 1 1 1 1 >1 1 1 1 > >Those who select the product will have 1 under the respective product. >They can select multiple products. Here's the part I don't understand: >I want to generate all possible combinations (can be double or >triplets and so on)of selection with top three most frequently >selected products. All possible combinations of products selected? Or, more likely, all combinations of selected products that occur? >One way is to concat across rows. However, when I combine altogether I >will have write too many if states. Is there any way we can do this by >looping? Suppose you had only 3 or 4 product variables. How would your syntax look, and what would it give you? That will help us understand what you really want. -Good luck, Richard |
At 07:47 PM 11/9/2006, Manmit Shrimali wrote:
Well, for better or for worse, I still don't understand. Now, the equivalent of this >Compute combine=concat(ltrim(productlist)*** isn't hard. If, . Your product variables are Prdct01 TO Prdct40 . They are contiguous in the file . They are numeric, 1 if the product was used, 0 if it wasn't . FINALLY, you want a single 40-character string giving which products were used, you can do this. It will give a '.' in the proper position for each product not used, and '1' for each product that was used. Not tested. It will put in '?' for missing product variables, and 'X' for any values other than 0 or 1. STRING ALL_USED(A40). COMPUTE ALL_USED = ' '. DO REPEAT PRODUCT = Prdct01 TO Prdct40. . STRING #FLAG (A1). . DO IF MISSING(PRODUCT). . COMPUTE #FLAG = '?'. . ELSE IF PRODUCT EQ 0. . COMPUTE #FLAG = '.'. . ELSE IF PRODUCT EQ 1. . COMPUTE #FLAG = '1'. . ELSE. . COMPUTE #FLAG = 'X'. . END IF. . COMPUTE ALL_USED =CONCAT(RTRIM(ALL_USED),#FLAG). END REPEAT. Here's the problem: you have 2**40 possible combinations. I don't see how you can possibly analyze count what combinations occur; there are two many. Even if people only use no more than three products, there are over 10,000. What are you going to do? I'm not searching back, but this question, or ones like it, seem to keep coming up, and I just can't see them making any sense. To help me, if you had 5 products and the following 20 customers, what combinations would you count? LIST. |-----------------------------|---------------------------| |Output Created |09-NOV-2006 20:52:13 | |-----------------------------|---------------------------| ID Pv1 Pv2 Pv3 Pv4 Pv5 001 1 1 0 1 1 002 0 1 0 1 1 003 1 1 0 1 0 004 0 0 1 0 1 005 1 0 1 1 0 006 0 1 0 1 0 007 1 1 1 1 0 008 0 0 0 0 0 009 1 0 1 0 0 010 0 1 0 0 1 011 1 0 0 0 0 012 1 1 0 0 1 013 1 0 0 0 0 014 1 1 0 0 1 015 0 1 1 1 0 016 0 0 0 1 0 017 1 0 0 1 0 018 0 0 1 0 1 019 0 0 1 0 0 020 0 1 0 0 0 Number of cases read: 20 Number of cases listed: 20 |
In reply to this post by Manmit Shrimali-2
Here's the problem: you have 2**40 possible combinations. I don't see
how you can possibly analyze count what combinations occur; there are two many. Even if people only use no more than three products, there are over 10,000. What are you going to do? I'm not searching back, but this question, or ones like it, seem to keep coming up, and I just can't see them making any sense. To help me, if you had 5 products and the following 20 customers, what combinations would you count? LIST. |-----------------------------|---------------------------| |Output Created |09-NOV-2006 20:52:13 | |-----------------------------|---------------------------| ID Pv1 Pv2 Pv3 Pv4 Pv5 Combo Sum of product selected 001 1 1 0 1 1 Pv1,pv2,pv4,pv5 4 002 0 1 0 1 1 pv2,pv4,pv5 3 003 1 1 0 1 0 pv1,pv2,pv4 3 004 0 0 1 0 1 pv3,pv5 2 005 1 0 1 1 0 pv1,pv3,pv4 3 006 0 1 0 1 0 pv2,pv4 2 007 1 1 1 1 0 pv1,pv2,pv3,pv4 4 008 0 0 0 0 0 . 0 009 1 0 1 0 0 pv1,pv3 2 010 0 1 0 0 1 pv2,pv5 2 011 1 0 0 0 0 pv1 1 012 1 1 0 0 1 pv1,pv2,pv5 3 013 1 0 0 0 0 pv1 1 014 1 1 0 0 1 pv1,pv2,pv5 3 015 0 1 1 1 0 pv2,pv3,pv4 3 016 0 0 0 1 0 pv4 1 017 1 0 0 1 0 pv1,pv4 2 018 0 0 1 0 1 pv3,pv5 2 019 0 0 1 0 0 pv3 1 020 0 1 0 0 0 pv2 1 Now, I know that average number of selection is 2.15 so I need to generate probabilities of pair of two which is: probability of PV1 and pv2 being together, probability of pv1 and pv3, pv1 and pv5 pv2 and pv1 etc. followed by probability of pv1, pv2, pv3 being together, probablity of pv1, pv2, pv4 together etc. Eventually based on the above exercise, I will be able to answer following: * Average number of products selected * Most frequently selected product, pair of two, pair of three (which will be derived from variable combo) * Top five product combination that has highest probability of being selected. (this is the most important and tricky part) And so, in a way you are right, there can be multiple combination and we have to calculate probability of each combinations. You are also right that I asked similar type of question earlier but was not able to get thorough solution. Richard, I thank you for your input, I am sorry for not being able to explain in simple way. Please let me know if you have any queries. Thank you very much, once again. |
In reply to this post by Manmit Shrimali-2
> I have 40 product variables as follows.
> > Begin data > PV1 Pv2 Pv3 Pv4 Pv5 Pv6 > 1 1 1 1 1 > 1 1 1 1 > > Those who select the product will have 1 under the ... product. > They can select multiple products. I want to generate all possible > combinations ... of selection with top three most frequently > selected products. I don't think I loop--or at least not at first. 1. I'd put in 0's when the product was not selected. 2. Then the means or sums of the products will give you the most frequently selected products. 3. I'd create a product selection variable by concatenating the product selection 1's and 0's. That would give you a unique binary number for every product combination. A Frequency could then give you the most frequent combo's. 4. I'd create a selection count variable by summing across products. That would give you how many products each "case" purchased. 5. At that point, you could data-mine any number of ways-- by best-selling product, most frequent combo, number of products, etc. You get a lot of power by creating only two additional variables. Does that make sense? Gary --- Gary S. Rosin Professor of Law South Texas College of Law 1303 San Jacinto Houston, TX 77002 <[hidden email]> 713-646-1854 |
In reply to this post by Manmit Shrimali-2
Hey Gary:
Thanks for your valuable input. The challenge is still not solved. Based on your solution, I will not be able to generate probabilities of selecting each message with pair and triplets. That is the big thing. Thank you very much for your input. Please provide any input if you have. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Gary Rosin Sent: Friday, November 10, 2006 9:56 AM To: [hidden email] Subject: Re: Looping for generating seletion combinations > I have 40 product variables as follows. > > Begin data > PV1 Pv2 Pv3 Pv4 Pv5 Pv6 > 1 1 1 1 1 > 1 1 1 1 > > Those who select the product will have 1 under the ... product. > They can select multiple products. I want to generate all possible > combinations ... of selection with top three most frequently > selected products. I don't think I loop--or at least not at first. 1. I'd put in 0's when the product was not selected. 2. Then the means or sums of the products will give you the most frequently selected products. 3. I'd create a product selection variable by concatenating the product selection 1's and 0's. That would give you a unique binary number for every product combination. A Frequency could then give you the most frequent combo's. 4. I'd create a selection count variable by summing across products. That would give you how many products each "case" purchased. 5. At that point, you could data-mine any number of ways-- by best-selling product, most frequent combo, number of products, etc. You get a lot of power by creating only two additional variables. Does that make sense? Gary --- Gary S. Rosin Professor of Law South Texas College of Law 1303 San Jacinto Houston, TX 77002 <[hidden email]> 713-646-1854 |
In reply to this post by Manmit Shrimali-2
OK, one more note fairly quickly, and then I may be off for a while.
At 09:24 PM 11/9/2006, Manmit Shrimali wrote: >If you had 5 products and the following 20 customers, what >combinations would you count? > > ID Pv1 Pv2 Pv3 Pv4 Pv5 Combo Sum of product >selected > >001 1 1 0 1 1 Pv1,pv2,pv4,pv5 4 >002 0 1 0 1 1 pv2,pv4,pv5 3 >003 1 1 0 1 0 pv1,pv2,pv4 3 >004 0 0 1 0 1 pv3,pv5 2 >005 1 0 1 1 0 pv1,pv3,pv 3 >006 0 1 0 1 0 pv2,pv4 2 >007 1 1 1 1 0 pv1,pv2,pv3,pv4 4 >008 0 0 0 0 0 . 0 >009 1 0 1 0 0 pv1,pv3 2 >010 0 1 0 0 1 pv2,pv5 2 >011 1 0 0 0 0 pv1 1 >012 1 1 0 0 1 pv1,pv2,pv5 3 >013 1 0 0 0 0 pv1 1 >014 1 1 0 0 1 pv1,pv2,pv5 3 >015 0 1 1 1 0 pv2,pv3,pv4 3 >016 0 0 0 1 0 pv4 1 >017 1 0 0 1 0 pv1,pv4 2 >018 0 0 1 0 1 pv3,pv5 2 >019 0 0 1 0 0 pv3 1 >020 0 1 0 0 0 pv2 1 > >Now, I know that average number of selection is 2.15 so I need to >generate probabilities of pair of two which is: > >probability of PV1 and pv2 being together, probability of pv1 and pv3, >pv1 and pv5 pv2 and pv1 etc. followed by probability of pv1, pv2, pv3 >being together, probablity of pv1, pv2, pv4 together etc. This may be a little tricky. In the above data, Pv1 and Pv2 are together in five cases, in four different combinations: 001 1 1 0 1 1 Pv1,pv2,pv4,pv5 4 003 1 1 0 1 0 pv1,pv2,pv4 3 007 1 1 1 1 0 pv1,pv2,pv3,pv4 4 012 1 1 0 0 1 pv1,pv2,pv5 3 014 1 1 0 0 1 pv1,pv2,pv5 3 I don't know how you're going to count those pairs, then. It may take more programming than we've done so far. >Eventually based on the above exercise, I will be able to answer >following: >* Average number of products selected Easy to get the number of products selected in each case; then, easy to take average, or other statistics. >* Most frequently selected product Again, easy; FREQUENCIES, or other procedures, on the individual variables. >[* Most frequently selected] pair of two, pair of three (which will be >derived from variable combo) This may be the hardest, because each pair can occur as part of several combinations. Or, when you say "PV1 and pv2 being together", does that mean those are the only two products selected? In that case it's easier to count, but with all those combinations to choose from, the count may be very low. >* Top five product combination that has highest probability of being >selected. (this is the most important and tricky part) No, this isn't tricky. It's easy to count the total occurrences of each combination (probably with AGGREGATE), and to sort and list the five most frequently occurring. The answer may not be meaningful, though, because there are so many possible combinations. >You are right that I asked similar type of question earlier but was >not able to get thorough solution. Good, and I hope we get farther this time. I don't think it's easy, though. Expressing clearly what you want, and the computations to do it, are both subtle. -Good luck, Richard |
In reply to this post by Manmit Shrimali-2
I may be well off the mark here, but is this similar to TURF analysis
that your wanting to do? Finding the reach, etc of the combinations? -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow Sent: 10 November 2006 05:21 To: [hidden email] Subject: Re: Looping for generating seletion combinations OK, one more note fairly quickly, and then I may be off for a while. At 09:24 PM 11/9/2006, Manmit Shrimali wrote: >If you had 5 products and the following 20 customers, what combinations >would you count? > > ID Pv1 Pv2 Pv3 Pv4 Pv5 Combo Sum of product >selected > >001 1 1 0 1 1 Pv1,pv2,pv4,pv5 4 >002 0 1 0 1 1 pv2,pv4,pv5 3 >003 1 1 0 1 0 pv1,pv2,pv4 3 >004 0 0 1 0 1 pv3,pv5 2 >005 1 0 1 1 0 pv1,pv3,pv 3 >006 0 1 0 1 0 pv2,pv4 2 >007 1 1 1 1 0 pv1,pv2,pv3,pv4 4 >008 0 0 0 0 0 . 0 >009 1 0 1 0 0 pv1,pv3 2 >010 0 1 0 0 1 pv2,pv5 2 >011 1 0 0 0 0 pv1 1 >012 1 1 0 0 1 pv1,pv2,pv5 3 >013 1 0 0 0 0 pv1 1 >014 1 1 0 0 1 pv1,pv2,pv5 3 >015 0 1 1 1 0 pv2,pv3,pv4 3 >016 0 0 0 1 0 pv4 1 >017 1 0 0 1 0 pv1,pv4 2 >018 0 0 1 0 1 pv3,pv5 2 >019 0 0 1 0 0 pv3 1 >020 0 1 0 0 0 pv2 1 > >Now, I know that average number of selection is 2.15 so I need to >generate probabilities of pair of two which is: > >probability of PV1 and pv2 being together, probability of pv1 and pv3, >pv1 and pv5 pv2 and pv1 etc. followed by probability of pv1, pv2, pv3 >being together, probablity of pv1, pv2, pv4 together etc. This may be a little tricky. In the above data, Pv1 and Pv2 are together in five cases, in four different combinations: 001 1 1 0 1 1 Pv1,pv2,pv4,pv5 4 003 1 1 0 1 0 pv1,pv2,pv4 3 007 1 1 1 1 0 pv1,pv2,pv3,pv4 4 012 1 1 0 0 1 pv1,pv2,pv5 3 014 1 1 0 0 1 pv1,pv2,pv5 3 I don't know how you're going to count those pairs, then. It may take more programming than we've done so far. >Eventually based on the above exercise, I will be able to answer >following: >* Average number of products selected Easy to get the number of products selected in each case; then, easy to take average, or other statistics. >* Most frequently selected product Again, easy; FREQUENCIES, or other procedures, on the individual variables. >[* Most frequently selected] pair of two, pair of three (which will be >derived from variable combo) This may be the hardest, because each pair can occur as part of several combinations. Or, when you say "PV1 and pv2 being together", does that mean those are the only two products selected? In that case it's easier to count, but with all those combinations to choose from, the count may be very low. >* Top five product combination that has highest probability of being >selected. (this is the most important and tricky part) No, this isn't tricky. It's easy to count the total occurrences of each combination (probably with AGGREGATE), and to sort and list the five most frequently occurring. The answer may not be meaningful, though, because there are so many possible combinations. >You are right that I asked similar type of question earlier but was not >able to get thorough solution. Good, and I hope we get farther this time. I don't think it's easy, though. Expressing clearly what you want, and the computations to do it, are both subtle. -Good luck, Richard ________________________________________________________________________ This e-mail has been scanned for all viruses by Star. The service is powered by MessageLabs. For more information on a proactive anti-virus service working around the clock, around the globe, visit: http://www.star.net.uk ________________________________________________________________________ ______________________________________________________________________ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email ______________________________________________________________________ |
Free forum by Nabble | Edit this page |