Hell Team:
I am writing after spending significant amount time but in futile. We have asked respondents to provide list of cities they travel in span of five years. Every year they trave to several city. Data looks as follows ID CITYA_YEAR1 CITYB_YEAR1 _CITYC_YEAR 1 CITYA_YEAR2 CITYB_YEAR2 CITYC_YEAR2 CITYA_YEAR3 and so on. If user has visited the city then his response will be captured as 1 other it will be missing. There are 10 cities to pick from for each year. Analysis: I want to prepare a tree that explain the flow of their journey i.e. how many visited city a in their first year - out of those which cities they visited in second year then out of those which in 3rd year and so. There can be several combination of cities and I want to create flow of each combinations and then get count and frequency. I tried hard to come up shorter solution but could not think any. Terribly long route is take frequency of cities visited in first year. Let say 10 people selected city a and b and 20 selected city c. Apply filter for those who visited city a and b then take the frequency of cities they visited in second year and so on. Any advise will be great help. Thanks, Manmit |
At 12:51 AM 10/21/2006, Manmit Shrimali wrote:
>I am writing after spending significant amount time but in futile. We >have asked respondents to provide list of cities they travel in span >of five years. Every year they trave to several city. Data looks as >follows ID CITYA_YEAR1 CITYB_YEAR1 _CITYC_YEAR 1 CITYA_YEAR2 >CITYB_YEAR2 CITYC_YEAR2 CITYA_YEAR3 and so on. If user has visited the >city then his response will be captured as 1 other it will be missing. >There are 10 cities to pick from for each year. This one will be easier with some data to work from. And I'm not sure how your variables are coded. Is it, CITYA_YEAR1 New York Yes/No CITYB_YEAR1 Boston Yes/No CITYC_YEAR1 Washington Yes/No Or is it more like CITYA_YEAR1 1 CITYB_YEAR1 2 CITYC_YEAR1 0 With VALUE LABELS CITYA_YEAR1 CITYB_YEAR1 CITYC_YEAR1 1 'New York' 2 'Boston' 3 'Washington' 0 'None'. >Analysis: I want to prepare a tree that explain the flow of their >journey i.e. how many visited city a in their first year - out of >those which cities they visited in second year then out of those which >in 3rd year and so. There can be several combination of cities and I >want to create flow of each combinations and then get count and >frequency. This would help if we had a little more data. Do you want to count everybody who visited New York in their first year, Boston in their second, and Washington in their third? If so, and if you data is organized in the second way above, it's reasonably easy using AGGREGATE. Give us two or three data records, and what your counts would be if those records were the only data you have. Good luck, Richard Ristow |
In reply to this post by Manmit Shrimali-2
Hi Richard:
Thanks for your response. My data is coded as follows: City_year1 New York Yes/No and so on. I did thought of aggregate function but the problem is that I will have multiple files and again multiple combination. I have 35 cities. Now, if 10 people visited citya in first year, I want to know, out of this 10 which cities they visited in year three, let say 5 visited and 5 did not visit at all. Now out of 5 who visited in year two, which cities they visited in year 3 and so on till year 5. So you see, I can have multiple combination and I want to get the analysis at unique level. For e.g. let say in 1st year I get following group: 10 people visited citya b and c 20 visited only c 5 visited city d 40 visited city fgh Now for each group I want to see which cities they visited in following years till 5 year. Your input is highly appreciated. -----Original Message----- From: Richard Ristow [mailto:[hidden email]] Sent: Sunday, October 22, 2006 7:50 AM To: Manmit Shrimali; [hidden email] Subject: Re: Looping complexity for complex situation At 12:51 AM 10/21/2006, Manmit Shrimali wrote: >I am writing after spending significant amount time but in futile. We >have asked respondents to provide list of cities they travel in span >of five years. Every year they trave to several city. Data looks as >follows ID CITYA_YEAR1 CITYB_YEAR1 _CITYC_YEAR 1 CITYA_YEAR2 >CITYB_YEAR2 CITYC_YEAR2 CITYA_YEAR3 and so on. If user has visited the >city then his response will be captured as 1 other it will be missing. >There are 10 cities to pick from for each year. This one will be easier with some data to work from. And I'm not sure how your variables are coded. Is it, CITYA_YEAR1 New York Yes/No CITYB_YEAR1 Boston Yes/No CITYC_YEAR1 Washington Yes/No Or is it more like CITYA_YEAR1 1 CITYB_YEAR1 2 CITYC_YEAR1 0 With VALUE LABELS CITYA_YEAR1 CITYB_YEAR1 CITYC_YEAR1 1 'New York' 2 'Boston' 3 'Washington' 0 'None'. >Analysis: I want to prepare a tree that explain the flow of their >journey i.e. how many visited city a in their first year - out of >those which cities they visited in second year then out of those which >in 3rd year and so. There can be several combination of cities and I >want to create flow of each combinations and then get count and >frequency. This would help if we had a little more data. Do you want to count everybody who visited New York in their first year, Boston in their second, and Washington in their third? If so, and if you data is organized in the second way above, it's reasonably easy using AGGREGATE. Give us two or three data records, and what your counts would be if those records were the only data you have. Good luck, Richard Ristow |
In reply to this post by Manmit Shrimali-2
Shalom
What you are trying to do need many to many match which is not available in SPSS you can do it in MSACESS Here is a small example of 6 cites and 3 years to demonstrate the complicity city a b c d e f year 1 2 3 if the data is person year city_a city_b city_c city_d city_e city_f 1 1 y y n n n n 1 2 n n y y n n 1 3 n n n n y y then the combination are years 1 2 3 city's combination a c e city's combination a c f city's combination a d e city's combination a d f city's combination b c e ect' .... that is for each person you have to match each city in year 1 with each in year 2 with each in year 3 ect' I think that even if you will succeed to do the restructure of the data the number of combination ( permutation) will be so big that it will not tall you any thing . Hillel Vardi Ben Gurion U Israel Manmit Shrimali wrote: > Hell Team: > > I am writing after spending significant amount time but in futile. We > have asked respondents to provide list of cities they travel in span of > five years. Every year they trave to several city. Data looks as follows > ID CITYA_YEAR1 CITYB_YEAR1 _CITYC_YEAR 1 CITYA_YEAR2 CITYB_YEAR2 > CITYC_YEAR2 CITYA_YEAR3 and so on. If user has visited the city then his > response will be captured as 1 other it will be missing. There are 10 > cities to pick from for each year. > > Analysis: I want to prepare a tree that explain the flow of their > journey i.e. how many visited city a in their first year - out of those > which cities they visited in second year then out of those which in 3rd > year and so. There can be several combination of cities and I want to > create flow of each combinations and then get count and frequency. > > I tried hard to come up shorter solution but could not think any. > Terribly long route is take frequency of cities visited in first year. > Let say 10 people selected city a and b and 20 selected city c. Apply > filter for those who visited city a and b then take the frequency of > cities they visited in second year and so on. > > Any advise will be great help. > > Thanks, > Manmit > > |
A many to many match can not be done directly, that is true. However, it can
be done. I gave an example of how to do it in the last couple of months, maybe the last month. I would also bet the Ray has example in his book, available from the spss web site. Gene Maguin |
In reply to this post by Manmit Shrimali-2
At 06:20 PM 10/22/2006, Manmit Shrimali wrote:
>I will have multiple files THAT shouldn't be a problem. You need to combine the multiple files into one, by whatever is the appropriate technique - ADD FILES, MATCH FILES, or whatever. >My data is coded as follows: City_year1 New York Yes/No and so on. Like this (SPSS draft output)? * "My data is coded: City_year1 New York Yes/No and so on." . * Like this? YR1_PVD 'Visited Providence in year 1' . * YR1_CRNS 'Visited Cranston in year 1' . * YR1_WRWK 'Visited Warwick in year 1' . * YR2_PVD 'Visited Providence in year 2' . * YR2_CRNS 'Visited Cranston in year 2' . * YR2_WRWK 'Visited Warwick in year 2' . * YR3_PVD 'Visited Providence in year 3' . * YR3_CRNS 'Visited Cranston in year 3' . * YR3_WRWK 'Visited Warwick in year 3' . GET FILE=TESTDATA. TEMPORARY. STRING SPACE(A28). LIST. List |-----------------------------|---------------------------| |Output Created |24-OCT-2006 16:43:46 | |-----------------------------|---------------------------| C:\Documents and Settings\Richard\My Documents \Temporary\SPSS \2006-10-23 Shrimali - Looping complexity for complex situation.SAV CASE YR1_ YR1_ YR1_ YR2_ YR2_ YR2_ YR3_ YR3_ YR3_ _NUM PVD CRNS WRWK PVD CRNS WRWK PVD CRNS WRWK SPACE 001 1 0 0 1 1 0 0 0 1 002 1 1 1 0 1 0 0 0 1 003 0 1 1 0 1 1 0 0 1 004 0 1 0 1 1 1 1 0 0 005 0 0 0 0 1 0 1 0 0 006 0 1 0 0 1 0 0 1 0 007 0 1 0 0 0 1 0 0 0 008 0 1 1 0 0 0 1 0 1 009 1 1 0 0 0 0 1 0 1 010 0 0 1 1 0 0 1 0 1 011 0 0 1 0 0 0 1 0 0 012 0 1 1 1 0 1 0 0 0 013 0 0 0 0 0 0 0 0 1 014 1 0 0 0 0 1 0 0 1 015 0 0 0 0 1 0 0 0 0 016 0 0 1 0 0 0 0 0 0 017 1 1 0 0 0 1 0 0 0 018 0 1 1 0 0 1 0 1 0 019 0 0 0 0 0 1 0 0 1 020 0 1 1 1 0 0 1 1 1 Number of cases read: 20 Number of cases listed: 20 >I can have multiple combination and I want to get the analysis at >unique level. For e.g. let say in 1st year I get following group: >10 people visited city a b and c [...] >40 visited city fgh > >For each group I want to see which cities they visited in following >years till 5 year. That *sounds* like you want to count all different patterns of visiting over the five years: the set of cities visited the first year BY the set visited the second year BY the set visited the third year.... That's easy with AGGREGATE (see below), but useless. With 3 cities and 3 years (in my test data) there are 2**(3*3)=512 possible combinations; as you see none occurs even twice in this test data. 35 cities and 5 years gives 2**(35*5)=5*10**52 combinations. In the test data I've posted - three cities, three year - what are the categories you'd like to count? (By the way, Hillel Vardi described this needing a many-to-many match. It probably doesn't. If your data is organized like the test data above, the many-to-many match is already done in the ata records.) Illustration - count all combinations (SPSS draft output): * "If 10 people visited city a in first year, I want to know,. * which cities they visited in year three, let say 5 visited . * and 5 did not visit at all. Out of 5 who visited in year . * two, which cities they visited in year 3 and so on" . * Possibility one: count all combinations: ............... . * (This is easy with AGGREGATE, but impossible to analyze . * because there are so many combinations.) . AGGREGATE OUTFILE=* /BREAK = YR1_PVD TO YR3_WRWK /INSTANCES 'Occurrences of this combination' = N. FORMATS INSTANCES (F4). TEMPORARY. STRING SPACE(A24). LIST. List |-----------------------------|---------------------------| |Output Created |24-OCT-2006 16:43:48 | |-----------------------------|---------------------------| YR1_ YR1_ YR1_ YR2_ YR2_ YR2_ YR3_ YR3_ YR3_ PVD CRNS WRWK PVD CRNS WRWK PVD CRNS WRWK INSTANCES SPACE 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 1 0 0 1 0 1 1 0 1 0 0 0 1 0 0 0 1 0 1 0 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 0 1 0 1 1 0 0 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 1 1 0 1 1 0 0 1 1 0 1 1 1 0 0 1 1 1 1 0 1 1 1 0 1 0 0 0 1 1 0 0 0 0 1 0 0 1 1 1 0 0 1 1 0 0 0 1 1 1 1 0 0 0 0 1 0 1 1 1 1 0 0 0 1 0 0 0 1 1 1 1 0 1 0 0 0 1 1 Number of cases read: 20 Number of cases listed: 20 |
In reply to this post by Manmit Shrimali-2
I want to count all categories i.e. all cities and all years. I did try
with aggregate but I get lot of data as my n is 150 so imagine the combinations. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow Sent: Wednesday, October 25, 2006 2:28 AM To: [hidden email] Subject: Re: Looping complexity for complex situation At 06:20 PM 10/22/2006, Manmit Shrimali wrote: >I will have multiple files THAT shouldn't be a problem. You need to combine the multiple files into one, by whatever is the appropriate technique - ADD FILES, MATCH FILES, or whatever. >My data is coded as follows: City_year1 New York Yes/No and so on. Like this (SPSS draft output)? * "My data is coded: City_year1 New York Yes/No and so on." . * Like this? YR1_PVD 'Visited Providence in year 1' . * YR1_CRNS 'Visited Cranston in year 1' . * YR1_WRWK 'Visited Warwick in year 1' . * YR2_PVD 'Visited Providence in year 2' . * YR2_CRNS 'Visited Cranston in year 2' . * YR2_WRWK 'Visited Warwick in year 2' . * YR3_PVD 'Visited Providence in year 3' . * YR3_CRNS 'Visited Cranston in year 3' . * YR3_WRWK 'Visited Warwick in year 3' . GET FILE=TESTDATA. TEMPORARY. STRING SPACE(A28). LIST. List |-----------------------------|---------------------------| |Output Created |24-OCT-2006 16:43:46 | |-----------------------------|---------------------------| C:\Documents and Settings\Richard\My Documents \Temporary\SPSS \2006-10-23 Shrimali - Looping complexity for complex situation.SAV CASE YR1_ YR1_ YR1_ YR2_ YR2_ YR2_ YR3_ YR3_ YR3_ _NUM PVD CRNS WRWK PVD CRNS WRWK PVD CRNS WRWK SPACE 001 1 0 0 1 1 0 0 0 1 002 1 1 1 0 1 0 0 0 1 003 0 1 1 0 1 1 0 0 1 004 0 1 0 1 1 1 1 0 0 005 0 0 0 0 1 0 1 0 0 006 0 1 0 0 1 0 0 1 0 007 0 1 0 0 0 1 0 0 0 008 0 1 1 0 0 0 1 0 1 009 1 1 0 0 0 0 1 0 1 010 0 0 1 1 0 0 1 0 1 011 0 0 1 0 0 0 1 0 0 012 0 1 1 1 0 1 0 0 0 013 0 0 0 0 0 0 0 0 1 014 1 0 0 0 0 1 0 0 1 015 0 0 0 0 1 0 0 0 0 016 0 0 1 0 0 0 0 0 0 017 1 1 0 0 0 1 0 0 0 018 0 1 1 0 0 1 0 1 0 019 0 0 0 0 0 1 0 0 1 020 0 1 1 1 0 0 1 1 1 Number of cases read: 20 Number of cases listed: 20 >I can have multiple combination and I want to get the analysis at >unique level. For e.g. let say in 1st year I get following group: >10 people visited city a b and c [...] >40 visited city fgh > >For each group I want to see which cities they visited in following >years till 5 year. That *sounds* like you want to count all different patterns of visiting over the five years: the set of cities visited the first year BY the set visited the second year BY the set visited the third year.... That's easy with AGGREGATE (see below), but useless. With 3 cities and 3 years (in my test data) there are 2**(3*3)=512 possible combinations; as you see none occurs even twice in this test data. 35 cities and 5 years gives 2**(35*5)=5*10**52 combinations. In the test data I've posted - three cities, three year - what are the categories you'd like to count? (By the way, Hillel Vardi described this needing a many-to-many match. It probably doesn't. If your data is organized like the test data above, the many-to-many match is already done in the ata records.) Illustration - count all combinations (SPSS draft output): * "If 10 people visited city a in first year, I want to know,. * which cities they visited in year three, let say 5 visited . * and 5 did not visit at all. Out of 5 who visited in year . * two, which cities they visited in year 3 and so on" . * Possibility one: count all combinations: ............... . * (This is easy with AGGREGATE, but impossible to analyze . * because there are so many combinations.) . AGGREGATE OUTFILE=* /BREAK = YR1_PVD TO YR3_WRWK /INSTANCES 'Occurrences of this combination' = N. FORMATS INSTANCES (F4). TEMPORARY. STRING SPACE(A24). LIST. List |-----------------------------|---------------------------| |Output Created |24-OCT-2006 16:43:48 | |-----------------------------|---------------------------| YR1_ YR1_ YR1_ YR2_ YR2_ YR2_ YR3_ YR3_ YR3_ PVD CRNS WRWK PVD CRNS WRWK PVD CRNS WRWK INSTANCES SPACE 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 1 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 1 0 0 1 1 0 0 1 0 1 1 0 1 0 0 0 1 0 0 0 1 0 1 0 0 1 0 0 1 0 1 0 1 0 1 1 1 1 0 0 1 0 1 1 0 0 0 1 0 1 1 0 1 1 0 0 1 0 1 0 1 0 1 1 0 1 1 0 0 1 1 0 1 1 1 0 0 1 1 1 1 0 1 1 1 0 1 0 0 0 1 1 0 0 0 0 1 0 0 1 1 1 0 0 1 1 0 0 0 1 1 1 1 0 0 0 0 1 0 1 1 1 1 0 0 0 1 0 0 0 1 1 1 1 0 1 0 0 0 1 1 Number of cases read: 20 Number of cases listed: 20 |
At 09:56 PM 10/24/2006, Manmit Shrimali wrote:
>I want to count all categories i.e. all cities and all years. I did >try with aggregate but I get lot of data as my n is 150 so imagine the >combinations. OK. Yes, I can imagine. Or calculate - more than 1E52 possibilities. And you have 150 records all told? The problem isn't how to calculate, using AGGREGATE or otherwise. The question is, what you *want* to count; and simplifying what you're counting enough that it's meaningful with 150 case. Yes, counting all possible combinations of cities and visits is ridiculous; I said that, though I posted the AGGREGATE code that would do it. Can you help us understand what it is you are looking for? I haven't understood, from your descriptions. As an example, can you give the categories you want to count, and the counts in each category, as they'd be from the test data I posted? -Good luck, Richard |
Free forum by Nabble | Edit this page |