Looping complexity for complex situation

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Looping complexity for complex situation

Manmit Shrimali-2
Hell Team:

I am writing after spending significant amount time but in futile. We
have asked respondents to provide list of cities they travel in span of
five years. Every year they trave to several city. Data looks as follows
ID CITYA_YEAR1 CITYB_YEAR1 _CITYC_YEAR 1 CITYA_YEAR2 CITYB_YEAR2
CITYC_YEAR2 CITYA_YEAR3 and so on. If user has visited the city then his
response will be captured as 1 other it will be missing. There are 10
cities to pick from for each year.

Analysis: I want to prepare a tree that explain the flow of their
journey i.e. how many visited city a in their first year - out of those
which cities they visited in second year then out of those which in 3rd
year and so. There can be several combination of cities and I want to
create flow of each combinations and then get count and frequency.

I tried hard to come up shorter solution but could not think any.
Terribly long route is take frequency of cities visited in first year.
Let say 10 people selected city a and b and 20 selected city c. Apply
filter for those who visited city a and b then take the frequency of
cities they visited in second year and so on.

Any advise will be great help.

Thanks,
Manmit
Reply | Threaded
Open this post in threaded view
|

Re: Looping complexity for complex situation

Richard Ristow
At 12:51 AM 10/21/2006, Manmit Shrimali wrote:

>I am writing after spending significant amount time but in futile. We
>have asked respondents to provide list of cities they travel in span
>of five years. Every year they trave to several city. Data looks as
>follows ID CITYA_YEAR1 CITYB_YEAR1 _CITYC_YEAR 1 CITYA_YEAR2
>CITYB_YEAR2 CITYC_YEAR2 CITYA_YEAR3 and so on. If user has visited the
>city then his response will be captured as 1 other it will be missing.
>There are 10 cities to pick from for each year.

This one will be easier with some data to work from. And I'm not sure
how your variables are coded. Is it,

CITYA_YEAR1   New York   Yes/No
CITYB_YEAR1   Boston     Yes/No
CITYC_YEAR1   Washington Yes/No

Or is it more like

CITYA_YEAR1   1
CITYB_YEAR1   2
CITYC_YEAR1   0

With

VALUE LABELS CITYA_YEAR1 CITYB_YEAR1 CITYC_YEAR1
     1 'New York'
     2 'Boston'
     3 'Washington'
     0 'None'.

>Analysis: I want to prepare a tree that explain the flow of their
>journey i.e. how many visited city a in their first year - out of
>those which cities they visited in second year then out of those which
>in 3rd year and so. There can be several combination of cities and I
>want to create flow of each combinations and then get count and
>frequency.

This would help if we had a little more data. Do you want to count
everybody who visited New York in their first year, Boston in their
second, and Washington in their third? If so, and if you data is
organized in the second way above, it's reasonably easy using
AGGREGATE.

Give us two or three data records, and what your counts would be if
those records were the only data you have.

Good luck,
Richard Ristow
Reply | Threaded
Open this post in threaded view
|

Re: Looping complexity for complex situation

Manmit Shrimali-2
In reply to this post by Manmit Shrimali-2
Hi Richard:

Thanks for your response. My data is coded as follows:
City_year1 New York Yes/No and so on.

I did thought of aggregate function but the problem is that I will have
multiple files and again multiple combination. I have 35 cities. Now, if
10 people visited citya in first year, I want to know, out of this 10
which cities they visited in year three, let say 5 visited and 5 did not
visit at all. Now out of 5 who visited in year two, which cities they
visited in year 3 and so on till year 5.

So you see, I can have multiple combination and I want to get the
analysis at unique level. For e.g. let say in 1st year I get following
group:
10 people visited citya b and c
20 visited only c
5 visited city d
40 visited city fgh

Now for each group I want to see which cities they visited in following
years till 5 year.

Your input is highly appreciated.



-----Original Message-----
From: Richard Ristow [mailto:[hidden email]]
Sent: Sunday, October 22, 2006 7:50 AM
To: Manmit Shrimali; [hidden email]
Subject: Re: Looping complexity for complex situation

At 12:51 AM 10/21/2006, Manmit Shrimali wrote:

>I am writing after spending significant amount time but in futile. We
>have asked respondents to provide list of cities they travel in span
>of five years. Every year they trave to several city. Data looks as
>follows ID CITYA_YEAR1 CITYB_YEAR1 _CITYC_YEAR 1 CITYA_YEAR2
>CITYB_YEAR2 CITYC_YEAR2 CITYA_YEAR3 and so on. If user has visited the
>city then his response will be captured as 1 other it will be missing.
>There are 10 cities to pick from for each year.

This one will be easier with some data to work from. And I'm not sure
how your variables are coded. Is it,

CITYA_YEAR1   New York   Yes/No
CITYB_YEAR1   Boston     Yes/No
CITYC_YEAR1   Washington Yes/No

Or is it more like

CITYA_YEAR1   1
CITYB_YEAR1   2
CITYC_YEAR1   0

With

VALUE LABELS CITYA_YEAR1 CITYB_YEAR1 CITYC_YEAR1
     1 'New York'
     2 'Boston'
     3 'Washington'
     0 'None'.

>Analysis: I want to prepare a tree that explain the flow of their
>journey i.e. how many visited city a in their first year - out of
>those which cities they visited in second year then out of those which
>in 3rd year and so. There can be several combination of cities and I
>want to create flow of each combinations and then get count and
>frequency.

This would help if we had a little more data. Do you want to count
everybody who visited New York in their first year, Boston in their
second, and Washington in their third? If so, and if you data is
organized in the second way above, it's reasonably easy using
AGGREGATE.

Give us two or three data records, and what your counts would be if
those records were the only data you have.

Good luck,
Richard Ristow
Reply | Threaded
Open this post in threaded view
|

Re: Looping complexity for complex situation

hillel vardi
In reply to this post by Manmit Shrimali-2
Shalom


What you are trying to do need many to many match which is not available
in SPSS you can do it in MSACESS

Here is a small example of 6 cites and 3 years to demonstrate  the
complicity


city a b c d e f

year  1 2  3


if the data is


person  year  city_a city_b city_c city_d city_e city_f

  1          1       y        y         n         n         n        n

  1         2       n        n          y         y         n        n

  1        3        n        n         n         n          y        y


then the combination are    years                             1    2      3

                                          city's   combination
a     c     e

                                          city's   combination
a     c     f

                                          city's   combination
a     d     e

                                           city's  combination
a     d     f

                                           city's  combination
b      c    e

                                            ect'               ....

that is for each person you have to match each city in year 1 with each
in year 2 with each in year 3 ect'


I think that even if you will succeed to do the restructure of the data
the number of combination ( permutation) will be so big that it will not
tall you any thing .


Hillel Vardi

Ben Gurion U

Israel


Manmit Shrimali wrote:

> Hell Team:
>
> I am writing after spending significant amount time but in futile. We
> have asked respondents to provide list of cities they travel in span of
> five years. Every year they trave to several city. Data looks as follows
> ID CITYA_YEAR1 CITYB_YEAR1 _CITYC_YEAR 1 CITYA_YEAR2 CITYB_YEAR2
> CITYC_YEAR2 CITYA_YEAR3 and so on. If user has visited the city then his
> response will be captured as 1 other it will be missing. There are 10
> cities to pick from for each year.
>
> Analysis: I want to prepare a tree that explain the flow of their
> journey i.e. how many visited city a in their first year - out of those
> which cities they visited in second year then out of those which in 3rd
> year and so. There can be several combination of cities and I want to
> create flow of each combinations and then get count and frequency.
>
> I tried hard to come up shorter solution but could not think any.
> Terribly long route is take frequency of cities visited in first year.
> Let say 10 people selected city a and b and 20 selected city c. Apply
> filter for those who visited city a and b then take the frequency of
> cities they visited in second year and so on.
>
> Any advise will be great help.
>
> Thanks,
> Manmit
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Looping complexity for complex situation

Maguin, Eugene
A many to many match can not be done directly, that is true. However, it can
be done. I gave an example of how to do it in the last couple of months,
maybe the last month. I would also bet the Ray has example in his book,
available from the spss web site.

Gene Maguin
Reply | Threaded
Open this post in threaded view
|

Re: Looping complexity for complex situation

Richard Ristow
In reply to this post by Manmit Shrimali-2
At 06:20 PM 10/22/2006, Manmit Shrimali wrote:

>I will have multiple files

THAT shouldn't be a problem. You need to combine the multiple files
into one, by whatever is the appropriate technique - ADD FILES, MATCH
FILES, or whatever.

>My data is coded as follows: City_year1 New York Yes/No and so on.

Like this (SPSS draft output)?

*  "My data is coded: City_year1 New York Yes/No and so on."  .
*  Like this?  YR1_PVD  'Visited Providence in year 1'        .
*              YR1_CRNS 'Visited Cranston   in year 1'        .
*              YR1_WRWK 'Visited Warwick    in year 1'        .
*              YR2_PVD  'Visited Providence in year 2'        .
*              YR2_CRNS 'Visited Cranston   in year 2'        .
*              YR2_WRWK 'Visited Warwick    in year 2'        .
*              YR3_PVD  'Visited Providence in year 3'        .
*              YR3_CRNS 'Visited Cranston   in year 3'        .
*              YR3_WRWK 'Visited Warwick    in year 3'        .
GET FILE=TESTDATA.
TEMPORARY.
STRING SPACE(A28).
LIST.

List
|-----------------------------|---------------------------|
|Output Created               |24-OCT-2006 16:43:46       |
|-----------------------------|---------------------------|
C:\Documents and Settings\Richard\My Documents
   \Temporary\SPSS
   \2006-10-23 Shrimali - Looping complexity for complex situation.SAV

CASE YR1_ YR1_ YR1_ YR2_ YR2_ YR2_ YR3_ YR3_ YR3_
_NUM PVD  CRNS WRWK PVD  CRNS WRWK PVD  CRNS WRWK SPACE

  001   1    0    0    1    1    0    0    0    1
  002   1    1    1    0    1    0    0    0    1
  003   0    1    1    0    1    1    0    0    1
  004   0    1    0    1    1    1    1    0    0
  005   0    0    0    0    1    0    1    0    0
  006   0    1    0    0    1    0    0    1    0
  007   0    1    0    0    0    1    0    0    0
  008   0    1    1    0    0    0    1    0    1
  009   1    1    0    0    0    0    1    0    1
  010   0    0    1    1    0    0    1    0    1
  011   0    0    1    0    0    0    1    0    0
  012   0    1    1    1    0    1    0    0    0
  013   0    0    0    0    0    0    0    0    1
  014   1    0    0    0    0    1    0    0    1
  015   0    0    0    0    1    0    0    0    0
  016   0    0    1    0    0    0    0    0    0
  017   1    1    0    0    0    1    0    0    0
  018   0    1    1    0    0    1    0    1    0
  019   0    0    0    0    0    1    0    0    1
  020   0    1    1    1    0    0    1    1    1

Number of cases read:  20    Number of cases listed:  20


>I can have multiple combination and I want to get the analysis at
>unique level. For e.g. let say in 1st year I get following group:
>10 people visited city a b and c
  [...]
>40 visited city fgh
>
>For each group I want to see which cities they visited in following
>years till 5 year.

That *sounds* like you want to count all different patterns of visiting
over the five years: the set of cities visited the first year BY the
set visited the second year BY the set visited the third year....

That's easy with AGGREGATE (see below), but useless. With 3 cities and
3 years (in my test data) there are 2**(3*3)=512 possible combinations;
as you see none occurs even twice in this test data. 35 cities and 5
years gives 2**(35*5)=5*10**52 combinations.

In the test data I've posted - three cities, three year - what are the
categories you'd like to count?

(By the way, Hillel Vardi described this needing a many-to-many match.
It probably doesn't. If your data is organized like the test data
above, the many-to-many match is already done in the ata records.)

Illustration - count all combinations (SPSS draft output):

*  "If 10 people visited city a in first year, I want to know,.
*  which cities they visited in year three, let say 5 visited .
*  and 5 did not visit at all. Out of 5 who visited in year   .
*  two, which cities they visited in year 3 and so on"        .

*  Possibility one: count all combinations:  ...............  .
*  (This is easy with AGGREGATE, but impossible to analyze    .
*  because there are so many combinations.)                   .

AGGREGATE OUTFILE=*
   /BREAK = YR1_PVD  TO        YR3_WRWK
   /INSTANCES 'Occurrences of this combination' = N.
FORMATS INSTANCES (F4).
TEMPORARY.
STRING SPACE(A24).
LIST.

List
|-----------------------------|---------------------------|
|Output Created               |24-OCT-2006 16:43:48       |
|-----------------------------|---------------------------|
YR1_ YR1_ YR1_ YR2_ YR2_ YR2_ YR3_ YR3_ YR3_
PVD  CRNS WRWK PVD  CRNS WRWK PVD  CRNS WRWK INSTANCES SPACE

   0    0    0    0    0    0    0    0    1        1
   0    0    0    0    0    1    0    0    1        1
   0    0    0    0    1    0    0    0    0        1
   0    0    0    0    1    0    1    0    0        1
   0    0    1    0    0    0    0    0    0        1
   0    0    1    0    0    0    1    0    0        1
   0    0    1    1    0    0    1    0    1        1
   0    1    0    0    0    1    0    0    0        1
   0    1    0    0    1    0    0    1    0        1
   0    1    0    1    1    1    1    0    0        1
   0    1    1    0    0    0    1    0    1        1
   0    1    1    0    0    1    0    1    0        1
   0    1    1    0    1    1    0    0    1        1
   0    1    1    1    0    0    1    1    1        1
   0    1    1    1    0    1    0    0    0        1
   1    0    0    0    0    1    0    0    1        1
   1    0    0    1    1    0    0    0    1        1
   1    1    0    0    0    0    1    0    1        1
   1    1    0    0    0    1    0    0    0        1
   1    1    1    0    1    0    0    0    1        1

Number of cases read:  20    Number of cases listed:  20
Reply | Threaded
Open this post in threaded view
|

Re: Looping complexity for complex situation

Manmit Shrimali-2
In reply to this post by Manmit Shrimali-2
I want to count all categories i.e. all cities and all years. I did try
with aggregate but I get lot of data as my n is 150 so imagine the
combinations.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Richard Ristow
Sent: Wednesday, October 25, 2006 2:28 AM
To: [hidden email]
Subject: Re: Looping complexity for complex situation

At 06:20 PM 10/22/2006, Manmit Shrimali wrote:

>I will have multiple files

THAT shouldn't be a problem. You need to combine the multiple files
into one, by whatever is the appropriate technique - ADD FILES, MATCH
FILES, or whatever.

>My data is coded as follows: City_year1 New York Yes/No and so on.

Like this (SPSS draft output)?

*  "My data is coded: City_year1 New York Yes/No and so on."  .
*  Like this?  YR1_PVD  'Visited Providence in year 1'        .
*              YR1_CRNS 'Visited Cranston   in year 1'        .
*              YR1_WRWK 'Visited Warwick    in year 1'        .
*              YR2_PVD  'Visited Providence in year 2'        .
*              YR2_CRNS 'Visited Cranston   in year 2'        .
*              YR2_WRWK 'Visited Warwick    in year 2'        .
*              YR3_PVD  'Visited Providence in year 3'        .
*              YR3_CRNS 'Visited Cranston   in year 3'        .
*              YR3_WRWK 'Visited Warwick    in year 3'        .
GET FILE=TESTDATA.
TEMPORARY.
STRING SPACE(A28).
LIST.

List
|-----------------------------|---------------------------|
|Output Created               |24-OCT-2006 16:43:46       |
|-----------------------------|---------------------------|
C:\Documents and Settings\Richard\My Documents
   \Temporary\SPSS
   \2006-10-23 Shrimali - Looping complexity for complex situation.SAV

CASE YR1_ YR1_ YR1_ YR2_ YR2_ YR2_ YR3_ YR3_ YR3_
_NUM PVD  CRNS WRWK PVD  CRNS WRWK PVD  CRNS WRWK SPACE

  001   1    0    0    1    1    0    0    0    1
  002   1    1    1    0    1    0    0    0    1
  003   0    1    1    0    1    1    0    0    1
  004   0    1    0    1    1    1    1    0    0
  005   0    0    0    0    1    0    1    0    0
  006   0    1    0    0    1    0    0    1    0
  007   0    1    0    0    0    1    0    0    0
  008   0    1    1    0    0    0    1    0    1
  009   1    1    0    0    0    0    1    0    1
  010   0    0    1    1    0    0    1    0    1
  011   0    0    1    0    0    0    1    0    0
  012   0    1    1    1    0    1    0    0    0
  013   0    0    0    0    0    0    0    0    1
  014   1    0    0    0    0    1    0    0    1
  015   0    0    0    0    1    0    0    0    0
  016   0    0    1    0    0    0    0    0    0
  017   1    1    0    0    0    1    0    0    0
  018   0    1    1    0    0    1    0    1    0
  019   0    0    0    0    0    1    0    0    1
  020   0    1    1    1    0    0    1    1    1

Number of cases read:  20    Number of cases listed:  20


>I can have multiple combination and I want to get the analysis at
>unique level. For e.g. let say in 1st year I get following group:
>10 people visited city a b and c
  [...]
>40 visited city fgh
>
>For each group I want to see which cities they visited in following
>years till 5 year.

That *sounds* like you want to count all different patterns of visiting
over the five years: the set of cities visited the first year BY the
set visited the second year BY the set visited the third year....

That's easy with AGGREGATE (see below), but useless. With 3 cities and
3 years (in my test data) there are 2**(3*3)=512 possible combinations;
as you see none occurs even twice in this test data. 35 cities and 5
years gives 2**(35*5)=5*10**52 combinations.

In the test data I've posted - three cities, three year - what are the
categories you'd like to count?

(By the way, Hillel Vardi described this needing a many-to-many match.
It probably doesn't. If your data is organized like the test data
above, the many-to-many match is already done in the ata records.)

Illustration - count all combinations (SPSS draft output):

*  "If 10 people visited city a in first year, I want to know,.
*  which cities they visited in year three, let say 5 visited .
*  and 5 did not visit at all. Out of 5 who visited in year   .
*  two, which cities they visited in year 3 and so on"        .

*  Possibility one: count all combinations:  ...............  .
*  (This is easy with AGGREGATE, but impossible to analyze    .
*  because there are so many combinations.)                   .

AGGREGATE OUTFILE=*
   /BREAK = YR1_PVD  TO        YR3_WRWK
   /INSTANCES 'Occurrences of this combination' = N.
FORMATS INSTANCES (F4).
TEMPORARY.
STRING SPACE(A24).
LIST.

List
|-----------------------------|---------------------------|
|Output Created               |24-OCT-2006 16:43:48       |
|-----------------------------|---------------------------|
YR1_ YR1_ YR1_ YR2_ YR2_ YR2_ YR3_ YR3_ YR3_
PVD  CRNS WRWK PVD  CRNS WRWK PVD  CRNS WRWK INSTANCES SPACE

   0    0    0    0    0    0    0    0    1        1
   0    0    0    0    0    1    0    0    1        1
   0    0    0    0    1    0    0    0    0        1
   0    0    0    0    1    0    1    0    0        1
   0    0    1    0    0    0    0    0    0        1
   0    0    1    0    0    0    1    0    0        1
   0    0    1    1    0    0    1    0    1        1
   0    1    0    0    0    1    0    0    0        1
   0    1    0    0    1    0    0    1    0        1
   0    1    0    1    1    1    1    0    0        1
   0    1    1    0    0    0    1    0    1        1
   0    1    1    0    0    1    0    1    0        1
   0    1    1    0    1    1    0    0    1        1
   0    1    1    1    0    0    1    1    1        1
   0    1    1    1    0    1    0    0    0        1
   1    0    0    0    0    1    0    0    1        1
   1    0    0    1    1    0    0    0    1        1
   1    1    0    0    0    0    1    0    1        1
   1    1    0    0    0    1    0    0    0        1
   1    1    1    0    1    0    0    0    1        1

Number of cases read:  20    Number of cases listed:  20
Reply | Threaded
Open this post in threaded view
|

Re: Looping complexity for complex situation

Richard Ristow
At 09:56 PM 10/24/2006, Manmit Shrimali wrote:

>I want to count all categories i.e. all cities and all years. I did
>try with aggregate but I get lot of data as my n is 150 so imagine the
>combinations.

OK. Yes, I can imagine. Or calculate - more than 1E52 possibilities.
And you have 150 records all told?

The problem isn't how to calculate, using AGGREGATE or otherwise. The
question is, what you *want* to count; and simplifying what you're
counting enough that it's meaningful with 150 case.

Yes, counting all possible combinations of cities and visits is
ridiculous; I said that, though I posted the AGGREGATE code that would
do it.

Can you help us understand what it is you are looking for? I haven't
understood, from your descriptions. As an example, can you give the
categories you want to count, and the counts in each category, as
they'd be from the test data I posted?

-Good luck,
  Richard