|
Hi
I have data collected from a rolling survey. The data is collected each quarter, however to obtain sufficient sample size data from the previous 4 quarters are combined. Lets ignore the starting quarters. So each data record has: Person_no mode travel_time q1 q2 q3 q4 q5 q6 q7 q8. - 4 of the variables q1 to q8 will have valid (weight) data and the other 4 quarter variables will have missing values. This is a travel survey where mode is a category variable 1=Car 2=Train and etc. Travel_time is continuous with values from 1 to say 100. Person_no is just an ID so we can ignore this. If I wish to obtain a "weighted" count of the people in each quarter I can use the following command. MEANS q1 q2 q3 q4 q5 q6 q7 q8 BY one / CELLS = SUM. - where is a dummy variable which always takes the value 1. If want the total trips by each mode I can use. MEANS q1 q2 q3 q4 q5 q6 q7 q8 BY mode / CELLS = SUM. What I want to do is obtain a table with the mean value of travel_time for each quarter. I am struggling to find a way to do this easily, given that I essentially have weights on my q1 .. q8 variables. I know that I can reformat the data by using a LOOP and write out 4 records from each original record and then process the data as per normal. I was wondering if anyone had a suggestion that avoided this approach which I have overlooked. Suggestions would be very welcome. Cheers Frank Milthorpe News: The majority of staff from the Transport and Population Data Centre have moved to the Ministry of Transport. Contact details will be progressively changed over the next few months. Frank Milthorpe Senior Manager, Transport Model Development Transport Data Centre NSW Ministry of Transport Direct: (02) 9206 8609 TDC Phone: (02) 9206 8611 Fax: (02) 9206 8691 Street: Level 7, Tower 2 (TNT Tower East), 1 Lawson Square, Redfern NSW 2016 Postal: GPO Box 1620, Sydney NSW 2001 Email: [hidden email] Web: www.transport.nsw.gov.au/tdc ------------------------------------------------------------------------------------------------------------------------------------------------ This message is intended for the addressee named and may contain confidential/privileged information. If you are not the intended recipient, please delete it and notify the sender. Views expressed in this message are those of the individual sender, and are not necessarily the views of the Department. You should scan any attached files for viruses. ------------------------------------------------------------------------------------------------------------------------------------------------ |
|
Hello Frank,
I hope I understand your question. Here is a tested syntax to solve it: /* first some sample data where the qX variables take values 1 if the records refer to that quarter, otherwise is equal to 0 */ DATA LIST LIST/Person_no mode travel_time q1 q2 q3 q4 q5 q6 q7 q8. BEGIN DATA 1000 1 51 1 0 0 0 0 0 0 0 1000 2 10 1 0 0 0 0 0 0 0 1000 3 74 1 0 0 0 0 0 0 0 1000 1 68 0 1 0 0 0 0 0 0 1000 2 5 0 1 0 0 0 0 0 0 1000 3 70 0 1 0 0 0 0 0 0 1000 1 58 0 0 1 0 0 0 0 0 1000 2 22 0 0 1 0 0 0 0 0 1000 3 68 0 0 1 0 0 0 0 0 1000 1 41 0 0 0 1 0 0 0 0 1000 2 0 0 0 0 1 0 0 0 0 1000 3 30 0 0 0 1 0 0 0 0 1001 1 35 1 0 0 0 0 0 0 0 1001 2 25 1 0 0 0 0 0 0 0 1001 3 71 1 0 0 0 0 0 0 0 1001 1 43 0 1 0 0 0 0 0 0 1001 2 18 0 1 0 0 0 0 0 0 1001 3 14 0 1 0 0 0 0 0 0 1001 1 29 0 0 1 0 0 0 0 0 1001 2 37 0 0 1 0 0 0 0 0 1001 3 78 0 0 1 0 0 0 0 0 1001 1 100 0 0 0 1 0 0 0 0 1001 2 12 0 0 0 1 0 0 0 0 1001 3 0 0 0 0 1 0 0 0 0 END DATA. VAL LABEL MODE 1 'Car' 2 'Train' 3 'Bus'. /* then let's create a variable indicating the quarter and drop the q1 to q8 */ IF q1=1 quarter=1. IF q2=1 quarter=2. IF q3=1 quarter=3. IF q4=1 quarter=4. IF q5=1 quarter=5. IF q6=1 quarter=6. IF q7=1 quarter=7. IF q8=1 quarter=8. add files /file=* /drop=q1 to q8. exe. /* finally let's get the table you need */ means tables=travel_time by quarter /cells mean sum. /* this one you might not need it but it will give you mean and total travel_time by mode */ means tables=travel_time by mode /cells mean sum. HTH, Luca Mr. Luca MEYER Market research, data analysis & more www.lucameyer.com - Tel: +39.339.495.00.21 -----Messaggio originale----- Da: SPSSX(r) Discussion [mailto:[hidden email]] Per conto di Frank Milthorpe Inviato: mercoledì 13 giugno 2007 9.23 A: [hidden email] Oggetto: Trying to obtain means with multiple weight variables in a single step Hi I have data collected from a rolling survey. The data is collected each quarter, however to obtain sufficient sample size data from the previous 4 quarters are combined. Lets ignore the starting quarters. So each data record has: Person_no mode travel_time q1 q2 q3 q4 q5 q6 q7 q8. - 4 of the variables q1 to q8 will have valid (weight) data and the other 4 quarter variables will have missing values. This is a travel survey where mode is a category variable 1=Car 2=Train and etc. Travel_time is continuous with values from 1 to say 100. Person_no is just an ID so we can ignore this. If I wish to obtain a "weighted" count of the people in each quarter I can use the following command. MEANS q1 q2 q3 q4 q5 q6 q7 q8 BY one / CELLS = SUM. - where is a dummy variable which always takes the value 1. If want the total trips by each mode I can use. MEANS q1 q2 q3 q4 q5 q6 q7 q8 BY mode / CELLS = SUM. What I want to do is obtain a table with the mean value of travel_time for each quarter. I am struggling to find a way to do this easily, given that I essentially have weights on my q1 .. q8 variables. I know that I can reformat the data by using a LOOP and write out 4 records from each original record and then process the data as per normal. I was wondering if anyone had a suggestion that avoided this approach which I have overlooked. Suggestions would be very welcome. Cheers Frank Milthorpe News: The majority of staff from the Transport and Population Data Centre have moved to the Ministry of Transport. Contact details will be progressively changed over the next few months. Frank Milthorpe Senior Manager, Transport Model Development Transport Data Centre NSW Ministry of Transport Direct: (02) 9206 8609 TDC Phone: (02) 9206 8611 Fax: (02) 9206 8691 Street: Level 7, Tower 2 (TNT Tower East), 1 Lawson Square, Redfern NSW 2016 Postal: GPO Box 1620, Sydney NSW 2001 Email: [hidden email] Web: www.transport.nsw.gov.au/tdc ---------------------------------------------------------------------------- -------------------------------------------------------------------- This message is intended for the addressee named and may contain confidential/privileged information. If you are not the intended recipient, please delete it and notify the sender. Views expressed in this message are those of the individual sender, and are not necessarily the views of the Department. You should scan any attached files for viruses. ---------------------------------------------------------------------------- -------------------------------------------------------------------- No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.472 / Virus Database: 269.8.15/847 - Release Date: 12/06/2007 21.42 No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.472 / Virus Database: 269.8.15/847 - Release Date: 12/06/2007 21.42 |
|
Luca and the rest of the list
This is a good start, however there is one aspect which I clearly didn't explain in sufficient detail in my original posting. The weights for each quarter vary. This is because the surveyed sample for the rolling 12 months will vary and the survey sample needs to be adjusted to match the population. Hence instead of the data looking like: 1000 1 51 1 0 0 0 0 0 0 0 It is actually 1000 1 51 10 15 20 16 . . . . In the above example there are four quarters with weights (that vary) between each quarter and four quarters which are not valid (shown as missing in the above). Each person can have a different weight which is based on location, age, gender and household type. Needless to say I need to use the weight already computed and stored on the file. Currently I duplicate 4 records from each original record. I have done this using a LOOP and XSAVE, however it would be nice to know if it possible to undertake my analysis without having to do the data transformations. Suggestions / comments would be very welcome. Cheers Frank >>> "Luca Meyer" <[hidden email]> 14/06/2007 12:35 am >>> Hello Frank, I hope I understand your question. Here is a tested syntax to solve it: /* first some sample data where the qX variables take values 1 if the records refer to that quarter, otherwise is equal to 0 */ DATA LIST LIST/Person_no mode travel_time q1 q2 q3 q4 q5 q6 q7 q8. BEGIN DATA 1000 1 51 1 0 0 0 0 0 0 0 1000 2 10 1 0 0 0 0 0 0 0 1000 3 74 1 0 0 0 0 0 0 0 1000 1 68 0 1 0 0 0 0 0 0 1000 2 5 0 1 0 0 0 0 0 0 1000 3 70 0 1 0 0 0 0 0 0 1000 1 58 0 0 1 0 0 0 0 0 1000 2 22 0 0 1 0 0 0 0 0 1000 3 68 0 0 1 0 0 0 0 0 1000 1 41 0 0 0 1 0 0 0 0 1000 2 0 0 0 0 1 0 0 0 0 1000 3 30 0 0 0 1 0 0 0 0 1001 1 35 1 0 0 0 0 0 0 0 1001 2 25 1 0 0 0 0 0 0 0 1001 3 71 1 0 0 0 0 0 0 0 1001 1 43 0 1 0 0 0 0 0 0 1001 2 18 0 1 0 0 0 0 0 0 1001 3 14 0 1 0 0 0 0 0 0 1001 1 29 0 0 1 0 0 0 0 0 1001 2 37 0 0 1 0 0 0 0 0 1001 3 78 0 0 1 0 0 0 0 0 1001 1 100 0 0 0 1 0 0 0 0 1001 2 12 0 0 0 1 0 0 0 0 1001 3 0 0 0 0 1 0 0 0 0 END DATA. VAL LABEL MODE 1 'Car' 2 'Train' 3 'Bus'. /* then let's create a variable indicating the quarter and drop the q1 to q8 */ IF q1=1 quarter=1. IF q2=1 quarter=2. IF q3=1 quarter=3. IF q4=1 quarter=4. IF q5=1 quarter=5. IF q6=1 quarter=6. IF q7=1 quarter=7. IF q8=1 quarter=8. add files /file=* /drop=q1 to q8. exe. /* finally let's get the table you need */ means tables=travel_time by quarter /cells mean sum. /* this one you might not need it but it will give you mean and total travel_time by mode */ means tables=travel_time by mode /cells mean sum. HTH, Luca Mr. Luca MEYER Market research, data analysis & more www.lucameyer.com - Tel: +39.339.495.00.21 -----Messaggio originale----- Da: SPSSX(r) Discussion [mailto:[hidden email]] Per conto di Frank Milthorpe Inviato: mercoledì 13 giugno 2007 9.23 A: [hidden email] Oggetto: Trying to obtain means with multiple weight variables in a single step Hi I have data collected from a rolling survey. The data is collected each quarter, however to obtain sufficient sample size data from the previous 4 quarters are combined. Lets ignore the starting quarters. So each data record has: Person_no mode travel_time q1 q2 q3 q4 q5 q6 q7 q8. - 4 of the variables q1 to q8 will have valid (weight) data and the other 4 quarter variables will have missing values. This is a travel survey where mode is a category variable 1=Car 2=Train and etc. Travel_time is continuous with values from 1 to say 100. Person_no is just an ID so we can ignore this. If I wish to obtain a "weighted" count of the people in each quarter I can use the following command. MEANS q1 q2 q3 q4 q5 q6 q7 q8 BY one / CELLS = SUM. - where is a dummy variable which always takes the value 1. If want the total trips by each mode I can use. MEANS q1 q2 q3 q4 q5 q6 q7 q8 BY mode / CELLS = SUM. What I want to do is obtain a table with the mean value of travel_time for each quarter. I am struggling to find a way to do this easily, given that I essentially have weights on my q1 .. q8 variables. I know that I can reformat the data by using a LOOP and write out 4 records from each original record and then process the data as per normal. I was wondering if anyone had a suggestion that avoided this approach which I have overlooked. Suggestions would be very welcome. Cheers Frank Milthorpe News: The majority of staff from the Transport and Population Data Centre have moved to the Ministry of Transport. Contact details will be progressively changed over the next few months. Frank Milthorpe Senior Manager, Transport Model Development Transport Data Centre NSW Ministry of Transport Direct: (02) 9206 8609 TDC Phone: (02) 9206 8611 Fax: (02) 9206 8691 Street: Level 7, Tower 2 (TNT Tower East), 1 Lawson Square, Redfern NSW 2016 Postal: GPO Box 1620, Sydney NSW 2001 Email: [hidden email] Web: www.transport.nsw.gov.au/tdc ---------------------------------------------------------------------------- -------------------------------------------------------------------- This message is intended for the addressee named and may contain confidential/privileged information. If you are not the intended recipient, please delete it and notify the sender. Views expressed in this message are those of the individual sender, and are not necessarily the views of the Department. You should scan any attached files for viruses. ---------------------------------------------------------------------------- -------------------------------------------------------------------- No virus found in this incoming message. Checked by AVG Free Edition. Version: 7.5.472 / Virus Database: 269.8.15/847 - Release Date: 12/06/2007 21.42 No virus found in this outgoing message. Checked by AVG Free Edition. Version: 7.5.472 / Virus Database: 269.8.15/847 - Release Date: 12/06/2007 21.42 ------------------------------------------------------------------------------------------------------------------------------------------------ This message is intended for the addressee named and may contain confidential/privileged information. If you are not the intended recipient, please delete it and notify the sender. Views expressed in this message are those of the individual sender, and are not necessarily the views of the Department. You should scan any attached files for viruses. ------------------------------------------------------------------------------------------------------------------------------------------------ |
|
In reply to this post by Frank Milthorpe
At 03:22 AM 6/13/2007, Frank Milthorpe wrote:
>I have data collected from a rolling survey. The data is collected >each quarter, however to obtain sufficient sample size data from the >previous 4 quarters are combined. Lets ignore the starting quarters. >So each data record has: > Person_no mode travel_time q1 q2 q3 q4 q5 q6 q7 q8. > - 4 of the variables q1 to q8 will have valid (weight) data and > the other 4 quarter variables will have missing values. Ah! OK; you have 'wide' data, i.e. data from all (eight, I guess?) quarters in your study, in one record. But that doesn't make sense of 'mode' and 'travel_time'; those likely don't have only one value for person and quarter. Does each record represent a *trip*, then, with one person possibly having many records that look like this, each with a mode and a travel time? (You can see that it helps a great deal of you tell us what a record *does* represent. I'm doing a lot of inference here.) I suppose that the latest of the four quarters for which there's a weight, is the quarter in which the travel took place? (If I'm right, and each record *does* represent a trip, I'm astounded that you don't have a variable for date of trip. Or maybe you do, but didn't mention it. It would have helped.) >I want to obtain a table with the [weighted?] mean value of >travel_time for each quarter. I am struggling to find a way to do this >easily, given that I essentially have weights on my q1 .. q8 variables. > >I can reformat the data by using a LOOP and write out 4 records from >each original record and then process the data as per normal. I was >wondering if anyone had a suggestion that avoided this approach which >I haveoverlooked. Suggestions would be very welcome. An awful lot of problems can be solved that way, by reformatting data from 'wide' to 'long' form. If the 'long' form you're using is Person_no mode travel_time 1 q1 Person_no mode travel_time 2 q2 Person_no mode travel_time 3 q3 it should be easy with VARSTOCASES, which is easier to code than LOOP/XSAVE logic. In your follow-up, you wrote, >Currently I duplicate 4 records from each original record with a LOOP >and XSAVE. It would be nice to undertake my analysis without having to >do the data transformations. Flat out: Stay with what you've got, except see if VARSTOCASES will work (as it probably will). Code to work with the 'wide' form will be much more complicated; and since you've got a good solution, I can't recommend that either you or we try it. We may be hitting programming aesthetics here. My own taste is for 'long' data. I think it easier to understand; and commonly, the code to use it is simpler, because loops in 'wide' data can often be replaced by using SPSS's long implicit loop through the list of cases. Maybe your aesthetics are the other way, preferring 'wide' forms. De gustibus non disputandum est. But this time, stay with the solution you've got. Of course, keep your original file as well, so THAT stays in form you like. I'd do that myself (never throw away anything you have), even if I used the 'long' form for all analyses. -Good wishes and good luck, Richard |
|
Richard
Each record is a summary of the daily travel of an individual. A person is only surveyed on a single day. So each person has only a single value of travel time. However, the responses from people from 4 quarters are pooled to calculate the quarter based weights. Results are obtained/reported for each quarter. The data for weighting purposes for Quarter 1 is the data collected in q-2,q-1,q0,q1. The data for weighting purposes for Quarter 2 is the data collected in q-1,q0,q1,q2 The data for weighting purposes for Quarter 3 is the data collected in q0,q1,q2,q3 The data for weighting purposes for Quarter 4 is the data collected in q1,q2,q3,q4 The data for weighting purposes for Quarter 5 is the data collected in q2,q3,q4,q5 and etc ... I have a single record for each person. This includes the travel time and 8 weight variables: w_q1, w_q2, w_q3, w_q4 and etc. Four of these weight variables will have real data and the rest will be missing values as shown above. At this stage just ignore the first rows above with the negative zero quarters, for this discussion we can consider someone surveyed in quarter 4. They have weights for quarters 1, 2, 3 & 4. They have no weights for quarters 5, 6, 7 & 8. From my original posting I described the following format. Person_no mode travel_time q1 q2 q3 q4 q5 q6 q7 q8. - 4 of the variables q1 to q8 will have valid (weight) data and the other 4 quarter variables will have missing values. q1 is what I am now calling w_q1 weight for quarter1, w_w2 weight for quarter 2. The trick here is that the data for an individual contains multiple weights, but a single values for the attributes, travel_time, mode and etc. In reality I start with multiple records for a person with a record for each journey they undertook. A journey will have a mode, time, distance and etc. Each of these records always has the same multiple weights values for the individual. However the weights vary between the individuals. I hope this helps to explain the data setup. Sorry that I didn't make the multiple weights for a quarter more explicit in the original posting. There are actually many more variables on the actual data tables which I have not informed you about to try to simplify my posting to the key issue. Thanks for helpful comments. Cheers Frank >>> Richard Ristow <[hidden email]> 15/06/2007 3:19 am >>> At 03:22 AM 6/13/2007, Frank Milthorpe wrote: >I have data collected from a rolling survey. The data is collected >each quarter, however to obtain sufficient sample size data from the >previous 4 quarters are combined. Lets ignore the starting quarters. >So each data record has: > Person_no mode travel_time q1 q2 q3 q4 q5 q6 q7 q8. > - 4 of the variables q1 to q8 will have valid (weight) data and > the other 4 quarter variables will have missing values. Ah! OK; you have 'wide' data, i.e. data from all (eight, I guess?) quarters in your study, in one record. But that doesn't make sense of 'mode' and 'travel_time'; those likely don't have only one value for person and quarter. Does each record represent a *trip*, then, with one person possibly having many records that look like this, each with a mode and a travel time? (You can see that it helps a great deal of you tell us what a record *does* represent. I'm doing a lot of inference here.) I suppose that the latest of the four quarters for which there's a weight, is the quarter in which the travel took place? (If I'm right, and each record *does* represent a trip, I'm astounded that you don't have a variable for date of trip. Or maybe you do, but didn't mention it. It would have helped.) >I want to obtain a table with the [weighted?] mean value of >travel_time for each quarter. I am struggling to find a way to do this >easily, given that I essentially have weights on my q1 .. q8 variables. > >I can reformat the data by using a LOOP and write out 4 records from >each original record and then process the data as per normal. I was >wondering if anyone had a suggestion that avoided this approach which >I haveoverlooked. Suggestions would be very welcome. An awful lot of problems can be solved that way, by reformatting data from 'wide' to 'long' form. If the 'long' form you're using is Person_no mode travel_time 1 q1 Person_no mode travel_time 2 q2 Person_no mode travel_time 3 q3 it should be easy with VARSTOCASES, which is easier to code than LOOP/XSAVE logic. In your follow-up, you wrote, >Currently I duplicate 4 records from each original record with a LOOP >and XSAVE. It would be nice to undertake my analysis without having to >do the data transformations. Flat out: Stay with what you've got, except see if VARSTOCASES will work (as it probably will). Code to work with the 'wide' form will be much more complicated; and since you've got a good solution, I can't recommend that either you or we try it. We may be hitting programming aesthetics here. My own taste is for 'long' data. I think it easier to understand; and commonly, the code to use it is simpler, because loops in 'wide' data can often be replaced by using SPSS's long implicit loop through the list of cases. Maybe your aesthetics are the other way, preferring 'wide' forms. De gustibus non disputandum est. But this time, stay with the solution you've got. Of course, keep your original file as well, so THAT stays in form you like. I'd do that myself (never throw away anything you have), even if I used the 'long' form for all analyses. -Good wishes and good luck, Richard ------------------------------------------------------------------------------------------------------------------------------------------------ This message is intended for the addressee named and may contain confidential/privileged information. If you are not the intended recipient, please delete it and notify the sender. Views expressed in this message are those of the individual sender, and are not necessarily the views of the Department. You should scan any attached files for viruses. ------------------------------------------------------------------------------------------------------------------------------------------------ |
|
At 06:41 PM 6/14/2007, Frank Milthorpe wrote:
>Each record is a summary of the daily travel of an individual. A >person is only surveyed on a single day. So each person has only a >single value of travel time. Ah, thank you. That's very helpful. [Not quoting all you wrote] I'm not going to do much with it just now, though. I think that unrolling to long form, with LOOP/XSAVE or CASESTOVARS, is still your best solution. There's a great deal to be said for using a technique that is available and known to work. -Best regards and good luck, Richard |
| Free forum by Nabble | Edit this page |
