Trying to obtain means with multiple weight variables in a single step

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Trying to obtain means with multiple weight variables in a single step

Frank Milthorpe
Hi

I have data collected from a rolling survey. The data is collected each
quarter, however to obtain sufficient sample size data from the previous
4 quarters are combined. Lets ignore the starting quarters. So each data
record has:
  Person_no mode travel_time  q1 q2 q3 q4 q5 q6 q7 q8.
    - 4 of the variables q1 to q8 will have valid (weight) data and the
other 4 quarter variables will have missing values.


This is a travel survey where mode is a category variable 1=Car 2=Train
and etc. Travel_time is continuous with values from 1 to say 100.
Person_no is just an ID so we can ignore this.


If I wish to obtain a "weighted" count of the people in each quarter I
can use the following command.
  MEANS  q1 q2 q3 q4 q5 q6 q7 q8 BY one / CELLS = SUM.
  -  where is a dummy variable which always takes the value 1.


If  want the total trips by each mode I can use.
  MEANS  q1 q2 q3 q4 q5 q6 q7 q8 BY mode / CELLS = SUM.



What I want to do is obtain a table with the mean value of travel_time
for each quarter. I am struggling to find a way to do this easily, given
that I essentially have weights on my q1 .. q8 variables.  I know that I
can reformat the data by using a LOOP and write out 4 records from each
original record and then process the data as per normal. I was wondering
if anyone had a suggestion that avoided this approach which I have
overlooked. Suggestions would be very welcome.

Cheers

Frank Milthorpe




News:
The majority of staff from the Transport and Population Data Centre
have moved to the Ministry of Transport. Contact details will be
progressively changed over the next few months.
Frank Milthorpe
Senior Manager, Transport Model Development
Transport Data Centre
NSW Ministry of  Transport
Direct:    (02) 9206 8609     TDC Phone: (02) 9206 8611
Fax:       (02) 9206 8691

Street:   Level 7, Tower 2 (TNT Tower East), 1 Lawson Square, Redfern
NSW 2016
Postal:   GPO Box 1620, Sydney NSW 2001
Email:     [hidden email]
Web:      www.transport.nsw.gov.au/tdc





------------------------------------------------------------------------------------------------------------------------------------------------
This message is intended for the addressee named and may contain
confidential/privileged information. If you are not the intended
recipient, please delete it and notify the sender.
Views expressed in this message are those of the individual sender, and
are not necessarily the views of the Department.
You should scan any attached files for viruses.
------------------------------------------------------------------------------------------------------------------------------------------------
Reply | Threaded
Open this post in threaded view
|

R: Trying to obtain means with multiple weight variables in a single step

Luca Meyer
Hello Frank,

I hope I understand your question. Here is a tested syntax to solve it:

/* first some sample data where the qX variables take values 1 if the
records refer to that quarter, otherwise is equal to 0 */
DATA LIST LIST/Person_no mode travel_time q1 q2 q3 q4 q5 q6 q7 q8.
BEGIN DATA
 1000 1 51 1 0 0 0 0 0 0 0
 1000 2 10 1 0 0 0 0 0 0 0
 1000 3 74 1 0 0 0 0 0 0 0
 1000 1 68 0 1 0 0 0 0 0 0
 1000 2 5 0 1 0 0 0 0 0 0
 1000 3 70 0 1 0 0 0 0 0 0
 1000 1 58 0 0 1 0 0 0 0 0
 1000 2 22 0 0 1 0 0 0 0 0
 1000 3 68 0 0 1 0 0 0 0 0
 1000 1 41 0 0 0 1 0 0 0 0
 1000 2 0 0 0 0 1 0 0 0 0
 1000 3 30 0 0 0 1 0 0 0 0
 1001 1 35 1 0 0 0 0 0 0 0
 1001 2 25 1 0 0 0 0 0 0 0
 1001 3 71 1 0 0 0 0 0 0 0
 1001 1 43 0 1 0 0 0 0 0 0
 1001 2 18 0 1 0 0 0 0 0 0
 1001 3 14 0 1 0 0 0 0 0 0
 1001 1 29 0 0 1 0 0 0 0 0
 1001 2 37 0 0 1 0 0 0 0 0
 1001 3 78 0 0 1 0 0 0 0 0
 1001 1 100 0 0 0 1 0 0 0 0
 1001 2 12 0 0 0 1 0 0 0 0
 1001 3 0 0 0 0 1 0 0 0 0
END DATA.
VAL LABEL MODE 1 'Car' 2 'Train' 3 'Bus'.

/* then let's create a variable indicating the quarter and drop the q1 to q8
*/
IF q1=1 quarter=1.
IF q2=1 quarter=2.
IF q3=1 quarter=3.
IF q4=1 quarter=4.
IF q5=1 quarter=5.
IF q6=1 quarter=6.
IF q7=1 quarter=7.
IF q8=1 quarter=8.
add files /file=* /drop=q1 to q8.
exe.

/* finally let's get the table you need */
means tables=travel_time by quarter /cells mean sum.

/* this one you might not need it but it will give you mean and total
travel_time by mode */
means tables=travel_time by mode /cells mean sum.

HTH,
Luca

Mr. Luca MEYER
Market research, data analysis & more
www.lucameyer.com - Tel: +39.339.495.00.21


-----Messaggio originale-----
Da: SPSSX(r) Discussion [mailto:[hidden email]] Per conto di Frank
Milthorpe
Inviato: mercoledì 13 giugno 2007 9.23
A: [hidden email]
Oggetto: Trying to obtain means with multiple weight variables in a single
step

Hi

I have data collected from a rolling survey. The data is collected each
quarter, however to obtain sufficient sample size data from the previous
4 quarters are combined. Lets ignore the starting quarters. So each data
record has:
  Person_no mode travel_time  q1 q2 q3 q4 q5 q6 q7 q8.
    - 4 of the variables q1 to q8 will have valid (weight) data and the
other 4 quarter variables will have missing values.


This is a travel survey where mode is a category variable 1=Car 2=Train and
etc. Travel_time is continuous with values from 1 to say 100.
Person_no is just an ID so we can ignore this.


If I wish to obtain a "weighted" count of the people in each quarter I can
use the following command.
  MEANS  q1 q2 q3 q4 q5 q6 q7 q8 BY one / CELLS = SUM.
  -  where is a dummy variable which always takes the value 1.


If  want the total trips by each mode I can use.
  MEANS  q1 q2 q3 q4 q5 q6 q7 q8 BY mode / CELLS = SUM.



What I want to do is obtain a table with the mean value of travel_time for
each quarter. I am struggling to find a way to do this easily, given that I
essentially have weights on my q1 .. q8 variables.  I know that I can
reformat the data by using a LOOP and write out 4 records from each original
record and then process the data as per normal. I was wondering if anyone
had a suggestion that avoided this approach which I have overlooked.
Suggestions would be very welcome.

Cheers

Frank Milthorpe




News:
The majority of staff from the Transport and Population Data Centre have
moved to the Ministry of Transport. Contact details will be progressively
changed over the next few months.
Frank Milthorpe
Senior Manager, Transport Model Development Transport Data Centre NSW
Ministry of  Transport
Direct:    (02) 9206 8609     TDC Phone: (02) 9206 8611
Fax:       (02) 9206 8691

Street:   Level 7, Tower 2 (TNT Tower East), 1 Lawson Square, Redfern
NSW 2016
Postal:   GPO Box 1620, Sydney NSW 2001
Email:     [hidden email]
Web:      www.transport.nsw.gov.au/tdc





----------------------------------------------------------------------------
--------------------------------------------------------------------
This message is intended for the addressee named and may contain
confidential/privileged information. If you are not the intended recipient,
please delete it and notify the sender.
Views expressed in this message are those of the individual sender, and are
not necessarily the views of the Department.
You should scan any attached files for viruses.
----------------------------------------------------------------------------
--------------------------------------------------------------------

No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.472 / Virus Database: 269.8.15/847 - Release Date: 12/06/2007
21.42


No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.472 / Virus Database: 269.8.15/847 - Release Date: 12/06/2007
21.42
Reply | Threaded
Open this post in threaded view
|

Re: R: Trying to obtain means with multiple weight variables in a single step

Frank Milthorpe
Luca and the rest of the list
 
This is a good start, however there is one aspect which I clearly didn't explain in sufficient detail in my original posting. The weights for each quarter vary. This is because the surveyed sample for the rolling 12 months will vary and the survey sample needs to be adjusted to match the population.
 
Hence instead of the data looking like:
  1000 1 51   1 0 0 0 0 0 0 0
 
It is actually
  1000 1 51   10 15 20 16 . . . .
 
In the above example there are four quarters with weights (that vary) between each quarter and four quarters which are not valid (shown as missing in the above). Each person can have a different weight which is based on location, age, gender and household type. Needless to say I need to use the weight already computed and stored on the file.
 
Currently I duplicate 4 records from each original record. I have done this using a LOOP and XSAVE, however it would be nice to know if it possible to undertake my analysis without having to do the data transformations.
 
Suggestions / comments would be very welcome.
 
Cheers
 
Frank
 
 

>>> "Luca Meyer" <[hidden email]> 14/06/2007 12:35 am >>>


Hello Frank,

I hope I understand your question. Here is a tested syntax to solve it:

/* first some sample data where the qX variables take values 1 if the
records refer to that quarter, otherwise is equal to 0 */
DATA LIST LIST/Person_no mode travel_time q1 q2 q3 q4 q5 q6 q7 q8.
BEGIN DATA
1000 1 51 1 0 0 0 0 0 0 0
1000 2 10 1 0 0 0 0 0 0 0
1000 3 74 1 0 0 0 0 0 0 0
1000 1 68 0 1 0 0 0 0 0 0
1000 2 5 0 1 0 0 0 0 0 0
1000 3 70 0 1 0 0 0 0 0 0
1000 1 58 0 0 1 0 0 0 0 0
1000 2 22 0 0 1 0 0 0 0 0
1000 3 68 0 0 1 0 0 0 0 0
1000 1 41 0 0 0 1 0 0 0 0
1000 2 0 0 0 0 1 0 0 0 0
1000 3 30 0 0 0 1 0 0 0 0
1001 1 35 1 0 0 0 0 0 0 0
1001 2 25 1 0 0 0 0 0 0 0
1001 3 71 1 0 0 0 0 0 0 0
1001 1 43 0 1 0 0 0 0 0 0
1001 2 18 0 1 0 0 0 0 0 0
1001 3 14 0 1 0 0 0 0 0 0
1001 1 29 0 0 1 0 0 0 0 0
1001 2 37 0 0 1 0 0 0 0 0
1001 3 78 0 0 1 0 0 0 0 0
1001 1 100 0 0 0 1 0 0 0 0
1001 2 12 0 0 0 1 0 0 0 0
1001 3 0 0 0 0 1 0 0 0 0
END DATA.
VAL LABEL MODE 1 'Car' 2 'Train' 3 'Bus'.

/* then let's create a variable indicating the quarter and drop the q1 to q8
*/
IF q1=1 quarter=1.
IF q2=1 quarter=2.
IF q3=1 quarter=3.
IF q4=1 quarter=4.
IF q5=1 quarter=5.
IF q6=1 quarter=6.
IF q7=1 quarter=7.
IF q8=1 quarter=8.
add files /file=* /drop=q1 to q8.
exe.

/* finally let's get the table you need */
means tables=travel_time by quarter /cells mean sum.

/* this one you might not need it but it will give you mean and total
travel_time by mode */
means tables=travel_time by mode /cells mean sum.

HTH,
Luca

Mr. Luca MEYER
Market research, data analysis & more
www.lucameyer.com - Tel: +39.339.495.00.21


-----Messaggio originale-----
Da: SPSSX(r) Discussion [mailto:[hidden email]] Per conto di Frank
Milthorpe
Inviato: mercoledì 13 giugno 2007 9.23
A: [hidden email]
Oggetto: Trying to obtain means with multiple weight variables in a single
step

Hi

I have data collected from a rolling survey. The data is collected each
quarter, however to obtain sufficient sample size data from the previous
4 quarters are combined. Lets ignore the starting quarters. So each data
record has:
  Person_no mode travel_time  q1 q2 q3 q4 q5 q6 q7 q8.
    - 4 of the variables q1 to q8 will have valid (weight) data and the
other 4 quarter variables will have missing values.


This is a travel survey where mode is a category variable 1=Car 2=Train and
etc. Travel_time is continuous with values from 1 to say 100.
Person_no is just an ID so we can ignore this.


If I wish to obtain a "weighted" count of the people in each quarter I can
use the following command.
  MEANS  q1 q2 q3 q4 q5 q6 q7 q8 BY one / CELLS = SUM.
  -  where is a dummy variable which always takes the value 1.


If  want the total trips by each mode I can use.
  MEANS  q1 q2 q3 q4 q5 q6 q7 q8 BY mode / CELLS = SUM.



What I want to do is obtain a table with the mean value of travel_time for
each quarter. I am struggling to find a way to do this easily, given that I
essentially have weights on my q1 .. q8 variables.  I know that I can
reformat the data by using a LOOP and write out 4 records from each original
record and then process the data as per normal. I was wondering if anyone
had a suggestion that avoided this approach which I have overlooked.
Suggestions would be very welcome.

Cheers

Frank Milthorpe




News:
The majority of staff from the Transport and Population Data Centre have
moved to the Ministry of Transport. Contact details will be progressively
changed over the next few months.
Frank Milthorpe
Senior Manager, Transport Model Development Transport Data Centre NSW
Ministry of  Transport
Direct:    (02) 9206 8609     TDC Phone: (02) 9206 8611
Fax:       (02) 9206 8691

Street:   Level 7, Tower 2 (TNT Tower East), 1 Lawson Square, Redfern
NSW 2016
Postal:   GPO Box 1620, Sydney NSW 2001
Email:     [hidden email]
Web:      www.transport.nsw.gov.au/tdc





----------------------------------------------------------------------------
--------------------------------------------------------------------
This message is intended for the addressee named and may contain
confidential/privileged information. If you are not the intended recipient,
please delete it and notify the sender.
Views expressed in this message are those of the individual sender, and are
not necessarily the views of the Department.
You should scan any attached files for viruses.
----------------------------------------------------------------------------
--------------------------------------------------------------------

No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.472 / Virus Database: 269.8.15/847 - Release Date: 12/06/2007
21.42


No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.472 / Virus Database: 269.8.15/847 - Release Date: 12/06/2007
21.42






------------------------------------------------------------------------------------------------------------------------------------------------
This message is intended for the addressee named and may contain confidential/privileged information. If you are not the intended recipient, please delete it and notify the sender.
Views expressed in this message are those of the individual sender, and are not necessarily the views of the Department.
You should scan any attached files for viruses.
------------------------------------------------------------------------------------------------------------------------------------------------
Reply | Threaded
Open this post in threaded view
|

Re: Trying to obtain means with multiple weight variables in a single step

Richard Ristow
In reply to this post by Frank Milthorpe
At 03:22 AM 6/13/2007, Frank Milthorpe wrote:

>I have data collected from a rolling survey. The data is collected
>each quarter, however to obtain sufficient sample size data from the
>previous 4 quarters are combined. Lets ignore the starting quarters.
>So each data record has:
>   Person_no mode travel_time  q1 q2 q3 q4 q5 q6 q7 q8.
>     - 4 of the variables q1 to q8 will have valid (weight) data and
> the other 4 quarter variables will have missing values.

Ah! OK; you have 'wide' data, i.e. data from all (eight, I guess?)
quarters in your study, in one record.

But that doesn't make sense of 'mode' and 'travel_time'; those likely
don't have only one value for person and quarter.  Does each record
represent a *trip*, then, with one person possibly having many records
that look like this, each with a mode and a travel time?

(You can see that it helps a great deal of you tell us what a record
*does* represent. I'm doing a lot of inference here.)

I suppose that the latest of the four quarters for which there's a
weight, is the quarter in which the travel took place? (If I'm right,
and each record *does* represent a trip, I'm astounded that you don't
have a variable for date of trip. Or maybe you do, but didn't mention
it. It would have helped.)

>I want to obtain a table with the [weighted?] mean value of
>travel_time for each quarter. I am struggling to find a way to do this
>easily, given that I essentially have weights on my q1 .. q8 variables.
>
>I can reformat the data by using a LOOP and write out 4 records from
>each original record and then process the data as per normal. I was
>wondering if anyone had a suggestion that avoided this approach which
>I haveoverlooked. Suggestions would be very welcome.

An awful lot of problems can be solved that way, by reformatting data
from 'wide' to 'long' form. If the 'long' form you're using is

   Person_no mode travel_time   1 q1
   Person_no mode travel_time   2 q2
   Person_no mode travel_time   3 q3

it should be easy with VARSTOCASES, which is easier to code than
LOOP/XSAVE logic.

In your follow-up, you wrote,
>Currently I duplicate 4 records from each original record with a LOOP
>and XSAVE. It would be nice to undertake my analysis without having to
>do the data transformations.

Flat out: Stay with what you've got, except see if VARSTOCASES will
work (as it probably will). Code to work with the 'wide' form will be
much more complicated; and since you've got a good solution, I can't
recommend that either you or we try it.

We may be hitting programming aesthetics here. My own taste is for
'long' data. I think it easier to understand; and commonly, the code to
use it is simpler, because loops in 'wide' data can often be replaced
by using SPSS's long implicit loop through the list of cases. Maybe
your aesthetics are the other way, preferring 'wide' forms. De gustibus
non disputandum est.

But this time, stay with the solution you've got. Of course, keep your
original file as well, so THAT stays in form you like. I'd do that
myself (never throw away anything you have), even if I used the 'long'
form for all analyses.

-Good wishes and good luck,
  Richard
Reply | Threaded
Open this post in threaded view
|

Re: Trying to obtain means with multiple weight variables in a single step

Frank Milthorpe
Richard

Each record is a summary of the daily travel of an individual. A person
is only surveyed on a single day. So each person has only a single value
of travel time. However, the responses from people from 4 quarters are
pooled to calculate the quarter based weights. Results are
obtained/reported for each quarter.

The data for weighting purposes for Quarter 1 is the data collected in
q-2,q-1,q0,q1.
The data for weighting purposes for Quarter 2 is the data collected in
      q-1,q0,q1,q2

The data for weighting purposes for Quarter 3 is the data collected in
          q0,q1,q2,q3
The data for weighting purposes for Quarter 4 is the data collected in
              q1,q2,q3,q4

The data for weighting purposes for Quarter 5 is the data collected in
                  q2,q3,q4,q5
  and etc ...


I have a single record for each person. This includes the travel time
and 8 weight variables: w_q1, w_q2, w_q3, w_q4 and etc. Four of these
weight variables will have real data and the rest will be missing values
as shown above. At this stage just ignore the first rows above with the
negative zero quarters, for this discussion we can consider someone
surveyed in quarter 4. They have weights for quarters 1, 2, 3 & 4. They
have no weights for quarters 5, 6, 7 & 8.

From my original posting I described the following format.

  Person_no mode travel_time  q1 q2 q3 q4 q5 q6 q7 q8.
    - 4 of the variables q1 to q8 will have valid (weight) data and
the
other 4 quarter variables will have missing values.

q1 is what I am now calling w_q1 weight for quarter1, w_w2 weight for
quarter 2.  The trick here is that the data for an individual contains
multiple weights, but a single values for the attributes, travel_time,
mode and etc. In reality I start with multiple records for a person with
a record for each journey they undertook. A journey will have a mode,
time, distance and etc. Each of these records always has the same
multiple weights values for the individual. However the weights vary
between the individuals.

I hope this helps to explain the data setup. Sorry that I didn't make
the multiple weights for a quarter more explicit in the original
posting. There are actually many more variables on the actual data
tables which I have not informed you about to try to simplify my posting
to the key issue.

Thanks for helpful comments.

Cheers

Frank


>>> Richard Ristow <[hidden email]> 15/06/2007 3:19 am >>>
At 03:22 AM 6/13/2007, Frank Milthorpe wrote:

>I have data collected from a rolling survey. The data is collected
>each quarter, however to obtain sufficient sample size data from the
>previous 4 quarters are combined. Lets ignore the starting quarters.
>So each data record has:
>   Person_no mode travel_time  q1 q2 q3 q4 q5 q6 q7 q8.
>     - 4 of the variables q1 to q8 will have valid (weight) data and
> the other 4 quarter variables will have missing values.

Ah! OK; you have 'wide' data, i.e. data from all (eight, I guess?)
quarters in your study, in one record.

But that doesn't make sense of 'mode' and 'travel_time'; those likely
don't have only one value for person and quarter.  Does each record
represent a *trip*, then, with one person possibly having many records

that look like this, each with a mode and a travel time?

(You can see that it helps a great deal of you tell us what a record
*does* represent. I'm doing a lot of inference here.)

I suppose that the latest of the four quarters for which there's a
weight, is the quarter in which the travel took place? (If I'm right,
and each record *does* represent a trip, I'm astounded that you don't
have a variable for date of trip. Or maybe you do, but didn't mention
it. It would have helped.)

>I want to obtain a table with the [weighted?] mean value of
>travel_time for each quarter. I am struggling to find a way to do this

>easily, given that I essentially have weights on my q1 .. q8
variables.
>
>I can reformat the data by using a LOOP and write out 4 records from
>each original record and then process the data as per normal. I was
>wondering if anyone had a suggestion that avoided this approach which

>I haveoverlooked. Suggestions would be very welcome.

An awful lot of problems can be solved that way, by reformatting data
from 'wide' to 'long' form. If the 'long' form you're using is

   Person_no mode travel_time   1 q1
   Person_no mode travel_time   2 q2
   Person_no mode travel_time   3 q3

it should be easy with VARSTOCASES, which is easier to code than
LOOP/XSAVE logic.

In your follow-up, you wrote,
>Currently I duplicate 4 records from each original record with a LOOP

>and XSAVE. It would be nice to undertake my analysis without having to

>do the data transformations.

Flat out: Stay with what you've got, except see if VARSTOCASES will
work (as it probably will). Code to work with the 'wide' form will be
much more complicated; and since you've got a good solution, I can't
recommend that either you or we try it.

We may be hitting programming aesthetics here. My own taste is for
'long' data. I think it easier to understand; and commonly, the code to

use it is simpler, because loops in 'wide' data can often be replaced
by using SPSS's long implicit loop through the list of cases. Maybe
your aesthetics are the other way, preferring 'wide' forms. De gustibus

non disputandum est.

But this time, stay with the solution you've got. Of course, keep your

original file as well, so THAT stays in form you like. I'd do that
myself (never throw away anything you have), even if I used the 'long'

form for all analyses.

-Good wishes and good luck,
  Richard







------------------------------------------------------------------------------------------------------------------------------------------------
This message is intended for the addressee named and may contain
confidential/privileged information. If you are not the intended
recipient, please delete it and notify the sender.
Views expressed in this message are those of the individual sender, and
are not necessarily the views of the Department.
You should scan any attached files for viruses.
------------------------------------------------------------------------------------------------------------------------------------------------
Reply | Threaded
Open this post in threaded view
|

Re: Trying to obtain means with multiple weight variables

Richard Ristow
At 06:41 PM 6/14/2007, Frank Milthorpe wrote:

>Each record is a summary of the daily travel of an individual. A
>person is only surveyed on a single day. So each person has only a
>single value of travel time.

Ah, thank you. That's very helpful.
[Not quoting all you wrote]

I'm not going to do much with it just now, though. I think that
unrolling to long form, with LOOP/XSAVE or CASESTOVARS, is still your
best solution.

There's a great deal to be said for using a technique that is available
and known to work.

-Best regards and good luck,
  Richard