weighting for anova, etc.

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

weighting for anova, etc.

Ian Martin-2
I have some data collected on various age classes in several small
communities. Age classes are 0-14, 15-39, 40+ years.  Subjects were
randomly selected within each age class, but the sampling effort
varies between age classes.  That is, one age class sample at
Community A, might represent 42% of the total individuals available
in that community, but other age classes, or the same age class at
another community, might be more than 42% or less than 42%.

To make inferences about something like say, blood pressure in the
population(s), or to compare between communities, it seems that we
should attempt to weight the observations according to whether
subjects were oversampled or undersampled in a particular age class.

How does one use WEIGHT cases to do something like this?

regards,
Ian

Ian D. Martin, Ph.D.

Tsuji Laboratory
University of Waterloo
Dept. of Environment & Resource Studies

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: weighting for anova, etc.

Steve Simon, P.Mean Consulting
Ian Martin wrote:

> I have some data collected on various age classes in several small
> communities. Age classes are 0-14, 15-39, 40+ years.  Subjects were
> randomly selected within each age class, but the sampling effort
> varies between age classes.  That is, one age class sample at
> Community A, might represent 42% of the total individuals available
> in that community, but other age classes, or the same age class at
> another community, might be more than 42% or less than 42%.
>
> To make inferences about something like say, blood pressure in the
> population(s), or to compare between communities, it seems that we
> should attempt to weight the observations according to whether
> subjects were oversampled or undersampled in a particular age class.
>
> How does one use WEIGHT cases to do something like this?

The key calculation is to understand the sampling probability. Let nij
represent the number of patients sampled in community i and age strata
j. Let Nij represent the total number of patients in the population in
community i and age strata j. The probability of sampling, pij,  is
nij/Nij. The inverse of this probability, 1/pij is an interesting
quantity. It tells you how many people in the population are represented
by a single individual in the population. So if the sample size is 100
and there are 2 million people in the population, each person in the
sample represents 20,000 people in the population.

If you weight the data by the inverse of pij, this will give greater
weight to those strata where you undersample, because each person in the
sample represents a larger number of individuals in the population than
you had hoped for. Similarly, this will give less weight to those strata
where you oversample.

Suppose you don't know the total number of patients in the population,
but you do know the relative proportions in each community. So in
community 1, the age group 0-14 years constituted 40% of your sample,
but you knew that in the population for community 1, age group 0-14
years corresponded to 50% of the community. Let pij be the proportion of
sample patients in community i and age strata j relative to the total
number sampled in community i across all strata. Let Pij be the
proportion of the population in community i who belong to strata j. If
you weight the data by Pij/pij, you will give greater weight to those
patients who are undersampled (Pij > pij) and lesser weight to those
patients who are oversampled (Pij < pij).  You will give weight 1 to
those patients who are sampled correctly (Pij = pij). In the above
example assign a weight of 0.5/0.4 = 1.25 to the age group 0-14 years.

I wrote this answer up as a webpage:

http://www.pmean.com/10/CalculatingWeights.html

Steve Simon, Standard Disclaimer
Sign up for The Monthly Mean, the newsletter that
dares to call itself "average" at www.pmean.com/news

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: weighting for anova, etc.

Ian Martin-2
Thanks Steve.  You've provided a nice generalization that validates
what I was thinking.  This gives me some confidence to proceed along
the lines I had thought.  I appreciate the time you've taken to lay
this out.

best regards,
Ian

On 22 Mar, 2010, at 5:41 PM, Steve Simon, P.Mean Consulting wrote:

> Ian Martin wrote:
>
>> I have some data collected on various age classes in several small
>> communities. Age classes are 0-14, 15-39, 40+ years.  Subjects were
>> randomly selected within each age class, but the sampling effort
>> varies between age classes.  That is, one age class sample at
>> Community A, might represent 42% of the total individuals available
>> in that community, but other age classes, or the same age class at
>> another community, might be more than 42% or less than 42%.
>>
>> To make inferences about something like say, blood pressure in the
>> population(s), or to compare between communities, it seems that we
>> should attempt to weight the observations according to whether
>> subjects were oversampled or undersampled in a particular age class.
>>
>> How does one use WEIGHT cases to do something like this?
>
> The key calculation is to understand the sampling probability. Let nij
> represent the number of patients sampled in community i and age strata
> j. Let Nij represent the total number of patients in the population in
> community i and age strata j. The probability of sampling, pij,  is
> nij/Nij. The inverse of this probability, 1/pij is an interesting
> quantity. It tells you how many people in the population are
> represented
> by a single individual in the population. So if the sample size is 100
> and there are 2 million people in the population, each person in the
> sample represents 20,000 people in the population.
>
> If you weight the data by the inverse of pij, this will give greater
> weight to those strata where you undersample, because each person
> in the
> sample represents a larger number of individuals in the population
> than
> you had hoped for. Similarly, this will give less weight to those
> strata
> where you oversample.
>
> Suppose you don't know the total number of patients in the population,
> but you do know the relative proportions in each community. So in
> community 1, the age group 0-14 years constituted 40% of your sample,
> but you knew that in the population for community 1, age group 0-14
> years corresponded to 50% of the community. Let pij be the
> proportion of
> sample patients in community i and age strata j relative to the total
> number sampled in community i across all strata. Let Pij be the
> proportion of the population in community i who belong to strata j. If
> you weight the data by Pij/pij, you will give greater weight to those
> patients who are undersampled (Pij > pij) and lesser weight to those
> patients who are oversampled (Pij < pij).  You will give weight 1 to
> those patients who are sampled correctly (Pij = pij). In the above
> example assign a weight of 0.5/0.4 = 1.25 to the age group 0-14 years.
>
> I wrote this answer up as a webpage:
>
> http://www.pmean.com/10/CalculatingWeights.html
>
> Steve Simon, Standard Disclaimer
> Sign up for The Monthly Mean, the newsletter that
> dares to call itself "average" at www.pmean.com/news
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text
> except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD