Posted by
Hector Maletta on
Feb 28, 2012; 4:57am
URL: http://spssx-discussion.165.s1.nabble.com/Weighting-in-GENLINMIXED-tp5489150p5520864.html
Thanks to Matthew Poes and Arthur Kendall for responding to my question. I
appreciate your contributions, but I am still looking for an answer.
To repeat my situation: I am asked to assist in applying a multilevel model
(through GENLINMIXED) with a survey based on a three-stage clustered sample
(first selecting towns, then neighborhoods, then households). Since the
multilevel model would look for effects within and between towns,
neighborhoods within towns, and households within towns and neighborhoods,
the question is what to do in this case about sample weighting.
Since the sampling ratio is not uniform, sample cases in different clusters
and strata represent different numbers of cases in the population. Using no
weights would be evidently wrong.
One may use inflationary frequency weights, whereby the weighted sample size
equals population size, OR one may use merely proportional weights where
total weighted sample size equals the actual number of cases in the sample.
In this latter option, some cases have weights >1, while others have weights
<1 (the average case has weight=1).
SPSS computes statistical tests relative to the weighted sample. Thus the
first approach (inflationary weights) would fool SPSS into believing that
the sample size is enormous, and consequently statistical error would be
grossly understated.
On the other hand, the second approach (proportional non-inflationary
weights) would produce many cases with weights well below 1; this should not
cause trouble if fractional weights were totalled first and then rounded,
but this appears not to be the case: Generalized Linear Mixed Model
Algorithms say (in the section about notation, concerning frequency weights
"f"): "Non-integer elements are treated by rounding the value to the nearest
integer. For values less than 0.5 or missing, the corresponding records are
not used." If that is so, then all cases with proportional weights lower
than 0.5 (i.e. all cases representing less than half the reciprocal of the
average sampling ratio) would be ignored, which is also unacceptable.
Anybody?
Hector
-----Mensaje original-----
De: SPSSX(r) Discussion [mailto:
[hidden email]] En nombre de
Arthur Kendall
Enviado el: Thursday, February 16, 2012 11:01
Para:
[hidden email]
Asunto: Re: Weighting in GENLINMIXED
I have not yet had an opportunity to work out the relation of multilevel
models to complex samples. I have gather a few data sets but have not
had a chance to run them through both procedures. I suspect that the two
are at least close to if not identically the same thing.
If you do not get satisfactory responses on the list I suggest you try
running both procedures and see how the complex samples weights things.
I am out of town for the Conference on Statistical Practice so do not have
access to my books and data sets that I was going to try.
HTH
Art
On 2/16/2012 5:27 AM, Hector Maletta wrote:
> I am asked to assist in an application of a multilevel model through
GENLINMIXED, f theor a survey based on a three-stage clustered sample (first
selecting towns, then neighborhoods, then households). Since the multilevel
model would look for effects within and between towns, neighborhoods in
towns, and households in towns and neighborhoods, the question is what to do
in this case about sample weighting.
> For other analyses (including tabulations of results) each household is
weighted according to the product of the reciprocals of sampling ratios at
the three stages (Pt/St)(Ptn/Stn)(Ptnh/Stnh) where M=Population and
S=sample, t=towns, n=neighborhoods, and h=households. These weights, in this
format, correct for proportionality of sampling, and also inflate the
weighted sample to population size. A deflation of the weighted sample to
its original sample size could be achieved through multiplying those weights
by the ratio R=(total household sample size/total households in the
population).
> The question is whether any weight (inflationary or most probably not
inflationary) should be used when analyzing a multilevel model where the
levels are the sampling stages. I have heard opinions for and against, and
would like to hear more on the subject.
>
> Hector
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
>
[hidden email] (not to SPSSX-L), with no body text except
> the command. To leave the list, send the command SIGNOFF SPSSX-L For a
> list of commands to manage subscriptions, send the command INFO
> REFCARD
>
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD
-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.1913 / Virus Database: 2112/4814 - Release Date: 02/16/12
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD