SPSSX Discussion

Weighting in GENLINMIXED

Classic

List

Threaded

7 messages Options

Hector Maletta

Feb 16, 2012; 10:27am

Weighting in GENLINMIXED

602 posts

I am asked to assist in an application of a multilevel model through GENLINMIXED, for a survey based on a three-stage clustered sample (first selecting towns, then neighborhoods, then households). Since the multilevel model would look for effects within and between towns, neighborhoods in towns, and households in towns and neighborhoods, the question is what to do in this case about sample weighting.
For other analyses (including tabulations of results) each household is weighted according to the product of the reciprocals of sampling ratios at the three stages (Pt/St)(Ptn/Stn)(Ptnh/Stnh) where M=Population and S=sample, t=towns, n=neighborhoods, and h=households. These weights, in this format, correct for proportionality of sampling, and also inflate the weighted sample to population size. A deflation of the weighted sample to its original sample size could be achieved through multiplying those weights by the ratio R=(total household sample size/total households in the population).
The question is whether any weight (inflationary or most probably not inflationary) should be used when analyzing a multilevel model where the levels are the sampling stages. I have heard opinions for and against, and would like to hear more on the subject.

Hector

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall

Feb 16, 2012; 2:01pm

Re: Weighting in GENLINMIXED

2500 posts

> I am asked to assist in an application of a multilevel model through GENLINMIXED, f theor a survey based on a three-stage clustered sample (first selecting towns, then neighborhoods, then households). Since the multilevel model would look for effects within and between towns, neighborhoods in towns, and households in towns and neighborhoods, the question is what to do in this case about sample weighting.
> For other analyses (including tabulations of results) each household is weighted according to the product of the reciprocals of sampling ratios at the three stages (Pt/St)(Ptn/Stn)(Ptnh/Stnh) where M=Population and S=sample, t=towns, n=neighborhoods, and h=households. These weights, in this format, correct for proportionality of sampling, and also inflate the weighted sample to population size. A deflation of the weighted sample to its original sample size could be achieved through multiplying those weights by the ratio R=(total household sample size/total households in the population).
> The question is whether any weight (inflationary or most probably not inflationary) should be used when analyzing a multilevel model where the levels are the sampling stages. I have heard opinions for and against, and would like to hear more on the subject.
>
> Hector
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

... [show rest of quote]

Art Kendall
Social Research Consultants

Poes, Matthew Joseph

Feb 16, 2012; 2:15pm

Re: Weighting in GENLINMIXED

112 posts

In reply to this post by Hector Maletta

A weighted sample needs to be analyzed as a weighted sample. Otherwise your standard errors will be wrong (inflated), and your significance tests may not be accurately interpretable. It's especially important in MLM/HLM procedures because of how variance is partialed out. However, having said that, I don't know how to do that in SPSS without the advanced sampling package. I personally use SAS for most of my multi-level modeling, and again, I'm not sure how to handle complex samples with SAS and MLM, so I've had to use HLM for such a procedure. In other words, I believe you may not be able to handle this correctly with SPSS. Even if they did have the necessary feature with the advanced sampling add-on (someone from SPSS might need to answer that), you would have to find out what approach it uses in multi-level modeling. In general, with any form of hierarchical model, a different form of multi-stage weighting is suggested from the more traditional approaches.

-Matt

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Hector Maletta
Sent: Thursday, February 16, 2012 4:28 AM
To: [hidden email]
Subject: Weighting in GENLINMIXED

I am asked to assist in an application of a multilevel model through GENLINMIXED, for a survey based on a three-stage clustered sample (first selecting towns, then neighborhoods, then households). Since the multilevel model would look for effects within and between towns, neighborhoods in towns, and households in towns and neighborhoods, the question is what to do in this case about sample weighting.
For other analyses (including tabulations of results) each household is weighted according to the product of the reciprocals of sampling ratios at the three stages (Pt/St)(Ptn/Stn)(Ptnh/Stnh) where M=Population and S=sample, t=towns, n=neighborhoods, and h=households. These weights, in this format, correct for proportionality of sampling, and also inflate the weighted sample to population size. A deflation of the weighted sample to its original sample size could be achieved through multiplying those weights by the ratio R=(total household sample size/total households in the population).
The question is whether any weight (inflationary or most probably not inflationary) should be used when analyzing a multilevel model where the levels are the sampling stages. I have heard opinions for and against, and would like to hear more on the subject.

Hector

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

ANDRES ALBERTO BURGA LEON

Feb 20, 2012; 4:05pm

Problem with REPLACE in sintax window

67 posts

In reply to this post by Hector Maletta

Hello to everyboy:

If I have an open syntax editor window, with something like:

COMPUTE dif_abs_p_g3_C_09_area=ABS(p_g3_C_09_area.1 - p_g3_C_09_area.2).
COMPUTE dif_abs_p_g3_C_09_gestion=ABS(p_g3_C_09_gestion.1 - p_g3_C_09_gestion.2).
COMPUTE dif_abs_p_g3_C_09_caracteristica2=ABS(p_g3_C_09_caracteristica2.1 - p_g3_C_09_caracteristica2.2).
COMPUTE dif_abs_p_g3_C_09_area=ABS(p_g3_C_09_area.1 - p_g3_C_09_area.2).
COMPUTE dif_abs_p_g3_C_09_gestion=ABS(p_g3_C_09_gestion.1 - p_g3_C_09_gestion.2).
COMPUTE dif_abs_p_g3_C_09_caracteristica2=ABS(p_g3_C_09_caracteristica2.1 - p_g3_C_09_caracteristica2.2).

And use the mouse to higligth the tree last lines and activate the menu Find and Replace (Ctrl + H), and put on Find : _C and on Replace: _M ,
activate the option Items to search: Selected ,and click on Replace All, I get an error mesaage of search string not found.

But if I do the same in SPSS17, i don't get the error message and get what I want:

COMPUTE dif_abs_p_g3_C_09_area=ABS(p_g3_C_09_area.1 - p_g3_C_09_area.2).
COMPUTE dif_abs_p_g3_C_09_gestion=ABS(p_g3_C_09_gestion.1 - p_g3_C_09_gestion.2).
COMPUTE dif_abs_p_g3_C_09_caracteristica2=ABS(p_g3_C_09_caracteristica2.1 - p_g3_C_09_caracteristica2.2).
COMPUTE dif_abs_p_g3_M_09_area=ABS(p_g3_M_09_area.1 - p_g3_M_09_area.2).
COMPUTE dif_abs_p_g3_M_09_gestion=ABS(p_g3_M_09_gestion.1 - p_g3_M_09_gestion.2).
COMPUTE dif_abs_p_g3_M_09_caracteristica2=ABS(p_g3_M_09_caracteristica2.1 - p_g3_M_09_caracteristica2.2).

Is this a some kind of bug on v20?

Kindly

Andrés

Hector Maletta

Feb 28, 2012; 4:57am

Re: Weighting in GENLINMIXED

602 posts

In reply to this post by Art Kendall

Thanks to Matthew Poes and Arthur Kendall for responding to my question. I
appreciate your contributions, but I am still looking for an answer.

To repeat my situation: I am asked to assist in applying a multilevel model
(through GENLINMIXED) with a survey based on a three-stage clustered sample
(first selecting towns, then neighborhoods, then households). Since the
multilevel model would look for effects within and between towns,
neighborhoods within towns, and households within towns and neighborhoods,
the question is what to do in this case about sample weighting.

Since the sampling ratio is not uniform, sample cases in different clusters
and strata represent different numbers of cases in the population. Using no
weights would be evidently wrong.

One may use inflationary frequency weights, whereby the weighted sample size
equals population size, OR one may use merely proportional weights where
total weighted sample size equals the actual number of cases in the sample.
In this latter option, some cases have weights >1, while others have weights
<1 (the average case has weight=1).

SPSS computes statistical tests relative to the weighted sample. Thus the
first approach (inflationary weights) would fool SPSS into believing that
the sample size is enormous, and consequently statistical error would be
grossly understated.

On the other hand, the second approach (proportional non-inflationary
weights) would produce many cases with weights well below 1; this should not
cause trouble if fractional weights were totalled first and then rounded,
but this appears not to be the case: Generalized Linear Mixed Model
Algorithms say (in the section about notation, concerning frequency weights
"f"): "Non-integer elements are treated by rounding the value to the nearest
integer. For values less than 0.5 or missing, the corresponding records are
not used." If that is so, then all cases with proportional weights lower
than 0.5 (i.e. all cases representing less than half the reciprocal of the
average sampling ratio) would be ignored, which is also unacceptable.

Anybody?

Hector

-----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de
Arthur Kendall
Enviado el: Thursday, February 16, 2012 11:01
Para: [hidden email]
Asunto: Re: Weighting in GENLINMIXED

I have not yet had an opportunity to work out the relation of multilevel
models to complex samples. I have gather a few data sets but have not
had a chance to run them through both procedures. I suspect that the two
are at least close to if not identically the same thing.

If you do not get satisfactory responses on the list I suggest you try
running both procedures and see how the complex samples weights things.

I am out of town for the Conference on Statistical Practice so do not have
access to my books and data sets that I was going to try.

HTH

Art

On 2/16/2012 5:27 AM, Hector Maletta wrote:
> I am asked to assist in an application of a multilevel model through
GENLINMIXED, f theor a survey based on a three-stage clustered sample (first
selecting towns, then neighborhoods, then households). Since the multilevel
model would look for effects within and between towns, neighborhoods in
towns, and households in towns and neighborhoods, the question is what to do
in this case about sample weighting.
> For other analyses (including tabulations of results) each household is
weighted according to the product of the reciprocals of sampling ratios at
the three stages (Pt/St)(Ptn/Stn)(Ptnh/Stnh) where M=Population and
S=sample, t=towns, n=neighborhoods, and h=households. These weights, in this
format, correct for proportionality of sampling, and also inflate the
weighted sample to population size. A deflation of the weighted sample to
its original sample size could be achieved through multiplying those weights
by the ratio R=(total household sample size/total households in the
population).
> The question is whether any weight (inflationary or most probably not
inflationary) should be used when analyzing a multilevel model where the
levels are the sampling stages. I have heard opinions for and against, and
would like to hear more on the subject.

>
> Hector
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except
> the command. To leave the list, send the command SIGNOFF SPSSX-L For a
> list of commands to manage subscriptions, send the command INFO
> REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD
-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.1913 / Virus Database: 2112/4814 - Release Date: 02/16/12

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Ryan

Mar 02, 2012; 5:44pm

Re: Weighting in GENLINMIXED

910 posts

Hector,

From what you've reported, it sounds like GENLINMIXED will not account for the probability weighting scheme that you desire. Perhaps there is a workaround, but I have not explored this area enough to provide any recommendations.

Having said that, you may not be aware that you've stepped into a highly debated topic--when and how to use weighting in the context of hierarchical models. And even more broadly, whehther one should even employ a hierarchical model in the context of multi-stage sampling.

Here's a very informative SUGI paper (by David Cassell) that touches on the latter topic (a mixed model procedure versus a survey sampling procedure that allows for clustering). It is a MUST read:

http://www2.sas.com/proceedings/sugi31/193-31.pdf

David Cassell also posted a message on SAS-L (among many similar messages) that I've found to be informative when thinking about using (or perhaps not using) hierarchical models with data obtained from complex survey designs:

http://listserv.uga.edu/cgi-bin/wa?A2=ind0602D&L=sas-l&P=R44987

I also suggest you consider reviewing Chapter 14 in the second edition of "Multilevel Analysis: An introduction to basic and advanced
multilevel modeling":

http://www.stats.ox.ac.uk/~snijders/SnBos_contents.html

HTH,

Ryan

On Mon, Feb 27, 2012 at 11:57 PM, Hector Maletta <[hidden email]> wrote:

Thanks to Matthew Poes and Arthur Kendall for responding to my question. I
appreciate your contributions, but I am still looking for an answer.

To repeat my situation: I am asked to assist in applying a multilevel model
(through GENLINMIXED) with a survey based on a three-stage clustered sample

(first selecting towns, then neighborhoods, then households). Since the
multilevel model would look for effects within and between towns,
neighborhoods within towns, and households within towns and neighborhoods,

the question is what to do in this case about sample weighting.

Since the sampling ratio is not uniform, sample cases in different clusters
and strata represent different numbers of cases in the population. Using no
weights would be evidently wrong.

One may use inflationary frequency weights, whereby the weighted sample size
equals population size, OR one may use merely proportional weights where
total weighted sample size equals the actual number of cases in the sample.
In this latter option, some cases have weights >1, while others have weights
<1 (the average case has weight=1).

SPSS computes statistical tests relative to the weighted sample. Thus the
first approach (inflationary weights) would fool SPSS into believing that
the sample size is enormous, and consequently statistical error would be
grossly understated.

On the other hand, the second approach (proportional non-inflationary
weights) would produce many cases with weights well below 1; this should not
cause trouble if fractional weights were totalled first and then rounded,
but this appears not to be the case: Generalized Linear Mixed Model
Algorithms say (in the section about notation, concerning frequency weights
"f"): "Non-integer elements are treated by rounding the value to the nearest
integer. For values less than 0.5 or missing, the corresponding records are
not used." If that is so, then all cases with proportional weights lower
than 0.5 (i.e. all cases representing less than half the reciprocal of the
average sampling ratio) would be ignored, which is also unacceptable.

Anybody?

Hector

-----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de
Arthur Kendall
Enviado el: Thursday, February 16, 2012 11:01
Para: [hidden email]
Asunto: Re: Weighting in GENLINMIXED

I have not yet had an opportunity to work out the relation of multilevel
models to complex samples. I have gather a few data sets but have not
had a chance to run them through both procedures. I suspect that the two
are at least close to if not identically the same thing.

If you do not get satisfactory responses on the list I suggest you try
running both procedures and see how the complex samples weights things.

I am out of town for the Conference on Statistical Practice so do not have
access to my books and data sets that I was going to try.

HTH

Art

On 2/16/2012 5:27 AM, Hector Maletta wrote:
> I am asked to assist in an application of a multilevel model through
GENLINMIXED, f theor a survey based on a three-stage clustered sample (first
selecting towns, then neighborhoods, then households). Since the multilevel
model would look for effects within and between towns, neighborhoods in
towns, and households in towns and neighborhoods, the question is what to do
in this case about sample weighting.
> For other analyses (including tabulations of results) each household is
weighted according to the product of the reciprocals of sampling ratios at
the three stages (Pt/St)(Ptn/Stn)(Ptnh/Stnh) where M=Population and
S=sample, t=towns, n=neighborhoods, and h=households. These weights, in this
format, correct for proportionality of sampling, and also inflate the
weighted sample to population size. A deflation of the weighted sample to
its original sample size could be achieved through multiplying those weights
by the ratio R=(total household sample size/total households in the
population).
> The question is whether any weight (inflationary or most probably not
inflationary) should be used when analyzing a multilevel model where the
levels are the sampling stages. I have heard opinions for and against, and
would like to hear more on the subject.
>
> Hector
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except
> the command. To leave the list, send the command SIGNOFF SPSSX-L For a
> list of commands to manage subscriptions, send the command INFO
> REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD

-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.1913 / Virus Database: 2112/4814 - Release Date: 02/16/12

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

... [show rest of quote]

Hector Maletta

Mar 02, 2012; 11:28pm

Re: Weighting in GENLINMIXED

602 posts

Dear Ryan,

Thanks for your very informative response. I am in fact (to a certain extent) aware of some of the discussions on this topic, though not of the Cassell paper and post that you kindly included links to. I had already read the new edition of Snijders and Bosker book on Multilevel Modeling, where (as you note) a whole new chapter has been added concerning sample weighting. The chapter is very informative though somewhat inconclusive about the various issues involved. Will read the Cassell materials and eventually send a comment around.

Hector

De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de R B
Enviado el: Friday, March 02, 2012 14:44
Para: [hidden email]
Asunto: Re: Weighting in GENLINMIXED

Hector,

Here's a very informative SUGI paper (by David Cassell) that touches on the latter topic (a mixed model procedure versus a survey sampling procedure that allows for clustering). It is a MUST read:

http://www2.sas.com/proceedings/sugi31/193-31.pdf

http://listserv.uga.edu/cgi-bin/wa?A2=ind0602D&L=sas-l&P=R44987

I also suggest you consider reviewing Chapter 14 in the second edition of "Multilevel Analysis: An introduction to basic and advanced
multilevel modeling":

http://www.stats.ox.ac.uk/~snijders/SnBos_contents.html

HTH,

Ryan

On Mon, Feb 27, 2012 at 11:57 PM, Hector Maletta <[hidden email]> wrote:

(first selecting towns, then neighborhoods, then households). Since the
multilevel model would look for effects within and between towns,

neighborhoods within towns, and households within towns and neighborhoods,

the question is what to do in this case about sample weighting.

Since the sampling ratio is not uniform, sample cases in different clusters
and strata represent different numbers of cases in the population. Using no
weights would be evidently wrong.

One may use inflationary frequency weights, whereby the weighted sample size
equals population size, OR one may use merely proportional weights where
total weighted sample size equals the actual number of cases in the sample.
In this latter option, some cases have weights >1, while others have weights
<1 (the average case has weight=1).

SPSS computes statistical tests relative to the weighted sample. Thus the
first approach (inflationary weights) would fool SPSS into believing that
the sample size is enormous, and consequently statistical error would be
grossly understated.

On the other hand, the second approach (proportional non-inflationary
weights) would produce many cases with weights well below 1; this should not
cause trouble if fractional weights were totalled first and then rounded,
but this appears not to be the case: Generalized Linear Mixed Model
Algorithms say (in the section about notation, concerning frequency weights
"f"): "Non-integer elements are treated by rounding the value to the nearest
integer. For values less than 0.5 or missing, the corresponding records are
not used." If that is so, then all cases with proportional weights lower
than 0.5 (i.e. all cases representing less than half the reciprocal of the
average sampling ratio) would be ignored, which is also unacceptable.

Anybody?

Hector

-----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de
Arthur Kendall
Enviado el: Thursday, February 16, 2012 11:01
Para: [hidden email]
Asunto: Re: Weighting in GENLINMIXED

I have not yet had an opportunity to work out the relation of multilevel
models to complex samples. I have gather a few data sets but have not
had a chance to run them through both procedures. I suspect that the two
are at least close to if not identically the same thing.

If you do not get satisfactory responses on the list I suggest you try
running both procedures and see how the complex samples weights things.

I am out of town for the Conference on Statistical Practice so do not have
access to my books and data sets that I was going to try.

HTH

Art

On 2/16/2012 5:27 AM, Hector Maletta wrote:
> I am asked to assist in an application of a multilevel model through
GENLINMIXED, f theor a survey based on a three-stage clustered sample (first
selecting towns, then neighborhoods, then households). Since the multilevel
model would look for effects within and between towns, neighborhoods in
towns, and households in towns and neighborhoods, the question is what to do
in this case about sample weighting.
> For other analyses (including tabulations of results) each household is
weighted according to the product of the reciprocals of sampling ratios at
the three stages (Pt/St)(Ptn/Stn)(Ptnh/Stnh) where M=Population and
S=sample, t=towns, n=neighborhoods, and h=households. These weights, in this
format, correct for proportionality of sampling, and also inflate the
weighted sample to population size. A deflation of the weighted sample to
its original sample size could be achieved through multiplying those weights
by the ratio R=(total household sample size/total households in the
population).
> The question is whether any weight (inflationary or most probably not
inflationary) should be used when analyzing a multilevel model where the
levels are the sampling stages. I have heard opinions for and against, and
would like to hear more on the subject.

-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2012.0.1913 / Virus Database: 2112/4814 - Release Date: 02/16/12