I am asked to assist in an application of a multilevel model through GENLINMIXED, for a survey based on a three-stage clustered sample (first selecting towns, then neighborhoods, then households). Since the multilevel model would look for effects within and between towns, neighborhoods in towns, and households in towns and neighborhoods, the question is what to do in this case about sample weighting.
For other analyses (including tabulations of results) each household is weighted according to the product of the reciprocals of sampling ratios at the three stages (Pt/St)(Ptn/Stn)(Ptnh/Stnh) where M=Population and S=sample, t=towns, n=neighborhoods, and h=households. These weights, in this format, correct for proportionality of sampling, and also inflate the weighted sample to population size. A deflation of the weighted sample to its original sample size could be achieved through multiplying those weights by the ratio R=(total household sample size/total households in the population). The question is whether any weight (inflationary or most probably not inflationary) should be used when analyzing a multilevel model where the levels are the sampling stages. I have heard opinions for and against, and would like to hear more on the subject. Hector ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
I have not yet had an opportunity to work out the relation of multilevel
models to complex samples. I have gather a few data sets but have not had a chance to run them through both procedures. I suspect that the two are at least close to if not identically the same thing. If you do not get satisfactory responses on the list I suggest you try running both procedures and see how the complex samples weights things. I am out of town for the Conference on Statistical Practice so do not have access to my books and data sets that I was going to try. HTH Art On 2/16/2012 5:27 AM, Hector Maletta wrote: > I am asked to assist in an application of a multilevel model through GENLINMIXED, f theor a survey based on a three-stage clustered sample (first selecting towns, then neighborhoods, then households). Since the multilevel model would look for effects within and between towns, neighborhoods in towns, and households in towns and neighborhoods, the question is what to do in this case about sample weighting. > For other analyses (including tabulations of results) each household is weighted according to the product of the reciprocals of sampling ratios at the three stages (Pt/St)(Ptn/Stn)(Ptnh/Stnh) where M=Population and S=sample, t=towns, n=neighborhoods, and h=households. These weights, in this format, correct for proportionality of sampling, and also inflate the weighted sample to population size. A deflation of the weighted sample to its original sample size could be achieved through multiplying those weights by the ratio R=(total household sample size/total households in the population). > The question is whether any weight (inflationary or most probably not inflationary) should be used when analyzing a multilevel model where the levels are the sampling stages. I have heard opinions for and against, and would like to hear more on the subject. > > Hector > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
In reply to this post by Hector Maletta
A weighted sample needs to be analyzed as a weighted sample. Otherwise your standard errors will be wrong (inflated), and your significance tests may not be accurately interpretable. It's especially important in MLM/HLM procedures because of how variance is partialed out. However, having said that, I don't know how to do that in SPSS without the advanced sampling package. I personally use SAS for most of my multi-level modeling, and again, I'm not sure how to handle complex samples with SAS and MLM, so I've had to use HLM for such a procedure. In other words, I believe you may not be able to handle this correctly with SPSS. Even if they did have the necessary feature with the advanced sampling add-on (someone from SPSS might need to answer that), you would have to find out what approach it uses in multi-level modeling. In general, with any form of hierarchical model, a different form of multi-stage weighting is suggested from the more traditional approaches.
-Matt -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Hector Maletta Sent: Thursday, February 16, 2012 4:28 AM To: [hidden email] Subject: Weighting in GENLINMIXED I am asked to assist in an application of a multilevel model through GENLINMIXED, for a survey based on a three-stage clustered sample (first selecting towns, then neighborhoods, then households). Since the multilevel model would look for effects within and between towns, neighborhoods in towns, and households in towns and neighborhoods, the question is what to do in this case about sample weighting. For other analyses (including tabulations of results) each household is weighted according to the product of the reciprocals of sampling ratios at the three stages (Pt/St)(Ptn/Stn)(Ptnh/Stnh) where M=Population and S=sample, t=towns, n=neighborhoods, and h=households. These weights, in this format, correct for proportionality of sampling, and also inflate the weighted sample to population size. A deflation of the weighted sample to its original sample size could be achieved through multiplying those weights by the ratio R=(total household sample size/total households in the population). The question is whether any weight (inflationary or most probably not inflationary) should be used when analyzing a multilevel model where the levels are the sampling stages. I have heard opinions for and against, and would like to hear more on the subject. Hector ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Hector Maletta
Hello to everyboy: If I have an open syntax editor window, with something like: COMPUTE dif_abs_p_g3_C_09_area=ABS(p_g3_C_09_area.1 - p_g3_C_09_area.2). COMPUTE dif_abs_p_g3_C_09_gestion=ABS(p_g3_C_09_gestion.1 - p_g3_C_09_gestion.2). COMPUTE dif_abs_p_g3_C_09_caracteristica2=ABS(p_g3_C_09_caracteristica2.1 - p_g3_C_09_caracteristica2.2). COMPUTE dif_abs_p_g3_C_09_area=ABS(p_g3_C_09_area.1 - p_g3_C_09_area.2). COMPUTE dif_abs_p_g3_C_09_gestion=ABS(p_g3_C_09_gestion.1 - p_g3_C_09_gestion.2). COMPUTE dif_abs_p_g3_C_09_caracteristica2=ABS(p_g3_C_09_caracteristica2.1 - p_g3_C_09_caracteristica2.2). And use the mouse to higligth the tree last lines and activate the menu Find and Replace (Ctrl + H), and put on Find : _C and on Replace: _M , activate the option Items to search: Selected ,and click on Replace All, I get an error mesaage of search string not found. But if I do the same in SPSS17, i don't get the error message and get what I want: COMPUTE dif_abs_p_g3_C_09_area=ABS(p_g3_C_09_area.1 - p_g3_C_09_area.2). COMPUTE dif_abs_p_g3_C_09_gestion=ABS(p_g3_C_09_gestion.1 - p_g3_C_09_gestion.2). COMPUTE dif_abs_p_g3_C_09_caracteristica2=ABS(p_g3_C_09_caracteristica2.1 - p_g3_C_09_caracteristica2.2). COMPUTE dif_abs_p_g3_M_09_area=ABS(p_g3_M_09_area.1 - p_g3_M_09_area.2). COMPUTE dif_abs_p_g3_M_09_gestion=ABS(p_g3_M_09_gestion.1 - p_g3_M_09_gestion.2). COMPUTE dif_abs_p_g3_M_09_caracteristica2=ABS(p_g3_M_09_caracteristica2.1 - p_g3_M_09_caracteristica2.2). Is this a some kind of bug on v20? Kindly Andrés |
In reply to this post by Art Kendall
Thanks to Matthew Poes and Arthur Kendall for responding to my question. I
appreciate your contributions, but I am still looking for an answer. To repeat my situation: I am asked to assist in applying a multilevel model (through GENLINMIXED) with a survey based on a three-stage clustered sample (first selecting towns, then neighborhoods, then households). Since the multilevel model would look for effects within and between towns, neighborhoods within towns, and households within towns and neighborhoods, the question is what to do in this case about sample weighting. Since the sampling ratio is not uniform, sample cases in different clusters and strata represent different numbers of cases in the population. Using no weights would be evidently wrong. One may use inflationary frequency weights, whereby the weighted sample size equals population size, OR one may use merely proportional weights where total weighted sample size equals the actual number of cases in the sample. In this latter option, some cases have weights >1, while others have weights <1 (the average case has weight=1). SPSS computes statistical tests relative to the weighted sample. Thus the first approach (inflationary weights) would fool SPSS into believing that the sample size is enormous, and consequently statistical error would be grossly understated. On the other hand, the second approach (proportional non-inflationary weights) would produce many cases with weights well below 1; this should not cause trouble if fractional weights were totalled first and then rounded, but this appears not to be the case: Generalized Linear Mixed Model Algorithms say (in the section about notation, concerning frequency weights "f"): "Non-integer elements are treated by rounding the value to the nearest integer. For values less than 0.5 or missing, the corresponding records are not used." If that is so, then all cases with proportional weights lower than 0.5 (i.e. all cases representing less than half the reciprocal of the average sampling ratio) would be ignored, which is also unacceptable. Anybody? Hector -----Mensaje original----- De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Arthur Kendall Enviado el: Thursday, February 16, 2012 11:01 Para: [hidden email] Asunto: Re: Weighting in GENLINMIXED I have not yet had an opportunity to work out the relation of multilevel models to complex samples. I have gather a few data sets but have not had a chance to run them through both procedures. I suspect that the two are at least close to if not identically the same thing. If you do not get satisfactory responses on the list I suggest you try running both procedures and see how the complex samples weights things. I am out of town for the Conference on Statistical Practice so do not have access to my books and data sets that I was going to try. HTH Art On 2/16/2012 5:27 AM, Hector Maletta wrote: > I am asked to assist in an application of a multilevel model through GENLINMIXED, f theor a survey based on a three-stage clustered sample (first selecting towns, then neighborhoods, then households). Since the multilevel model would look for effects within and between towns, neighborhoods in towns, and households in towns and neighborhoods, the question is what to do in this case about sample weighting. > For other analyses (including tabulations of results) each household is weighted according to the product of the reciprocals of sampling ratios at the three stages (Pt/St)(Ptn/Stn)(Ptnh/Stnh) where M=Population and S=sample, t=towns, n=neighborhoods, and h=households. These weights, in this format, correct for proportionality of sampling, and also inflate the weighted sample to population size. A deflation of the weighted sample to its original sample size could be achieved through multiplying those weights by the ratio R=(total household sample size/total households in the population). > The question is whether any weight (inflationary or most probably not inflationary) should be used when analyzing a multilevel model where the levels are the sampling stages. I have heard opinions for and against, and would like to hear more on the subject. > > Hector > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except > the command. To leave the list, send the command SIGNOFF SPSSX-L For a > list of commands to manage subscriptions, send the command INFO > REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ----- No virus found in this message. Checked by AVG - www.avg.com Version: 2012.0.1913 / Virus Database: 2112/4814 - Release Date: 02/16/12 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Hector,
From what you've reported, it sounds like GENLINMIXED will not account for the probability weighting scheme that you desire. Perhaps there is a workaround, but I have not explored this area enough to provide any recommendations.
Having said that, you may not be aware that you've stepped into a highly debated topic--when and how to use weighting in the context of hierarchical models. And even more broadly, whehther one should even employ a hierarchical model in the context of multi-stage sampling.
Here's a very informative SUGI paper (by David Cassell) that touches on the latter topic (a mixed model procedure versus a survey sampling procedure that allows for clustering). It is a MUST read:
David Cassell also posted a message on SAS-L (among many similar messages) that I've found to be informative when thinking about using (or perhaps not using) hierarchical models with data obtained from complex survey designs:
I also suggest you consider reviewing Chapter 14 in the second edition of "Multilevel Analysis: An introduction to basic and advanced
multilevel modeling": HTH,
Ryan
On Mon, Feb 27, 2012 at 11:57 PM, Hector Maletta <[hidden email]> wrote: Thanks to Matthew Poes and Arthur Kendall for responding to my question. I |
Dear Ryan, Thanks for your very informative response. I am in fact (to a certain extent) aware of some of the discussions on this topic, though not of the Cassell paper and post that you kindly included links to. I had already read the new edition of Snijders and Bosker book on Multilevel Modeling, where (as you note) a whole new chapter has been added concerning sample weighting. The chapter is very informative though somewhat inconclusive about the various issues involved. Will read the Cassell materials and eventually send a comment around. Hector De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de R B Hector, From what you've reported, it sounds like GENLINMIXED will not account for the probability weighting scheme that you desire. Perhaps there is a workaround, but I have not explored this area enough to provide any recommendations. Having said that, you may not be aware that you've stepped into a highly debated topic--when and how to use weighting in the context of hierarchical models. And even more broadly, whehther one should even employ a hierarchical model in the context of multi-stage sampling. Here's a very informative SUGI paper (by David Cassell) that touches on the latter topic (a mixed model procedure versus a survey sampling procedure that allows for clustering). It is a MUST read: David Cassell also posted a message on SAS-L (among many similar messages) that I've found to be informative when thinking about using (or perhaps not using) hierarchical models with data obtained from complex survey designs: I also suggest you consider reviewing Chapter 14 in the second edition of "Multilevel Analysis: An introduction to basic and advanced HTH, Ryan On Mon, Feb 27, 2012 at 11:57 PM, Hector Maletta <[hidden email]> wrote: Thanks to Matthew Poes and Arthur Kendall for responding to my question. I (first selecting towns, then neighborhoods, then households). Since the neighborhoods within towns, and households within towns and neighborhoods, the question is what to do in this case about sample weighting. Since the sampling ratio is not uniform, sample cases in different clusters
> > Hector > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except > the command. To leave the list, send the command SIGNOFF SPSSX-L For a > list of commands to manage subscriptions, send the command INFO > REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD -----
|
Free forum by Nabble | Edit this page |