Hello,
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
I am planning to fit an HLM model predicting the odds of HIV risk behavior among people who are aware of their HIV status, using the demographic surveys. I am considering recent survey from 30 countries. I would fit either a two level model (in which individuals are nested into countries) or three level model (individuals nested into regions - nested in countries). Here are my sample sizes: Level 3: Max of 30 countries Level 2 (regions): varies between 3 regions to 15 regions per country Level 1: Individuals (men/women): 2 to a 1,000 who are aware of their positive status. Now, my question relates to the sample size. I've read some of the literature on the area and aware of the 30-30 rule (30 groups with 30 observations/group) Since my focus is on those who are presumably aware of their seropositivity, I will focus on those countries that have sufficiently large samples of those focal population. The total number varies between 2 to 1,000. But there are gender differences. So, should I take the gender differences into account (i.e, the number of men/women) when deciding whether to include a country or not. CountryA may have 30 observations total, but considerably more women (25) than men (5). In that case, should I include the country or delete. In otherwords, is the population in sub-groups of interest important for sample determination? Thanks -cY |
My experience is with power analysis and experimental design,
rather than with survey design, but you do suggest problems
that I am familiar with.
You say that there are "gender differences" - I would expect
both main effects and interactions, and you have the further
problem in having Ns that may be severely unbalanced by
gender, not always in the same direction.
This suggests to me that I would consider a males-only analysis
and a females-only analysis for main conclusions; and after that,
look very carefully at what can be inferred across regions or
countries for the possible gender differences and interactions,
using all data, if possible.
Before looking at outcomes, I would try to see how much I
might be able to collapse across demographically near and
similar regions (say) in order to boost the smallest Ns.
--
Rich Ulrich
From: SPSSX(r) Discussion <[hidden email]> on behalf of Chao yawo <[hidden email]>
Sent: Wednesday, May 8, 2019 9:14 PM To: [hidden email] Subject: HLM and Sample Size (sub-groups) Hello,
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
I am planning to fit an HLM model predicting the odds of HIV risk behavior among people who are aware of their HIV status, using the demographic surveys. I am considering recent survey from 30 countries. I would fit either a two level model (in which individuals are nested into countries) or three level model (individuals nested into regions - nested in countries). Here are my sample sizes: Level 3: Max of 30 countries Level 2 (regions): varies between 3 regions to 15 regions per country Level 1: Individuals (men/women): 2 to a 1,000 who are aware of their positive status. Now, my question relates to the sample size. I've read some of the literature on the area and aware of the 30-30 rule (30 groups with 30 observations/group) Since my focus is on those who are presumably aware of their seropositivity, I will focus on those countries that have sufficiently large samples of those focal population. The total number varies between 2 to 1,000. But there are gender differences. So, should I take the gender differences into account (i.e, the number of men/women) when deciding whether to include a country or not. CountryA may have 30 observations total, but considerably more women (25) than men (5). In that case, should I include the country or delete. In otherwords, is the population in sub-groups of interest important for sample determination? Thanks -cY |
Another consideration is whether the gender imbalance across countries or regions is itself endogenous, in which case there are other considerations besides sample size. On Thu, May 9, 2019 at 12:06 AM Rich Ulrich <[hidden email]> wrote:
|
Administrator
|
In reply to this post by Chao yawo-2
I had not heard of that 30-30 rule of thumb, but discovered that Joop Hox
mentions it in the second edition (2010) of his book, Multilevel Analysis. Here's what he says on p. 235. --- start of excerpt --- It is clear that with increasing sample sizes at all levels, estimates and their standard errors become more accurate. Kreft (1996) suggests a rule of thumb, which she calls the ‘30/30 rule’. To be on the safe side, researchers should strive for a sample of at least 30 groups with at least 30 individuals per group. From the various simulations, this seems sound advice if the interest is mostly in the fixed parameters. For certain applications, one may modify this rule of thumb. Specifically, if there is strong interest in cross-level interactions, the number of groups should be larger, which leads to a 50/20 rule: about 50 groups with about 20 individuals per group. If there is strong interest in the random part, the variance and covariance components and their standard errors, the number of groups should be considerably larger, which leads to a 100/10 rule: about 100 groups with at least about 10 individuals per group. These rules of thumb take into account that there are costs attached to data collection, so if the number of groups is increased, the number of individuals per group decreases. In some cases, this may not be a realistic reflection of costs. For instance, in school research an extra cost will be incurred when an extra class is included. Testing only part of the class instead of all pupils will usually not make much difference in the data collection cost. Given a limited budget, an optimal design should reflect the various costs of data collection. Snijders and Bosker (1993), Cohen (1998), Raudenbush and Liu (2000), and Moerbeek, van Breukelen, and Berger (2000) all discuss the problem of choosing sample sizes at two levels while considering costs. Moerbeek, van Breukelen, and Berger (2001) discuss the problem of optimal design for multilevel logistic models. Essentially, optimal design is a question of balancing statistical power against data collection costs. Data collection costs depend on the details of the data collection method (see Groves, 1989). The problem of estimating power in multilevel designs is treated later in this chapter. --- end of excerpt --- The two main points I would draw from that are this: 1. You need to think about whether you are most interested in fixed effects, cross-level interactions or random effects. 2. You might want to look up Moerbeek, van Breukelen, and Berger (2001), given that you would appear to be estimating a multilevel logit model. (I infer that from your desire to model predicted odds.) Moerbeek, M., van Breukelen, G. J. P., & Berger, M. (2001). Optimal experimental design for multilevel logistic models. The Statistician, 50, 17–30. https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/1467-9884.00257 Pages 46-48 of the book by Snidjers & Bosker (2012, 2nd Ed) also have some material you may find useful. You should be able to see it via Google Books using this (long) link: https://books.google.ca/books?id=N1BQvcomDdQC&pg=PA46&lpg=PA46&dq=%22These+two+different+interpretations+of+equation%22&source=bl&ots=Ah4sRXxAjK&sig=ACfU3U2sBwiF1oyLjVLt5L6_GWHSwoHbKA&hl=en&sa=X&ved=2ahUKEwilpIul247iAhUnheAKHciRAbUQ6AEwAHoECAAQAQ#v=onepage&q=%22These%20two%20different%20interpretations%20of%20equation%22&f=false HTH. Chao yawo-2 wrote > Hello, > > I am planning to fit an HLM model predicting the odds of HIV risk behavior > among people who are aware of their HIV status, using the demographic > surveys. I am considering recent survey from 30 countries. > > I would fit either a two level model (in which individuals are nested into > countries) or three level model (individuals nested into regions - nested > in countries). > > Here are my sample sizes: > > Level 3: Max of 30 countries > Level 2 (regions): varies between 3 regions to 15 regions per country > Level 1: Individuals (men/women): 2 to a 1,000 who are aware of their > positive status. > > Now, my question relates to the sample size. I've read some of the > literature on the area and aware of the 30-30 rule (30 groups with 30 > observations/group) > > Since my focus is on those who are presumably aware of their > seropositivity, I will focus on those countries that have sufficiently > large samples of those focal population. The total number varies between 2 > to 1,000. But there are gender differences. > > So, should I take the gender differences into account (i.e, the number of > men/women) when deciding whether to include a country or not. CountryA may > have 30 observations total, but considerably more women (25) than men (5). > In that case, should I include the country or delete. In otherwords, is > the > population in sub-groups of interest important for sample determination? > > Thanks -cY > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Administrator
|
The 3rd edition of Hox's book (with Moerbeek & Schoot) says a bit more about
the effects of group size. The following excerpt is from pp. 215-216. Look for the sentence starting with, "Theall et al. (2011) studied the effects of small group sizes." --- start of excerpt --- It is clear that with increasing sample sizes at all levels, estimates and their standard errors become more accurate. Kreft (1996) suggests a rule of thumb, which she calls the ‘30/30 rule.’ To be on the safe side, researchers should strive for a sample of at least 30 groups with at least 30 individuals per group. From various simulations, this seems sound advice if the interest is mostly in the fixed parameters. However, it seems that this rule will likely not yield high power levels for fixed effects at both levels (Bell et al. 2014). For certain applications, one may modify this rule of thumb. Specifically, if there is strong interest in cross-level interactions, the number of groups should be larger, which leads to a 50/20 rule: about 50 groups with about 20 individuals per group. If there is strong interest in the random part, the variance and covariance components and their standard errors, the number of groups should be considerably larger, which leads to a 100/10 rule: about 100 groups with at least about 10 individuals per group. Theall et al. (2011) studied the effects of small group sizes (i.e. less than five). They found that when the number of groups was as large as 459, the fixed and random effects were not affected by group size. When the number of groups decreased, inflated standard errors of fixed and random effects were found. Group-level variance estimates were more inflated than fixed effects. Raudenbush (2008) also treats the case of many small groups. Many small groups always arise when the object of study is twins, married couples, families, or short time series (Raudenbush, 2008, p. 215). The general advice is to keep the model simple, with few random components at the second level. The exception are short time series, where often relatively much and reliable variation is found at the subject level (Raudenbush, 2008, p. 218). When the number of groups is smaller than 20, fixed parameter estimates and their standard errors become inaccurate. When the interest is in variance components, as in structural equation modeling (SEM), the minimum number of groups is 50 (Meuleman & Billiet, 2009). Hox, van de Schoot and Mattijsse (2012) show that with Bayesian estimation, SEM with as few as 20 groups is feasible. We refer to McNeish and Stapleton (2016) for a general review of the problems associated with having a small number of groups. These rules of thumb take into account that there are costs attached to data collection, so if the number of groups is increased, the number of individuals per group decreases. In some cases, this may not be a realistic reflection of costs. For instance, in school research an extra cost will be incurred when an extra class is included. Testing only part of the class instead of all pupils will usually not make much difference in the data collection cost. Given a limited budget, an optimal design should reflect the various costs of data collection. Snijders and Bosker (1993), Cohen (1998), Raudenbush and Liu (2000) and Moerbeek, van Breukelen and Berger (2000) all discuss the problem of choosing sample sizes at two levels while considering costs. Moerbeek, van Breukelen and Berger (2001) discuss the problem of optimal design for multilevel logistic models. Essentially, optimal design is a question of balancing statistical power against data collection costs. Data collection costs depend on the details of the data collection method. The problem of estimating power in multilevel designs is treated later in this chapter. --- end of excerpt --- Hox, J. J., Moerbeek, M., & Van de Schoot, R. (2017). Multilevel analysis: Techniques and applications. Routledge. Bruce Weaver wrote > I had not heard of that 30-30 rule of thumb, but discovered that Joop Hox > mentions it in the second edition (2010) of his book, Multilevel Analysis. > Here's what he says on p. 235. > > --- snip ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
There are strong analogies between complex samples and HLM on a
logic/procedure level. Results of using HLM and complex samples often give strongly similar results. I do not know when and if they result in different substantive conclusions. I would like to see some list members who use one of these approaches also use the other. That being said from a complex samples approach, finite population corrections can make a big difference in sample size. It is not possible to have a random sample of 30 from a population of 10. For example, there are only 5 permanent members of the UN security council. There are not 30 US states that have sea coasts. There are only 10 Canadian Provinces, etc. In dna on pair 23 it is only possible to have so many combinations of X, Y, x, and y in a trisomy. ----- Art Kendall Social Research Consultants -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
Free forum by Nabble | Edit this page |