HLM and Sample Size (sub-groups)

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

HLM and Sample Size (sub-groups)

Chao yawo-2
Hello,

I am planning to fit an HLM model predicting the odds of HIV risk behavior among people who are aware of their HIV status, using the demographic surveys. I am considering recent survey from 30 countries.

I would fit either a two level model (in which individuals are nested into countries) or three level model (individuals nested into regions - nested in countries).

Here are my sample sizes:

Level 3: Max of 30 countries
Level 2 (regions): varies between 3 regions to 15 regions per country
Level 1: Individuals (men/women): 2 to a 1,000 who are aware of their positive status.

Now, my question relates to the sample size. I've read some of the literature on the area and aware of the 30-30 rule (30 groups with 30 observations/group)

Since my focus is on those who are presumably aware of their seropositivity, I will focus on those countries that have sufficiently large samples of those focal population. The total number varies between 2 to 1,000. But there are gender differences.

So, should I take the gender differences into account (i.e, the number of men/women) when deciding whether to include a country or not. CountryA may have 30 observations total, but considerably more women (25) than men (5). In that case, should I include the country or delete. In otherwords, is the population in sub-groups of interest important for sample determination?

Thanks -cY
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: HLM and Sample Size (sub-groups)

Rich Ulrich
My experience is with power analysis and experimental design,
rather than with survey design, but you do suggest problems
that I am familiar with.

You say that there are "gender differences" - I would expect
both main effects and interactions, and you have the further
problem in having Ns that may be severely unbalanced by
gender, not always in the same direction.

This suggests to me that I would consider a males-only analysis
and a females-only analysis for main conclusions; and after that,
look very carefully at what can be inferred across regions or
countries for the possible gender differences and interactions,
using all data, if possible.

Before looking at outcomes, I would try to see how much I
might be able to collapse across demographically near and
similar regions (say) in order to boost the smallest Ns. 

--
Rich Ulrich


From: SPSSX(r) Discussion <[hidden email]> on behalf of Chao yawo <[hidden email]>
Sent: Wednesday, May 8, 2019 9:14 PM
To: [hidden email]
Subject: HLM and Sample Size (sub-groups)
 
Hello,

I am planning to fit an HLM model predicting the odds of HIV risk behavior among people who are aware of their HIV status, using the demographic surveys. I am considering recent survey from 30 countries.

I would fit either a two level model (in which individuals are nested into countries) or three level model (individuals nested into regions - nested in countries).

Here are my sample sizes:

Level 3: Max of 30 countries
Level 2 (regions): varies between 3 regions to 15 regions per country
Level 1: Individuals (men/women): 2 to a 1,000 who are aware of their positive status.

Now, my question relates to the sample size. I've read some of the literature on the area and aware of the 30-30 rule (30 groups with 30 observations/group)

Since my focus is on those who are presumably aware of their seropositivity, I will focus on those countries that have sufficiently large samples of those focal population. The total number varies between 2 to 1,000. But there are gender differences.

So, should I take the gender differences into account (i.e, the number of men/women) when deciding whether to include a country or not. CountryA may have 30 observations total, but considerably more women (25) than men (5). In that case, should I include the country or delete. In otherwords, is the population in sub-groups of interest important for sample determination?

Thanks -cY
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: HLM and Sample Size (sub-groups)

Jon Peck
Another consideration is whether the gender imbalance across countries or regions is itself endogenous, in which case there are other considerations besides sample size.

On Thu, May 9, 2019 at 12:06 AM Rich Ulrich <[hidden email]> wrote:
My experience is with power analysis and experimental design,
rather than with survey design, but you do suggest problems
that I am familiar with.

You say that there are "gender differences" - I would expect
both main effects and interactions, and you have the further
problem in having Ns that may be severely unbalanced by
gender, not always in the same direction.

This suggests to me that I would consider a males-only analysis
and a females-only analysis for main conclusions; and after that,
look very carefully at what can be inferred across regions or
countries for the possible gender differences and interactions,
using all data, if possible.

Before looking at outcomes, I would try to see how much I
might be able to collapse across demographically near and
similar regions (say) in order to boost the smallest Ns. 

--
Rich Ulrich


From: SPSSX(r) Discussion <[hidden email]> on behalf of Chao yawo <[hidden email]>
Sent: Wednesday, May 8, 2019 9:14 PM
To: [hidden email]
Subject: HLM and Sample Size (sub-groups)
 
Hello,

I am planning to fit an HLM model predicting the odds of HIV risk behavior among people who are aware of their HIV status, using the demographic surveys. I am considering recent survey from 30 countries.

I would fit either a two level model (in which individuals are nested into countries) or three level model (individuals nested into regions - nested in countries).

Here are my sample sizes:

Level 3: Max of 30 countries
Level 2 (regions): varies between 3 regions to 15 regions per country
Level 1: Individuals (men/women): 2 to a 1,000 who are aware of their positive status.

Now, my question relates to the sample size. I've read some of the literature on the area and aware of the 30-30 rule (30 groups with 30 observations/group)

Since my focus is on those who are presumably aware of their seropositivity, I will focus on those countries that have sufficiently large samples of those focal population. The total number varies between 2 to 1,000. But there are gender differences.

So, should I take the gender differences into account (i.e, the number of men/women) when deciding whether to include a country or not. CountryA may have 30 observations total, but considerably more women (25) than men (5). In that case, should I include the country or delete. In otherwords, is the population in sub-groups of interest important for sample determination?

Thanks -cY
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: HLM and Sample Size (sub-groups)

Bruce Weaver
Administrator
In reply to this post by Chao yawo-2
I had not heard of that 30-30 rule of thumb, but discovered that Joop Hox
mentions it in the second edition (2010) of his book, Multilevel Analysis.
Here's what he says on p. 235.

--- start of excerpt ---
It is clear that with increasing sample sizes at all levels, estimates and
their standard errors become more accurate. Kreft (1996) suggests a rule of
thumb, which she calls the ‘30/30 rule’. To be on the safe side, researchers
should strive for a sample of at least 30 groups with at least 30
individuals per group. From the various simulations, this seems sound advice
if the interest is mostly in the fixed parameters. For certain applications,
one may modify this rule of thumb. Specifically, if there is strong interest
in cross-level interactions, the number of groups should be larger, which
leads to a 50/20 rule: about
50 groups with about 20 individuals per group. If there is strong interest
in the random part, the variance and covariance components and their
standard errors, the number of groups should be considerably larger, which
leads to a 100/10 rule: about 100 groups with at least about 10 individuals
per group.

These rules of thumb take into account that there are costs attached to data
collection, so if the number of groups is increased, the number of
individuals per group decreases. In some cases, this may not be a realistic
reflection of costs. For instance, in school research an extra cost will be
incurred when an extra class is included. Testing only part of the class
instead of all pupils will usually not make much difference in the data
collection cost. Given a limited budget, an optimal design should reflect
the various costs of data collection. Snijders and Bosker (1993), Cohen
(1998), Raudenbush and Liu (2000), and Moerbeek, van Breukelen, and Berger
(2000) all discuss the problem of choosing sample sizes at two levels while
considering costs. Moerbeek, van Breukelen, and Berger (2001) discuss the
problem of optimal design for multilevel logistic models. Essentially,
optimal design is a question of balancing statistical power against data
collection costs. Data collection costs depend on the details of the data
collection method (see Groves, 1989). The problem of estimating power in
multilevel designs is treated later in this chapter.
--- end of excerpt ---

The two main points I would draw from that are this:

1. You need to think about whether you are most interested in fixed effects,
cross-level interactions or random effects.  

2. You might want to look up Moerbeek, van Breukelen, and Berger (2001),
given that you would appear to be estimating a multilevel logit model.  (I
infer that from your desire to model predicted odds.)  

Moerbeek, M., van Breukelen, G. J. P., & Berger, M. (2001). Optimal
experimental design for multilevel
logistic models. The Statistician, 50, 17–30.
https://rss.onlinelibrary.wiley.com/doi/pdf/10.1111/1467-9884.00257

Pages 46-48 of the book by Snidjers & Bosker (2012, 2nd Ed) also have some
material you may find useful.  You should be able to see it via Google Books
using this (long) link:

https://books.google.ca/books?id=N1BQvcomDdQC&pg=PA46&lpg=PA46&dq=%22These+two+different+interpretations+of+equation%22&source=bl&ots=Ah4sRXxAjK&sig=ACfU3U2sBwiF1oyLjVLt5L6_GWHSwoHbKA&hl=en&sa=X&ved=2ahUKEwilpIul247iAhUnheAKHciRAbUQ6AEwAHoECAAQAQ#v=onepage&q=%22These%20two%20different%20interpretations%20of%20equation%22&f=false


HTH.



Chao yawo-2 wrote

> Hello,
>
> I am planning to fit an HLM model predicting the odds of HIV risk behavior
> among people who are aware of their HIV status, using the demographic
> surveys. I am considering recent survey from 30 countries.
>
> I would fit either a two level model (in which individuals are nested into
> countries) or three level model (individuals nested into regions - nested
> in countries).
>
> Here are my sample sizes:
>
> Level 3: Max of 30 countries
> Level 2 (regions): varies between 3 regions to 15 regions per country
> Level 1: Individuals (men/women): 2 to a 1,000 who are aware of their
> positive status.
>
> Now, my question relates to the sample size. I've read some of the
> literature on the area and aware of the 30-30 rule (30 groups with 30
> observations/group)
>
> Since my focus is on those who are presumably aware of their
> seropositivity, I will focus on those countries that have sufficiently
> large samples of those focal population. The total number varies between 2
> to 1,000. But there are gender differences.
>
> So, should I take the gender differences into account (i.e, the number of
> men/women) when deciding whether to include a country or not. CountryA may
> have 30 observations total, but considerably more women (25) than men (5).
> In that case, should I include the country or delete. In otherwords, is
> the
> population in sub-groups of interest important for sample determination?
>
> Thanks -cY
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: HLM and Sample Size (sub-groups)

Bruce Weaver
Administrator
The 3rd edition of Hox's book (with Moerbeek & Schoot) says a bit more about
the effects of group size.  The following excerpt is from pp. 215-216.  Look
for the sentence starting with, "Theall et al. (2011) studied the effects of
small group sizes."

--- start of excerpt ---
It is clear that with increasing sample sizes at all levels, estimates and
their standard errors become more accurate. Kreft (1996) suggests a rule of
thumb, which she calls the ‘30/30 rule.’ To be on the safe side, researchers
should strive for a sample of at least 30 groups with at least 30
individuals per group. From various simulations, this seems sound advice if
the interest is mostly in the fixed parameters. However, it seems that this
rule will likely not yield high power levels for fixed effects at both
levels (Bell et al. 2014). For certain applications, one may modify this
rule of thumb. Specifically, if there is strong interest in cross-level
interactions, the number of groups should be larger, which leads to a 50/20
rule: about 50 groups with about 20 individuals per group. If there is
strong interest in the random part, the variance and covariance components
and their standard errors, the number of groups should be considerably
larger, which leads to a 100/10 rule: about 100 groups with at least about
10 individuals per group. Theall et al. (2011) studied the effects of small
group sizes (i.e. less than five). They found that when the number of groups
was as large as 459, the fixed and random effects were not affected by group
size. When the number of groups decreased, inflated standard errors of fixed
and random effects were found. Group-level variance estimates were more
inflated than fixed effects. Raudenbush (2008) also treats the case of many
small groups. Many small groups always arise when the object of study is
twins, married couples, families, or short time series (Raudenbush, 2008, p.
215). The general advice is to keep the model simple, with few random
components at the second level. The exception are short time series, where
often relatively much and reliable variation is found at the subject level
(Raudenbush, 2008, p. 218).

When the number of groups is smaller than 20, fixed parameter estimates and
their standard errors become inaccurate. When the interest is in variance
components, as in structural equation modeling (SEM), the minimum number of
groups is 50 (Meuleman & Billiet, 2009). Hox, van de Schoot and Mattijsse
(2012) show that with Bayesian estimation, SEM with as few as 20 groups is
feasible. We refer to McNeish and Stapleton (2016) for a general review of
the problems associated with having a small number of groups.

These rules of thumb take into account that there are costs attached to data
collection, so if the number of groups is increased, the number of
individuals per group decreases. In some cases, this may not be a realistic
reflection of costs. For instance, in school research an extra cost will be
incurred when an extra class is included. Testing only part of the class
instead of all pupils will usually not make much difference in the data
collection cost. Given a limited budget, an optimal design should reflect
the various costs of data collection. Snijders and Bosker (1993), Cohen
(1998), Raudenbush and Liu (2000) and Moerbeek, van Breukelen and Berger
(2000) all discuss the problem of choosing sample sizes at two levels while
considering costs. Moerbeek, van Breukelen and Berger (2001) discuss the
problem of optimal design for multilevel logistic models. Essentially,
optimal design is a question of balancing statistical power against data
collection costs. Data collection costs depend on the details of the data
collection method. The problem of estimating power in multilevel designs is
treated later in this chapter.
--- end of excerpt ---

Hox, J. J., Moerbeek, M., & Van de Schoot, R. (2017). Multilevel analysis:
Techniques and applications. Routledge.


Bruce Weaver wrote
> I had not heard of that 30-30 rule of thumb, but discovered that Joop Hox
> mentions it in the second edition (2010) of his book, Multilevel Analysis.
> Here's what he says on p. 235.
>
> --- snip





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: HLM and Sample Size (sub-groups)

Art Kendall
There are strong analogies between complex samples and HLM on a
logic/procedure level.

Results of using HLM and complex samples often give strongly similar
results.

I do not know when and if they result in different substantive conclusions.

I would like to see some list members who use one of these approaches also
use the other.

That being said from a complex samples approach, finite population
corrections can make a big difference in sample size.

It is not possible to have a random sample of 30 from a population of 10.
For example, there are only 5 permanent members of the UN security council.
There are not 30 US states that have sea coasts.  There are only 10 Canadian
Provinces, etc.  In dna  on pair 23 it is only possible to have so many
combinations of X, Y, x, and y in a trisomy.



-----
Art Kendall
Social Research Consultants
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants