SPSSX Discussion

Mixed model for data that are not normally distributed.

Classic

List

Threaded

9 messages Options

Wienta Diarsvitri

Mixed model for data that are not normally distributed.

Hello all,

I have finished my research project. It was a cluster randomized trial. I want to compare the changes of post test and pre test score between intervention group & control group. I planned to use spss mixed model. Unfortunately my data are not normally distributed. If I use Friedman test, I cannot control for the pre etst & other covariates.

Do you have any suggestion?

Thank you very much for your help.

Sincerely,

Wienta

Make your browsing faster, safer, and easier with the new Internet Explorer® 8. Optimized for Yahoo! Get it Now for Free!

Maguin, Eugene

Re: Mixed model for data that are not normally distributed.

Wienta,

Please describe in what way the data are not normally distributed and to
what extent. I am assuming that you do not have dichotomous DVs but I just
want to ask. Are the DVs Likert scale items? If so, are you wanting to
analyze them as ordinal variables? Are the DV distributions J-shaped, i.e.,
high and rapidly declining percentages for low values but with a (very)long
tail?

Within spss, I think you have no alternative but spss Mixed because you have
time nested within persons and persons nested within cluster (assuming I
have understood you design correctly). It may be possible to transform your
data to get better distributions but you'll need to consider how transformed
results will play in your publishing venue. Spss has increased their
capacity for categorical data through the Genlin procedure but I'm not sure
it can handle a problem such as yours. I may be wrong; perhaps others will
correct me if so.

Gene Maguin

>>I have finished my research project. It was a cluster randomized trial. I
want to compare the changes of post test and pre test score between
intervention group & control group. I planned to use spss mixed model.
Unfortunately my data are not normally distributed. If I use Friedman test,
I cannot control for the pre etst & other covariates.
Do you have any suggestion?

Thank you very much for your help.

Sincerely,
Wienta

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

statisticsdoc

Re: Mixed model for data that are not normally distributed.

Wienta,
If you need to model binary or ordinal outcomes in multilevel modeling, you may want to look at HLM software.
Best,
Steve Brand
www.StatisticsDoc.com

-----Original Message-----
From: Gene Maguin <[hidden email]>
Date: Mon, 11 Jan 2010 09:38:15
To: <[hidden email]>
Subject: Re: Mixed model for data that are not normally distributed.

Wienta,

Please describe in what way the data are not normally distributed and to
what extent. I am assuming that you do not have dichotomous DVs but I just
want to ask. Are the DVs Likert scale items? If so, are you wanting to
analyze them as ordinal variables? Are the DV distributions J-shaped, i.e.,
high and rapidly declining percentages for low values but with a (very)long
tail?

Within spss, I think you have no alternative but spss Mixed because you have
time nested within persons and persons nested within cluster (assuming I
have understood you design correctly). It may be possible to transform your
data to get better distributions but you'll need to consider how transformed
results will play in your publishing venue. Spss has increased their
capacity for categorical data through the Genlin procedure but I'm not sure
it can handle a problem such as yours. I may be wrong; perhaps others will
correct me if so.

Gene Maguin

>>I have finished my research project. It was a cluster randomized trial. I
want to compare the changes of post test and pre test score between
intervention group & control group. I planned to use spss mixed model.
Unfortunately my data are not normally distributed. If I use Friedman test,
I cannot control for the pre etst & other covariates.
Do you have any suggestion?

Thank you very much for your help.

Sincerely,
Wienta

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Maguin, Eugene

Re: Mixed model for data that are not normally distributed.

In reply to this post by Maguin, Eugene

Wienta,

Ok. Negative skew. A power transformation, squared, cubed, etc stretches the
upper end of the distribution and reduces skew. I'd like to suggest that you
try out different transformations as well as the untransformed data. It may
be that the coefficient estimates and standard errors are little affected by
the skew.

You have a three level model. There was a reply today to an earlier message
by a different poster and that bears on your planned analysis. You may have
seen it. 16 level 3 units (schools) would, I think, be regarded as too few
to give good estimates of coefficients and standard errors.

Gene Maguin

>>My research was carried out among grade 11 students in 16 senior high
schools in 2 provinces in Indonesia. 8 schools are in the intervention group
& the other 8 schools are in the control group. There are 1079 students
participated in my research. I gave pre test in January 2009 & post test in
March 2009.

Yes, my dependent variables are originally in Likert scale (knowledge,
attitude & behaviour intent tests). I take the total scores of each test and
test their normality. The data are negatively skewed.

Wienta

>>>Wienta,

Please describe in what way the data are not normally distributed and to
what extent. I am assuming that you do not have dichotomous DVs but I just
want to ask. Are the DVs Likert scale items? If so, are you wanting to
analyze them as ordinal variables? Are the DV distributions J-shaped, i.e.,
high and rapidly declining percentages for low values but with a (very)long
tail?

Within spss, I think you have no alternative but spss Mixed because you have
time nested within persons and persons nested within cluster (assuming I
have understood you design correctly). It may be possible to transform your
data to get better distributions but you'll need to consider how transformed
results will play in your publishing venue. Spss has increased their
capacity for categorical data through the Genlin procedure but I'm not sure
it can handle a problem such as yours. I may be wrong; perhaps others will
correct me if so.

Gene Maguin

>>>>I have finished my research project. It was a cluster randomized trial.
I
want to compare the changes of post test and pre test score between
intervention group & control group. I planned to use spss mixed model.
Unfortunately my data are not normally distributed. If I use Friedman test,
I cannot control for the pre etst & other covariates.
Do you have any suggestion?

Thank you very much for your help.

Sincerely,
Wienta

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Ryan

Re: Mixed model for data that are not normally distributed.

Gene,

You and another poster yesterday raised an interesting point about the required sample size at the higher levels (i.e. communities) with the higher level subjects variable treated as a random effects variable. I have seen references that advise at least 20 or even 30 subjects at each of the higher levels. In my own work, I've gone as low as ~10 higher level units with the understanding that I am below the typically recommended sample size. Determining whether a sample size is appropriate is, I believe, in part study specific. It is certainly possible to obtain reliable estimates at the smaller sample sizes, but one should proceed with caution. I welcome further discussion on this topic.

For the specific analysis in question, I'd consider running a linear mixed model [if assumptions are met], and I'd probably leave out Province at least initially and include a REPEATED or RANDOM statement to account for covariation across both time points, and include a random intercept for School. This, of course, is not the "full" model but I won't go beyond this recommendation until I know more about the data.

Ryan

Gene Maguin wrote

Wienta,

Ok. Negative skew. A power transformation, squared, cubed, etc stretches the
upper end of the distribution and reduces skew. I'd like to suggest that you
try out different transformations as well as the untransformed data. It may
be that the coefficient estimates and standard errors are little affected by
the skew.

You have a three level model. There was a reply today to an earlier message
by a different poster and that bears on your planned analysis. You may have
seen it. 16 level 3 units (schools) would, I think, be regarded as too few
to give good estimates of coefficients and standard errors.

Gene Maguin

>>My research was carried out among grade 11 students in 16 senior high
schools in 2 provinces in Indonesia. 8 schools are in the intervention group
& the other 8 schools are in the control group. There are 1079 students
participated in my research. I gave pre test in January 2009 & post test in
March 2009.

Yes, my dependent variables are originally in Likert scale (knowledge,
attitude & behaviour intent tests). I take the total scores of each test and
test their normality. The data are negatively skewed.

Wienta

>>>Wienta,

Please describe in what way the data are not normally distributed and to
what extent. I am assuming that you do not have dichotomous DVs but I just
want to ask. Are the DVs Likert scale items? If so, are you wanting to
analyze them as ordinal variables? Are the DV distributions J-shaped, i.e.,
high and rapidly declining percentages for low values but with a (very)long
tail?

Within spss, I think you have no alternative but spss Mixed because you have
time nested within persons and persons nested within cluster (assuming I
have understood you design correctly). It may be possible to transform your
data to get better distributions but you'll need to consider how transformed
results will play in your publishing venue. Spss has increased their
capacity for categorical data through the Genlin procedure but I'm not sure
it can handle a problem such as yours. I may be wrong; perhaps others will
correct me if so.

Gene Maguin

>>>>I have finished my research project. It was a cluster randomized trial.
I
want to compare the changes of post test and pre test score between
intervention group & control group. I planned to use spss mixed model.
Unfortunately my data are not normally distributed. If I use Friedman test,
I cannot control for the pre etst & other covariates.
Do you have any suggestion?

Thank you very much for your help.

Sincerely,
Wienta

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

statisticsdoc

Re: Mixed model for data that are not normally distributed.

Wienta,
Sample size in HLM is a complex issue. With 16 schools, your estimates of between school effects would be biased (standard errors would be too small) but your interests probably lie in the lower level effects. Would I be correct in assuming that you are testing the hypothesis that level 1 change in the dependent variable differs between the experimental and comparison schools? For some guidelines and possible solutions you might want to start with Maas and Hox (2005) in the journal Methodology.
HTH
Steve Brand
www.StatisticsDoc.com

-----Original Message-----
From: rblack <[hidden email]>
Date: Tue, 12 Jan 2010 05:49:42
To: <[hidden email]>
Subject: Re: Mixed model for data that are not normally distributed.

Gene,

You and another poster yesterday raised an interesting point about the
required sample size at the higher levels (i.e. communities) with the higher
level subjects variable treated as a random effects variable. I have seen
references that advise at least 20 or even 30 subjects at each of the higher
levels. In my own work, I've gone as low as ~10 higher level units with the
understanding that I am below the typically recommended sample size.
Determining whether a sample size is appropriate is, I believe, in part
study specific. It is certainly possible to obtain reliable estimates at the
smaller sample sizes, but one should proceed with caution. I welcome further
discussion on this topic.

For the specific analysis in question, I'd consider running a linear mixed
model [if assumptions are met], and I'd probably leave out Province at least
initially and include a REPEATED or RANDOM statement to account for
covariation across both time points, and include a random intercept for
School. This, of course, is not the "full" model but I won't go beyond this
recommendation until I know more about the data.

Ryan

Gene Maguin wrote:

>
> Wienta,
>
> Ok. Negative skew. A power transformation, squared, cubed, etc stretches
> the
> upper end of the distribution and reduces skew. I'd like to suggest that
> you
> try out different transformations as well as the untransformed data. It
> may
> be that the coefficient estimates and standard errors are little affected
> by
> the skew.
>
> You have a three level model. There was a reply today to an earlier
> message
> by a different poster and that bears on your planned analysis. You may
> have
> seen it. 16 level 3 units (schools) would, I think, be regarded as too few
> to give good estimates of coefficients and standard errors.
>
> Gene Maguin
>
>
>
>>>My research was carried out among grade 11 students in 16 senior high
> schools in 2 provinces in Indonesia. 8 schools are in the intervention
> group
> & the other 8 schools are in the control group. There are 1079 students
> participated in my research. I gave pre test in January 2009 & post test
> in
> March 2009.
>
> Yes, my dependent variables are originally in Likert scale (knowledge,
> attitude & behaviour intent tests). I take the total scores of each test
> and
> test their normality. The data are negatively skewed.
>
> Wienta
>
>
>>>>Wienta,
>
> Please describe in what way the data are not normally distributed and to
> what extent. I am assuming that you do not have dichotomous DVs but I just
> want to ask. Are the DVs Likert scale items? If so, are you wanting to
> analyze them as ordinal variables? Are the DV distributions J-shaped,
> i.e.,
> high and rapidly declining percentages for low values but with a
> (very)long
> tail?
>
> Within spss, I think you have no alternative but spss Mixed because you
> have
> time nested within persons and persons nested within cluster (assuming I
> have understood you design correctly). It may be possible to transform
> your
> data to get better distributions but you'll need to consider how
> transformed
> results will play in your publishing venue. Spss has increased their
> capacity for categorical data through the Genlin procedure but I'm not
> sure
> it can handle a problem such as yours. I may be wrong; perhaps others will
> correct me if so.
>
> Gene Maguin
>
>
>>>>>I have finished my research project. It was a cluster randomized trial.
> I
> want to compare the changes of post test and pre test score between
> intervention group & control group. I planned to use spss mixed model.
> Unfortunately my data are not normally distributed. If I use Friedman
> test,
> I cannot control for the pre etst & other covariates.
> Do you have any suggestion?
>
> Thank you very much for your help.
>
> Sincerely,
> Wienta
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>

--
View this message in context: http://old.nabble.com/Mixed-model-for-data-that-are-not-normally-distributed.-tp27105939p27127804.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Bruce Weaver

Re: Mixed model for data that are not normally distributed.

Administrator

In reply to this post by Ryan

rblack wrote

Gene,

You and another poster yesterday raised an interesting point about the required sample size at the higher levels (i.e. communities) with the higher level subjects variable treated as a random effects variable. I have seen references that advise at least 20 or even 30 subjects at each of the higher levels. In my own work, I've gone as low as ~10 higher level units with the understanding that I am below the typically recommended sample size. Determining whether a sample size is appropriate is, I believe, in part study specific. It is certainly possible to obtain reliable estimates at the smaller sample sizes, but one should proceed with caution. I welcome further discussion on this topic.

For the specific analysis in question, I'd consider running a linear mixed model [if assumptions are met], and I'd probably leave out Province at least initially and include a REPEATED or RANDOM statement to account for covariation across both time points, and include a random intercept for School. This, of course, is not the "full" model but I won't go beyond this recommendation until I know more about the data.

Ryan

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

statisticsdoc

Re: Mixed model for data that are not normally distributed.

Bruce.
Maas and Hox (2005) suggest that 50 level 2 units are needed for accurate estimation of level 2 effects, but lower level parameters are OK with fewer level 2 units. This recommendation is based on simulation studies. However, they also suggest that goof lower level estimates can be obtained with fewer higher level units.
Best
Steve
www.StatisticsDoc.com

-----Original Message-----
From: Bruce Weaver <[hidden email]>
Date: Tue, 12 Jan 2010 10:43:18
To: <[hidden email]>
Subject: Re: Mixed model for data that are not normally distributed.

rblack wrote:

>
> Gene,
>
> You and another poster yesterday raised an interesting point about the
> required sample size at the higher levels (i.e. communities) with the
> higher level subjects variable treated as a random effects variable. I
> have seen references that advise at least 20 or even 30 subjects at each
> of the higher levels. In my own work, I've gone as low as ~10 higher level
> units with the understanding that I am below the typically recommended
> sample size. Determining whether a sample size is appropriate is, I
> believe, in part study specific. It is certainly possible to obtain
> reliable estimates at the smaller sample sizes, but one should proceed
> with caution. I welcome further discussion on this topic.
>
> For the specific analysis in question, I'd consider running a linear mixed
> model [if assumptions are met], and I'd probably leave out Province at
> least initially and include a REPEATED or RANDOM statement to account for
> covariation across both time points, and include a random intercept for
> School. This, of course, is not the "full" model but I won't go beyond
> this recommendation until I know more about the data.
>
> Ryan
>
>

I posted somewhere recently that Snijders & Bosker (1999) use 10 groups as a
dividing line in their rule of thumb for choosing between fixed vs. random
effects. Here is the excerpt:

"In order to choose between regarding the group-dependent intercepts U_0j as
fixed statistical parameters and regarding them as random variables, a rule
of thumb that often works well in educational and social research is the
following. This rule mainly depends on N, the number of groups in the data.
If N is small, say N < 10, then use the analysis of covariance approach: the
problem with viewing the groups as a sample from a population is in this
case, that the data will contain only scanty information about this
population. If N is not small, say N >= 10, while n_j is small or
intermediate, say n_j < 100, then use the random coefficient approach: 10 or
more groups is usually too large a number to be regarded as unique entities.
If the group sizes n_j are large, say n_j >= 100, then it does not matter
much which view we take. However, this rule of thumb should be take [sic]
with a large grain of salt and serves only to give a first hunch, not to
determine the choice between fixed and random effects." (Snijders & Bosker,
1999, p. 44)

Gene commented that 16 schools would be considered too few higher level
units, and I think another poster suggested that anything less than 20 or 30
higher level units is too few. So is the advice from Snijders & Bosker out
of step with current thinking in the multilevel community?

-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/
"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.
--
View this message in context: http://old.nabble.com/Mixed-model-for-data-that-are-not-normally-distributed.-tp27105939p27131839.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Bruce Weaver

Re: Mixed model for data that are not normally distributed.

Administrator

Thanks Steve. I'll have to take a look at that sometime.
Cheers,
Bruce

statisticsdoc wrote

Bruce.
Maas and Hox (2005) suggest that 50 level 2 units are needed for accurate estimation of level 2 effects, but lower level parameters are OK with fewer level 2 units. This recommendation is based on simulation studies. However, they also suggest that goof lower level estimates can be obtained with fewer higher level units.
Best
Steve
www.StatisticsDoc.com

-----Original Message-----
From: Bruce Weaver <bruce.weaver@hotmail.com>
Date: Tue, 12 Jan 2010 10:43:18
To: <SPSSX-L@LISTSERV.UGA.EDU>
Subject: Re: Mixed model for data that are not normally distributed.

rblack wrote:
>
> Gene,
>
> You and another poster yesterday raised an interesting point about the
> required sample size at the higher levels (i.e. communities) with the
> higher level subjects variable treated as a random effects variable. I
> have seen references that advise at least 20 or even 30 subjects at each
> of the higher levels. In my own work, I've gone as low as ~10 higher level
> units with the understanding that I am below the typically recommended
> sample size. Determining whether a sample size is appropriate is, I
> believe, in part study specific. It is certainly possible to obtain
> reliable estimates at the smaller sample sizes, but one should proceed
> with caution. I welcome further discussion on this topic.
>
> For the specific analysis in question, I'd consider running a linear mixed
> model [if assumptions are met], and I'd probably leave out Province at
> least initially and include a REPEATED or RANDOM statement to account for
> covariation across both time points, and include a random intercept for
> School. This, of course, is not the "full" model but I won't go beyond
> this recommendation until I know more about the data.
>
> Ryan
>
>

I posted somewhere recently that Snijders & Bosker (1999) use 10 groups as a
dividing line in their rule of thumb for choosing between fixed vs. random
effects. Here is the excerpt:

"In order to choose between regarding the group-dependent intercepts U_0j as
fixed statistical parameters and regarding them as random variables, a rule
of thumb that often works well in educational and social research is the
following. This rule mainly depends on N, the number of groups in the data.
If N is small, say N < 10, then use the analysis of covariance approach: the
problem with viewing the groups as a sample from a population is in this
case, that the data will contain only scanty information about this
population. If N is not small, say N >= 10, while n_j is small or
intermediate, say n_j < 100, then use the random coefficient approach: 10 or
more groups is usually too large a number to be regarded as unique entities.
If the group sizes n_j are large, say n_j >= 100, then it does not matter
much which view we take. However, this rule of thumb should be take [sic]
with a large grain of salt and serves only to give a first hunch, not to
determine the choice between fixed and random effects." (Snijders & Bosker,
1999, p. 44)

Gene commented that 16 schools would be considered too few higher level
units, and I think another poster suggested that anything less than 20 or 30
higher level units is too few. So is the advice from Snijders & Bosker out
of step with current thinking in the multilevel community?

-----
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/
"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.
--
View this message in context: http://old.nabble.com/Mixed-model-for-data-that-are-not-normally-distributed.-tp27105939p27131839.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD