SPSSX Discussion

ANOVA with more than 50 groups

Classic

List

Threaded

13 messages Options

Maia Chankseliani

ANOVA with more than 50 groups

Dear all,
First, let me say hello to the listserv members, as I have recently joined the group.

I am trying to establish if average test scores for students from 70 districts differ significantly. When I run ANOVA on SPSS, I receive a warning that Post hoc tests cannot be performed as I have more than 50 groups. Is there any other option you can suggest?

Thanks!
Maia

Bruce Weaver

Re: ANOVA with more than 50 groups

Administrator

Maia Chankseliani wrote

Dear all,
First, let me say hello to the listserv members, as I have recently joined
the group.

I am trying to establish if average test scores for students from 70
districts differ significantly. When I run ANOVA on SPSS, I receive a
warning that Post hoc tests cannot be performed as I have more than 50
groups. Is there any other option you can suggest?

Thanks!
Maia

Just out of curiosity, what kind of multiple comparison technique were you planning on using? With that many groups, the per contrast alpha for any procedure will be vanishingly small, I should think.

Have you taken a look at using a multilevel model rather than ANOVA? One problem with ANOVA is that the group variable leaves nothing to be explained by higher level explanatory variables (e.g., district level variables). If you're not familiar with multilevel models, I recommend Jos Twisk's "Applied Multilevel Analysis"--I found it very accessible and helpful. You can preview it on Google Books.

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Maia Chankseliani

Re: ANOVA with more than 50 groups

Thanks, Bruce & Nath.
I am planning to use the hierarchical modelling. However, before doing that I am interested in observing the relationships between all pairs of outcome-predictor variables. That's the reason that I am using ANOVA, T-test, correlations and chi-square.

Also, I do not really want to group my districts, as they are different geographic units. grouping results in 10 categories (regions) for which I easily use ANOVA.

Thus, is there no chance of establishing the significance of differences in the mean test scores for 70 districts?

On Mon, Oct 4, 2010 at 5:01 PM, Bruce Weaver <[hidden email]> wrote:

Maia Chankseliani wrote:
>
> Dear all,
> First, let me say hello to the listserv members, as I have recently joined
> the group.
>
> I am trying to establish if average test scores for students from 70
> districts differ significantly. When I run ANOVA on SPSS, I receive a
> warning that Post hoc tests cannot be performed as I have more than 50
> groups. Is there any other option you can suggest?
>
> Thanks!
> Maia
>
>

Just out of curiosity, what kind of multiple comparison technique were you
planning on using? With that many groups, the per contrast alpha for any
procedure will be vanishingly small, I should think.

Have you taken a look at using a multilevel model rather than ANOVA? One
problem with ANOVA is that the group variable leaves nothing to be explained
by higher level explanatory variables (e.g., district level variables). If
you're not familiar with multilevel models, I recommend Jos Twisk's "Applied
Multilevel Analysis"--I found it very accessible and helpful. You can
preview it on Google Books.

-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/ANOVA-with-more-than-50-groups-tp3173003p3173329.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Bruce Weaver

Re: ANOVA with more than 50 groups

Administrator

Maia Chankseliani wrote

Thanks, Bruce & Nath.
I am planning to use the hierarchical modelling. However, before doing that
I am interested in observing the relationships between all pairs of
outcome-predictor variables. That's the reason that I am using ANOVA,
T-test, correlations and chi-square.

Also, I do not really want to group my districts, as they are different
geographic units. grouping results in 10 categories (regions) for which I
easily use ANOVA.

Thus, is there no chance of establishing the significance of differences in
the mean test scores for 70 districts?

C(70,2) = 2415. Carrying out that many contrasts seems ill-advised to me.

Granaas, Michael

Re: ANOVA with more than 50 groups

Maia Chankseliani wrote:
>
> Thus, is there no chance of establishing the significance of differences in the mean test scores for 70 districts?
>

To which bruce weaver replied:

>C(70,2) = 2415. Carrying out that many contrasts seems ill-advised to me.

I add:

Bruce is correct, of course. Imagine using pairwise t-tests which will result in .05 * 2415 ~ 121 false positives assuming no real differences.

Correcting for the large number of t-tests with a Bonferroni correction results in an adjusted alpha of about 0.0000207 which might protect from false alarms, but would you actuallybe able to detect any real differences?

Either way I don't see anything meaningful resulting from such an analysis.

Michael

****************************************************
Michael Granaas [hidden email]
Assoc. Prof. Phone: 605 677 5295
Dept. of Psychology FAX: 605 677 3195
University of South Dakota
414 E. Clark St.
Vermillion, SD 57069
*****************************************************
________________________________________
From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Bruce Weaver [[hidden email]]
Sent: Monday, October 04, 2010 11:19 AM
To: [hidden email]
Subject: Re: ANOVA with more than 50 groups

Maia Chankseliani wrote:

>
> Thanks, Bruce & Nath.
> I am planning to use the hierarchical modelling. However, before doing
> that
> I am interested in observing the relationships between all pairs of
> outcome-predictor variables. That's the reason that I am using ANOVA,
> T-test, correlations and chi-square.
>
> Also, I do not really want to group my districts, as they are different
> geographic units. grouping results in 10 categories (regions) for which I
> easily use ANOVA.
>
> Thus, is there no chance of establishing the significance of differences
> in
> the mean test scores for 70 districts?
>
>

C(70,2) = 2415. Carrying out that many contrasts seems ill-advised to me.

-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/ANOVA-with-more-than-50-groups-tp3173003p3183923.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Steve Simon, P.Mean Consulting

Re: ANOVA with more than 50 groups

In reply to this post by Maia Chankseliani

On 10/4/2010 3:27 AM, Maia Chankseliani wrote:

> I am trying to establish if average test scores for students from 70
> districts differ significantly. When I run ANOVA on SPSS, I receive a
> warning that Post hoc tests cannot be performed as I have more than 50
> groups. Is there any other option you can suggest?

In addition to the many good comments made already, let me suggest that
perhaps comparing each district to the overall mean of all districts
might be an interesting alternative. This is the approach taken by ANOM
(Analysis of Means). I don't think you can do this directly in SPSS, but
perhaps someone will prove me wrong.
--
Steve Simon, Standard Disclaimer
Sign up for The Monthly Mean, the newsletter that
dares to call itself "average" at www.pmean.com/news

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Bruce Weaver

Re: ANOVA with more than 50 groups

Administrator

Steve Simon, P.Mean Consulting wrote

On 10/4/2010 3:27 AM, Maia Chankseliani wrote:

> I am trying to establish if average test scores for students from 70
> districts differ significantly. When I run ANOVA on SPSS, I receive a
> warning that Post hoc tests cannot be performed as I have more than 50
> groups. Is there any other option you can suggest?

In addition to the many good comments made already, let me suggest that
perhaps comparing each district to the overall mean of all districts
might be an interesting alternative. This is the approach taken by ANOM
(Analysis of Means). I don't think you can do this directly in SPSS, but
perhaps someone will prove me wrong.

Hi Steve. You can get something pretty close to that by using a "deviation" contrast. Here's the Help file description of it:

DEVIATION. Deviations from the grand mean. This is the default for factors. Each level of the factor except one is compared to the grand mean. One category (by default, the last) must be omitted so that the effects will be independent of one another. To omit a category other than the last, specify the number of the omitted category (which is not necessarily the same as its value) in parentheses after the keyword DEVIATION. For example,

UNIANOVA Y BY B /CONTRAST(B)=DEVIATION(1).

To obtain what you're describing, you could run it a second time, changing which category is omitted.

This still seems like an awful lot of contrasts to me, though. And I wonder if the OP will run into the same limit of 50 that they described earlier.

Cheers,
Bruce

Maia Chankseliani

Re: ANOVA with more than 50 groups

Thanks for the emails.
I could not figure out how to do the ANOM or the deviation-related procedure on SPSS.
What I tried is ANALYZE - COMPARE MEANS - MEANS. This procedure displayed mean test scores for all the 70 districts. Also, performed ANOVA and demonstrated that they do differ significantly. What would love to be able to do a post hoc test - to find out which districts differ significantly from one another and/or the mean nationwide score. At this stage, however, I needed to know that districts do differ. Later, when I include the district together with other variables in the hierarchical model, I hope to learn more.

Thanks again.
Best,
Maia

On Tue, Oct 5, 2010 at 10:19 PM, Bruce Weaver <[hidden email]> wrote:

Steve Simon, P.Mean Consulting wrote:
>
> On 10/4/2010 3:27 AM, Maia Chankseliani wrote:
>
>> I am trying to establish if average test scores for students from 70
>> districts differ significantly. When I run ANOVA on SPSS, I receive a
>> warning that Post hoc tests cannot be performed as I have more than 50
>> groups. Is there any other option you can suggest?
>
> In addition to the many good comments made already, let me suggest that
> perhaps comparing each district to the overall mean of all districts
> might be an interesting alternative. This is the approach taken by ANOM
> (Analysis of Means). I don't think you can do this directly in SPSS, but
> perhaps someone will prove me wrong.
>
>

Hi Steve. You can get something pretty close to that by using a "deviation"
contrast. Here's the Help file description of it:

DEVIATION. Deviations from the grand mean. This is the default for factors.
Each level of the factor except one is compared to the grand mean. One
category (by default, the last) must be omitted so that the effects will be
independent of one another. To omit a category other than the last, specify
the number of the omitted category (which is not necessarily the same as its
value) in parentheses after the keyword DEVIATION. For example,

UNIANOVA Y BY B /CONTRAST(B)=DEVIATION(1).

To obtain what you're describing, you could run it a second time, changing
which category is omitted.

This still seems like an awful lot of contrasts to me, though. And I wonder
if the OP will run into the same limit of 50 that they described earlier.

Cheers,
Bruce

-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--

View this message in context: http://spssx-discussion.1045642.n5.nabble.com/ANOVA-with-more-than-50-groups-tp3173003p3200155.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================

To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Bruce Weaver

Re: ANOVA with more than 50 groups

Administrator

Maia Chankseliani wrote

Thanks for the emails.
I could not figure out how to do the ANOM or the deviation-related procedure
on SPSS.
What I tried is ANALYZE - COMPARE MEANS - MEANS. This procedure displayed
mean test scores for all the 70 districts. Also, performed ANOVA and
demonstrated that they do differ significantly. What would love to be able
to do a post hoc test - to find out which districts differ significantly
from one another and/or the mean nationwide score. At this stage, however, I
needed to know that districts do differ. Later, when I include the district
together with other variables in the hierarchical model, I hope to learn
more.

Thanks again.
Best,
Maia

You are using the MEANS procedure to perform the ANOVA. It does not include contrasts or other multiple comparison procedures. To perform the DEVIANCE contrasts I described, use UNIANOVA. In the menus:

Analyze - General Linear Model - Univariate

The independent variable goes in the Fixed Factor box. Then click on the Contrasts button to open a dialog that lets you select Deviation contrasts. Run it twice, the first time, run it with the last category as the reference category, and the next time with the first category as the reference category.

But...as I mentioned in my earlier post, this still seems like an awful lot of contrasts to me (70 of them). If you do a Bonferroni adjustment, the per contrast alpha is .05/70 = .0007. (I.e., you would only declare a contrast statistically significant if p is less than or equal to .0007.)

I don't remember you saying anything about where your research falls on the exploratory to confirmatory spectrum. If it is purely exploratory, you can be a bit less careful about adjusting the per contrast alpha. E.g., if you set your per contrast alpha at .001, the family-wise alpha would be in the vicinity of .07, which is a bit higher than the usual .05, but not ridiculously so. I doubt there would be any serious objections.

And finally, it would not surprise me if you run into the same limit you mentioned before--i.e., that contrasts can be done for only 50 levels or less.

Gosse, Michelle

Re: ANOVA with more than 50 groups [Sec: UNCLASSIFIED]

In reply to this post by Maia Chankseliani

Hi Maia,

Is there any reason to expect that average test scores across 70 districts won’t differ? With that number, and assuming varying degrees of social disadvantage, language skills, etc, I would find it unusual if there weren’t any differences.

Would you be more interested in trying to find a model of why the test scores differ, rather than looking to see if they do?

Cheers

Michelle

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Maia Chankseliani
Sent: Monday, 4 October 2010 9:27 p.m.
To: [hidden email]
Subject: ANOVA with more than 50 groups

**********************************************************************

This email and any files transmitted with it are confidential and

intended solely for the use of the individual or entity to whom they

are addressed. If you have received this email in error please notify

the system manager.

This footnote also confirms that this email message has been swept by

MIMEsweeper for the presence of computer viruses.

www.clearswift.com

**********************************************************************

Ryan

Re: ANOVA with more than 50 groups

In reply to this post by Bruce Weaver

Maia,

Why is it so important to test for mean differences between 70 districts? Do you view these districts as a [random] sample from which you would like to make inferences about the larger population of districts? What is/are your dependent variable(s)? What are your predictors? At what level are your variables measured (e.g. student, school, district)? How were these data collected? Do you have repeated measures? What is the approximate distribution of your dependent variable? What are your research questions? What are the sample sizes at each level (e.g., number of students, number of schools, number of districts, etc.)?

These are some of the questions I would need answered before providing a recommendation. Based on what I have read, I am not convinced that a general linear model (e.g. ANOVA) is the optimal approach.

Ryan

On Tue, Oct 5, 2010 at 4:23 PM, Bruce Weaver <[hidden email]> wrote:

Maia Chankseliani wrote:
>
> Thanks for the emails.
> I could not figure out how to do the ANOM or the deviation-related
> procedure
> on SPSS.
> What I tried is ANALYZE - COMPARE MEANS - MEANS. This procedure displayed
> mean test scores for all the 70 districts. Also, performed ANOVA and
> demonstrated that they do differ significantly. What would love to be able
> to do a post hoc test - to find out which districts differ significantly
> from one another and/or the mean nationwide score. At this stage, however,
> I
> needed to know that districts do differ. Later, when I include the
> district
> together with other variables in the hierarchical model, I hope to learn
> more.
>
> Thanks again.
> Best,
> Maia
>

You are using the MEANS procedure to perform the ANOVA. It does not include
contrasts or other multiple comparison procedures. To perform the DEVIANCE
contrasts I described, use UNIANOVA. In the menus:

Analyze - General Linear Model - Univariate

The independent variable goes in the Fixed Factor box. Then click on the
Contrasts button to open a dialog that lets you select Deviation contrasts.
Run it twice, the first time, run it with the last category as the reference
category, and the next time with the first category as the reference
category.

But...as I mentioned in my earlier post, this still seems like an awful lot
of contrasts to me (70 of them). If you do a Bonferroni adjustment, the per
contrast alpha is .05/70 = .0007. (I.e., you would only declare a contrast
statistically significant if p is less than or equal to .0007.)

I don't remember you saying anything about where your research falls on the
exploratory to confirmatory spectrum. If it is purely exploratory, you can
be a bit less careful about adjusting the per contrast alpha. E.g., if you
set your per contrast alpha at .001, the family-wise alpha would be in the
vicinity of .07, which is a bit higher than the usual .05, but not
ridiculously so. I doubt there would be any serious objections.

And finally, it would not surprise me if you run into the same limit you
mentioned before--i.e., that contrasts can be done for only 50 levels or
less.

-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/ANOVA-with-more-than-50-groups-tp3173003p3200362.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Dale Glaser

Error bar for profile plot from mixed ANOVA?

In reply to this post by Bruce Weaver

I have what I thought would be a simple graphic, but am having troubles finding a solution. I have a 2 (group) x 7 (time) mixed ANOVA, and I get the graph I desire via the profile plot, but I want to append an error bar for each of the group means.....apparently that is not possible within the profile plot graphic..I tried other options (line plot, chart builder, etc) but coudn't find a suitable solution....any suggestions would be most appreciated.

Thank you.

Dale Glaser, Ph.D.
Principal--Glaser Consulting
Lecturer/Adjunct Faculty--SDSU/USD/Alliant
Past-President, San Diego Chapter of
American Statistical Association
3115 4th Avenue
San Diego, CA 92103
phone: 619-220-0602
fax: 619-220-0412
email: [hidden email]
website: www.glaserconsult.com

Bruce Weaver

Re: Error bar for profile plot from mixed ANOVA?

Administrator

Dale Glaser wrote

I have what I thought would be a simple graphic, but am having troubles finding a solution. I have a 2 (group) x 7 (time) mixed ANOVA, and I get the graph I desire via the profile plot, but I want to append an error bar for each of the group means.....apparently that is not possible within the profile plot graphic..I tried other options (line plot, chart builder, etc) but coudn't find a suitable solution....any suggestions would be most appreciated.

Thank you.

Hi Dale. I think that what I have done in the past is use the information in the table of estimated marginal means (i.e., the means and confidence intervals) to create a clustered hi-lo plot. You may need to add lines via the chart editor. Or, if you use the new-fangled GGRAPH methods, you can probably do it via syntax. ViAnn or someone else better versed in GGRAPH than me may jump in with advice on that.