SPSSX Discussion

Equal variances not assumed in t tests

Classic

List

Threaded

3 messages Options

Allan Lundy, PhD

Equal variances not assumed in t tests

Dear Listers,
Here's an issue that has had me wondering for some time. In t test output, you are given a choice of using t and significance either for the case when variances are equal or unequal, and often these differ greatly. The decision as to which to use is based (I have always assumed) on the F test of equal variances in the columns to the left. But even more than most uses of p< .05 as the critical value, this seems to me rather dubious. I just had an excellent example of the issue: comparing two groups of N= 195 and 27, the Levene's test was barely significant at p= .045. The two t values showed p's of .015 and .001 for equal variances assumed and not assumed, respectively. This is a pretty large difference (and I have seen many that were significant on one line and not on the other), based on a barely significant difference in variances. Has anyone written about this issue? It seems to me that an ideal solution would be to adjust the t test p value depending on the degree of inequality of variances, but that is probably asking too much. Still, considering the millions of t tests that must be run every day, it seems like it would be a pretty important issue.

Regards to all,
Allan

Allan Lundy, PhD
Research Consulting
[hidden email]

Business & Cell (any time): 215-820-8100
NEW Address:
587 Shotgun Spring Rd, New Market, VA 22844
Visit my Web site at www.dissertationconsulting.net ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Rich Ulrich

Re: Equal variances not assumed in t tests

When I looked into this some years ago, frankly, I was a bit surprised
by what I found.

It has been mentioned a number of times. The uniform opinion of
experts writing on the issue seems to be that this obvious option
is an unacceptable general strategy. What is acceptable is
somewhat diverse -- reflecting, I think, the usual circumstances that
are apt to arise in different areas of data collection.

The important thing to note is that while the two-group F-test is highly
robust, in the sense that a 5% test generally rejects about 5% of the time,
the one-tailed t-test with unequal Ns is not especially robust against
heterogeneity of variance. And this is true for both the Student test and
those with the SW correction. This 1-tail non-robustness is what comes
in play in deciding which test to use, since they have symmetrically-
opposite biases. [Do you want to maximize "power" for your test? If
the small group has a big variance, bury its influence by pooling; if
the small group has a tiny variance, capitalize on its influence by using
the separate-variance test. It is easy to design examples where a
"5% difference" is a 1% result on one test and 10% on the other.]

If you have good reason to expect that the variances should be equal, then
using the pooled test is justified. Is it worth noting, as a *result* of the
data collection or experiment, that *these* variances seem to differ?
Further -- If you expect the variances to differ ... perhaps you should ask
if you are really interested in the *mean*, and not in some difference at
one extreme or the other.

A couple of authors recommended using the unequal-variance t-test all
the time, because it is somewhat more robust than Students' against the
"usual" variations in variance. That advice, I think, is not especially good
for the social sciences, where we have (a) data that deserves to be
transformed, or (b) scores rated as dichotomy or on a scale with only a
few points. I have kept this advice in mind, though, for the occasions
when I use up my alternatives.

This first thing, of course, is to make sure that the data are "good" - no
invalid scores, no unexplained outliers. What do I do then?

1) Transform instead.
In my social science experience, I have noticed that many scores with
unequal variance are scores that are arguably in need of transformation:
that is, it is very *natural* to take the square root of some counts, or the
log of blood levels. Where the variances increase with the means and the
distributions belong to the same family, a rank-order transformation (i.e.,
using the MWW "non-parametric test") would also be valid -- but where
a transformation is natural and evident, I would do the test after the transform.
This avoids the question by removing the unequal variances.

2) When to avoid assuming unequal variances.
Another sort of score that give unequal variances on occasion are low response
rates for scales with just a few points. It turns out, from simulation that I did,
that the pooled variance t-test better estimates the rejection area than does
the MWW. I don't remember reading any source that addressed this point.
The pooled test even does reasonably well on testing dichotomies, though most
people would avoid reporting those results.
- The large fraction of ties make the MWW inaccurate for scales with few points.

For your data: You have smaller variance in the small group. The Levene test
gives you a warning. Does a transformation seem justified? If so, do it.
If not - Do you have an explanation for the variance being small? Typically,
I think I might have trouble justifying to a random critic why I seem to have
cherry-picked that result, if I can't give a reason. And critics don't like the
Levene test as a reason.

--
Rich Ulrich

Date: Mon, 22 Sep 2014 19:57:12 -0400
From: [hidden email]
Subject: Equal variances not assumed in t tests
To: [hidden email]

Dear Listers,
Here's an issue that has had me wondering for some time. In t test output, you are given a choice of using t and significance either for the case when variances are equal or unequal, and often these differ greatly. The decision as to which to use is based (I have always assumed) on the F test of equal variances in the columns to the left. But even more than most uses of p< .05 as the critical value, this seems to me rather dubious. I just had an excellent example of the issue: comparing two groups of N= 195 and 27, the Levene's test was barely significant at p= .045. The two t values showed p's of .015 and .001 for equal variances assumed and not assumed, respectively. This is a pretty large difference (and I have seen many that were significant on one line and not on the other), based on a barely significant difference in variances. Has anyone written about this issue? It seems to me that an ideal solution would be to adjust the t test p value depending on the degree of inequality of variances, but that is probably asking too much. Still, considering the millions of t tests that must be run every day, it seems like it would be a pretty important issue.

Regards to all,
Allan

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Kornbrot, Diana

Re: Equal variances not assumed in t tests

In reply to this post by Allan Lundy, PhD

The F-test for equality of var is only a guide, and has much lower power than t-tests themselves.

In my view if they are different one should go with UNEQUAL var, typically higher p-value.

Using the equal var option uses weighted var estimate and then spumes totalN/2 to calculate se. This is optimistic if on group has much smaller N - your case.

This is conservative - less likely to reach significance, but it is honest.

Also look at effect sizes. If small then study is inconclusive, non-significant does not mean much as power is low. If large eft size but not significant then study is underpowered and needs to be related with larger N in order to reach any conclusion

best

Diana

On 23 Sep 2014, at 00:57, Allan Lundy, PhD <[hidden email]> wrote:

Dear Listers,
Here's an issue that has had me wondering for some time. In t test output, you are given a choice of using t and significance either for the case when variances are equal or unequal, and often these differ greatly. The decision as to which to use is based (I have always assumed) on the F test of equal variances in the columns to the left. But even more than most uses of p< .05 as the critical value, this seems to me rather dubious. I just had an excellent example of the issue: comparing two groups of N= 195 and 27, the Levene's test was barely significant at p= .045. The two t values showed p's of .015 and .001 for equal variances assumed and not assumed, respectively. This is a pretty large difference (and I have seen many that were significant on one line and not on the other), based on a barely significant difference in variances. Has anyone written about this issue? It seems to me that an ideal solution would be to adjust the t test p value depending on the degree of inequality of variances, but that is probably asking too much. Still, considering the millions of t tests that must be run every day, it seems like it would be a pretty important issue.

Regards to all,
Allan

Allan Lundy, PhD
Research Consulting
[hidden email]

Business & Cell (any time): 215-820-8100
NEW Address:
587 Shotgun Spring Rd, New Market, VA 22844
Visit my Web site at www.dissertationconsulting.net
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

_______________

Professor Diana Kornbrot

University of Hertfordshire

College Lane, Hatfield, Hertfordshire AL10 9AB, UK

+44 (0) 208 444 2081

+44 (0) 7403 18 16 12

+44 (0) 170 728 4626

[hidden email]

http://dianakornbrot.wordpress.com/

http://go.herts.ac.uk/Diana_Kornbrot

skype: kornbrotme_______________________________