Shapiro-Wilks Statistic

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: Shapiro-Wilks Statistic

Kornbrot, Diana
Good question
First determine your question(s)
If you wish to compare means using ANOVA or t-test then non-normality matters
1st EXPLORE data in all groups, with normality superposed
If it has long tail, log all data and try again
Or try a Box-Cox transformation

Once you have a good transformation, convert all between predictors to a single factor B with levels K1*K2*K3…
Then do 1 way with heterogenous variance  - Welch
Heterogeneity of variance is usually MUCH bigger problem than normality

If you have repeated measure, would recommend generalised linear model - its a nuisance but you will have to put data in long form. GeneraliZed allows trying a variety of distributions, and does NOT insist on homogeneity of variance 

Often on is not just interested in mean but in whole distribution - EXPLORE is really useful for this
Classic e.g. Door designer want no bashed heads 95%ile distribution of tallest group is what mater - not their mean height. Joking aside in many medical apps one wants to know how many in each group are within ‘healthy’ limits - mean Amy be deceptive
best
Diana


On 2 Oct 2020, at 13:27, 3J LEMA <[hidden email]> wrote:

Hi, Bruce.

Mine was just an exercise. 

If we don't bother to test normality, why did book authors bothered to discuss normality and parametric  tests. -:) Moreover, why SPSS bothered to include procedures for testing normality? -:)

-:)

3J

On Thu, 1 Oct 2020 at 20:26, Bruce Weaver <[hidden email]> wrote:
I'll repeat the question I asked earlier:

> Why do you [Harley & 3J LEMA] want to test for normality?

If it is to justify use of a parametric test, my advice would be, "Don't
bother!"   ;-) 



3J LEMA wrote
> Any tips on when to use the Shapiro-Wilk statistic over the
> Kolmogorov-Smirnov Statistic in testing normality?
>
> Thank you.





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

____________
University of Hertfordshire
College Lane, Hatfield, Hertfordshire AL10 9AB, UK
+44 (0) 208 444 2081
+44 (0) 7403 18 16 12
[hidden email]
http://dianakornbrot.wordpress.com/
skype:  kornbrotme
Save our in-boxes! http://emailcharter.org
 __________________







===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Shapiro-Wilks Statistic

Bruce Weaver
Administrator
I believe there is much confusion about what it is that has to be
approximately normally distributed when one uses a parametric test or
procedure.  First, I say *approximately*, because as George Box noted in his
1976 article, there is no such thing as a normal distribution in the real
world.

"…the statistician knows…that in nature there never was a normal
distribution, there never was a straight line, yet with normal and linear
assumptions, known to be false, he can often derive results which match, to
a useful approximation, those found in the real world." (JASA, 1976, Vol.
71, 791-799)
http://mkweb.bcgsc.ca/pointsofsignificance/img/Boxonmaths.pdf

Second, when discussing normality and OLS models, good textbooks say that it
is the error distribution, not the DV distribution, that is assumed to be
normal.  

Unfortunately, most of the books I've seen fail to clarify that normality of
the errors is a *sufficient* condition but not a necessary condition.  The
necessary condition is that the sampling distributions of the parameter
estimates are approximately normal.  And the shape of those sampling
distributions is determined by both the shape of the error distribution and
the sample size.  Given a large enough n, the sampling distributions of the
parameter estimates can be approximately normal even if the error
distribution is not.  

These points are addressed in some slides I cobbled together to summarize
Jeff Wooldridge's nice discussion of the assumptions for OLS regression.
See especially slides 7 and 9-11.  

https://sites.google.com/a/lakeheadu.ca/bweaver/Home/statistics/files/OLS_regression_assumptions_Wooldridge.pdf

Finally, when I teach about t-tests, I emphasize the common format that all
of them share:

   t = (statistic - parameter|H0) / SE_statistic

Single-sample t-test:  
  statistic = Xbar
  parameter = mu|H0
  SE = SD/SQRT(n)

Unpaired t-test (equal variances assumed):
  statistic = Xbar1-Xbar2
  parameter = mu1-mu2 | H0 {this is often 0, but not necessarily so!}
  SE = SQRT((Pooled Var / n1) + (Pooled Var / n2))
 
Etc.

I also stress that the test will be reasonably valid if the *sampling
distribution of the statistic* in the numerator is approximately normal. But
given that t-tests are special cases of OLS regression, this is just the
same thing that Wooldridge and Vittinghoff say about normality of sampling
distributions of parameter estimates for OLS models.  

Cheers,
Bruce

PS- I did not talk about testing for normality prior to using a parametric
test/procedure.  My thoughts on that are summarized in a short conference
presentation I gave in 2011.  You can see it here:

https://www.researchgate.net/publication/299497976_Silly_or_Pointless_Things_People_Do_When_Analyzing_Data_1_Testing_for_Normality_as_a_Precursor_to_a_t-test

Apologies to all who are too young to recognize classic Hanna-Barbera
cartoon characters.  (And to those who do remember the cartoon characters,
apologies for misspelling Baba Looey.  I should have looked it up!)  

;-)  


Kornbrot, Diana wrote

> Good question
> First determine your question(s)
> If you wish to compare means using ANOVA or t-test then non-normality
> matters
> 1st EXPLORE data in all groups, with normality superposed
> If it has long tail, log all data and try again
> Or try a Box-Cox transformation
>
> Once you have a good transformation, convert all between predictors to a
> single factor B with levels K1*K2*K3…
> Then do 1 way with heterogenous variance  - Welch
> Heterogeneity of variance is usually MUCH bigger problem than normality
>
> If you have repeated measure, would recommend generalised linear model -
> its a nuisance but you will have to put data in long form. GeneraliZed
> allows trying a variety of distributions, and does NOT insist on
> homogeneity of variance
>
> Often on is not just interested in mean but in whole distribution -
> EXPLORE is really useful for this
> Classic e.g. Door designer want no bashed heads 95%ile distribution of
> tallest group is what mater - not their mean height. Joking aside in many
> medical apps one wants to know how many in each group are within ‘healthy’
> limits - mean Amy be deceptive
> best
> Diana
>
>
> On 2 Oct 2020, at 13:27, 3J LEMA <

> 3jlema@

> &lt;mailto:

> 3jlema@

> &gt;> wrote:
>
> Hi, Bruce.
>
> Mine was just an exercise.
>
> If we don't bother to test normality, why did book authors bothered to
> discuss normality and parametric  tests. -:) Moreover, why SPSS bothered
> to include procedures for testing normality? -:)
>
> -:)
>
> 3J
>
> On Thu, 1 Oct 2020 at 20:26, Bruce Weaver &lt;

> bruce.weaver@

> &lt;mailto:

> bruce.weaver@

> &gt;> wrote:
> I'll repeat the question I asked earlier:
>
>> Why do you [Harley & 3J LEMA] want to test for normality?
>
> If it is to justify use of a parametric test, my advice would be, "Don't
> bother!"   ;-)
>
>
>
> 3J LEMA wrote
>> Any tips on when to use the Shapiro-Wilk statistic over the
>> Kolmogorov-Smirnov Statistic in testing normality?
>>
>> Thank you.
>
>
>
>
>
> -----
> --
> Bruce Weaver

> bweaver@

> &lt;mailto:

> bweaver@

> &gt;
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> Sent from: http://spssx-discussion.1045642.n5.nabble.com/
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

> &lt;mailto:

> LISTSERV@.UGA

> &gt; (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
> ===================== To manage your subscription to SPSSX-L, send a
> message to

> LISTSERV@.UGA

> &lt;mailto:

> LISTSERV@.UGA

> &gt; (not to SPSSX-L), with no body text except the command. To leave the
> list, send the command SIGNOFF SPSSX-L For a list of commands to manage
> subscriptions, send the command INFO REFCARD
>
> ____________
> University of Hertfordshire
> College Lane, Hatfield, Hertfordshire AL10 9AB, UK
> +44 (0) 208 444 2081
> +44 (0) 7403 18 16 12

> d.e.kornbrot@.ac

> &lt;mailto:

> d.e.kornbrot@.ac

> &gt;
> http://dianakornbrot.wordpress.com/
> skype:  kornbrotme
> Save our in-boxes! http://emailcharter.org
>  __________________
>
>
>
>
>
>
>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Shapiro-Wilks Statistic

bdates
Bruce,

It reminds me of George Box's famous comment, "All models are wrong; but some are useful." So the Gaussian model isn't always precise, but it is immensely useful. Thanks for going to all the trouble on this.

Brian

From: SPSSX(r) Discussion <[hidden email]> on behalf of Bruce Weaver <[hidden email]>
Sent: Friday, October 2, 2020 10:27 AM
To: [hidden email] <[hidden email]>
Subject: Re: Shapiro-Wilks Statistic
 
I believe there is much confusion about what it is that has to be
approximately normally distributed when one uses a parametric test or
procedure.  First, I say *approximately*, because as George Box noted in his
1976 article, there is no such thing as a normal distribution in the real
world.

"…the statistician knows…that in nature there never was a normal
distribution, there never was a straight line, yet with normal and linear
assumptions, known to be false, he can often derive results which match, to
a useful approximation, those found in the real world." (JASA, 1976, Vol.
71, 791-799)
https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Fmkweb.bcgsc.ca%2Fpointsofsignificance%2Fimg%2FBoxonmaths.pdf&amp;data=02%7C01%7Cbdates%40SWSOL.ORG%7Ce37f47c0adf241ad579808d866df58a9%7Cecdd61640dbd4227b0986de8e52525ca%7C0%7C0%7C637372457480615867&amp;sdata=BXcywEq5LOR9cnbLBxWSBERL4TkO0gdYPszfsdV5i5g%3D&amp;reserved=0

Second, when discussing normality and OLS models, good textbooks say that it
is the error distribution, not the DV distribution, that is assumed to be
normal. 

Unfortunately, most of the books I've seen fail to clarify that normality of
the errors is a *sufficient* condition but not a necessary condition.  The
necessary condition is that the sampling distributions of the parameter
estimates are approximately normal.  And the shape of those sampling
distributions is determined by both the shape of the error distribution and
the sample size.  Given a large enough n, the sampling distributions of the
parameter estimates can be approximately normal even if the error
distribution is not. 

These points are addressed in some slides I cobbled together to summarize
Jeff Wooldridge's nice discussion of the assumptions for OLS regression.
See especially slides 7 and 9-11. 

https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsites.google.com%2Fa%2Flakeheadu.ca%2Fbweaver%2FHome%2Fstatistics%2Ffiles%2FOLS_regression_assumptions_Wooldridge.pdf&amp;data=02%7C01%7Cbdates%40SWSOL.ORG%7Ce37f47c0adf241ad579808d866df58a9%7Cecdd61640dbd4227b0986de8e52525ca%7C0%7C0%7C637372457480615867&amp;sdata=pvVZHsXnmZQNO3Ky7AU6ENLWl4pYqO45IizwKkinegI%3D&amp;reserved=0

Finally, when I teach about t-tests, I emphasize the common format that all
of them share:

   t = (statistic - parameter|H0) / SE_statistic

Single-sample t-test: 
  statistic = Xbar
  parameter = mu|H0
  SE = SD/SQRT(n)

Unpaired t-test (equal variances assumed):
  statistic = Xbar1-Xbar2
  parameter = mu1-mu2 | H0 {this is often 0, but not necessarily so!}
  SE = SQRT((Pooled Var / n1) + (Pooled Var / n2))
 
Etc.

I also stress that the test will be reasonably valid if the *sampling
distribution of the statistic* in the numerator is approximately normal. But
given that t-tests are special cases of OLS regression, this is just the
same thing that Wooldridge and Vittinghoff say about normality of sampling
distributions of parameter estimates for OLS models. 

Cheers,
Bruce

PS- I did not talk about testing for normality prior to using a parametric
test/procedure.  My thoughts on that are summarized in a short conference
presentation I gave in 2011.  You can see it here:

https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.researchgate.net%2Fpublication%2F299497976_Silly_or_Pointless_Things_People_Do_When_Analyzing_Data_1_Testing_for_Normality_as_a_Precursor_to_a_t-test&amp;data=02%7C01%7Cbdates%40SWSOL.ORG%7Ce37f47c0adf241ad579808d866df58a9%7Cecdd61640dbd4227b0986de8e52525ca%7C0%7C0%7C637372457480615867&amp;sdata=1jNusBH5jtYjRfy7YtExYs1aonEYo930Er%2BNQuYSyMw%3D&amp;reserved=0

Apologies to all who are too young to recognize classic Hanna-Barbera
cartoon characters.  (And to those who do remember the cartoon characters,
apologies for misspelling Baba Looey.  I should have looked it up!) 

;-) 


Kornbrot, Diana wrote
> Good question
> First determine your question(s)
> If you wish to compare means using ANOVA or t-test then non-normality
> matters
> 1st EXPLORE data in all groups, with normality superposed
> If it has long tail, log all data and try again
> Or try a Box-Cox transformation
>
> Once you have a good transformation, convert all between predictors to a
> single factor B with levels K1*K2*K3…
> Then do 1 way with heterogenous variance  - Welch
> Heterogeneity of variance is usually MUCH bigger problem than normality
>
> If you have repeated measure, would recommend generalised linear model -
> its a nuisance but you will have to put data in long form. GeneraliZed
> allows trying a variety of distributions, and does NOT insist on
> homogeneity of variance
>
> Often on is not just interested in mean but in whole distribution -
> EXPLORE is really useful for this
> Classic e.g. Door designer want no bashed heads 95%ile distribution of
> tallest group is what mater - not their mean height. Joking aside in many
> medical apps one wants to know how many in each group are within ‘healthy’
> limits - mean Amy be deceptive
> best
> Diana
>
>
> On 2 Oct 2020, at 13:27, 3J LEMA <

> 3jlema@

> &lt;mailto:

> 3jlema@

> &gt;> wrote:
>
> Hi, Bruce.
>
> Mine was just an exercise.
>
> If we don't bother to test normality, why did book authors bothered to
> discuss normality and parametric  tests. -:) Moreover, why SPSS bothered
> to include procedures for testing normality? -:)
>
> -:)
>
> 3J
>
> On Thu, 1 Oct 2020 at 20:26, Bruce Weaver &lt;

> bruce.weaver@

> &lt;mailto:

> bruce.weaver@

> &gt;> wrote:
> I'll repeat the question I asked earlier:
>
>> Why do you [Harley & 3J LEMA] want to test for normality?
>
> If it is to justify use of a parametric test, my advice would be, "Don't
> bother!"   ;-)
>
>
>
> 3J LEMA wrote
>> Any tips on when to use the Shapiro-Wilk statistic over the
>> Kolmogorov-Smirnov Statistic in testing normality?
>>
>> Thank you.
>
>
>
>
>
> -----
> --
> Bruce Weaver

> bweaver@

> &lt;mailto:

> bweaver@

> &gt;
> https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsites.google.com%2Fa%2Flakeheadu.ca%2Fbweaver%2F&amp;data=02%7C01%7Cbdates%40SWSOL.ORG%7Ce37f47c0adf241ad579808d866df58a9%7Cecdd61640dbd4227b0986de8e52525ca%7C0%7C0%7C637372457480625862&amp;sdata=PmLz7lKJYsfLFnKTyhJKfr%2FlRM3DWrI6S5auADzKdI0%3D&amp;reserved=0
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> Sent from: https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspssx-discussion.1045642.n5.nabble.com%2F&amp;data=02%7C01%7Cbdates%40SWSOL.ORG%7Ce37f47c0adf241ad579808d866df58a9%7Cecdd61640dbd4227b0986de8e52525ca%7C0%7C0%7C637372457480625862&amp;sdata=Z2eV6Axl42YbxXLwCzhDpGf2zMMimgn7JVYbfZAIOCI%3D&amp;reserved=0
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

> &lt;mailto:

> LISTSERV@.UGA

> &gt; (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
> ===================== To manage your subscription to SPSSX-L, send a
> message to

> LISTSERV@.UGA

> &lt;mailto:

> LISTSERV@.UGA

> &gt; (not to SPSSX-L), with no body text except the command. To leave the
> list, send the command SIGNOFF SPSSX-L For a list of commands to manage
> subscriptions, send the command INFO REFCARD
>
> ____________
> University of Hertfordshire
> College Lane, Hatfield, Hertfordshire AL10 9AB, UK
> +44 (0) 208 444 2081
> +44 (0) 7403 18 16 12

> d.e.kornbrot@.ac

> &lt;mailto:

> d.e.kornbrot@.ac

> &gt;
> https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdianakornbrot.wordpress.com%2F&amp;data=02%7C01%7Cbdates%40SWSOL.ORG%7Ce37f47c0adf241ad579808d866df58a9%7Cecdd61640dbd4227b0986de8e52525ca%7C0%7C0%7C637372457480625862&amp;sdata=n8%2BX7iNXg7hTAW8icBjnJiF53eLKmj9xf3%2BTSaUlNUw%3D&amp;reserved=0
> skype:  kornbrotme
> Save our in-boxes! https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Femailcharter.org%2F&amp;data=02%7C01%7Cbdates%40SWSOL.ORG%7Ce37f47c0adf241ad579808d866df58a9%7Cecdd61640dbd4227b0986de8e52525ca%7C0%7C0%7C637372457480625862&amp;sdata=Z34VJk%2Fc7lxSGKlxRcmF9hQcKFA2DfunB7vqaUhIBus%3D&amp;reserved=0
>  __________________
>
>
>
>
>
>
>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Fsites.google.com%2Fa%2Flakeheadu.ca%2Fbweaver%2F&amp;data=02%7C01%7Cbdates%40SWSOL.ORG%7Ce37f47c0adf241ad579808d866df58a9%7Cecdd61640dbd4227b0986de8e52525ca%7C0%7C0%7C637372457480625862&amp;sdata=PmLz7lKJYsfLFnKTyhJKfr%2FlRM3DWrI6S5auADzKdI0%3D&amp;reserved=0

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Fspssx-discussion.1045642.n5.nabble.com%2F&amp;data=02%7C01%7Cbdates%40SWSOL.ORG%7Ce37f47c0adf241ad579808d866df58a9%7Cecdd61640dbd4227b0986de8e52525ca%7C0%7C0%7C637372457480625862&amp;sdata=Z2eV6Axl42YbxXLwCzhDpGf2zMMimgn7JVYbfZAIOCI%3D&amp;reserved=0

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Shapiro-Wilks Statistic

Bruce Weaver
Administrator
Indeed, Brian.  I use that quote frequently in my lectures too.  

As an example, I use it after give the following rant describing what I
*wish* textbooks said regarding the key assumptions for the unpaired t-test.

<rant>
If one was able to sample from two perfectly normal populations that had
perfectly equal variances, and if every observation was perfectly
independent of all other observations, then the t-test (as we typically
compute it) would be an exact test.  However, when working with real-world
data, we can never sample from two populations that are perfectly normal and
have exactly equal variances.  Therefore, the t-test is really an
approximate test.  And therefore, there is no point in worrying about (or
testing for) exact normality and exact homogeneity of variance—we know we
don’t have them.  Rather, the question should be whether the approximation
is good enough to be "useful".  And that brings us back to the CLT, and the
trade-off between population shape and sample size in determining the shape
of the sampling distribution(s).  
</rant>

I then say that we must not despair because the t-test (when done using real
world data) is not an exact test.  I remind students that the Chi-square
tests they learned in their first course are also approximate tests.  And I
then quote Box, and suggest that even if they are only approximate, t-tests
(and Chi-square tests) can still be useful.

Regarding variance homogeneity, I like the rule of thumb Dave Howell (and
others) have suggested:  If the sample sizes are equal, t-tests (and ANOVA)
are very robust to heterogeneity of variance, provided the ratio of largest
to smallest variance is no greater than 4 or 5.  But Howell also said that
"heterogeneity of variance and unequal sample sizes do not mix."  So if n1
and n2 differ by more than a trivial amount, I'd likely use either the
"equal variances not assumed" t-test.  (And for one-way ANOVA, I'd use the
WELCH or BROWNFORSYTHE option for the ONEWAY command.)  

Cheers,
Bruce



bdates wrote
> Bruce,
>
> It reminds me of George Box's famous comment, "All models are wrong; but
> some are useful." So the Gaussian model isn't always precise, but it is
> immensely useful. Thanks for going to all the trouble on this.
>
> Brian





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Shapiro-Wilks Statistic

Anthony Babinec
Another point:
Generate a random normal variable using SPSS Statistics.
Then, discretize it into an ordinal variable by binning values.
A discretized variable is not normally distributed. This is
true whether the categories are populated in a U-shape or
an inverted U-shape, as they would appear if discretized
from the usual bell-shaped curve. Why test for normality
when the variable in question is known to be not normal?

Anthony Babinec

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Shapiro-Wilks Statistic

Rich Ulrich
In reply to this post by 3J LEMA
I will address, "Why do text books discuss normality ...
nonparametric tests... Why does SPSS include procedures?"

The last is easy:  SPSS includes what the users demand.

Why the demand?  Psychology, for some reason, seems to
attract people who do not do well in math ... and statistical
reasoning.  Numerical research in /various/ areas only got started
in a big way after WW II.  Fisher wrote his book in 1928, the 1930s
saw much tedious test development, and the war years gave some
real-life experience with useful analyses using probability and
statistics.

Especially in the 1950s, researchers in psychology and education
discovered "nonparametric testing" and were thrilled at the
chance (as they saw it) to ignore all that complicated stuff about
"assumptions.."   They thought they had found a magic bullet
that would slay all demons.  They and their students continued to
have an over-affection for teaching and non-parametric methods
for years, despite healthy criticism: mean scores tell us more than
mean ranks do if we know the scaling.

WJ Conover in the 1980s unified parametric and the most popular
non-parametric procedures by considering the "rank-transformation"
as a transformation that can be analyzed by ordinary ANOVA. He showed
that sometimes the ANOVA gives /exactly/ the same test. Sometimes it
gives a /better/ test, as when the "ranked" cases fail to meet the
assumption of "no ties" and the user is referred to some approximate
test with a crude estimate of the variance. That's one example, IIRC.

TRANSFORMATIONS other than by RANK.
Conover's article is part of what impressed me with "equal interval"
as the key to deciding on transformation.  Yes, if the "mean" is
meaningful, that is a very strong clue that you have equal intervals. 
But you can consider that across the whole range (observed or
potential).  In my own experience, "unequal variances" very often
indicated (NOT the need for Satterthwaite correction, but... ) the
need for transformation because intervals are unequal:  the unit
of time between 1 and 2 seconds often is important, whereas the time
between 61 and 62 seconds is rarely a "unit" equal to that.

Occasionally, the rank-transformation improves the "interval-ness"
more than any simple power transformation, but, "given my druthers,"
I would rather use another transformation. 

But non-par tests are nice to have available. The rank-test or rank-
correlation can provide useful, obvious, backup information for the
skeptical reader.

Rich Ulrich




If we don't bother to test normality, why did book authors bothered to discuss normality and parametric  tests. -:) Moreover, why SPSS bothered to include procedures for testing normality? -:)
bother!" 
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
12