Login  Register

Re: 2 Variables, 7 cases, 10 observations -- Simple?

Posted by Bruce Weaver on Mar 12, 2018; 2:34pm
URL: http://spssx-discussion.165.s1.nabble.com/2-Variables-7-cases-10-observations-Simple-tp5735614p5735683.html

The widespread belief that the data need to be normally distributed is one of
the biggest statistical misconceptions people have, I reckon.  And I think
it comes down to not having a clear grasp of the distinctions between 3
different distributions:

1) The population distribution
2) The distributions for given samples from that population
3) The sampling distribution for some statistic

Here is a figure I saw recently.  I think it illustrates these distinctions
very nicely.

http://slideplayer.com/slide/8557877/26/images/8/Population+Distributions+vs.+Sampling+Distributions.jpg

It uses a binary variable to illustrate, but I think one can easily imagine
a corresponding figure where the Population and Sample distributions are
continuous and where the sampling distribution is the sampling distribution
of the mean rather than the sampling distribution of a proportion.  

If we use the single  sample z-test to illustrate, it is not the sample
distribution that needs to be normal, nor is it the population distribution.
Rather, it is the /sampling distribution of the mean/ that needs to be
(approximately) normal for the z-test to be valid.  (The scores need to be
independent too, of course.)  

If we shift the context to linear regression, I contend that it is the
/sampling distributions of the regression parameters/ that need to be
approximately normal in order for the t-tests on them to be valid, and for
the F-test for the overall model to be valid.  Again, the observations need
to be independent, and the errors need to be uncorrelated with the
explanatory variables.  But the (approximate) normality requirement apples
to the /sampling distributions of the parameters/.  

I know that the normality assumption for OLS regression is often said to
apply to the errors.  I have said that myself many times.  But of late, I
have come to the view that normality of the errors is a /sufficient/, but
not a /necessary/ condition.  The necessary normality condition is
approximate normality of the sampling distributions of the parameters.  And
as n increases, that condition will be met, even if the errors are not
normally distributed.  

Some of my thinking on this is due to reading what Jeffrey Wooldridge says
about the assumptions for OLS regression in his popular econometrics
textbook.  I've attached a small set of slides in which I've summarized his
main points.  

OLS_regression_assumptions_Wooldridge.pdf
<http://spssx-discussion.1045642.n5.nabble.com/file/t7186/OLS_regression_assumptions_Wooldridge.pdf>  

Members who do not read the list via Nabble can find a link to download the
file by viewing the thread here:  

http://spssx-discussion.1045642.n5.nabble.com/2-Variables-7-cases-10-observations-Simple-td5735614.html

Finally, I always try to say approximate normality rather than normality,
because I believe George Box was right when he commented that in the real
world, normal distributions (and straight lines) don't really exist.
Nevertheless, they can serve as useful approximations (or models) of
real-world phenomena.  See section 2.5 of this famous article:

http://mkweb.bcgsc.ca/pointsofsignificance/img/Boxonmaths.pdf

Cheers,
Bruce



Art Kendall wrote

> "the data also are not normally distributed,"
>
> There is no assumption that the *data*** are normally distributed.
>
> For uses of the general linear model (regression, anova, correlations,
> etc.,
> etc.) it is desirable that the *residuals * (aka errors in fit) are not
> very
> discrepant form normally distributed.
>
> Check to see whether CATREG handles repeated measures.
> Since CATREG has actual tests of whether there is a better fit with
> ordinal
> vs continuous assumptions it may be a way to look at your data.
>
>
>
> -----
> Art Kendall
> Social Research Consultants
> --
> Sent from: http://spssx-discussion.1045642.n5.nabble.com/
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).