Hi all,
This question has appeared quite a few times on the web, but I've not found an answer that clarifies my confusion. So here is what I understand: General Linear Models assume a normal distribution of residuals because the test statistic is calculated using residuals of the factors in the model and sampling error. But, what about a normal distribution of each sample in the model (which is equivalent to normal distribution of the residuals of THAT sample)? And, how does (if at all) does "normality" differ from "normal distribution"? Now, about means: Given we are comparing means in parametric tests, it seems to me that a non-normal distribution of a sample provides a misleading/inappropriate representation of the trait we wish to characterise for the population being sampled (e.g. IQ, arm length, heart rate) i.e. the mean equals the mode and median only when the distribution is normal. In cases where distributions are not normal it is inappropriate to report a measure of dispersion (standard error, standard deviation) as these assume data are normally distributed, YET everyone seems to report samples as mean + sd or se. WHERE AM I GOING WRONG HERE? What should we be testing when conducting a test with respect to normal distributions? The samples, the residuals, or both? If you can help me make sense of this it would be a useful christmas present! Thanks, Dean |
There is a reason that "normality" is required as something similar
to a "normal distribution of the residuals", and not as the distribution of the original sample. If that assumption is not met, then the F-test derived as a test statistic will not be distributed "as F". In other words, the tests you get may be wrong about their reported p-values. - The F-test is well-known as being rather robust, but even one or two *outliers* can screw up results. - Proper results can sometimes be obtained by using the F and finding its p-value by "monte-carlo" procedures, including bootstrap. Your additional point about means is overstated, but is otherwise a good one. The mean never matches the mode or median when the criterion is a zero-one dichotomy, giving the mean as a "rate" -- but that F-test is still going to be a good test so long as the d.f. is not tiny. I do tell people that when there are outliers (or such) so that you do not like the mean as representing the outcome, or the contrasts in outcomes, then you need to fix something. Sometimes that will be a matter of "bad data", or sometimes a transformation is needed -- either "parametric" like taking log or something, or "infinite-parametric" which is one way to describe using rank-orders. In my own experience of standard deviations, as a statistical reviewer, I might readily object to someone who used *only* the SD to describe data where there were important outliers or other oddities. I think it is pretty conventional to mention extreme scores where circumstance mention it. I don't remember being upset by any particular example of mis-use of SDs, but that would probably only arise somewhere that I was already dismissing the analysis for using inappropriate raw values. -- Rich Ulrich |
Hi,
Thank you Rich for your reply. I've followed up with some googling and naturally I've confused myself rather than improved my understanding. I have read several descriptions of the Assumption of Normality - 1) it applies to the sample data for each group (categorical predictor variable: t-test, ANOVA), 2) means calculated from multiple samples from a population are distributed normally, and 3) distribution of residuals for the model (as opposed to individual samples) is normal, 4) multivariate normality - combination of variables is distributed normally. One source I found helpful but difficult to digest is at http://www.psychology.uiowa.edu/faculty/mordkoff/GradStats/part%20I/I.07%20normal.pdf ********** Here is what I THINK I understand about normality with respect to general linear models: A t-test is a special case of a general linear model (two groups, categorical predictor). In this case, the assumption is the data in EACH SAMPLE have a normal distribution (equivalent to a normal distribution of the residuals for that sample). Is this because it isn't possible to calculate residuals for the MODEL, unlike in ANOVA and regression? or because of the formula for the t-statistic which is a function of differences in means and standard deviations rather than sums of squares? In more complex tests (ANOVA, Regression) the distribution of the residuals is assumed to be normal because of the distribution of the test statistic used in calculating the P-value (though I don't understand why, if indeed this is correct!). ********** Regarding reporting sample statistics. I'm a biologist and we regularly encounter samples that are skewed. An extreme example is with reproductive success. In many polygamous species the number of offspring sired by males is skewed, with most bearing a few or no offspring with most offspring sired by a small proportion of males. Another example where calculating and reporting a mean for a naturally skewed distribution that would raise few eyebrows is parasite load. In cases of a naturally skewed distribution (i.e. the statistical population has a skewed distribution) a mean (with standard error or standard deviation) is routinely reported. In this case, while the estimated mean may be an accurate, unbiased estimator of the population mean, the standard deviation and standard error are misleading, but convention apparently dictates we report a measure of dispersion with reported means. This is where I think there is a problem, but don't have a solution. cheers, Dean From: [hidden email] To: [hidden email]; [hidden email] Subject: RE: GLM assumption - Normality of residuals vs Normal distribution of samples Date: Mon, 10 Dec 2012 23:24:33 -0500
There is a reason that "normality" is required as something similar to a "normal distribution of the residuals", and not as the distribution of the original sample. If that assumption is not met, then the F-test derived as a test statistic will not be distributed "as F". In other words, the tests you get may be wrong about their reported p-values. - The F-test is well-known as being rather robust, but even one or two *outliers* can screw up results. - Proper results can sometimes be obtained by using the F and finding its p-value by "monte-carlo" procedures, including bootstrap. Your additional point about means is overstated, but is otherwise a good one. The mean never matches the mode or median when the criterion is a zero-one dichotomy, giving the mean as a "rate" -- but that F-test is still going to be a good test so long as the d.f. is not tiny. I do tell people that when there are outliers (or such) so that you do not like the mean as representing the outcome, or the contrasts in outcomes, then you need to fix something. Sometimes that will be a matter of "bad data", or sometimes a transformation is needed -- either "parametric" like taking log or something, or "infinite-parametric" which is one way to describe using rank-orders. In my own experience of standard deviations, as a statistical reviewer, I might readily object to someone who used *only* the SD to describe data where there were important outliers or other oddities. I think it is pretty conventional to mention extreme scores where circumstance mention it. I don't remember being upset by any particular example of mis-use of SDs, but that would probably only arise somewhere that I was already dismissing the analysis for using inappropriate raw values. -- Rich Ulrich |
Administrator
|
In reply to this post by DP_Sydney
I'll address a point that Rich didn't talk about specifically. You said,
"What should we be testing when conducting a test with respect to normal distributions? The samples, the residuals, or both?" To keep things simple, let's use one-way ANOVA as the context. The fitted values for that model are the group means. Therefore, residuals equal raw score minus group mean; and the distribution of residuals within each group have exactly the same shape as the distribution of scores, within each group (the distribution is just shifted to the left or right). So normality of residuals = normality of the scores within groups. Re testing for normality, I would not advise it as a means of justifying use of a parametric test. When sample sizes are low, tests of normality have low power, and will likely fail to detect important departures from normality. When sample sizes are large, the tests have too much power, and will throw up the red flag for small, unimportant departures from normality. Let me also remind you that the i.i.d. N(0,sigma) assumption for OLS models has to do with the errors, not the residuals. The wikipedia page on the distinction between the two is quite good, I think. http://en.wikipedia.org/wiki/Errors_and_residuals_in_statistics And finally, as Herman Rubin often says in the sci.stat.* newsgroups, the independence assumption is FAR more important than the normality assumption. HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Hi Bruce,
Thanks for your input. I might (?!?!) be getting closer to a correct understanding, but I have a few queries about what you said. Which residuals are assumed to be normally distributed? You wrote: "To keep things simple, let's use one-way ANOVA as the context. The fitted values for that model are the group means. Therefore, residuals equal raw score minus group mean; and the distribution of residuals within each group have exactly the same shape as the distribution of scores, within each group (the distribution is just shifted to the left or right). So normality of residuals = normality of the scores within groups." Does this then imply the residuals for each group (sample) individually are assumed to be distributed normally and normality for EACH sample should be checked? OR does the assumption apply to the residuals pooled across groups (i.e. one 'sample' of residuals)? I think(?!) the latter given the formula for SS in ANOVA – i.e. the within-group (aka residual) SS is calculated by pooling the residual scores for each datum, which is calculated as the deviation of the datum value from its group mean, across groups. (It is for this reason that departures from normality can 'bias' the SS and thus the P-value??????) Fitted Values in ANOVA. I don't understand what you mean by the "fitted values for that model are the group means". In linear regression the fitted value is determined by the linear equation, so in ANOVA (another 'linear model') wouldn't the fitted value be calculated from the linear equation fitted to all the data? Errors vs Residuals. You wrote: "Let me also remind you that the i.i.d. N(0,sigma) assumption for OLS models has to do with the errors, not the residuals". I've read the Wikipedia page and some other information and it seems to me that the observed residuals are estimates of the statistical error (aka theoretical residuals), similar to the sample mean being an estimate of the population mean with the exception that observed residuals are not independent but statistical errors are. Cheers, Dean |
Administrator
|
"Fitted Values in ANOVA. I don't understand what you mean by the "fitted values for that model are the group means". In linear regression the fitted value is determined by the linear equation, so in ANOVA (another 'linear model') wouldn't the fitted value be calculated from the linear equation fitted to all the data? "
--- Consider: Create some junk data. data list free /gp dv. begin data 1 23 1 45 1 65 1 11 1 23 2 11 2 13 2 15 2 17 2 19 2 20 3 26 3 29 3 17 3 22 3 19 3 18 4 29 4 35 4 22 4 39 4 50 4 23 4 45 end data. GLM dv by gp / INTERCEPT EXCLUDE /PRINT DESC PARAM /EMMEANS=TABLES(GP). VECTOR GPDUM(4). RECODE GPDUM1 TO GPDUM4 (ELSE=0). COMPUTE GPDUM(gp)=1. REG /ORIG/DEP DV / METHOD ENTER GPDUM1 TO GPDUM4 .
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by DP_Sydney
Dean,
To take several points in order... "Difficult to digest" is an apt way to describe that source. He is not very wrong, but there is a world outside of ANOVA that he seems largely unaware of. And I think you might want something that touches more on the theory of statistical estimation. On Normality: a number of various statistical distributions are related. A z-distribution ("normal") can be squared to get a chi-squared. The sum of k independent z's is a chi-squared with k d.f. The SD, as computed around an observed mean, is the square root of the (average squared deviation). It is the sum of k terms which are not entirely "independent" because they use the mean that they contributed to. Theory shows that the unbiassed estimate of the SD can be obtained by dividing the Sum of Squares by (k-1) instead of (k), before taking the square root to get the SD. A z, in practice, is often estimated using this sample SD to divide raw scores; and the result is a t-distribution - which has slightly fatter tails, depending on the degrees of freedom. The t is computed with a normal in the numerator and a (chi-squared/d.f) in the denominator. A t with k d.f. can be squared to get a F with (1, k) d.f. An F is also called an F-ratio statistic. Europeans still may use the name "Fisher statistic", after its inventor. It can be thought of as the ratio of two terms that are independently (chi-squared/d.f.). ** Thus, the t-test is fully a member of the general linear model, and it is one of the simplest of instances. Having "normality" in *each* of the two samples is not sufficient -- the standard deviations of the two must also be the same. The mixture of a wide set of residual with a narrow one -- when combining both sets of residuals -- will not be a set of "normal residuals." However, pragmatically, and following some theory, we know that testing the difference between means when the SDs are different can be well- approximated with another t-test, using the Welch-Satterthwaite correction to the degrees of freedom. On your question of reporting sample statistics when there is great skew -- I agree that it seems to be poor practice to give only the mean and SD when discussing such things as "parasite load", in many of the possible contexts, when one infested critter may have most of the total for the sample. Among the numbers that I read these days, this is like the concentration of wealth. People not only refer to per capita income, or wealth, but they also mention that "1% of the US population has 45% of the wealth, while 50% of the population has 1% of the wealth." Or, "1% of the population earns 20% of the income." The economists who are comparing these things may use the Gini coefficient to point out that the US concentration of wealth (and incomes) are now greater than any time in the US since before the Depression, and greater than any other advanced industrial nation. So, economists do make use of other descriptions beyond the simple average. "Median income" and "top 10% and "quintiles" are also used a lot. But per capita income is still useful, at times. I suppose, if you aren't a reviewer, you could write letters of protest to the journals that have the most misleading examples. Or raise the issue with presenters, at symposiums or other talks. -- Rich Ulrich |
Administrator
|
In reply to this post by DP_Sydney
See below.
The entire set of errors for the model is assumed to be "independent and identically distributed as Normal with mean = 0 and variance = some sigma-squared value". The "identically distributed" part of of that statement is what Rich was talking about in his post. In an ANOVA model, it would be called homogeneity of variance. In a classic regression model (i.e., with continuous predictors only), it would be called homoscedasticity. David provided a nice example of this in his post. Correct. Residuals are observable estimates of the unobservable errors; and the residuals are not truly independent, because once n-1 of them are known, the nth one is determined (because the sum of all residuals = 0). HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Rich Ulrich
Hi Rich and Bruce,
So far, have I got this right? 1) The residuals in an ANOVA model are calculated independently for each group then pooled (as sums of squares) to estimate the statistical error in the model (i.e. derive the SS-within). Correct? 2) The residuals for each group individually are assumed to have the same variance, otherwise the 'pooled' sample will not be distributed normally. (Also to avoid one sample having greater influence on the SS-within than the others). Correct? 3) The 'pooled' residuals (i.e. entire set in the model) are assumed to be distributed normally. Bruce wrote: "The entire set of errors for the model is assumed to be "independent and identically distributed as Normal with mean = 0 and variance = some sigma-squared value". Taking ANOVA as an example this equates to the SS-within being distributed normally. I THINK(!?) this is required so the MS-within (~mean residual/error?) is estimated accurately (and then the F-ratio and P-value)? I'm still unclear why this is assumed. 4) THIS is where I get stuck: the residuals for each group have to be distributed normally Bruce wrote: "The entire set of errors for the model is assumed to be "independent and identically distributed as Normal with mean = 0 and variance = some sigma-squared value". The "identically distributed" part of of that statement is what Rich was talking about in his post. In an ANOVA model, it would be called homogeneity of variance", and "To keep things simple, let's use one-way ANOVA as the context. The fitted values for that model are the group means. Therefore, residuals equal raw score minus group mean; and the distribution of residuals within each group have exactly the same shape as the distribution of scores, within each group (the distribution is just shifted to the left or right). So normality of residuals = normality of the scores within groups". Otherwise the 'pooled' residuals will not be distributed normally????? Rich wrote: "Having "normality" in *each* of the two samples is not sufficient -- the standard deviations of the two must also be the same. The mixture of a wide set of residual with a narrow one -- when combining both sets of residuals -- will not be a set of "normal residuals." Cheers, Dean
|
Administrator
|
On the page cited below, see the answer that begins, 'Standard Classical one-way ANOVA can be viewed as an extension to the classical "2-sample T-test" to an "n-sample T-test".' I think it addresses many of your questions.
http://stats.stackexchange.com/questions/6350/anova-assumption-normality-normal-distribution-of-residuals In the second paragraph, I would make a couple small changes. My edits below are in bold. "I think where you are getting confused is that (under the assumptions of the model) the errors and the raw data are BOTH normally distributed. However the raw data consist of samples from normally distributed populations with different means (unless all the effects are exactly the same) but the same variance. The errors on the other hand have the same normal distribution. This comes from the third assumption of homoscedasticity." HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Free forum by Nabble | Edit this page |