My question is What is the best generalized linear model to use with my dataset? I’m working on a regression analysis, with four predictors that has a positively skewed outcome variable (Skewness=1.447, SE = .241; Kutosis=2.870, SE=.478).
This is not a count variable but a measure of parenting style. When I went to Generalized Linear Models I got confused by the choices. I wish there was some documentation about these models with examples. Since that not the case, I thought this is the best
place to which model is appropriate for a non-count positively skewed outcome. TIA. Stephen Salbod, Pace University, NYC |
I assume the DV is the sum/mean of items so it is kind of continuous within the scale range. Perhaps you’ve done this already, but I think normal distribution regression (aka Regression) would be most appropriate.
Look at residuals and plots of the usual variables for problems. As others have pointed out least squares regression assumes a normal distribution for the residuals. Maximum likelihood, which GenLin uses, assumes multivariate normality for variables, IVs and
DVs, a much higher standard. Gene Maguin From: SPSSX(r) Discussion [mailto:[hidden email]]
On Behalf Of Salbod, Mr. Stephen My question is What is the best generalized linear model to use with my dataset? I’m working on a regression analysis, with four predictors that has a positively skewed outcome variable (Skewness=1.447, SE = .241; Kutosis=2.870, SE=.478).
This is not a count variable but a measure of parenting style. When I went to Generalized Linear Models I got confused by the choices. I wish there was some documentation about these models with examples. Since that not the case, I thought this is the best
place to which model is appropriate for a non-count positively skewed outcome. TIA. Stephen Salbod, Pace University, NYC |
Administrator
|
In reply to this post by Salbod
As Gene said in his reply, the assumptions for OLS linear regression have to do with the errors, not the outcome variable. So I agree that the first step is to run the OLS regression model and look at residual plots. The errors are assumed to be i.i.d. N(0,sigma^2) -- i.e., independently and identically distributed as Normal with a mean of 0 and some variance. But as George Box reminded us in one of his famous articles, nothing in nature is truly normally distributed. So you know off the top that the errors are not truly normal--i.e., the tests associated with your model are approximate tests, not exact tests. And as always with approximate tests, the important question is whether the approximation is good enough to be useful. (You can probably think of another famous George Box quote about models being useful.)
So how important is normality of the errors? Not very. Independence is arguably the most important assumption, with homoscedasticity (i.e., identically distributed) after that. I am reminded of this excerpt from Herman Rubin's post to sci.stat.edu back in 1999: --- start of excerpt --- It cannot be overemphasized that normality is NOT necessary for the validity of regression. What has just been said is what is most important, that the disturbances (actual deviations from the "true" regression expression) must be uncorrelated with the "explanatory" variables. The precise probabilities of various tests do depend on normality, some more than others. But regression has rather good robustness, by which I mean that the properties of the procedure do not depend much on those assumptions which one does not wish (or need) to make. --- end of excerpt --- The full thread where Herman posted that can be seen here: https://groups.google.com/forum/#!msg/sci.stat.edu/KeXbqt-5Zuk/z5gjEvMVWnQJ HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Salbod
I question the need for complication of using GLM.
Skewness of 1.447? I'm sorry, but that seems too extreme to me, considering the scales and measures that I have dealt with. So, I wonder if it really is a proper measure, with equal intervals, of something as fuzzy as "parenting style" for a single society. Does one point less (or more) really mean the same thing at the top of the scale as it does at the bottom? - If it does not start out with equal intervals, it is not very likely to end up with equal intervals for the error. (Tukey's textbook provides a rule of thumb, which I don't recall for sure. I think it was, If the largest value of the outcome is 10 times the smallest, you probably want to transform the scores; if it is 20 times, you almost surely do.) If points on the scale don't make good, equal-interval sense at the start, then you won't get an equation that you can discuss intelligently, either. Given large positive skew, the first conventional guess is, "Take the log." -- Rich Ulrich Date: Wed, 2 Oct 2013 18:10:08 +0000 From: [hidden email] Subject: Positively skewed outcome To: [hidden email] My question is What is the best generalized linear model to use with my dataset? I’m working on a regression analysis, with four predictors that has a positively skewed outcome variable (Skewness=1.447, SE = .241; Kutosis=2.870, SE=.478). This is not a count variable but a measure of parenting style. When I went to Generalized Linear Models I got confused by the choices. I wish there was some documentation about these models with examples. Since that not the case, I thought this is the best place to which model is appropriate for a non-count positively skewed outcome.
TIA. Stephen Salbod, Pace University, NYC
|
In reply to this post by Maguin, Eugene
I ran the analysis using OLS regression. A plot of the residuals was positively skewed. I'm going to look at transforming the outcome measure.
|
Free forum by Nabble | Edit this page |