SPSSX Discussion

Positively skewed outcome

Classic

List

Threaded

6 messages Options

Salbod

Positively skewed outcome

My question is What is the best generalized linear model to use with my dataset? I’m working on a regression analysis, with four predictors that has a positively skewed outcome variable (Skewness=1.447, SE = .241; Kutosis=2.870, SE=.478). This is not a count variable but a measure of parenting style. When I went to Generalized Linear Models I got confused by the choices. I wish there was some documentation about these models with examples. Since that not the case, I thought this is the best place to which model is appropriate for a non-count positively skewed outcome.

TIA.

Stephen Salbod, Pace University, NYC

Maguin, Eugene

Re: Positively skewed outcome

I assume the DV is the sum/mean of items so it is kind of continuous within the scale range. Perhaps you’ve done this already, but I think normal distribution regression (aka Regression) would be most appropriate. Look at residuals and plots of the usual variables for problems. As others have pointed out least squares regression assumes a normal distribution for the residuals. Maximum likelihood, which GenLin uses, assumes multivariate normality for variables, IVs and DVs, a much higher standard.

Gene Maguin

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Salbod, Mr. Stephen
Sent: Wednesday, October 02, 2013 2:10 PM
To: [hidden email]
Subject: Positively skewed outcome

TIA.

Stephen Salbod, Pace University, NYC

Bruce Weaver

Re: Positively skewed outcome

Administrator

In reply to this post by Salbod

As Gene said in his reply, the assumptions for OLS linear regression have to do with the errors, not the outcome variable. So I agree that the first step is to run the OLS regression model and look at residual plots. The errors are assumed to be i.i.d. N(0,sigma^2) -- i.e., independently and identically distributed as Normal with a mean of 0 and some variance. But as George Box reminded us in one of his famous articles, nothing in nature is truly normally distributed. So you know off the top that the errors are not truly normal--i.e., the tests associated with your model are approximate tests, not exact tests. And as always with approximate tests, the important question is whether the approximation is good enough to be useful. (You can probably think of another famous George Box quote about models being useful.)

So how important is normality of the errors? Not very. Independence is arguably the most important assumption, with homoscedasticity (i.e., identically distributed) after that. I am reminded of this excerpt from Herman Rubin's post to sci.stat.edu back in 1999:

--- start of excerpt ---
It cannot be overemphasized that normality is NOT necessary for the
validity of regression. What has just been said is what is most
important, that the disturbances (actual deviations from the "true"
regression expression) must be uncorrelated with the "explanatory"
variables.

The precise probabilities of various tests do depend on normality,
some more than others. But regression has rather good robustness,
by which I mean that the properties of the procedure do not depend
much on those assumptions which one does not wish (or need) to make.
--- end of excerpt ---

The full thread where Herman posted that can be seen here:

https://groups.google.com/forum/#!msg/sci.stat.edu/KeXbqt-5Zuk/z5gjEvMVWnQJ

HTH.

Salbod wrote

My question is What is the best generalized linear model to use with my dataset? I'm working on a regression analysis, with four predictors that has a positively skewed outcome variable (Skewness=1.447, SE = .241; Kutosis=2.870, SE=.478). This is not a count variable but a measure of parenting style. When I went to Generalized Linear Models I got confused by the choices. I wish there was some documentation about these models with examples. Since that not the case, I thought this is the best place to which model is appropriate for a non-count positively skewed outcome.

TIA.
Stephen Salbod, Pace University, NYC

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Rich Ulrich

Re: Positively skewed outcome

In reply to this post by Salbod

I question the need for complication of using GLM.

Skewness of 1.447? I'm sorry, but that seems too extreme to me,
considering the scales and measures that I have dealt with. So, I
wonder if it really is a proper measure, with equal intervals, of
something as fuzzy as "parenting style" for a single society. Does
one point less (or more) really mean the same thing at the top of
the scale as it does at the bottom?

- If it does not start out with equal intervals, it is not very likely to
end up with equal intervals for the error. (Tukey's textbook provides
a rule of thumb, which I don't recall for sure. I think it was, If the
largest value of the outcome is 10 times the smallest, you probably
want to transform the scores; if it is 20 times, you almost surely do.)

If points on the scale don't make good, equal-interval sense at the
start, then you won't get an equation that you can discuss intelligently,
either.

Given large positive skew, the first conventional guess is, "Take the log."

--
Rich Ulrich

Date: Wed, 2 Oct 2013 18:10:08 +0000
From: [hidden email]
Subject: Positively skewed outcome
To: [hidden email]

TIA.

Stephen Salbod, Pace University, NYC

Salbod

Re: Positively skewed outcome

In reply to this post by Maguin, Eugene

I ran the analysis using OLS regression. A plot of the residuals was positively skewed. I'm going to look at transforming the outcome measure.

Salbod

Re: Positively skewed outcome

In reply to this post by Rich Ulrich

I did take the log, thank you, and got the same findings as I did with OLS regression. I'm going to report the OLS regression and footnote the log findings.