SPSSX Discussion

Linearity assumption in Multiple Regression Analysis

Classic

List

Threaded

4 messages Options

E. Bernardo

Linearity assumption in Multiple Regression Analysis

I am confused of the linearity assumption in Multiple Linear Regression. I dont know which of the two is correct?

1) Is linearity would mean that EACH independent variable in the model should have linear relationship with the dependent? And, therefore, the linear relationship between each of the independent and dependent should be tested statistically.

2) Is linearity would mean that the PREDICTED DEPENDENT and the OBSERVED DEPENDENT should have linear relationship? And, therefore, in testing the linearity one should just use the R-square and check if the regression ANOVA is significant?

Any comment is welcome.

Bruce Weaver

Re: Linearity assumption in Multiple Regression Analysis

Administrator

This post was updated on .

Unfortunately, there is not a lot of consistency in how authors talk about "linearity" in linear regression. Some describe a regression model as linear only if the functional relationship between X and Y is linear. For these folks, a model including both X and X-squared as predictors (to account for a U-shaped functional relationship) would likely be described as a polynomial regression.

But others emphasize that OLS linear regression models are "linear in the coefficients". So even if the functional relationship is not linear--e.g., X and X-squared as predictors--it is still properly described as a linear regression model (provided it's linear in the coefficients).

The following web-page provides an interesting example:

https://onlinecourses.science.psu.edu/stat501/node/235

The page title is "Polynomial Regression". But notice what the author says in the second paragraph:

"As for a bit of semantics, it was noted at the beginning of the previous course how nonlinear regression (which we discuss later) refers to the nonlinear behavior of the coefficients, which are linear in polynomial regression. Thus, polynomial regression is still considered linear regression!"

Having said all that, I think what you're concerned about is the possibility of non-linear functional relationships. One good way to check for that is by looking at residual plots. When you run your model, save the residuals. (Several types of residuals are available, and you may want to look at more than just the raw residuals. See the Help for details.) Then make scatter-plots with Y = residual, and X = fitted value of Y, or perhaps X = an explanatory variable of particular interest. Do a Google search on <residual plots> or <analysis of residuals> etc to find more info. See also this page by the same author cited above:

https://onlinecourses.science.psu.edu/stat501/node/157

HTH.

E. Bernardo wrote

I am confused of the linearity assumption in Multiple Linear Regression. I dont know which of the two is correct?

1) Is linearity would mean that EACH independent variable in the model should have linear relationship with the dependent? And, therefore, the linear relationship between each of the independent and dependent should be tested statistically.

2) Is linearity would mean that the PREDICTED DEPENDENT and the OBSERVED DEPENDENT should have linear relationship? And, therefore, in testing the linearity one should just use the R-square and check if the regression ANOVA is significant?

Any comment is welcome.

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Yang, Hongwei

Re: Linearity assumption in Multiple Regression Analysis

I quite agree with Bruce on this.

That a linear model is linear is due to the fact that it is linear in each and every coefficient/model parameter. In other words, if you take the first partial derivative of the DV with respect to each coefficient/model parameter, you obtain a quantity that is constant with respect to the coefficient/model parameter. This even holds when the parameter is that for a quadratic/cubic term which models a nonlinear relationship between the DV and that predictor. Simply put, a linear model can model a nonlinear relationship between the DV and a predictor when the linearity assumption still holds.

Hongwei Yang, University of Kentucky

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Bruce Weaver
Sent: Wednesday, April 16, 2014 9:27 AM
To: [hidden email]
Subject: Re: Linearity assumption in Multiple Regression Analysis

Unfortunately, there is not a lot of consistency in how authors talk about "linearity" in linear regression. Some describe a regression model as linear only if the functional relationship between X and Y is linear. For these folks, a model including both X and X-squared as predictors (to account for a U-shaped functional relationship) would likely be described as a polynomial regression.

But others emphasize that OLS linear regression models are "linear in the coefficients". So even if the functional relationship is not linear--e.g., X and X-squared as predictors--it is still properly described as a linear regression model (provided it's linear in the coefficients).

The following web-page provides an interesting example:

https://onlinecourses.science.psu.edu/stat501/node/235

The page title is "Polynomial Regression". But notice what the author says in the second paragraph:

"As for a bit of semantics, it was noted at the beginning of the previous course how nonlinear regression (which we discuss later) refers to the nonlinear behavior of the coefficients, which are linear in polynomial regression. Thus, polynomial regression is still considered linear regression!"

Having said all that, I think what you're concerned about is the possibility of non-linear functional relationships. One good way to check for that is by looking at residual plots. When you run your model, save the residuals.
(Several types of residuals are available, and you may want to look at more than just the raw residuals. See the Help for details.) Then make scatter-plots with Y = residual, and X = fitted value of Y, or perhaps X = an explanatory variable of particular interest. Do a Google search on <residual plots> or <analysis of residuals> etc to find more info.

HTH.

E. Bernardo wrote

> I am confused of the linearity assumption in Multiple Linear
> Regression. I dont know which of the two is correct?
>
> 1) Is linearity would mean that EACH independent variable in the model
> should have linear relationship with the dependent? And, therefore,
> the linear relationship between each of the independent and dependent
> should be tested statistically.
>
> 2) Is linearity would mean that the PREDICTED DEPENDENT and the
> OBSERVED DEPENDENT should have linear relationship? And, therefore, in
> testing the linearity one should just use the R-square and check if
> the regression ANOVA is significant?
>
> Any comment is welcome.

-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Linearity-assumption-in-Multiple-Regression-Analysis-tp5725473p5725482.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Rich Ulrich

Re: Linearity assumption in Multiple Regression Analysis

In reply to this post by E. Bernardo

I guess that I think of "linearity" in *3* different contexts in
Multiple Linear Regression. There is the definitional part in
the name, MLR; there is the underlying assumption, which is
mis-used in your point (2); and there is the "generally good
idea" which is mentioned in your point (1).

As to the name: Bruce does a good job in summarizing. MLR includes
polynomial regression.

As to point (2): The first thing to say is that, Yes, the overall test is a
test for whether there is some linear relationship. However, the discussion
of the "linearity assumption" is usually concerned with the absence of
*nonlinearity* and the choice of the correct model, after assuming or
knowing that there is some relationship to start with.

More on point (2): I would probably derive this or discuss this assumption
of linearity under the topic of "homogeneity of variance of the regression
residuals". Looking at residuals is useful, as Bruce also prescribes. If the
observed do not have a symmetric distribution around the regression line,
with similar variances for the whole line, it is at least true that the F-tests
are not necessarily good since the "effective degrees of freedom" is not as
large as the test-d.f. (You might see this, say, using two log-normal variables
when you didn't take the logs of them - linear fit with unequal residuals.)

If the general fit is not linear, then that implies that the residuals do not
everywhere have a zero mean. Again, the test is bad; this time, there is
sometimes a "fix" available through simple transformation. Contrary to what
"high R-squared" aspect of point (2), nonlinearity of fit is often easiest to spot
when there is a very good fit with a very high R-square.

As to point (1): If you are not using polynomial regression with multiple
transformations of some (independent variables) X, then, yes, it is generally
nicer (because it will give results that are more robust) if you use a scaling
of the X's that are each linear with the dependent Y. But that can be less
important than keeping the model sensible, so someone might insist on keeping
terms as "dollars" (for example) instead of transforming.

--
Rich Ulrich

Date: Wed, 16 Apr 2014 17:21:46 +0800
From: [hidden email]
Subject: Linearity assumption in Multiple Regression Analysis
To: [hidden email]

I am confused of the linearity assumption in Multiple Linear Regression. I dont know which of the two is correct?

Any comment is welcome.