SPSSX Discussion

output from linear regression

Classic

List

Threaded

8 messages Options

Leon

output from linear regression

Hi,

i have some troubles with understanding of output from multivariate linear regression…

As predictors there are some 25 variables and on the other side is dependent variable (from medical research),

which represents ‘quality of life” (between 0 and 100 points, more points implies more quality of life after operation).

I have chosen backward procedure, so after n steps remained only some medical predictors with significant influence….

Now the problem: one have some predictors which should have obviously negative influence on my dependent variable, which is ‘quality of life’,

such predictors for example are ‘surgery complication’ (0: no, 1 yes) OR ‘tumor length’ have indeed significant positive one,

like Beta = .253, p < .000 for ‘tumor length’. It can’t be logical that people with big tumors have significantly better ‘quality of life’ after operation nor

with more surgical complications….(one should see instead Beta = - .253 p < .000).

On the other side another predictor - variables gained negative or positive significant influence, which could be logically well explained.

Could it be that by processing linear regression with backward procedure are some intern steps,

which makes signs (if it is plus then empty, minus as ‘-‘) for Beta – Values in an output irrelevant?

How else could this be explained?

Thanks,

Leon

Muir Houston-3

Re: output from linear regression

First problem is your reliance on an entry method – stepwise procedures are frowned upon due to a number of issues – mainly they are divorced from theory and existing research

Copied from http://www.stata.com/support/faqs/stat/stepwise.html

Here are some of the problems with stepwise variable selection.

It yields R-squared values that are badly biased to be high.
The F and chi-squared tests quoted next to each variable on the printout do not have the claimed distribution.
The method yields confidence intervals for effects and predicted values that are falsely narrow; see Altman and Andersen (1989).
It yields p-values that do not have the proper meaning, and the proper correction for them is a difficult problem.
It gives biased regression coefficients that need shrinkage (the coefficients for remaining variables are too large; see Tibshirani [1996]).
It has severe problems in the presence of collinearity.
It is based on methods (e.g., F tests for nested models) that were intended to be used to test prespecified hypotheses.
Increasing the sample size does not help very much; see Derksen and Keselman (1992).
It allows us to not think about the problem.
It uses a lot of paper.

“All possible subsets” regression solves none of these problems.

Hope this helps

Muir

Muir Houston, HNC, BA (Hons), M.Phil., PhD, FHEA

Social Justice, Place and Lifelong Education Research

School of Education

University of Glasgow

0044+141-330-4699

R3L+ Project - Adult education in the light of the European Quality Strategy

http://www.learning-regions.net/

GINCO Project - Grundtvig International Network of Course Organisers

http://www.ginconet.eu/

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Leon Galushko
Sent: 21 September 2011 13:31
To: [hidden email]
Subject: output from linear regression