output from linear regression

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

output from linear regression

Leon

Hi,

 

i have some troubles with understanding of output from multivariate linear regression…

 

As predictors there are some 25 variables and on the other side is dependent variable (from medical research),

which represents ‘quality of life” (between 0 and 100 points, more points implies more quality of life after operation).

I have chosen backward procedure, so after n steps remained only some medical predictors with significant influence….

 

Now the problem: one have some predictors which should have obviously negative influence on my dependent variable, which is ‘quality of life’,

such predictors for example are ‘surgery complication’ (0: no, 1 yes) OR ‘tumor length’ have indeed significant positive one,

like Beta = .253, p < .000 for ‘tumor length’. It can’t be logical that people with big tumors have significantly better ‘quality of life’ after operation nor

with more surgical complications….(one should see instead Beta = - .253 p < .000).

On the other side another predictor - variables gained negative or positive significant influence, which could be logically well explained.

 

Could it be that by processing linear regression with backward procedure are some intern steps,

which makes signs (if it is plus then empty, minus as ‘-‘) for Beta – Values in an output irrelevant?

 

How else could this be explained?

 

Thanks,

Leon

 

 

 

Reply | Threaded
Open this post in threaded view
|

Re: output from linear regression

Muir Houston-3

First problem is your reliance on an entry method – stepwise procedures are frowned upon due to a number of issues – mainly they are divorced from theory and existing research

 

Copied from  http://www.stata.com/support/faqs/stat/stepwise.html

Here are some of the problems with stepwise variable selection.

  1. It yields R-squared values that are badly biased to be high.
  2. The F and chi-squared tests quoted next to each variable on the printout do not have the claimed distribution.
  3. The method yields confidence intervals for effects and predicted values that are falsely narrow; see Altman and Andersen (1989).
  4. It yields p-values that do not have the proper meaning, and the proper correction for them is a difficult problem.
  5. It gives biased regression coefficients that need shrinkage (the coefficients for remaining variables are too large; see Tibshirani [1996]).
  6. It has severe problems in the presence of collinearity.
  7. It is based on methods (e.g., F tests for nested models) that were intended to be used to test prespecified hypotheses.
  8. Increasing the sample size does not help very much; see Derksen and Keselman (1992).
  9. It allows us to not think about the problem.
  10. It uses a lot of paper.

“All possible subsets” regression solves none of these problems.

Hope this helps

Muir

 

 

Muir Houston, HNC, BA (Hons), M.Phil., PhD, FHEA

Social Justice, Place and Lifelong Education Research

School of Education

University of Glasgow

0044+141-330-4699

 

R3L+ Project - Adult education in the light of the European Quality Strategy

http://www.learning-regions.net/

 

GINCO Project - Grundtvig International Network of Course Organisers

http://www.ginconet.eu/

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Leon Galushko
Sent: 21 September 2011 13:31
To: [hidden email]
Subject: output from linear regression

 

Hi,

 

i have some troubles with understanding of output from multivariate linear regression…

 

As predictors there are some 25 variables and on the other side is dependent variable (from medical research),

which represents ‘quality of life” (between 0 and 100 points, more points implies more quality of life after operation).

I have chosen backward procedure, so after n steps remained only some medical predictors with significant influence….

 

Now the problem: one have some predictors which should have obviously negative influence on my dependent variable, which is ‘quality of life’,

such predictors for example are ‘surgery complication’ (0: no, 1 yes) OR ‘tumor length’ have indeed significant positive one,

like Beta = .253, p < .000 for ‘tumor length’. It can’t be logical that people with big tumors have significantly better ‘quality of life’ after operation nor

with more surgical complications….(one should see instead Beta = - .253 p < .000).

On the other side another predictor - variables gained negative or positive significant influence, which could be logically well explained.

 

Could it be that by processing linear regression with backward procedure are some intern steps,

which makes signs (if it is plus then empty, minus as ‘-‘) for Beta – Values in an output irrelevant?

 

How else could this be explained?

 

Thanks,

Leon

 

 

 

Reply | Threaded
Open this post in threaded view
|

Factor analysis for Dichotomous items

E. Bernardo
In reply to this post by Leon
I have 12 dichotomous items to be subjected to factor analysis. The goal is to reduce the items into a few and manageable number of factors then compute the scores for each factor.  What extraction method is appropriate? I appreciate any help you may extend. 

Thanks.

Eins
Reply | Threaded
Open this post in threaded view
|

Re: output from linear regression

Maguin, Eugene
In reply to this post by Leon
Leon,
 
Back up a bit. How do the correlations look? Are the signs what you expect them to be? Are the magnitudes what you might expect given other people's work?
 
Backward deletion or forward entry is not well thought of because the entry is driven by a combination of the true population correlation, samping error and variables remaining to be entered. In a word: a-theoretical. But, you chose it, so use the results. Is the sign of the target variables always in the unexpected direction or does it change direction as variables are taken out? What do you observe? By the way, how about collinearity?
 
It may be that you have suppression issues. Here are some articles to look at from Cam McIntosh on the semnet list.
 
Paulhus, D.L, Robins, R.W., Trzesniewski, K.H., & Tracy, J.L. (2004). Two replicable suppressor situations in personality research. Multivariate Behavioral Research, 39, 303-328.

Nickerson, C. (2008). Mutual suppression: comment on Paulhus et al. (2004). Multivariate Behavioral Research, 43, 556-563.

Gwowen, S. (2006). Suppression situations in multiple linear regression. Educational and Psychological Measurement, 66(3), 435-447.

Maassen, G.H., & Bakker, A.B. (2001). Suppressor variables in path models: definitions and interpretations. Sociological Methods & Research, 30(2), 241-270.

MacKinnon, D. P., Krull, J. L., & Lockwood, C. M. (2000). Equivalence of the mediation, confounding and suppression effect. Prevention Science, 1, 173-181.
 
Smith, R.L., Ager, J.W. (Jr)., & Williams, D.L. (1992). Suppressor variables in multiple regression/correlation. Educational and Psychological Measurement, 52(1), 17-29.

Gene Maguin


From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Leon Galushko
Sent: Wednesday, September 21, 2011 8:31 AM
To: [hidden email]
Subject: output from linear regression

Hi,

 

i have some troubles with understanding of output from multivariate linear regression…

 

As predictors there are some 25 variables and on the other side is dependent variable (from medical research),

which represents ‘quality of life” (between 0 and 100 points, more points implies more quality of life after operation).

I have chosen backward procedure, so after n steps remained only some medical predictors with significant influence….

 

Now the problem: one have some predictors which should have obviously negative influence on my dependent variable, which is ‘quality of life’,

such predictors for example are ‘surgery complication’ (0: no, 1 yes) OR ‘tumor length’ have indeed significant positive one,

like Beta = .253, p < .000 for ‘tumor length’. It can’t be logical that people with big tumors have significantly better ‘quality of life’ after operation nor

with more surgical complications….(one should see instead Beta = - .253 p < .000).

On the other side another predictor - variables gained negative or positive significant influence, which could be logically well explained.

 

Could it be that by processing linear regression with backward procedure are some intern steps,

which makes signs (if it is plus then empty, minus as ‘-‘) for Beta – Values in an output irrelevant?

 

How else could this be explained?

 

Thanks,

Leon

 

 

 

Reply | Threaded
Open this post in threaded view
|

Re: output from linear regression

Rich Ulrich
In reply to this post by Leon
I've seen excellent advice in the two Replies so far.
Please, do read some of the literature on "suppressor variables".

In addition -- For the data on hand, consider which variables might
be acting to "suppress" the contribution of "Complications" (in this
instance, to actually reverse it).  This can also be considered under
"confounding".  If you find the source of confounding, you should
next try to re-score your predictor variables so that you do directly
measure the predictive influence of the logically-combined variables.

For instance -- If you were also using something like "Length
of Hospitalization" as a predictor, it could be that the people
who have the worst QOL  on followup are the ones who had a
long hospital stay and did *not*  have Complications that readily
explained it.  Therefore, Complications enters with a minus sign.
By logical analysis, you might be able to break that Length of stay
into parts:  (a) Expected (minimum), (b) Extra days, due to surgical
complications, and (c) Extra, due to non-surgical complications. 

Also:  The effect of (c) might be non-linear, such that having one,
two or three days could be increasingly bad, but having seven days
is not much worse than having three.  - This sort of measurement
non-linearity is another source of apparent confounding, which should
be covered in some of the literature.

--
Rich Ulrich


Date: Wed, 21 Sep 2011 14:30:53 +0200
From: [hidden email]
Subject: output from linear regression
To: [hidden email]

Hi,

 

i have some troubles with understanding of output from multivariate linear regression…

 

As predictors there are some 25 variables and on the other side is dependent variable (from medical research),

which represents ‘quality of life” (between 0 and 100 points, more points implies more quality of life after operation).

I have chosen backward procedure, so after n steps remained only some medical predictors with significant influence….

 

Now the problem: one have some predictors which should have obviously negative influence on my dependent variable, which is ‘quality of life’,

such predictors for example are ‘surgery complication’ (0: no, 1 yes) OR ‘tumor length’ have indeed significant positive one,

like Beta = .253, p < .000 for ‘tumor length’. It can’t be logical that people with big tumors have significantly better ‘quality of life’ after operation nor

with more surgical complications….(one should see instead Beta = - .253 p < .000).

On the other side another predictor - variables gained negative or positive significant influence, which could be logically well explained.

 

Could it be that by processing linear regression with backward procedure are some intern steps,

which makes signs (if it is plus then empty, minus as ‘-‘) for Beta – Values in an output irrelevant?

 

How else could this be explained?

 

Thanks,

Leon


Reply | Threaded
Open this post in threaded view
|

Re: output from linear regression

Bruce Weaver
Administrator
In reply to this post by Leon
In addition to what others have said about stepwise selection methods, take a look at the Vanderbilt Biostats Dept Manuscript Checklist, available here:

   http://biostat.mc.vanderbilt.edu/wiki/Main/ManuscriptChecklist

Notice especially the "Multivariable Modeling Problems" section.  Mike Babyak's article on overfitting is also very good (and readable).  Watch for what he says about phantom degrees of freedom.

   http://os1.amc.nl/mediawiki/images/Babyak_-_overfitting.pdf

Regarding signs on coefficients not being as expected, I was reminded of Chapter 13 (Woes of Regression Coefficients) in the classic regression book "Data analysis and regression: A second course in statistics", by Mosteller & Tukey (1977).  (It is known informally as "the green book".)  Here is an excerpt (starting on p. 301):

--- start of excerpt ---
...in real-world problems, we do ransack the variables, keeping some and throwing others away. In the end we will have a strong urge to interpret the chosen coefficients in a physical manner, such as "If we
change x1 by a given amount, we will change y by a certain amount, and therefore public policy should be...". Our [earlier] example shows that even in a deterministic system [their example had no error],
different subsets of the variables used for regression can give substantially different coefficients for the same variable. Indeed, even the sign can be reversed from one set to another. We need then to speak of the coefficient of x1

* when x2 and x3 are also offered,
* when x2 is also offered,
* when x3 is also offered,
* when nothing else is offered,

and appreciate that these four ordinarily give substantially different results.

--- end of excerpt ---

HTH.


leon Galushko wrote
Hi,



i have some troubles with understanding of output from multivariate linear
regression.



As predictors there are some 25 variables and on the other side is dependent
variable (from medical research),

which represents 'quality of life" (between 0 and 100 points, more points
implies more quality of life after operation).

I have chosen backward procedure, so after n steps remained only some
medical predictors with significant influence..



Now the problem: one have some predictors which should have obviously
negative influence on my dependent variable, which is 'quality of life',

such predictors for example are 'surgery complication' (0: no, 1 yes) OR
'tumor length' have indeed significant positive one,

like Beta = .253, p < .000 for 'tumor length'. It can't be logical that
people with big tumors have significantly better 'quality of life' after
operation nor

with more surgical complications..(one should see instead Beta = - .253 p <
.000).

On the other side another predictor - variables gained negative or positive
significant influence, which could be logically well explained.



Could it be that by processing linear regression with backward procedure are
some intern steps,

which makes signs (if it is plus then empty, minus as '-') for Beta - Values
in an output irrelevant?



How else could this be explained?



Thanks,

Leon
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: output from linear regression

Swank, Paul R
I would like to emphasize something that Bruce included below. When you have multiple independent variables, it is a multivariable analysis, not multivariate. Multivariate means multiple dependent variables.

Dr. Paul R. Swank,
Children's Learning Institute
Professor, Department of Pediatrics, Medical School
Adjunct Professor, School of Public Health
University of Texas Health Science Center-Houston


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Bruce Weaver
Sent: Wednesday, September 21, 2011 2:11 PM
To: [hidden email]
Subject: Re: output from linear regression

In addition to what others have said about stepwise selection methods, take a
look at the Vanderbilt Biostats Dept Manuscript Checklist, available here:

   http://biostat.mc.vanderbilt.edu/wiki/Main/ManuscriptChecklist

Notice especially the "Multivariable Modeling Problems" section.  Mike
Babyak's article on overfitting is also very good (and readable).  Watch for
what he says about phantom degrees of freedom.

   http://os1.amc.nl/mediawiki/images/Babyak_-_overfitting.pdf

Regarding signs on coefficients not being as expected, I was reminded of
Chapter 13 (Woes of Regression Coefficients) in the classic regression book
"Data analysis and regression: A second course in statistics", by Mosteller
& Tukey (1977).  (It is known informally as "the green book".)  Here is an
excerpt (starting on p. 301):

--- start of excerpt ---
...in real-world problems, we do ransack the variables, keeping some and
throwing others away. In the end we will have a strong urge to interpret the
chosen coefficients in a physical manner, such as "If we
change x1 by a given amount, we will change y by a certain amount, and
therefore public policy should be...". Our [earlier] example shows that even
in a deterministic system [their example had no error],
different subsets of the variables used for regression can give
substantially different coefficients for the same variable. Indeed, even the
sign can be reversed from one set to another. We need then to speak of the
coefficient of x1

* when x2 and x3 are also offered,
* when x2 is also offered,
* when x3 is also offered,
* when nothing else is offered,

and appreciate that these four ordinarily give substantially different
results.

--- end of excerpt ---

HTH.



leon Galushko wrote:

>
> Hi,
>
>
>
> i have some troubles with understanding of output from multivariate linear
> regression.
>
>
>
> As predictors there are some 25 variables and on the other side is
> dependent
> variable (from medical research),
>
> which represents 'quality of life" (between 0 and 100 points, more points
> implies more quality of life after operation).
>
> I have chosen backward procedure, so after n steps remained only some
> medical predictors with significant influence..
>
>
>
> Now the problem: one have some predictors which should have obviously
> negative influence on my dependent variable, which is 'quality of life',
>
> such predictors for example are 'surgery complication' (0: no, 1 yes) OR
> 'tumor length' have indeed significant positive one,
>
> like Beta = .253, p < .000 for 'tumor length'. It can't be logical that
> people with big tumors have significantly better 'quality of life' after
> operation nor
>
> with more surgical complications..(one should see instead Beta = - .253 p
> <
> .000).
>
> On the other side another predictor - variables gained negative or
> positive
> significant influence, which could be logically well explained.
>
>
>
> Could it be that by processing linear regression with backward procedure
> are
> some intern steps,
>
> which makes signs (if it is plus then empty, minus as '-') for Beta -
> Values
> in an output irrelevant?
>
>
>
> How else could this be explained?
>
>
>
> Thanks,
>
> Leon
>


-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/output-from-linear-regression-tp4826230p4827518.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: output from linear regression

Art Kendall
In reply to this post by Leon
Others have mentioned why stepwise methods are very suspect so I won't go into that except to say that I concur with the idea that they are not advisable.

However, look up the help on FACTOR to see if some of your variables are redundant measure of something that is common.


If you cannot come up with factors, and if you have hundreds of cases. you could stick with reporting the zero=order (simple bivariate) correlations and strongly warn that the predictors are not independent and may relate in complex ways.


Art Kendall
Social Research Consultants




On 9/21/2011 8:30 AM, Leon Galushko wrote:

Hi,

 

i have some troubles with understanding of output from multivariate linear regression…

 

As predictors there are some 25 variables and on the other side is dependent variable (from medical research),

which represents ‘quality of life” (between 0 and 100 points, more points implies more quality of life after operation).

I have chosen backward procedure, so after n steps remained only some medical predictors with significant influence….

 

Now the problem: one have some predictors which should have obviously negative influence on my dependent variable, which is ‘quality of life’,

such predictors for example are ‘surgery complication’ (0: no, 1 yes) OR ‘tumor length’ have indeed significant positive one,

like Beta = .253, p < .000 for ‘tumor length’. It can’t be logical that people with big tumors have significantly better ‘quality of life’ after operation nor

with more surgical complications….(one should see instead Beta = - .253 p < .000).

On the other side another predictor - variables gained negative or positive significant influence, which could be logically well explained.

 

Could it be that by processing linear regression with backward procedure are some intern steps,

which makes signs (if it is plus then empty, minus as ‘-‘) for Beta – Values in an output irrelevant?

 

How else could this be explained?

 

Thanks,

Leon

 

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants