SPSSX Discussion

Logistic Regression vs Linear Regression

Classic

List

Threaded

8 messages Options

jimjohn

Logistic Regression vs Linear Regression

I just want to confirm: If I have many different independent variables, and i want to find the best regression model from these variables. Once I find the variables that make the best multiple linear regression model, would those same variables make the best logistic regression model? Or, could there be a case where a different group of variables make the best logistic regression model. Thanks.

Sara House

Re: Logistic Regression vs Linear Regression

jimjohn says:

I just want to confirm: If I have many different independent variables, and i
want to find the best regression model from these variables. Once I find the
variables that make the best multiple linear regression model, would those
same variables make the best logistic regression model? Or, could there be a
case where a different group of variables make the best logistic regression
model. Thanks.

It depends on what your outcome variables are. If your continuous outcome variable (for the linear reg.) and categorical outcome variable (for the log. reg.) are highly related, then you could use the same predictors, though it seems that your outcomes should be tapping different concepts so you may want to use two different sets of variables. It is true that either regression will work with the same types of predictors (so both regressions can accept dummy variables, interaction terms and so on), but what variables work best in a model should be derived from theory and empirical evidence. What concepts are your outcomes measuring and what, according to theory and past research, predicts these outcomes?

Sara

Sara M. House, M.A.
Adjunct Faculty
Loyola University Chicago, Psychology Department
Email: [hidden email]
Teaching: Research Methods, Psychology & Law

AND

Data Analyst
Chicago Public Schools, Department of Program Evaluation
Email: [hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

jimjohn

Re: Logistic Regression vs Linear Regression

Thanks Sara. To answer your question, the variable I am trying to predict is funding ratio. In mortgage commitments, the customer is locked in to a rate up to 4 months in advance. Once the commitment period is up, they have the option to fund the mortgage or to cancel. The funding ratio is the percentage that will fund their mortgage. This is the first time this research is being conducted, but I have been given many different variables that should theoretically have some effect on the funding ratio (e.x. expected future interest rates, percentage of customers switching over to variable mortgages, difference in two types of rates, etc.). I have already tried this out with multiple linear regression and tried to come up with good models. However, since the ratio is only between 0 and 1, as suggested here, I'm going to try and transform my dependent variable (ln p / 1-p), and then run linear regression on that. Then, i would compare the two models and see which one provides a better fit. For the first branch, the same variables that were predictors for the normal linear regression model are the predictors for the transformed regression model. I have to do this analysis for many different branches, regions, etc. so I'm just wondering if I need to go on and find the best model again in each case for the transformed variable. or if i can just look at the same variables, and use the new regression equation. thanks!

Sara House wrote

jimjohn says:

I just want to confirm: If I have many different independent variables, and i
want to find the best regression model from these variables. Once I find the
variables that make the best multiple linear regression model, would those
same variables make the best logistic regression model? Or, could there be a
case where a different group of variables make the best logistic regression
model. Thanks.

It depends on what your outcome variables are. If your continuous outcome variable (for the linear reg.) and categorical outcome variable (for the log. reg.) are highly related, then you could use the same predictors, though it seems that your outcomes should be tapping different concepts so you may want to use two different sets of variables. It is true that either regression will work with the same types of predictors (so both regressions can accept dummy variables, interaction terms and so on), but what variables work best in a model should be derived from theory and empirical evidence. What concepts are your outcomes measuring and what, according to theory and past research, predicts these outcomes?

Sara

Sara M. House, M.A.
Adjunct Faculty
Loyola University Chicago, Psychology Department
Email: shouse1@luc.edu
Teaching: Research Methods, Psychology & Law

AND

Data Analyst
Chicago Public Schools, Department of Program Evaluation
Email: shouse@cps.k12.il.us

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Richard Ristow

Re: Logistic Regression vs Linear Regression

In reply to this post by jimjohn

At 02:23 PM 7/2/2008, jimjohn wrote:

>If I have many different independent variables, and i want to find
>the best regression model from these variables. Once I find the
>variables that make the best multiple linear regression model, would
>those same variables make the best logistic regression model?

Absolutely no reason why they models would have the same 'best set'.
Why would you think so? Is the dependent variable for the logistic
regression any close relation to the DV for the linear regression?

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

jimjohn

Re: Logistic Regression vs Linear Regression

Thanks Richard. I was just thinking that since my new dependent variable is just a transformation of the old one, that the same variables that affect the old one would affect the transformed one. But I guess I should run the regressions again, just in case different variables affect my transformed dependent variable better?

Richard Ristow wrote

At 02:23 PM 7/2/2008, jimjohn wrote:

>If I have many different independent variables, and i want to find
>the best regression model from these variables. Once I find the
>variables that make the best multiple linear regression model, would
>those same variables make the best logistic regression model?

Absolutely no reason why they models would have the same 'best set'.
Why would you think so? Is the dependent variable for the logistic
regression any close relation to the DV for the linear regression?

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Richard Ristow

Re: Logistic Regression vs Linear Regression

At 12:57 PM 7/3/2008, jimjohn wrote:

>I was just thinking that since my new [dichotomous?] dependent
>variable is just a transformation of the old one, that the same
>variables that affect the old one would affect the transformed one.

That opens another area for discussion: when it is, and when it is
not, advisable to dichotomize (or categorize) a continuous variable.

It's been discussed on this and other lists. I'd like to invite list
members to respond with general advice, or particular questions. I'm
not going to; I'm far from the best person to start this discussion.

>I guess I should run the regressions again, just in case different
>variables affect my transformed dependent variable better?

I would. Among other things, non-linearities in the effects could
mean the dichotomized variable is affected differently -- and if you
don't think there may be non-linearities, the logistic regression
doesn't make much sense.

By the way, you've written as if you're working with a large set of
independent variables, with a good deal of multi-collinearity. Good
luck. You'll get sound advice on this list (and elsewhere) against
selecting a subset of independent variables based on experience with
the data. Collinearity makes the process not only dubious, but
unstable. I think you've already been advised about factor analysis
and other prior dimension-reduction techniques.

Good luck to you!
Richard

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Bob Schacht-3

Re: Logistic Regression vs Linear Regression

At 08:35 AM 7/3/2008, Richard Ristow wrote:

>At 12:57 PM 7/3/2008, jimjohn wrote:
>
>>I was just thinking that since my new [dichotomous?] dependent
>>variable is just a transformation of the old one, that the same
>>variables that affect the old one would affect the transformed one.
>
>That opens another area for discussion: when it is, and when it is
>not, advisable to dichotomize (or categorize) a continuous variable.
>
>It's been discussed on this and other lists. I'd like to invite list
>members to respond with general advice, or particular questions. I'm
>not going to; I'm far from the best person to start this discussion.

Richard,
Your question is multiplied because the original question is not simply
about *a variable* but about *a relationship between variables*.
In that context, it matters a great deal whether one has in mind to alter
both variables, or only one to match the other.

If the variables are of a different type to begin with, it might be a
better idea to switch to a mode of analysis, such as analysis of variance,
which is constructed for mixed variables in the first place.

>>I guess I should run the regressions again, just in case different
>>variables affect my transformed dependent variable better?
>
>I would. Among other things, non-linearities in the effects could
>mean the dichotomized variable is affected differently -- and if you
>don't think there may be non-linearities, the logistic regression
>doesn't make much sense.

My first caveat is that reducing a variable's measurement level (e.g. from
ratio to interval or categorical) always involves throwing information
away, and that sounds bad.

On the other hand, if the fundamental reason for the variable reduction is
that one had been attempting to do a kind of analysis that assumes variable
characteristics that were not valid, then what one is doing, in effect, is
switching from a powerful analysis based on inappropriate assumptions, to a
less powerful analysis based on appropriate assumptions. This seems to be a
good reason for doing so.

Those are my initial thoughts to your excellent questions.

Bob Schacht

Robert M. Schacht, Ph.D. <[hidden email]>
Pacific Basin Rehabilitation Research & Training Center
1268 Young Street, Suite #204
Research Center, University of Hawaii
Honolulu, HI 96814

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

jimjohn

Re: Logistic Regression vs Linear Regression

Thanks guys for all the help! Although my dependent variable is a ratio between 0 and 1, some of my independent variables are also ratios/percentages, while others are continuous variables that can take on any value. I planned to do a logit transformation on my dependent variable and then run linear regression on that, because I thought otherwise, some possible combinations of my independent variables could result in a predicted value for my DV that is outside of the ratio (1,0). Also, because I heard that otherwise, without the logit transformation, changes in my independent variables could result in changes to my predicted DV that are higher or smaller than they should be.

Just wondering, there are some cases where the only high predictors are ratios between 0 and 1, and i guess in those cases, I do not need to conduct a logit transformation? (since the intervals of my IV's match the interval of my DV). Do you guys agree with this?

Also, I am seeing lots of multicollinearity, but since I get high Adusted R^2's with only 2-3 uncorrelated variables, I am probably going to leave out a lot of the other highly correlated variables. Any suggestions or thoughts?
Thanks!

Bob Schacht-3 wrote

At 08:35 AM 7/3/2008, Richard Ristow wrote:
>At 12:57 PM 7/3/2008, jimjohn wrote:
>
>>I was just thinking that since my new [dichotomous?] dependent
>>variable is just a transformation of the old one, that the same
>>variables that affect the old one would affect the transformed one.
>
>That opens another area for discussion: when it is, and when it is
>not, advisable to dichotomize (or categorize) a continuous variable.
>
>It's been discussed on this and other lists. I'd like to invite list
>members to respond with general advice, or particular questions. I'm
>not going to; I'm far from the best person to start this discussion.

Richard,
Your question is multiplied because the original question is not simply
about *a variable* but about *a relationship between variables*.
In that context, it matters a great deal whether one has in mind to alter
both variables, or only one to match the other.

If the variables are of a different type to begin with, it might be a
better idea to switch to a mode of analysis, such as analysis of variance,
which is constructed for mixed variables in the first place.

>>I guess I should run the regressions again, just in case different
>>variables affect my transformed dependent variable better?
>
>I would. Among other things, non-linearities in the effects could
>mean the dichotomized variable is affected differently -- and if you
>don't think there may be non-linearities, the logistic regression
>doesn't make much sense.

My first caveat is that reducing a variable's measurement level (e.g. from
ratio to interval or categorical) always involves throwing information
away, and that sounds bad.

On the other hand, if the fundamental reason for the variable reduction is
that one had been attempting to do a kind of analysis that assumes variable
characteristics that were not valid, then what one is doing, in effect, is
switching from a powerful analysis based on inappropriate assumptions, to a
less powerful analysis based on appropriate assumptions. This seems to be a
good reason for doing so.

Those are my initial thoughts to your excellent questions.

Bob Schacht

Robert M. Schacht, Ph.D. <schacht@hawaii.edu>
Pacific Basin Rehabilitation Research & Training Center
1268 Young Street, Suite #204
Research Center, University of Hawaii
Honolulu, HI 96814

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD