Multiple Regression with a dichotmous DV

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Multiple Regression with a dichotmous DV

rob-3
Just need some clarification here.  Is it appropriate to use a dichotomous
dependent variable (ex. recidivist vs. non recidivist) in a multiple
regression model (OLS)? why/why not.  Here is another issue to this mix.
With the understanding that a dichomtous variable has very little
variance, the actual data of this dichotomous dependent variable is as
follows:  449 cases for non recidivist (98%) and 11 cases for recidivist
(2%).....total sample size 460 cases of inmates). Thanks.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple Regression with a dichotmous DV

Matthias Spörrle-2
Hi Rob,

No, this is not appropriate, see for explanation:
Cortina, J. M. (2002). Big things have small beginnings: An asssortment of “minor” methodological misunderstandings. Journal of Management, 28, 339-362. (specifically: pp.347ff)

HTH
M


On Wed, Dec 1, 2010 at 5:22 PM, Rob <[hidden email]> wrote:
> Just need some clarification here.  Is it appropriate to use a dichotomous
> dependent variable (ex. recidivist vs. non recidivist) in a multiple
> regression model (OLS)? why/why not.  Here is another issue to this mix.
> With the understanding that a dichomtous variable has very little
> variance, the actual data of this dichotomous dependent variable is as
> follows:  449 cases for non recidivist (98%) and 11 cases for recidivist
> (2%).....total sample size 460 cases of inmates). Thanks.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

Reply | Threaded
Open this post in threaded view
|

Re: Multiple Regression with a dichotmous DV

Bruce Weaver
Administrator
I agree that usually, some other form of analysis would be preferable--e.g., logistic regression.  But I'm not sure I'd issue an across the board condemnation for all situations.  I believe one of my colleagues (an epidemiologist) sometimes uses linear regression with a dichotomous outcome when he wants to model the risk difference (rather than the odds ratio, which is what logistic regression would give him).  I think I have an article on this stashed away somewhere--will look for it, and re-post if I find it.

Another issue is the proportion of observations falling into the two categories of the outcome variable.  If the proportions are not too close to 0 and 1, the model might not be too bad.

And finally, I believe that linear regression with a dichotomous outcome is analogous (if not equivalent) to a two-group discriminant function analysis.  I'm not an expert on multivariate stuff, but maybe one of the multivariate experts in the group can comment on that.

Cheers,
Bruce


Matthias Spörrle-2 wrote
Hi Rob,

No, this is not appropriate, see for explanation:
Cortina, J. M. (2002). Big things have small beginnings: An asssortment of
“minor” methodological misunderstandings. *Journal of Management, 28*,
339-362. (specifically: pp.347ff)

HTH
M


On Wed, Dec 1, 2010 at 5:22 PM, Rob <cfenlon@nccu.edu> wrote:
> Just need some clarification here.  Is it appropriate to use a dichotomous
> dependent variable (ex. recidivist vs. non recidivist) in a multiple
> regression model (OLS)? why/why not.  Here is another issue to this mix.
> With the understanding that a dichomtous variable has very little
> variance, the actual data of this dichotomous dependent variable is as
> follows:  449 cases for non recidivist (98%) and 11 cases for recidivist
> (2%).....total sample size 460 cases of inmates). Thanks.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Multiple Regression with a dichotmous DV

Mike
A few points:

(1)  For an overview of these issues, see Cohen, Cohen, West & Aiken's
(2003) Applied Multiple Regression/Correlation Analysis for the Behavioral
Sciences, pages 481-490 or so.  This book is available on books.google.com
though page 485-486 are "hidden".  Here's a link to the relevant section:
http://tinyurl.com/cohencohenwestaiken

(2) If Y is a dichotomy (0,1) and X is a continous variable you can do a
simple regression of Y on X and the correlation you get is the point-biseral
version of the Pearson r.  The Ordinary Least Squares (OLS) is valid but
not all of the usual statistics are (see Cohen et al) because the residuals are
no longer normally distributed.

(2) If Y is a dichotomy (0,1) and you have several X variables that
are dichotomous and/or continous, you can perform an OLS regression
as in the simple regression cases.  The multiple R is now the point-biserial
version of R (one way of thinking of R is that it is the Pearson r between
the actual values of Y and the predicted values of Y or Y-hat).  Again,
some of the statistics will be off.  Cohen et al describe the linear probability
model that represents this type of analysis (there's also a Sage "green book"
on the topic).

(3)  The situation in (2) is equivalent to a linear discriminant analysis but
the point of the discriminant analysis is to find the weights/regression coefficients
(i.e., identifying the predictors) that maximize the difference between the
two groups represented by the dichotomized dependent variable (Cohen et al
cover this).  Multiple discriminat analysis extends this to multilevel categorical
dependent variable.

(4)  The use of OLS regression or logistic regression depends upon what
assumptions one is willing to make about the nature of the data one has
and what types of questions one is asking.  Cohen et al is a good starting
place but one will probably have to look a few more sources to make
sure one knows what one is doing.

I'm a little rusty on this stuff and I don't have a copy of Cohen et al at hand,
so it's probably is a good idea to check a copy of it.  These issues are also
treated in other books.  Again, I don't have a copy of Tabachnick & Fidell's
Using Multivariate Statistics at hand but I have a recollection that this topic
is covered there.

-Mike Palij
New York University
[hidden email]


----- Original Message -----
From: "Bruce Weaver" <[hidden email]>
To: <[hidden email]>
Sent: Wednesday, December 01, 2010 3:36 PM
Subject: Re: Multiple Regression with a dichotmous DV


>I agree that usually, some other form of analysis would be preferable--e.g.,
> logistic regression.  But I'm not sure I'd issue an across the board
> condemnation for all situations.  I believe one of my colleagues (an
> epidemiologist) sometimes uses linear regression with a dichotomous outcome
> when he wants to model the risk difference (rather than the odds ratio,
> which is what logistic regression would give him).  I think I have an
> article on this stashed away somewhere--will look for it, and re-post if I
> find it.
>
> Another issue is the proportion of observations falling into the two
> categories of the outcome variable.  If the proportions are not too close to
> 0 and 1, the model might not be too bad.
>
> And finally, I believe that linear regression with a dichotomous outcome is
> analogous (if not equivalent) to a two-group discriminant function analysis.
> I'm not an expert on multivariate stuff, but maybe one of the multivariate
> experts in the group can comment on that.
>
> Cheers,
> Bruce
>
>
>
> Matthias Spörrle-2 wrote:
>>
>> Hi Rob,
>>
>> No, this is not appropriate, see for explanation:
>> Cortina, J. M. (2002). Big things have small beginnings: An asssortment of
>> “minor” methodological misunderstandings. *Journal of Management, 28*,
>> 339-362. (specifically: pp.347ff)
>>
>> HTH
>> M
>>
>>
>> On Wed, Dec 1, 2010 at 5:22 PM, Rob <[hidden email]> wrote:
>>> Just need some clarification here.  Is it appropriate to use a
>>> dichotomous
>>> dependent variable (ex. recidivist vs. non recidivist) in a multiple
>>> regression model (OLS)? why/why not.  Here is another issue to this mix.
>>> With the understanding that a dichomtous variable has very little
>>> variance, the actual data of this dichotomous dependent variable is as
>>> follows:  449 cases for non recidivist (98%) and 11 cases for recidivist
>>> (2%).....total sample size 460 cases of inmates). Thanks.
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>> [hidden email] (not to SPSSX-L), with no body text except the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>>
>>
>>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multiple-Regression-with-a-dichotmous-DV-tp3287954p3288364.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple Regression with a dichotomous DV

Ornelas, Fermin-2
The fundamental issue is that multiple regression applies to a continuous dependent variable. Moreover, the normality assumption of the errors for a binary variable no longer holds. Furthermore, the error variance is not constant either. Now, since the function being estimated as binary is an estimated probability there is a possibility that the estimated values could be negative.

Fermin Ornelas,


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Mike Palij
Sent: Wednesday, December 01, 2010 2:42 PM
To: [hidden email]
Subject: Re: Multiple Regression with a dichotmous DV

A few points:

(1)  For an overview of these issues, see Cohen, Cohen, West & Aiken's
(2003) Applied Multiple Regression/Correlation Analysis for the Behavioral
Sciences, pages 481-490 or so.  This book is available on books.google.com
though page 485-486 are "hidden".  Here's a link to the relevant section:
http://tinyurl.com/cohencohenwestaiken

(2) If Y is a dichotomy (0,1) and X is a continous variable you can do a
simple regression of Y on X and the correlation you get is the point-biseral
version of the Pearson r.  The Ordinary Least Squares (OLS) is valid but
not all of the usual statistics are (see Cohen et al) because the residuals are
no longer normally distributed.

(2) If Y is a dichotomy (0,1) and you have several X variables that
are dichotomous and/or continous, you can perform an OLS regression
as in the simple regression cases.  The multiple R is now the point-biserial
version of R (one way of thinking of R is that it is the Pearson r between
the actual values of Y and the predicted values of Y or Y-hat).  Again,
some of the statistics will be off.  Cohen et al describe the linear probability
model that represents this type of analysis (there's also a Sage "green book"
on the topic).

(3)  The situation in (2) is equivalent to a linear discriminant analysis but
the point of the discriminant analysis is to find the weights/regression coefficients
(i.e., identifying the predictors) that maximize the difference between the
two groups represented by the dichotomized dependent variable (Cohen et al
cover this).  Multiple discriminat analysis extends this to multilevel categorical
dependent variable.

(4)  The use of OLS regression or logistic regression depends upon what
assumptions one is willing to make about the nature of the data one has
and what types of questions one is asking.  Cohen et al is a good starting
place but one will probably have to look a few more sources to make
sure one knows what one is doing.

I'm a little rusty on this stuff and I don't have a copy of Cohen et al at hand,
so it's probably is a good idea to check a copy of it.  These issues are also
treated in other books.  Again, I don't have a copy of Tabachnick & Fidell's
Using Multivariate Statistics at hand but I have a recollection that this topic
is covered there.

-Mike Palij
New York University
[hidden email]


----- Original Message -----
From: "Bruce Weaver" <[hidden email]>
To: <[hidden email]>
Sent: Wednesday, December 01, 2010 3:36 PM
Subject: Re: Multiple Regression with a dichotmous DV


>I agree that usually, some other form of analysis would be preferable--e.g.,
> logistic regression.  But I'm not sure I'd issue an across the board
> condemnation for all situations.  I believe one of my colleagues (an
> epidemiologist) sometimes uses linear regression with a dichotomous outcome
> when he wants to model the risk difference (rather than the odds ratio,
> which is what logistic regression would give him).  I think I have an
> article on this stashed away somewhere--will look for it, and re-post if I
> find it.
>
> Another issue is the proportion of observations falling into the two
> categories of the outcome variable.  If the proportions are not too close to
> 0 and 1, the model might not be too bad.
>
> And finally, I believe that linear regression with a dichotomous outcome is
> analogous (if not equivalent) to a two-group discriminant function analysis.
> I'm not an expert on multivariate stuff, but maybe one of the multivariate
> experts in the group can comment on that.
>
> Cheers,
> Bruce
>
>
>
> Matthias Spörrle-2 wrote:
>>
>> Hi Rob,
>>
>> No, this is not appropriate, see for explanation:
>> Cortina, J. M. (2002). Big things have small beginnings: An asssortment of
>> “minor” methodological misunderstandings. *Journal of Management, 28*,
>> 339-362. (specifically: pp.347ff)
>>
>> HTH
>> M
>>
>>
>> On Wed, Dec 1, 2010 at 5:22 PM, Rob <[hidden email]> wrote:
>>> Just need some clarification here.  Is it appropriate to use a
>>> dichotomous
>>> dependent variable (ex. recidivist vs. non recidivist) in a multiple
>>> regression model (OLS)? why/why not.  Here is another issue to this mix.
>>> With the understanding that a dichomtous variable has very little
>>> variance, the actual data of this dichotomous dependent variable is as
>>> follows:  449 cases for non recidivist (98%) and 11 cases for recidivist
>>> (2%).....total sample size 460 cases of inmates). Thanks.
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>> [hidden email] (not to SPSSX-L), with no body text except the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>>
>>
>>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multiple-Regression-with-a-dichotmous-DV-tp3287954p3288364.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR CONFIDENTIAL information and is intended only for the use of the specific individual(s) to whom it is addressed.  It may contain information that is privileged and confidential under state and federal law.  This information may be used or disclosed only in accordance with law, and you may be subject to penalties under law for improper use or further disclosure of the information in this e-mail and its attachments. If you have received this e-mail in error, please immediately notify the person named above by reply e-mail, and then delete the original e-mail.  Thank you.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple Regression with a dichotmous DV

Ryan
In reply to this post by Bruce Weaver
It is also possible to obtain relative risks by employing a
log-binomial model, although one can still run into trouble if the
probability estimates go above 1. Although it is true that the
parameter estimates in a binary/binomial logistic regression are in
log-odds units, it is possible in some statistical software packages
(e.g., NLMIXED procedure in SAS) to compute relative risks, and risk
differences for that matter, with their respective X% confidence
limits even if the model is parameterized as a binary/binomial
logistic regression. In general, one does not need to worry about
going above a probability of 1 in such a scenario. This has been
discussed a few times on SAS-L.

Ryan

On Wed, Dec 1, 2010 at 3:36 PM, Bruce Weaver <[hidden email]> wrote:

> I agree that usually, some other form of analysis would be preferable--e.g.,
> logistic regression.  But I'm not sure I'd issue an across the board
> condemnation for all situations.  I believe one of my colleagues (an
> epidemiologist) sometimes uses linear regression with a dichotomous outcome
> when he wants to model the risk difference (rather than the odds ratio,
> which is what logistic regression would give him).  I think I have an
> article on this stashed away somewhere--will look for it, and re-post if I
> find it.
>
> Another issue is the proportion of observations falling into the two
> categories of the outcome variable.  If the proportions are not too close to
> 0 and 1, the model might not be too bad.
>
> And finally, I believe that linear regression with a dichotomous outcome is
> analogous (if not equivalent) to a two-group discriminant function analysis.
> I'm not an expert on multivariate stuff, but maybe one of the multivariate
> experts in the group can comment on that.
>
> Cheers,
> Bruce
>
>
>
> Matthias Spörrle-2 wrote:
>>
>> Hi Rob,
>>
>> No, this is not appropriate, see for explanation:
>> Cortina, J. M. (2002). Big things have small beginnings: An asssortment of
>> “minor” methodological misunderstandings. *Journal of Management, 28*,
>> 339-362. (specifically: pp.347ff)
>>
>> HTH
>> M
>>
>>
>> On Wed, Dec 1, 2010 at 5:22 PM, Rob <[hidden email]> wrote:
>>> Just need some clarification here.  Is it appropriate to use a
>>> dichotomous
>>> dependent variable (ex. recidivist vs. non recidivist) in a multiple
>>> regression model (OLS)? why/why not.  Here is another issue to this mix.
>>> With the understanding that a dichomtous variable has very little
>>> variance, the actual data of this dichotomous dependent variable is as
>>> follows:  449 cases for non recidivist (98%) and 11 cases for recidivist
>>> (2%).....total sample size 460 cases of inmates). Thanks.
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>> [hidden email] (not to SPSSX-L), with no body text except the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>>
>>
>>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multiple-Regression-with-a-dichotmous-DV-tp3287954p3288364.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple Regression with a dichotomous DV

Mike
In reply to this post by Ornelas, Fermin-2
The objections made below are pointed out on pages 483-484 of
Cohen et al's presentation of OLS regression on a dichotomous Y.
This is in the context of the Linear Probability Model (LPM) which
they present as the OLS regression with a dichotomous dependent
variable (Note: Agresti in his Intro to Categorical Data Analysis
presents LPM with Y as the probability of being in one of the
levels of Y [i.e., Pi] and he uses Maximum Likelihood or ML estimation
instead of OLS but if the model is correctly specified, he points out
that the estimates produced by the methods will be similar; see
pages 74-78).

The LPM essentially estimates the probability that a case is in one
of the categories of Y, (i.e. Pi). For reasons stated below,
Pi is not an optimal representation for Y.  The logit transformation,
that is, ln[Pi/(1-Pi)], remedies the problems and represents the
dependent variable used in binary logistic regression which uses
ML to estimate the coefficients and associated statistics.

Again, which analysis one uses depends upon which questions one
is trying to answer and what assumptions one is willing to make.
One would have to provide the rationale for why they are using
one form of analysis over another but it should be clear that,
in general practice, binary logistic regression has much to recommend
it in this situation though it would have to justify it as well, say,
in contrast to using probit regression (see Agresti p79-80 and
Table 4.1 on page 75 for a comparison of the estimates provided
by the different regression equations).

-Mike Palij
New York University
[hidden email]


----- Original Message -----
From: "Ornelas, Fermin" <[hidden email]>
To: "'Mike Palij'" <[hidden email]>; <[hidden email]>
Sent: Wednesday, December 01, 2010 4:59 PM
Subject: RE: Multiple Regression with a dichotomous DV


> The fundamental issue is that multiple regression applies to a continuous dependent variable. Moreover, the normality assumption of the errors for a binary variable no longer holds. Furthermore, the error variance is not constant either. Now, since the function being estimated as binary is an estimated probability there is a possibility that the estimated values could be negative.
>
> Fermin Ornelas,
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Mike Palij
> Sent: Wednesday, December 01, 2010 2:42 PM
> To: [hidden email]
> Subject: Re: Multiple Regression with a dichotmous DV
>
> A few points:
>
> (1)  For an overview of these issues, see Cohen, Cohen, West & Aiken's
> (2003) Applied Multiple Regression/Correlation Analysis for the Behavioral
> Sciences, pages 481-490 or so.  This book is available on books.google.com
> though page 485-486 are "hidden".  Here's a link to the relevant section:
> http://tinyurl.com/cohencohenwestaiken
>
> (2) If Y is a dichotomy (0,1) and X is a continous variable you can do a
> simple regression of Y on X and the correlation you get is the point-biseral
> version of the Pearson r.  The Ordinary Least Squares (OLS) is valid but
> not all of the usual statistics are (see Cohen et al) because the residuals are
> no longer normally distributed.
>
> (2) If Y is a dichotomy (0,1) and you have several X variables that
> are dichotomous and/or continous, you can perform an OLS regression
> as in the simple regression cases.  The multiple R is now the point-biserial
> version of R (one way of thinking of R is that it is the Pearson r between
> the actual values of Y and the predicted values of Y or Y-hat).  Again,
> some of the statistics will be off.  Cohen et al describe the linear probability
> model that represents this type of analysis (there's also a Sage "green book"
> on the topic).
>
> (3)  The situation in (2) is equivalent to a linear discriminant analysis but
> the point of the discriminant analysis is to find the weights/regression coefficients
> (i.e., identifying the predictors) that maximize the difference between the
> two groups represented by the dichotomized dependent variable (Cohen et al
> cover this).  Multiple discriminat analysis extends this to multilevel categorical
> dependent variable.
>
> (4)  The use of OLS regression or logistic regression depends upon what
> assumptions one is willing to make about the nature of the data one has
> and what types of questions one is asking.  Cohen et al is a good starting
> place but one will probably have to look a few more sources to make
> sure one knows what one is doing.
>
> I'm a little rusty on this stuff and I don't have a copy of Cohen et al at hand,
> so it's probably is a good idea to check a copy of it.  These issues are also
> treated in other books.  Again, I don't have a copy of Tabachnick & Fidell's
> Using Multivariate Statistics at hand but I have a recollection that this topic
> is covered there.
>
> -Mike Palij
> New York University
> [hidden email]
>
>
> ----- Original Message -----
> From: "Bruce Weaver" <[hidden email]>
> To: <[hidden email]>
> Sent: Wednesday, December 01, 2010 3:36 PM
> Subject: Re: Multiple Regression with a dichotmous DV
>
>
>>I agree that usually, some other form of analysis would be preferable--e.g.,
>> logistic regression.  But I'm not sure I'd issue an across the board
>> condemnation for all situations.  I believe one of my colleagues (an
>> epidemiologist) sometimes uses linear regression with a dichotomous outcome
>> when he wants to model the risk difference (rather than the odds ratio,
>> which is what logistic regression would give him).  I think I have an
>> article on this stashed away somewhere--will look for it, and re-post if I
>> find it.
>>
>> Another issue is the proportion of observations falling into the two
>> categories of the outcome variable.  If the proportions are not too close to
>> 0 and 1, the model might not be too bad.
>>
>> And finally, I believe that linear regression with a dichotomous outcome is
>> analogous (if not equivalent) to a two-group discriminant function analysis.
>> I'm not an expert on multivariate stuff, but maybe one of the multivariate
>> experts in the group can comment on that.
>>
>> Cheers,
>> Bruce
>>
>>
>>
>> Matthias Spörrle-2 wrote:
>>>
>>> Hi Rob,
>>>
>>> No, this is not appropriate, see for explanation:
>>> Cortina, J. M. (2002). Big things have small beginnings: An asssortment of
>>> “minor” methodological misunderstandings. *Journal of Management, 28*,
>>> 339-362. (specifically: pp.347ff)
>>>
>>> HTH
>>> M
>>>
>>>
>>> On Wed, Dec 1, 2010 at 5:22 PM, Rob <[hidden email]> wrote:
>>>> Just need some clarification here.  Is it appropriate to use a
>>>> dichotomous
>>>> dependent variable (ex. recidivist vs. non recidivist) in a multiple
>>>> regression model (OLS)? why/why not.  Here is another issue to this mix.
>>>> With the understanding that a dichomtous variable has very little
>>>> variance, the actual data of this dichotomous dependent variable is as
>>>> follows:  449 cases for non recidivist (98%) and 11 cases for recidivist
>>>> (2%).....total sample size 460 cases of inmates). Thanks.
>>>>
>>>> =====================
>>>> To manage your subscription to SPSSX-L, send a message to
>>>> [hidden email] (not to SPSSX-L), with no body text except the
>>>> command. To leave the list, send the command
>>>> SIGNOFF SPSSX-L
>>>> For a list of commands to manage subscriptions, send the command
>>>> INFO REFCARD
>>>>
>>>
>>>
>>
>>
>> -----
>> --
>> Bruce Weaver
>> [hidden email]
>> http://sites.google.com/a/lakeheadu.ca/bweaver/
>>
>> "When all else fails, RTFM."
>>
>> NOTE: My Hotmail account is not monitored regularly.
>> To send me an e-mail, please use the address shown above.
>>
>> --
>> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multiple-Regression-with-a-dichotmous-DV-tp3287954p3288364.html
>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
> NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR CONFIDENTIAL information and is intended only for the use of the specific individual(s) to whom it is addressed.  It may contain information that is privileged and confidential under state and federal law.  This information may be used or disclosed only in accordance with law, and you may be subject to penalties under law for improper use or further disclosure of the information in this e-mail and its attachments. If you have received this e-mail in error, please immediately notify the person named above by reply e-mail, and then delete the original e-mail.  Thank you.
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Multiple Regression with a dichotomous DV

Ornelas, Fermin-2
I think that person the posting the question did not have a clear understanding of the research problem. From the onset it seemed to me that the ideal modeling technique would have been a logistic regression model where one is attempting to predict probability of an event or a hazard model. However I noticed that those having the event were a small number in the data set.

Fermin Ornelas


-----Original Message-----
From: Mike Palij [mailto:[hidden email]]
Sent: Wednesday, December 01, 2010 4:22 PM
To: Ornelas, Fermin; [hidden email]
Cc: Mike Palij
Subject: Re: Multiple Regression with a dichotomous DV

The objections made below are pointed out on pages 483-484 of
Cohen et al's presentation of OLS regression on a dichotomous Y.
This is in the context of the Linear Probability Model (LPM) which
they present as the OLS regression with a dichotomous dependent
variable (Note: Agresti in his Intro to Categorical Data Analysis
presents LPM with Y as the probability of being in one of the
levels of Y [i.e., Pi] and he uses Maximum Likelihood or ML estimation
instead of OLS but if the model is correctly specified, he points out
that the estimates produced by the methods will be similar; see
pages 74-78).

The LPM essentially estimates the probability that a case is in one
of the categories of Y, (i.e. Pi). For reasons stated below,
Pi is not an optimal representation for Y.  The logit transformation,
that is, ln[Pi/(1-Pi)], remedies the problems and represents the
dependent variable used in binary logistic regression which uses
ML to estimate the coefficients and associated statistics.

Again, which analysis one uses depends upon which questions one
is trying to answer and what assumptions one is willing to make.
One would have to provide the rationale for why they are using
one form of analysis over another but it should be clear that,
in general practice, binary logistic regression has much to recommend
it in this situation though it would have to justify it as well, say,
in contrast to using probit regression (see Agresti p79-80 and
Table 4.1 on page 75 for a comparison of the estimates provided
by the different regression equations).

-Mike Palij
New York University
[hidden email]


----- Original Message -----
From: "Ornelas, Fermin" <[hidden email]>
To: "'Mike Palij'" <[hidden email]>; <[hidden email]>
Sent: Wednesday, December 01, 2010 4:59 PM
Subject: RE: Multiple Regression with a dichotomous DV


> The fundamental issue is that multiple regression applies to a continuous dependent variable. Moreover, the normality assumption of the errors for a binary variable no longer holds. Furthermore, the error variance is not constant either. Now, since the function being estimated as binary is an estimated probability there is a possibility that the estimated values could be negative.
>
> Fermin Ornelas,
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Mike Palij
> Sent: Wednesday, December 01, 2010 2:42 PM
> To: [hidden email]
> Subject: Re: Multiple Regression with a dichotmous DV
>
> A few points:
>
> (1)  For an overview of these issues, see Cohen, Cohen, West & Aiken's
> (2003) Applied Multiple Regression/Correlation Analysis for the Behavioral
> Sciences, pages 481-490 or so.  This book is available on books.google.com
> though page 485-486 are "hidden".  Here's a link to the relevant section:
> http://tinyurl.com/cohencohenwestaiken
>
> (2) If Y is a dichotomy (0,1) and X is a continous variable you can do a
> simple regression of Y on X and the correlation you get is the point-biseral
> version of the Pearson r.  The Ordinary Least Squares (OLS) is valid but
> not all of the usual statistics are (see Cohen et al) because the residuals are
> no longer normally distributed.
>
> (2) If Y is a dichotomy (0,1) and you have several X variables that
> are dichotomous and/or continous, you can perform an OLS regression
> as in the simple regression cases.  The multiple R is now the point-biserial
> version of R (one way of thinking of R is that it is the Pearson r between
> the actual values of Y and the predicted values of Y or Y-hat).  Again,
> some of the statistics will be off.  Cohen et al describe the linear probability
> model that represents this type of analysis (there's also a Sage "green book"
> on the topic).
>
> (3)  The situation in (2) is equivalent to a linear discriminant analysis but
> the point of the discriminant analysis is to find the weights/regression coefficients
> (i.e., identifying the predictors) that maximize the difference between the
> two groups represented by the dichotomized dependent variable (Cohen et al
> cover this).  Multiple discriminat analysis extends this to multilevel categorical
> dependent variable.
>
> (4)  The use of OLS regression or logistic regression depends upon what
> assumptions one is willing to make about the nature of the data one has
> and what types of questions one is asking.  Cohen et al is a good starting
> place but one will probably have to look a few more sources to make
> sure one knows what one is doing.
>
> I'm a little rusty on this stuff and I don't have a copy of Cohen et al at hand,
> so it's probably is a good idea to check a copy of it.  These issues are also
> treated in other books.  Again, I don't have a copy of Tabachnick & Fidell's
> Using Multivariate Statistics at hand but I have a recollection that this topic
> is covered there.
>
> -Mike Palij
> New York University
> [hidden email]
>
>
> ----- Original Message -----
> From: "Bruce Weaver" <[hidden email]>
> To: <[hidden email]>
> Sent: Wednesday, December 01, 2010 3:36 PM
> Subject: Re: Multiple Regression with a dichotmous DV
>
>
>>I agree that usually, some other form of analysis would be preferable--e.g.,
>> logistic regression.  But I'm not sure I'd issue an across the board
>> condemnation for all situations.  I believe one of my colleagues (an
>> epidemiologist) sometimes uses linear regression with a dichotomous outcome
>> when he wants to model the risk difference (rather than the odds ratio,
>> which is what logistic regression would give him).  I think I have an
>> article on this stashed away somewhere--will look for it, and re-post if I
>> find it.
>>
>> Another issue is the proportion of observations falling into the two
>> categories of the outcome variable.  If the proportions are not too close to
>> 0 and 1, the model might not be too bad.
>>
>> And finally, I believe that linear regression with a dichotomous outcome is
>> analogous (if not equivalent) to a two-group discriminant function analysis.
>> I'm not an expert on multivariate stuff, but maybe one of the multivariate
>> experts in the group can comment on that.
>>
>> Cheers,
>> Bruce
>>
>>
>>
>> Matthias Spörrle-2 wrote:
>>>
>>> Hi Rob,
>>>
>>> No, this is not appropriate, see for explanation:
>>> Cortina, J. M. (2002). Big things have small beginnings: An asssortment of
>>> “minor” methodological misunderstandings. *Journal of Management, 28*,
>>> 339-362. (specifically: pp.347ff)
>>>
>>> HTH
>>> M
>>>
>>>
>>> On Wed, Dec 1, 2010 at 5:22 PM, Rob <[hidden email]> wrote:
>>>> Just need some clarification here.  Is it appropriate to use a
>>>> dichotomous
>>>> dependent variable (ex. recidivist vs. non recidivist) in a multiple
>>>> regression model (OLS)? why/why not.  Here is another issue to this mix.
>>>> With the understanding that a dichomtous variable has very little
>>>> variance, the actual data of this dichotomous dependent variable is as
>>>> follows:  449 cases for non recidivist (98%) and 11 cases for recidivist
>>>> (2%).....total sample size 460 cases of inmates). Thanks.
>>>>
>>>> =====================
>>>> To manage your subscription to SPSSX-L, send a message to
>>>> [hidden email] (not to SPSSX-L), with no body text except the
>>>> command. To leave the list, send the command
>>>> SIGNOFF SPSSX-L
>>>> For a list of commands to manage subscriptions, send the command
>>>> INFO REFCARD
>>>>
>>>
>>>
>>
>>
>> -----
>> --
>> Bruce Weaver
>> [hidden email]
>> http://sites.google.com/a/lakeheadu.ca/bweaver/
>>
>> "When all else fails, RTFM."
>>
>> NOTE: My Hotmail account is not monitored regularly.
>> To send me an e-mail, please use the address shown above.
>>
>> --
>> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multiple-Regression-with-a-dichotmous-DV-tp3287954p3288364.html
>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
> NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR CONFIDENTIAL information and is intended only for the use of the specific individual(s) to whom it is addressed.  It may contain information that is privileged and confidential under state and federal law.  This information may be used or disclosed only in accordance with law, and you may be subject to penalties under law for improper use or further disclosure of the information in this e-mail and its attachments. If you have received this e-mail in error, please immediately notify the person named above by reply e-mail, and then delete the original e-mail.  Thank you.
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD