running ANOVA on binary data?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

running ANOVA on binary data?

Robinson Aschoff
Hello,

I would like to ask if you are aware of any problems (violated
assumptions) if you run an ANOVA on binary data (e.g. 0 / 1 coded for
answer "no" "yes" or "patient not infected" "patient infected"). How
severe are those violations? Would you consider running an ANOVA in this
case "common practice" or not recommendable? Does anybody happen to know
where this aspect is discussed in literature?

Thanks a lot. I really appreciate your help.

Sincerely,
Robinson Aschoff

I hope this hasn`t been asked before a alot. I didn`t found it in the
archive though.
----------------------------------------------------------------
Felix-Robinson Aschoff
Information Management Research Group
Department of Informatics
University of Zurich
Binzmuehlestrasse 14
CH-8050 Zurich, Switzerland

E-Mail: [hidden email]
Phone: +41 (0)44 635 6690
Fax: +41 (0)44 635 6809
Room: 2.D.11
http://www.ifi.unizh.ch/im
Reply | Threaded
Open this post in threaded view
|

Re: running ANOVA on binary data?

Michael Kruger
If you are talking about running an ANOVA on a response or dependent
variable that is binary, you really need to take a statistics course.
ANOVA assumes a normally distributed , interval level variable. A binary
variale is neither of these things!

>
>
>
>


--
Michael Kruger
"A True Prince"
Statistical Analyst
C.S. Mott Center
Dept. of OB/GYN
Wayne State University School of Medicine
(313)-577-1794
Reply | Threaded
Open this post in threaded view
|

Re: running ANOVA on binary data?

Dominic Lusinchi
In reply to this post by Robinson Aschoff
Without knowing more about your data... the most appropriate technique to
use would be logistic regression where the dependent variable can be binary,
and the independents can be a mix (categorical, continuous).

Dominic Lusinchi
Statistician
Far West Research
Statistical Consulting
San Francisco, California
415-664-3032
www.farwestresearch.com

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Robinson Aschoff
Sent: Monday, August 21, 2006 10:54 AM
To: [hidden email]
Subject: running ANOVA on binary data?

Hello,

I would like to ask if you are aware of any problems (violated
assumptions) if you run an ANOVA on binary data (e.g. 0 / 1 coded for
answer "no" "yes" or "patient not infected" "patient infected"). How
severe are those violations? Would you consider running an ANOVA in this
case "common practice" or not recommendable? Does anybody happen to know
where this aspect is discussed in literature?

Thanks a lot. I really appreciate your help.

Sincerely,
Robinson Aschoff

I hope this hasn`t been asked before a alot. I didn`t found it in the
archive though.
----------------------------------------------------------------
Felix-Robinson Aschoff
Information Management Research Group
Department of Informatics
University of Zurich
Binzmuehlestrasse 14
CH-8050 Zurich, Switzerland

E-Mail: [hidden email]
Phone: +41 (0)44 635 6690
Fax: +41 (0)44 635 6809
Room: 2.D.11
http://www.ifi.unizh.ch/im
Reply | Threaded
Open this post in threaded view
|

Re: running ANOVA on binary data?

Ken Chui
In reply to this post by Robinson Aschoff
Strictly speaking no.

Although in econometric analyses there are linear regression using binary
variable as outcome (I think it's called linear probability model), those
kind of analyses tend to have very big sample size and the stand error
wouldn't be outrageous.

The problem for medical studies is limited sample size, and if you feed
binary variable into linear model, you may end up with predicted result like
less then zero or bigger than one, that would be difficult to explain.

A better approach would be binary logistic regression.  Changing the outcome
to log(odds) takes care of that 1/0 domain problem.

-Ken

>Hello,
>
>I would like to ask if you are aware of any problems (violated
>assumptions) if you run an ANOVA on binary data (e.g. 0 / 1 coded for
>answer "no" "yes" or "patient not infected" "patient infected"). How
>severe are those violations? Would you consider running an ANOVA in this
>case "common practice" or not recommendable? Does anybody happen to know
>where this aspect is discussed in literature?
>
>Thanks a lot. I really appreciate your help.
>
>Sincerely,
>Robinson Aschoff
>
>I hope this hasn`t been asked before a alot. I didn`t found it in the
>archive though.
>----------------------------------------------------------------
>Felix-Robinson Aschoff
>Information Management Research Group
>Department of Informatics
>University of Zurich
>Binzmuehlestrasse 14
>CH-8050 Zurich, Switzerland
>
>E-Mail: [hidden email]
>Phone: +41 (0)44 635 6690
>Fax: +41 (0)44 635 6809
>Room: 2.D.11
>http://www.ifi.unizh.ch/im
Reply | Threaded
Open this post in threaded view
|

Re: running ANOVA on binary data?

Art Kendall
In reply to this post by Robinson Aschoff
It is not common practice.  There are circumstances where the residuals
could be normal. What question(s) are you using the data to answer?

There are many different circumstances that could apply.
What  proportion of the patients are infected?
What are your explanatory/independent variables? How many independent
variables are there? How are they measured?

Art

Social Research Consultants

Robinson Aschoff wrote:

>Hello,
>
>I would like to ask if you are aware of any problems (violated
>assumptions) if you run an ANOVA on binary data (e.g. 0 / 1 coded for
>answer "no" "yes" or "patient not infected" "patient infected"). How
>severe are those violations? Would you consider running an ANOVA in this
>case "common practice" or not recommendable? Does anybody happen to know
>where this aspect is discussed in literature?
>
>Thanks a lot. I really appreciate your help.
>
>Sincerely,
>Robinson Aschoff
>
>I hope this hasn`t been asked before a alot. I didn`t found it in the
>archive though.
>----------------------------------------------------------------
>Felix-Robinson Aschoff
>Information Management Research Group
>Department of Informatics
>University of Zurich
>Binzmuehlestrasse 14
>CH-8050 Zurich, Switzerland
>
>E-Mail: [hidden email]
>Phone: +41 (0)44 635 6690
>Fax: +41 (0)44 635 6809
>Room: 2.D.11
>http://www.ifi.unizh.ch/im
>
>
>
>
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: running ANOVA on binary data?

Kevin Bladon
In reply to this post by Michael Kruger
Robinson,

I agree, using ANOVA for a dichotomous response variable violates the
assumptions of homogeneity of variances and normally distributed
errors.�  You should definitely look into using logistic regression for
this type of analysis.

Regards,
Kevin Bladon, Ph.D., A.Ag.
Resource Analyst
Silvacom Ltd.
3825 - 93 Street
Edmonton, AB
T6E 5K5
Phone: 780.462.3238
Fax: 780.462.4726
E-mail: [hidden email]
www.silvacom.com

"Make every obstacle an opportunity."
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � - Lance Armstrong
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Cancer survivor and 7 time Tour de
France champion (1999-2005)


On Aug 21, 2006 12:06, Michael Kruger wrote:

>If you are talking about running an ANOVA on a response or dependent
>variable that is binary, you really need to take a statistics course.
>ANOVA assumes a normally distributed , interval level variable. A
>binary
>variale is neither of these things!
>
>>
>>
>>
>>
>
>
>--
>Michael Kruger
>"A True Prince"
>Statistical Analyst
>C.S. Mott Center
>Dept. of OB/GYN
>Wayne State University School of Medicine
>(313)-577-1794

Hello,

I would like to ask if you are aware of any problems (violated
assumptions) if you run an ANOVA on binary data (e.g. 0 / 1 coded for
answer "no" "yes" or "patient not infected" "patient infected"). How
severe are those violations? Would you consider running an ANOVA in this
case "common practice" or not recommendable? Does anybody happen to know
where this aspect is discussed in literature?

Thanks a lot. I really appreciate your help.

Sincerely,
Robinson Aschoff

I hope this hasn`t been asked before a alot. I didn`t found it in the
archive though.
----------------------------------------------------------------
Felix-Robinson Aschoff
Information Management Research Group
Department of Informatics
University of Zurich
Binzmuehlestrasse 14
CH-8050 Zurich, Switzerland

E-Mail: [hidden email]
Phone: +41 (0)44 635 6690
Fax: +41 (0)44 635 6809
Room: 2.D.11
http://www.ifi.unizh.ch/im
Reply | Threaded
Open this post in threaded view
|

Re: running ANOVA on binary data?

Hector Maletta
In reply to this post by Ken Chui
Ken,
Yes and no. The problem posted by Robinson was about ANOVA, not about
regression. ANOVA on binary variables is an unorthodox but feasible
procedure, because ANOVA analyzes the MEAN of a variable in different cells
defined by combinations of categories, en the mean of a binary is simple the
proportion with the value 1 (assuming the other value is 0). The unorthodoxy
comes from the fact that residuals are not normal.
Now, the explanation by Ken regarding regression in econometrics is not
entirely convincing. A regression using a binary (say, coded 1 and 0) as
dependent variable is likely to predict values outside the (1,0) range.
There are (in econometrics) certain models like "linear probability" forcing
linear regression to produce predictions within a restricted range of values
(a good text is G. S. Maddala, Limited Dependent and Qualitative Variables
in Econometrics, Cambridge U.Press, 1983), but they are not widely used, and
logistic regression is by far a preferred option in such cases, in order to
predict the probability of the value 1 to occur. Only when theory dictates
that the underlying function is linear and not logistic, the linear
probability model should be used, and that is seldom, if ever, the case. By
the way, one the original models for continuous latent variables assumed
linear tracelines for dichotomous items (see Paul Lazarsfeld and Neil Henry,
Latent Structure Analysis, NY, Houghton Mifflin, 1968), but only as a rough
first approximation soon discarded in favor of more appropriate functions.
Medical studies may have small or large samples. Many medical studies have
tens of thousands of cases, but even so certain methods are not advisable
because of the nature of the data and methods involved. By the same token
you may have a perfectly legitimate method with adequate data but giving
large errors because the sample is too small. These are two completely
different problems, and sample size is just not pertinent here.
Hector


-----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Ken
Chui
Enviado el: Monday, August 21, 2006 10:40 PM
Para: [hidden email]
Asunto: Re: running ANOVA on binary data?

Strictly speaking no.

Although in econometric analyses there are linear regression using binary
variable as outcome (I think it's called linear probability model), those
kind of analyses tend to have very big sample size and the stand error
wouldn't be outrageous.

The problem for medical studies is limited sample size, and if you feed
binary variable into linear model, you may end up with predicted result like
less then zero or bigger than one, that would be difficult to explain.

A better approach would be binary logistic regression.  Changing the outcome
to log(odds) takes care of that 1/0 domain problem.

-Ken

>Hello,
>
>I would like to ask if you are aware of any problems (violated
>assumptions) if you run an ANOVA on binary data (e.g. 0 / 1 coded for
>answer "no" "yes" or "patient not infected" "patient infected"). How
>severe are those violations? Would you consider running an ANOVA in this
>case "common practice" or not recommendable? Does anybody happen to know
>where this aspect is discussed in literature?
>
>Thanks a lot. I really appreciate your help.
>
>Sincerely,
>Robinson Aschoff
>
>I hope this hasn`t been asked before a alot. I didn`t found it in the
>archive though.
>----------------------------------------------------------------
>Felix-Robinson Aschoff
>Information Management Research Group
>Department of Informatics
>University of Zurich
>Binzmuehlestrasse 14
>CH-8050 Zurich, Switzerland
>
>E-Mail: [hidden email]
>Phone: +41 (0)44 635 6690
>Fax: +41 (0)44 635 6809
>Room: 2.D.11
>http://www.ifi.unizh.ch/im
Reply | Threaded
Open this post in threaded view
|

Re: running ANOVA on binary data?

Patricia Rego
In reply to this post by Robinson Aschoff
And your response, Kevin, is in a much more appropriate tone and more
helpful than the previous response to Robinson's query.
Patricia


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Kevin Bladon
Sent: Wednesday, 23 August 2006 1:01 AM
To: [hidden email]
Subject: Re: running ANOVA on binary data?

Robinson,

I agree, using ANOVA for a dichotomous response variable violates the
assumptions of homogeneity of variances and normally distributed
errors.  You should definitely look into using logistic regression for
this type of analysis.

Regards,
Kevin Bladon, Ph.D., A.Ag.
Resource Analyst
Silvacom Ltd.
3825 - 93 Street
Edmonton, AB
T6E 5K5
Phone: 780.462.3238
Fax: 780.462.4726
E-mail: [hidden email]
www.silvacom.com

"Make every obstacle an opportunity."
                              - Lance Armstrong
                                Cancer survivor and 7 time Tour de
France champion (1999-2005)


On Aug 21, 2006 12:06, Michael Kruger wrote:

>If you are talking about running an ANOVA on a response or dependent
>variable that is binary, you really need to take a statistics course.
>ANOVA assumes a normally distributed , interval level variable. A
>binary
>variale is neither of these things!
>
>>
>>
>>
>>
>
>
>--
>Michael Kruger
>"A True Prince"
>Statistical Analyst
>C.S. Mott Center
>Dept. of OB/GYN
>Wayne State University School of Medicine
>(313)-577-1794

Hello,

I would like to ask if you are aware of any problems (violated
assumptions) if you run an ANOVA on binary data (e.g. 0 / 1 coded for
answer "no" "yes" or "patient not infected" "patient infected"). How
severe are those violations? Would you consider running an ANOVA in this
case "common practice" or not recommendable? Does anybody happen to know
where this aspect is discussed in literature?

Thanks a lot. I really appreciate your help.

Sincerely,
Robinson Aschoff

I hope this hasn`t been asked before a alot. I didn`t found it in the
archive though.
----------------------------------------------------------------
Felix-Robinson Aschoff
Information Management Research Group
Department of Informatics
University of Zurich
Binzmuehlestrasse 14
CH-8050 Zurich, Switzerland

E-Mail: [hidden email]
Phone: +41 (0)44 635 6690
Fax: +41 (0)44 635 6809
Room: 2.D.11
http://www.ifi.unizh.ch/im
Reply | Threaded
Open this post in threaded view
|

Re: running ANOVA on binary data?

Baker, Harley
Robinson,

In general, the difficulty is that ANOVA is not uniformly justified in such
situations. This is because dichotomous data have a greater probability of
violating seriously the ANOVA assumptions than do strictly numeric data.

While the comments from Kevin and Michael about the inadvisability of ANOVA
with dichotomous variables are correct, the real issue may not be whether
the assumptions are violated, but rather the degree to which they are
violated. In general, the major assumptions of an ANOVA model can be tested
to the degree to which they are violated in any particular data set. If the
violations are relatively minor, the use of ANOVA may be warranted. I would
argue that in such a case, the use of ANOVA is justified statistically
because the data fit the model. If the use of ANOVA then can be justified
from an empirical viewpoint (e.g., makes sense in the context of the
particular research and intended use of the outcomes) I would say to use
ANOVA.

Harley


Dr. Harley Baker
Associate Professor and Chair, Psychology Program
Chief Assessment Officer for Academic Affairs
California State University Channel Islands
One University Drive
Camarillo, CA 93012

805.437.8997 (p)
805.437.8951 (f)

[hidden email]



> From: Patricia Rego <[hidden email]>
> Reply-To: Patricia Rego <[hidden email]>
> Newsgroups: bit.listserv.spssx-l
> Date: Wed, 23 Aug 2006 11:12:14 +1000
> To: <[hidden email]>
> Conversation: running ANOVA on binary data?
> Subject: Re: running ANOVA on binary data?
>
> And your response, Kevin, is in a much more appropriate tone and more
> helpful than the previous response to Robinson's query.
> Patricia
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Kevin Bladon
> Sent: Wednesday, 23 August 2006 1:01 AM
> To: [hidden email]
> Subject: Re: running ANOVA on binary data?
>
> Robinson,
>
> I agree, using ANOVA for a dichotomous response variable violates the
> assumptions of homogeneity of variances and normally distributed
> errors.  You should definitely look into using logistic regression for
> this type of analysis.
>
> Regards,
> Kevin Bladon, Ph.D., A.Ag.
> Resource Analyst
> Silvacom Ltd.
> 3825 - 93 Street
> Edmonton, AB
> T6E 5K5
> Phone: 780.462.3238
> Fax: 780.462.4726
> E-mail: [hidden email]
> www.silvacom.com
>
> "Make every obstacle an opportunity."
>                               - Lance Armstrong
>                                 Cancer survivor and 7 time Tour de
> France champion (1999-2005)
>
>
> On Aug 21, 2006 12:06, Michael Kruger wrote:
>
>> If you are talking about running an ANOVA on a response or dependent
>> variable that is binary, you really need to take a statistics course.
>> ANOVA assumes a normally distributed , interval level variable. A
>> binary
>> variale is neither of these things!
>>
>>>
>>>
>>>
>>>
>>
>>
>> --
>> Michael Kruger
>> "A True Prince"
>> Statistical Analyst
>> C.S. Mott Center
>> Dept. of OB/GYN
>> Wayne State University School of Medicine
>> (313)-577-1794
>
> Hello,
>
> I would like to ask if you are aware of any problems (violated
> assumptions) if you run an ANOVA on binary data (e.g. 0 / 1 coded for
> answer "no" "yes" or "patient not infected" "patient infected"). How
> severe are those violations? Would you consider running an ANOVA in this
> case "common practice" or not recommendable? Does anybody happen to know
> where this aspect is discussed in literature?
>
> Thanks a lot. I really appreciate your help.
>
> Sincerely,
> Robinson Aschoff
>
> I hope this hasn`t been asked before a alot. I didn`t found it in the
> archive though.
> ----------------------------------------------------------------
> Felix-Robinson Aschoff
> Information Management Research Group
> Department of Informatics
> University of Zurich
> Binzmuehlestrasse 14
> CH-8050 Zurich, Switzerland
>
> E-Mail: [hidden email]
> Phone: +41 (0)44 635 6690
> Fax: +41 (0)44 635 6809
> Room: 2.D.11
> http://www.ifi.unizh.ch/im