Hello,
I would like to ask if you are aware of any problems (violated assumptions) if you run an ANOVA on binary data (e.g. 0 / 1 coded for answer "no" "yes" or "patient not infected" "patient infected"). How severe are those violations? Would you consider running an ANOVA in this case "common practice" or not recommendable? Does anybody happen to know where this aspect is discussed in literature? Thanks a lot. I really appreciate your help. Sincerely, Robinson Aschoff I hope this hasn`t been asked before a alot. I didn`t found it in the archive though. ---------------------------------------------------------------- Felix-Robinson Aschoff Information Management Research Group Department of Informatics University of Zurich Binzmuehlestrasse 14 CH-8050 Zurich, Switzerland E-Mail: [hidden email] Phone: +41 (0)44 635 6690 Fax: +41 (0)44 635 6809 Room: 2.D.11 http://www.ifi.unizh.ch/im |
If you are talking about running an ANOVA on a response or dependent
variable that is binary, you really need to take a statistics course. ANOVA assumes a normally distributed , interval level variable. A binary variale is neither of these things! > > > > -- Michael Kruger "A True Prince" Statistical Analyst C.S. Mott Center Dept. of OB/GYN Wayne State University School of Medicine (313)-577-1794 |
In reply to this post by Robinson Aschoff
Without knowing more about your data... the most appropriate technique to
use would be logistic regression where the dependent variable can be binary, and the independents can be a mix (categorical, continuous). Dominic Lusinchi Statistician Far West Research Statistical Consulting San Francisco, California 415-664-3032 www.farwestresearch.com -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Robinson Aschoff Sent: Monday, August 21, 2006 10:54 AM To: [hidden email] Subject: running ANOVA on binary data? Hello, I would like to ask if you are aware of any problems (violated assumptions) if you run an ANOVA on binary data (e.g. 0 / 1 coded for answer "no" "yes" or "patient not infected" "patient infected"). How severe are those violations? Would you consider running an ANOVA in this case "common practice" or not recommendable? Does anybody happen to know where this aspect is discussed in literature? Thanks a lot. I really appreciate your help. Sincerely, Robinson Aschoff I hope this hasn`t been asked before a alot. I didn`t found it in the archive though. ---------------------------------------------------------------- Felix-Robinson Aschoff Information Management Research Group Department of Informatics University of Zurich Binzmuehlestrasse 14 CH-8050 Zurich, Switzerland E-Mail: [hidden email] Phone: +41 (0)44 635 6690 Fax: +41 (0)44 635 6809 Room: 2.D.11 http://www.ifi.unizh.ch/im |
In reply to this post by Robinson Aschoff
Strictly speaking no.
Although in econometric analyses there are linear regression using binary variable as outcome (I think it's called linear probability model), those kind of analyses tend to have very big sample size and the stand error wouldn't be outrageous. The problem for medical studies is limited sample size, and if you feed binary variable into linear model, you may end up with predicted result like less then zero or bigger than one, that would be difficult to explain. A better approach would be binary logistic regression. Changing the outcome to log(odds) takes care of that 1/0 domain problem. -Ken >Hello, > >I would like to ask if you are aware of any problems (violated >assumptions) if you run an ANOVA on binary data (e.g. 0 / 1 coded for >answer "no" "yes" or "patient not infected" "patient infected"). How >severe are those violations? Would you consider running an ANOVA in this >case "common practice" or not recommendable? Does anybody happen to know >where this aspect is discussed in literature? > >Thanks a lot. I really appreciate your help. > >Sincerely, >Robinson Aschoff > >I hope this hasn`t been asked before a alot. I didn`t found it in the >archive though. >---------------------------------------------------------------- >Felix-Robinson Aschoff >Information Management Research Group >Department of Informatics >University of Zurich >Binzmuehlestrasse 14 >CH-8050 Zurich, Switzerland > >E-Mail: [hidden email] >Phone: +41 (0)44 635 6690 >Fax: +41 (0)44 635 6809 >Room: 2.D.11 >http://www.ifi.unizh.ch/im |
In reply to this post by Robinson Aschoff
It is not common practice. There are circumstances where the residuals
could be normal. What question(s) are you using the data to answer? There are many different circumstances that could apply. What proportion of the patients are infected? What are your explanatory/independent variables? How many independent variables are there? How are they measured? Art Social Research Consultants Robinson Aschoff wrote: >Hello, > >I would like to ask if you are aware of any problems (violated >assumptions) if you run an ANOVA on binary data (e.g. 0 / 1 coded for >answer "no" "yes" or "patient not infected" "patient infected"). How >severe are those violations? Would you consider running an ANOVA in this >case "common practice" or not recommendable? Does anybody happen to know >where this aspect is discussed in literature? > >Thanks a lot. I really appreciate your help. > >Sincerely, >Robinson Aschoff > >I hope this hasn`t been asked before a alot. I didn`t found it in the >archive though. >---------------------------------------------------------------- >Felix-Robinson Aschoff >Information Management Research Group >Department of Informatics >University of Zurich >Binzmuehlestrasse 14 >CH-8050 Zurich, Switzerland > >E-Mail: [hidden email] >Phone: +41 (0)44 635 6690 >Fax: +41 (0)44 635 6809 >Room: 2.D.11 >http://www.ifi.unizh.ch/im > > > >
Art Kendall
Social Research Consultants |
In reply to this post by Michael Kruger
Robinson,
I agree, using ANOVA for a dichotomous response variable violates the assumptions of homogeneity of variances and normally distributed errors.� You should definitely look into using logistic regression for this type of analysis. Regards, Kevin Bladon, Ph.D., A.Ag. Resource Analyst Silvacom Ltd. 3825 - 93 Street Edmonton, AB T6E 5K5 Phone: 780.462.3238 Fax: 780.462.4726 E-mail: [hidden email] www.silvacom.com "Make every obstacle an opportunity." � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � - Lance Armstrong � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Cancer survivor and 7 time Tour de France champion (1999-2005) On Aug 21, 2006 12:06, Michael Kruger wrote: >If you are talking about running an ANOVA on a response or dependent >variable that is binary, you really need to take a statistics course. >ANOVA assumes a normally distributed , interval level variable. A >binary >variale is neither of these things! > >> >> >> >> > > >-- >Michael Kruger >"A True Prince" >Statistical Analyst >C.S. Mott Center >Dept. of OB/GYN >Wayne State University School of Medicine >(313)-577-1794 Hello, I would like to ask if you are aware of any problems (violated assumptions) if you run an ANOVA on binary data (e.g. 0 / 1 coded for answer "no" "yes" or "patient not infected" "patient infected"). How severe are those violations? Would you consider running an ANOVA in this case "common practice" or not recommendable? Does anybody happen to know where this aspect is discussed in literature? Thanks a lot. I really appreciate your help. Sincerely, Robinson Aschoff I hope this hasn`t been asked before a alot. I didn`t found it in the archive though. ---------------------------------------------------------------- Felix-Robinson Aschoff Information Management Research Group Department of Informatics University of Zurich Binzmuehlestrasse 14 CH-8050 Zurich, Switzerland E-Mail: [hidden email] Phone: +41 (0)44 635 6690 Fax: +41 (0)44 635 6809 Room: 2.D.11 http://www.ifi.unizh.ch/im |
In reply to this post by Ken Chui
Ken,
Yes and no. The problem posted by Robinson was about ANOVA, not about regression. ANOVA on binary variables is an unorthodox but feasible procedure, because ANOVA analyzes the MEAN of a variable in different cells defined by combinations of categories, en the mean of a binary is simple the proportion with the value 1 (assuming the other value is 0). The unorthodoxy comes from the fact that residuals are not normal. Now, the explanation by Ken regarding regression in econometrics is not entirely convincing. A regression using a binary (say, coded 1 and 0) as dependent variable is likely to predict values outside the (1,0) range. There are (in econometrics) certain models like "linear probability" forcing linear regression to produce predictions within a restricted range of values (a good text is G. S. Maddala, Limited Dependent and Qualitative Variables in Econometrics, Cambridge U.Press, 1983), but they are not widely used, and logistic regression is by far a preferred option in such cases, in order to predict the probability of the value 1 to occur. Only when theory dictates that the underlying function is linear and not logistic, the linear probability model should be used, and that is seldom, if ever, the case. By the way, one the original models for continuous latent variables assumed linear tracelines for dichotomous items (see Paul Lazarsfeld and Neil Henry, Latent Structure Analysis, NY, Houghton Mifflin, 1968), but only as a rough first approximation soon discarded in favor of more appropriate functions. Medical studies may have small or large samples. Many medical studies have tens of thousands of cases, but even so certain methods are not advisable because of the nature of the data and methods involved. By the same token you may have a perfectly legitimate method with adequate data but giving large errors because the sample is too small. These are two completely different problems, and sample size is just not pertinent here. Hector -----Mensaje original----- De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Ken Chui Enviado el: Monday, August 21, 2006 10:40 PM Para: [hidden email] Asunto: Re: running ANOVA on binary data? Strictly speaking no. Although in econometric analyses there are linear regression using binary variable as outcome (I think it's called linear probability model), those kind of analyses tend to have very big sample size and the stand error wouldn't be outrageous. The problem for medical studies is limited sample size, and if you feed binary variable into linear model, you may end up with predicted result like less then zero or bigger than one, that would be difficult to explain. A better approach would be binary logistic regression. Changing the outcome to log(odds) takes care of that 1/0 domain problem. -Ken >Hello, > >I would like to ask if you are aware of any problems (violated >assumptions) if you run an ANOVA on binary data (e.g. 0 / 1 coded for >answer "no" "yes" or "patient not infected" "patient infected"). How >severe are those violations? Would you consider running an ANOVA in this >case "common practice" or not recommendable? Does anybody happen to know >where this aspect is discussed in literature? > >Thanks a lot. I really appreciate your help. > >Sincerely, >Robinson Aschoff > >I hope this hasn`t been asked before a alot. I didn`t found it in the >archive though. >---------------------------------------------------------------- >Felix-Robinson Aschoff >Information Management Research Group >Department of Informatics >University of Zurich >Binzmuehlestrasse 14 >CH-8050 Zurich, Switzerland > >E-Mail: [hidden email] >Phone: +41 (0)44 635 6690 >Fax: +41 (0)44 635 6809 >Room: 2.D.11 >http://www.ifi.unizh.ch/im |
In reply to this post by Robinson Aschoff
And your response, Kevin, is in a much more appropriate tone and more
helpful than the previous response to Robinson's query. Patricia -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Kevin Bladon Sent: Wednesday, 23 August 2006 1:01 AM To: [hidden email] Subject: Re: running ANOVA on binary data? Robinson, I agree, using ANOVA for a dichotomous response variable violates the assumptions of homogeneity of variances and normally distributed errors. You should definitely look into using logistic regression for this type of analysis. Regards, Kevin Bladon, Ph.D., A.Ag. Resource Analyst Silvacom Ltd. 3825 - 93 Street Edmonton, AB T6E 5K5 Phone: 780.462.3238 Fax: 780.462.4726 E-mail: [hidden email] www.silvacom.com "Make every obstacle an opportunity." - Lance Armstrong Cancer survivor and 7 time Tour de France champion (1999-2005) On Aug 21, 2006 12:06, Michael Kruger wrote: >If you are talking about running an ANOVA on a response or dependent >variable that is binary, you really need to take a statistics course. >ANOVA assumes a normally distributed , interval level variable. A >binary >variale is neither of these things! > >> >> >> >> > > >-- >Michael Kruger >"A True Prince" >Statistical Analyst >C.S. Mott Center >Dept. of OB/GYN >Wayne State University School of Medicine >(313)-577-1794 Hello, I would like to ask if you are aware of any problems (violated assumptions) if you run an ANOVA on binary data (e.g. 0 / 1 coded for answer "no" "yes" or "patient not infected" "patient infected"). How severe are those violations? Would you consider running an ANOVA in this case "common practice" or not recommendable? Does anybody happen to know where this aspect is discussed in literature? Thanks a lot. I really appreciate your help. Sincerely, Robinson Aschoff I hope this hasn`t been asked before a alot. I didn`t found it in the archive though. ---------------------------------------------------------------- Felix-Robinson Aschoff Information Management Research Group Department of Informatics University of Zurich Binzmuehlestrasse 14 CH-8050 Zurich, Switzerland E-Mail: [hidden email] Phone: +41 (0)44 635 6690 Fax: +41 (0)44 635 6809 Room: 2.D.11 http://www.ifi.unizh.ch/im |
Robinson,
In general, the difficulty is that ANOVA is not uniformly justified in such situations. This is because dichotomous data have a greater probability of violating seriously the ANOVA assumptions than do strictly numeric data. While the comments from Kevin and Michael about the inadvisability of ANOVA with dichotomous variables are correct, the real issue may not be whether the assumptions are violated, but rather the degree to which they are violated. In general, the major assumptions of an ANOVA model can be tested to the degree to which they are violated in any particular data set. If the violations are relatively minor, the use of ANOVA may be warranted. I would argue that in such a case, the use of ANOVA is justified statistically because the data fit the model. If the use of ANOVA then can be justified from an empirical viewpoint (e.g., makes sense in the context of the particular research and intended use of the outcomes) I would say to use ANOVA. Harley Dr. Harley Baker Associate Professor and Chair, Psychology Program Chief Assessment Officer for Academic Affairs California State University Channel Islands One University Drive Camarillo, CA 93012 805.437.8997 (p) 805.437.8951 (f) [hidden email] > From: Patricia Rego <[hidden email]> > Reply-To: Patricia Rego <[hidden email]> > Newsgroups: bit.listserv.spssx-l > Date: Wed, 23 Aug 2006 11:12:14 +1000 > To: <[hidden email]> > Conversation: running ANOVA on binary data? > Subject: Re: running ANOVA on binary data? > > And your response, Kevin, is in a much more appropriate tone and more > helpful than the previous response to Robinson's query. > Patricia > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of > Kevin Bladon > Sent: Wednesday, 23 August 2006 1:01 AM > To: [hidden email] > Subject: Re: running ANOVA on binary data? > > Robinson, > > I agree, using ANOVA for a dichotomous response variable violates the > assumptions of homogeneity of variances and normally distributed > errors. You should definitely look into using logistic regression for > this type of analysis. > > Regards, > Kevin Bladon, Ph.D., A.Ag. > Resource Analyst > Silvacom Ltd. > 3825 - 93 Street > Edmonton, AB > T6E 5K5 > Phone: 780.462.3238 > Fax: 780.462.4726 > E-mail: [hidden email] > www.silvacom.com > > "Make every obstacle an opportunity." > - Lance Armstrong > Cancer survivor and 7 time Tour de > France champion (1999-2005) > > > On Aug 21, 2006 12:06, Michael Kruger wrote: > >> If you are talking about running an ANOVA on a response or dependent >> variable that is binary, you really need to take a statistics course. >> ANOVA assumes a normally distributed , interval level variable. A >> binary >> variale is neither of these things! >> >>> >>> >>> >>> >> >> >> -- >> Michael Kruger >> "A True Prince" >> Statistical Analyst >> C.S. Mott Center >> Dept. of OB/GYN >> Wayne State University School of Medicine >> (313)-577-1794 > > Hello, > > I would like to ask if you are aware of any problems (violated > assumptions) if you run an ANOVA on binary data (e.g. 0 / 1 coded for > answer "no" "yes" or "patient not infected" "patient infected"). How > severe are those violations? Would you consider running an ANOVA in this > case "common practice" or not recommendable? Does anybody happen to know > where this aspect is discussed in literature? > > Thanks a lot. I really appreciate your help. > > Sincerely, > Robinson Aschoff > > I hope this hasn`t been asked before a alot. I didn`t found it in the > archive though. > ---------------------------------------------------------------- > Felix-Robinson Aschoff > Information Management Research Group > Department of Informatics > University of Zurich > Binzmuehlestrasse 14 > CH-8050 Zurich, Switzerland > > E-Mail: [hidden email] > Phone: +41 (0)44 635 6690 > Fax: +41 (0)44 635 6809 > Room: 2.D.11 > http://www.ifi.unizh.ch/im |
Free forum by Nabble | Edit this page |