|
Sorry, Just a quick PS that I should have included just now.
It's 10 per factor on the outcome that is less frequent.
bw,
Martin Holt
----- Forwarded Message ---- From: M HOLT <[hidden email]> To: "Allan Lundy, PhD" <[hidden email]>; [hidden email] Sent: Sunday, 13 June, 2010 11:29:05 Subject: Re: Logistic Regression fails with empty cell Hi Allan,
Remember that it's the **expected** counts that matter, rather than the actual counts.
The following link is excellent, taking you into and through and out the other side on 2x2 tables. It expands on the methods section published in the paper: Campbell Ian, 2007, Chi-squared and Fisher-Irwin tests of two-by-two tables with small sample recommendations, Statistics in Medicine, 26, 3661 - 3675.
In a logistic regression it is common to accept "more than 10" per factor in the analysis, yet some, including me, prefer "more than 15". Peduzzi et al ran simulation studies and settled on 10:
Michael A. Babyak. What You See May Not Be What You Get: A Brief,
Nontechnical Introduction to Overfitting in Regression-Type Models. Psychosom Med 2004 66: 411-421. and Peduzzi P, Concato J, Kemper E, Holford TR, Feinstein AR. . A simulation study of the number of events per variable in logistic regression analysis.J Clin Epidemiol. 1996 Dec;49(12):1373-9. I'd concentrate on Ian Campbell's papers and you'll find an answer....but you might not like it :(
Best Wishes,
Martin Holt
From: "Allan Lundy, PhD" <[hidden email]> To: [hidden email] Sent: Saturday, 12 June, 2010 22:40:38 Subject: Logistic Regression fails with empty cell Dear Listers, First, thanks to Martin Holt, Ryan Black, and Bruce Weaver for helpful comments on another recent logistic regression question. This one is much more basic, but very surprising (to me, anyway). I have 32 cases, divided into 16 and 16, with a dichotomous outcome. The data look like this: (Group is A or B; outcome is Yes or No) Yes No A 16 0 B 6 10 As you might expect, chi-square is highly significant: 14.5, p< .001. However, using this data in a binomial logistic regression with additional continuous predictor variables yielded weirdly high p values for Group: like p= .996. I eliminated the continuous predictors, so there was just the dichotomous predictor and dichotomous outcome. Results: The classification table showed overall correct classification as 81.3%. But Variable in the equation (Step1) was, for Group: B= 21.7, S.E.= 10048.2 Sig.= .998. Obviously the huge SE was what was making it non-significant. Finally decided the problem had to be the empty cell. I switched one of the outcome values and re-ran Yes No A 15 1 B 6 10 Results, of course, less significant chi-square, but the following for Group: B= 3.2, S.E.= 1.2 Sig.= .005. SPSS Help says, under Data Considerations: However, your solution may be more stable if your predictors have a multivariate normal distribution. Additionally, as with other forms of regression, multicollinearity among the predictors can lead to biased estimates and inflated standard errors. Inflated? I guess so, by about 10,000 times! It would have been nice if this section simply said, "Does not work with an empty cell." Anybody know a way around this problem that won't lose power? Remember, I want to include continuous predictors also. I have not tried it with plain MR, but I don't see why that would be different. Thanks! Allan Lundy, PhD |
| Free forum by Nabble | Edit this page |
