Login  Register

Re: Binary logistic regression - poor models

Posted by Rich Ulrich on Apr 26, 2016; 6:02am
URL: http://spssx-discussion.165.s1.nabble.com/Binary-logistic-regression-poor-models-tp5732023p5732028.html

Agnes,

Bruce jumped to logistic regression because that is the natural tool for
epidemiology with a (somewhat) rare outcome -- The reason, in particular,
is that the Odds Ratio (OR) is a good measure of effect size for such data,
and the R-squared is a very poor one.  The size of the R-squared depends
rather vitally on how rare the outcome is.  What does "variance" mean in
such cases?  - the meaning of the Odds Ratio is straightforward, by comparison.

If indeed you have the N= 6000 sample so that Events= 600 or so, then the
R-squared of 0.1  would be very highly significant.  If you N is much smaller,
you really /need/  to categorize your hypotheses into a few that you hope to
confirm, and a larger number that are regarded as exploratory.  That is probably
a good thing to do for various other reasons, even with large N.  The "multiplicity
of tests" creates a logical problem for which there is no other solution than using
prior information, one way or another.

--
Rich Ulrich

> Date: Mon, 25 Apr 2016 15:19:30 -0700

> From: [hidden email]
> Subject: Re: Binary logistic regression - poor models
> To: [hidden email]
>
> Hello Agnes. What is your sample size? I ask, because a rule of thumb for
> logistic regression is that you should have 10-15 "events-per-variable"
> (EPV), where an event is the outcome category with the lower frequency. (In
> your data, event = heart failure.) See Mike Babyak's nice article on
> over-fitting for more info.
>
> http://people.duke.edu/~mababyak/papers/babyakregression.pdf
>
> HTH.
>
>
> dfva wrote
> > Hi all!
> >
> > I am fairly novice in SPSS binary regression. I would like to carry out an
> > analysis on the following topic:
> > - I want to predict a medical event (this is the heart failure)
> > I have the following predictors: age, gender, the existence of 12
> > different diseases for each patient (e.g diabetes mellitus) and cumulative
> > doses for 42 drugs - so each patient can I describe with 58 variables. The
> > existence variables have 'yes' and 'no' values, the cumulative drugs are
> > variables with continuous values.
> >
> > I know that age has effect on the dependent variable, and probably some
> > other variables has also affect on the dependent variable, but it is
> > unknown.
> >
> > I would like to determine the effect of each variable.
> >
> > I tried to put all variables in the covariates box, but it gave me very
> > poor model.
> > Then I tried it in 3 step: in the first step age and gender, and in the
> > second step the diseases, and in the third step the drugs. It resulted in
> > very poor model as well.
> > Then I though I will test only one drog, and I put in the first step age
> > and gender, and in the second step only one drug. The model was again very
> > poor.
> > In all cases Rˇ2 values are under 0,1.
> >
> > In my sample the occurrence is only 10,1 percent.
> > If I make ROC analysis it shows very bad curve - fast straight across.
> >
> > Could somebody help me, how can I solve this problem? Is binary logistic
> > regression a good method for this problem? Even if the model is very
> > poor??? Can I evaluate the p values for poor models or not???
> >
> > Thank you!
> > Agnes
>
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD