Login  Register

Re: Binary logistic regression - poor models

Posted by Rich Ulrich on Apr 28, 2016; 5:51am
URL: http://spssx-discussion.165.s1.nabble.com/Binary-logistic-regression-poor-models-tp5732023p5732055.html

Okay, here is some help.  But why isn't a serious, experienced statistician
associated with the data before now?

First, you did not heed my advice from before: R-squared is /not/ a useful
measure of effect size for epidemiology studies with rare events or rare
predictors.  But you use that to justify calling your results "poor."  Not good.
If the R-squared were to be large, tens of thousands of cases would be an
excessive design.  When your N is that large, you look at something else.

Before you say that the modeling is poor:  What are the Odds Ratios?

Age and sex and diabetes are obviously, seriously related to outcome.
Accounting for them can make testing more precise.  However, if they
are also associated with the other predictors, they will "confound" the
testing - Depending on the direction of correlations, they (often) wipe
out an apparent univariate effect or they can strengthen it.

How related are these strong, a-prior factors to other predictors?

Multiple a-priori predictors, ones that are not central to the hypotheses, are
sometimes combined to make a single "propensity score" as a covariate.
I think I would be tempted, for these data, to start with a set of simple
analyses that used the propensity score as a covariate and computed the OR
(or something) for each other predictor.


To have 10% mortality, you have years of follow-up at somewhat-elevated
ages.  Some version of life-table could conceivably be appropriate, depending
on how the data were collected.  Does everyone enter at data set at the same
time, or at the same age?  What does the duration of follow-up depend on?

Hope this helps.

--
Rich Ulrich




> Date: Tue, 26 Apr 2016 14:15:02 -0700

> From: [hidden email]
> Subject: Re: Binary logistic regression - poor models
> To: [hidden email]
>
> Dear all!
>
> Thank you for drawing attention to the unusual distribution. Among the 42
> drugs were a number of medications which were administered only a few
> patients. I have counted the number of patients for each drug. With the
> exception of those drugs, which did not receive minimum of 100 patients, the
> remaining drugs and related number of patients are as follows:
> dr1 851
> dr2 9234
> dr3 128
> dr4 827
> dr5 5439
> dr6 16846
> dr7 4502
> dr8 338
> dr9 803
> dr10 246
> dr11 11522
> dr12 3622
> dr13 296
> dr14 7972
> dr15 814
> dr16 4787
> dr17 212
> dr18 2688
> dr19 4607
> dr20 571
> dr21 816
> dr22 2243
> dr23 3012
> dr24 570
>
> It hase a high variance. The number of the total population is only
> decreased with 8 patients.
>
> My goal is to analyze the impact of drugs. I dont know if the logistic
> regression is the right method or not. I thought, if I calculate the OR for
> each drug I can establish a rankig between them and I can characterize their
> effects with the ORs. But the resulted model has very low sensitivity,
> perhaps because of the few cases of heart failure and the lots of variables.
> The calculated models have R-square about 0,022. It can be, that the sample
> is too complex for the logistic regression?
>
> Prior knowledge:
> We have relatively sparse prior knowledge about the effects of the drugs to
> be analyzed. In the literature we have only found detailed information about
> 3 drugs. So far we have analyzed only one of them. In this case: according
> to the literature we can establish a threshod cumulative dose. Under this
> cumulative dose it is associated with heart failure at very low incidence,
> and over this dose the incedence of heart failure increases exponentionally.
> I have made for this analysis chi-square test, and I evaluated the change of
> p value. But in this case we hade prior knowledge and I had made several
> runs of chi square test in the range that included the treshold value.
> In other cases: it can be that the effect of the drug is independent from
> the dose. We dont know it.
> The previously mentioned iterative calculation of the p values took a lots
> of time, but naturally I can do it for all drugs if you suggest this for me.
> The problem is: the inteval of the cumulative dose for each drug has a very
> wide range and the distribution of the patients for the different doses is
> very variable.
>
> Furthermore I though, when I analyse the effect of drugs in this way, I can
> not consider the effect of other drugs, and variables (age, gender, ...)
> into account. We know, that the age has high impact on the outcome.
> I think to create stata for each group of ages in case of one drug is not
> complicate, but I can not create strata for all drugs and age and gender
> together (and for all deases), because then every strata will contains only
> a few patients.
>
> So I want to consider (if it is possible) the effect of other variables as
> well, and so I got to the logistic regression. But it gives me very poor
> results, or I think that it is very poor. It can also be that not the
> logistic regression is the key solution. Therefore, I ask for help.
>
> Agnes
>
>
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD