Login  Register

Re: Binary logistic regression - poor models

Posted by dfva on Apr 26, 2016; 9:15pm
URL: http://spssx-discussion.165.s1.nabble.com/Binary-logistic-regression-poor-models-tp5732023p5732044.html

Dear all!

Thank you for drawing attention to the unusual distribution. Among the 42 drugs were a number of medications which were administered only a few patients. I have counted the number of patients for each drug. With the exception of those drugs, which did not receive minimum of 100 patients, the remaining drugs and related number of patients are as follows:
dr1 851
dr2 9234
dr3 128
dr4 827
dr5 5439
dr6 16846
dr7 4502
dr8 338
dr9 803
dr10 246
dr11 11522
dr12 3622
dr13 296
dr14 7972
dr15 814
dr16 4787
dr17 212
dr18 2688
dr19 4607
dr20 571
dr21 816
dr22 2243
dr23 3012
dr24 570

It hase a high variance. The number of the total population is only decreased with 8 patients.

My goal is to analyze the impact of drugs. I dont know if the logistic regression is the right method or not. I thought, if I calculate the OR for each drug I can establish a rankig between them and I can characterize their effects with the ORs. But the resulted model has very low sensitivity, perhaps because of the few cases of heart failure and the lots of variables. The calculated models have R-square about 0,022. It can be, that the sample is too complex for the logistic regression?

Prior knowledge:
We have relatively sparse prior knowledge about the effects of the drugs to be analyzed. In the literature we have only found detailed information about 3 drugs. So far we have analyzed only one of them. In this case: according to the literature we can establish a threshod cumulative dose. Under this cumulative dose it is associated with heart failure at very low incidence, and over this dose the incedence of heart failure increases exponentionally. I have made for this analysis chi-square test, and I evaluated the change of p value. But in this case we hade prior knowledge and I had made several runs of chi square test in the range that included the treshold value.
In other cases: it can be that the effect of the drug is independent from the dose. We dont know it.
The previously mentioned iterative calculation of the p values took a lots of time, but naturally I can do it for all drugs if you suggest this for me. The problem is: the inteval of the cumulative dose for each drug has a very wide range and the distribution of the patients for the different doses is very variable.

Furthermore I though, when I analyse the effect of drugs in this way, I can not consider the effect of other drugs, and variables (age, gender, ...) into account. We know, that the age has high impact on the outcome.
I think to create stata for each group of ages in case of one drug is not complicate, but I can not create strata for all drugs and age and gender together (and for all deases), because then every strata will contains only a few patients.

So I want to consider (if it is possible) the effect of other variables as well, and so I got to the logistic regression. But it gives me very poor results, or I think that it is very poor. It can also be that not the logistic regression is the key solution. Therefore, I ask for help.  

Agnes