Binary logistic regression - poor models

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Binary logistic regression - poor models

dfva
Hi all!

I am fairly novice in SPSS binary regression. I would like to carry out an analysis on the following topic:
- I want to predict a medical event (this is the heart failure)
I have the following predictors: age, gender, the existence of 12 different diseases for each patient (e.g diabetes mellitus) and cumulative doses for 42 drugs - so each patient can I describe with 58 variables. The existence variables have 'yes' and 'no' values, the cumulative drugs are variables with continuous values.

I know that age has effect on the dependent variable, and probably some other variables has also affect on the dependent variable, but it is unknown.

I would like to determine the effect of each variable.

I tried to put all variables in the covariates box, but it gave me very poor model.
Then I tried it in 3 step: in the first step age and gender, and in the second step the diseases, and in the third step the drugs. It resulted in very poor model as well.
Then I though I will test only one drog, and I put in the first step age and gender, and in the second step only one drug. The model was again very poor.
In all cases Rˇ2 values are under 0,1.  

In my sample the occurrence is only 10,1 percent.
If I make ROC analysis it shows very bad curve - fast straight across.

Could somebody help me, how can I solve this problem? Is binary logistic regression a good method for this problem? Even if the model is very poor??? Can I evaluate the p values for poor models or not???

Thank you!
Agnes
Reply | Threaded
Open this post in threaded view
|

Re: Binary logistic regression - poor models

Bruce Weaver
Administrator
Hello Agnes.  What is your sample size?  I ask, because a rule of thumb for logistic regression is that you should have 10-15 "events-per-variable" (EPV), where an event is the outcome category with the lower frequency.  (In your data, event = heart failure.)  See Mike Babyak's nice article on over-fitting for more info.

   http://people.duke.edu/~mababyak/papers/babyakregression.pdf

HTH.

dfva wrote
Hi all!

I am fairly novice in SPSS binary regression. I would like to carry out an analysis on the following topic:
- I want to predict a medical event (this is the heart failure)
I have the following predictors: age, gender, the existence of 12 different diseases for each patient (e.g diabetes mellitus) and cumulative doses for 42 drugs - so each patient can I describe with 58 variables. The existence variables have 'yes' and 'no' values, the cumulative drugs are variables with continuous values.

I know that age has effect on the dependent variable, and probably some other variables has also affect on the dependent variable, but it is unknown.

I would like to determine the effect of each variable.

I tried to put all variables in the covariates box, but it gave me very poor model.
Then I tried it in 3 step: in the first step age and gender, and in the second step the diseases, and in the third step the drugs. It resulted in very poor model as well.
Then I though I will test only one drog, and I put in the first step age and gender, and in the second step only one drug. The model was again very poor.
In all cases Rˇ2 values are under 0,1.  

In my sample the occurrence is only 10,1 percent.
If I make ROC analysis it shows very bad curve - fast straight across.

Could somebody help me, how can I solve this problem? Is binary logistic regression a good method for this problem? Even if the model is very poor??? Can I evaluate the p values for poor models or not???

Thank you!
Agnes
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Binary logistic regression - poor models

dfva
Hi HTH!

I have 25 065 samples (patients), and in these samples I have about 2300 patients with heart failure.
I thought it must have been big enough sample set.

Should I check for each variable that in each stratum there are minimum 15 patients with heart failure?
I mean e.g:
- check that among patients having diabetes mellitus is minimum 15 patients with heart failure - and in the same way for other diseases?
- and check for each drug, that among patients were administered for this drug are also minimum 15 patients with heart failure - and in the same way for all drugs?


Thank you:
Agnes
Reply | Threaded
Open this post in threaded view
|

Re: Binary logistic regression - poor models

David Marso
Administrator
In reply to this post by dfva
I don't think it wise to toss everything into one big stew!
People probably form clusters WRT disease constellations and not all drugs are applicable for all diseases.
Peel back the onion Grasshopper.  Throw a little bit of theory into your GIGO?

dfva wrote
Hi all!

I am fairly novice in SPSS binary regression. I would like to carry out an analysis on the following topic:
- I want to predict a medical event (this is the heart failure)
I have the following predictors: age, gender, the existence of 12 different diseases for each patient (e.g diabetes mellitus) and cumulative doses for 42 drugs - so each patient can I describe with 58 variables. The existence variables have 'yes' and 'no' values, the cumulative drugs are variables with continuous values.

I know that age has effect on the dependent variable, and probably some other variables has also affect on the dependent variable, but it is unknown.

I would like to determine the effect of each variable.

I tried to put all variables in the covariates box, but it gave me very poor model.
Then I tried it in 3 step: in the first step age and gender, and in the second step the diseases, and in the third step the drugs. It resulted in very poor model as well.
Then I though I will test only one drog, and I put in the first step age and gender, and in the second step only one drug. The model was again very poor.
In all cases Rˇ2 values are under 0,1.  

In my sample the occurrence is only 10,1 percent.
If I make ROC analysis it shows very bad curve - fast straight across.

Could somebody help me, how can I solve this problem? Is binary logistic regression a good method for this problem? Even if the model is very poor??? Can I evaluate the p values for poor models or not???

Thank you!
Agnes
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Binary logistic regression - poor models

Rich Ulrich
In reply to this post by Bruce Weaver
Agnes,

Bruce jumped to logistic regression because that is the natural tool for
epidemiology with a (somewhat) rare outcome -- The reason, in particular,
is that the Odds Ratio (OR) is a good measure of effect size for such data,
and the R-squared is a very poor one.  The size of the R-squared depends
rather vitally on how rare the outcome is.  What does "variance" mean in
such cases?  - the meaning of the Odds Ratio is straightforward, by comparison.

If indeed you have the N= 6000 sample so that Events= 600 or so, then the
R-squared of 0.1  would be very highly significant.  If you N is much smaller,
you really /need/  to categorize your hypotheses into a few that you hope to
confirm, and a larger number that are regarded as exploratory.  That is probably
a good thing to do for various other reasons, even with large N.  The "multiplicity
of tests" creates a logical problem for which there is no other solution than using
prior information, one way or another.

--
Rich Ulrich

> Date: Mon, 25 Apr 2016 15:19:30 -0700

> From: [hidden email]
> Subject: Re: Binary logistic regression - poor models
> To: [hidden email]
>
> Hello Agnes. What is your sample size? I ask, because a rule of thumb for
> logistic regression is that you should have 10-15 "events-per-variable"
> (EPV), where an event is the outcome category with the lower frequency. (In
> your data, event = heart failure.) See Mike Babyak's nice article on
> over-fitting for more info.
>
> http://people.duke.edu/~mababyak/papers/babyakregression.pdf
>
> HTH.
>
>
> dfva wrote
> > Hi all!
> >
> > I am fairly novice in SPSS binary regression. I would like to carry out an
> > analysis on the following topic:
> > - I want to predict a medical event (this is the heart failure)
> > I have the following predictors: age, gender, the existence of 12
> > different diseases for each patient (e.g diabetes mellitus) and cumulative
> > doses for 42 drugs - so each patient can I describe with 58 variables. The
> > existence variables have 'yes' and 'no' values, the cumulative drugs are
> > variables with continuous values.
> >
> > I know that age has effect on the dependent variable, and probably some
> > other variables has also affect on the dependent variable, but it is
> > unknown.
> >
> > I would like to determine the effect of each variable.
> >
> > I tried to put all variables in the covariates box, but it gave me very
> > poor model.
> > Then I tried it in 3 step: in the first step age and gender, and in the
> > second step the diseases, and in the third step the drugs. It resulted in
> > very poor model as well.
> > Then I though I will test only one drog, and I put in the first step age
> > and gender, and in the second step only one drug. The model was again very
> > poor.
> > In all cases Rˇ2 values are under 0,1.
> >
> > In my sample the occurrence is only 10,1 percent.
> > If I make ROC analysis it shows very bad curve - fast straight across.
> >
> > Could somebody help me, how can I solve this problem? Is binary logistic
> > regression a good method for this problem? Even if the model is very
> > poor??? Can I evaluate the p values for poor models or not???
> >
> > Thank you!
> > Agnes
>
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Binary logistic regression - poor models

Bruce Weaver
Administrator
In reply to this post by dfva
I don't have time to respond to this right now, but am replying so that Agnes' message below gets distributed to the mailing list.  (Nabble shows that none of her messages have actually been posted to the mailing list.)  Perhaps Rich or someone else will have time to jump in before I do.

dfva wrote
Hi HTH!

I have 25 065 samples (patients), and in these samples I have about 2300 patients with heart failure.
I thought it must have been big enough sample set.

Should I check for each variable that in each stratum there are minimum 15 patients with heart failure?
I mean e.g:
- check that among patients having diabetes mellitus is minimum 15 patients with heart failure - and in the same way for other diseases?
- and check for each drug, that among patients were administered for this drug are also minimum 15 patients with heart failure - and in the same way for all drugs?


Thank you:
Agnes
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Binary logistic regression - poor models

Rich Ulrich
Agnes,
Basically, "no".  The rule of thumb is applied to the overall count, so
you should be okay there.

On the other hand, if you have any unusual distributions (say, some drug
that shows up only 10 times in all), then you should always remain aware
that Maximum Likelihood solutions sometimes do not like 0's.  (So, if something
goes obviously wrong, such variables might be dropped to see if that fixes the
problem -- I might try that even before looking at the cell counts, if I had not
looked at all the two-way tabulations at the start, just to make sure that my
dataset had no grossly apparent errors.)

--
Rich Ulrich


> Date: Tue, 26 Apr 2016 07:26:38 -0700

> From: [hidden email]
> Subject: Re: Binary logistic regression - poor models
> To: [hidden email]
>
> I don't have time to respond to this right now, but am replying so that
> Agnes' message below gets distributed to the mailing list. (Nabble shows
> that none of her messages have actually been posted to the mailing list.)
> Perhaps Rich or someone else will have time to jump in before I do.
>
>
> dfva wrote
> > Hi HTH!
> >
> > I have 25 065 samples (patients), and in these samples I have about 2300
> > patients with heart failure.
> > I thought it must have been big enough sample set.
> >
> > Should I check for each variable that in each stratum there are minimum 15
> > patients with heart failure?
> > I mean e.g:
> > - check that among patients having diabetes mellitus is minimum 15
> > patients with heart failure - and in the same way for other diseases?
> > - and check for each drug, that among patients were administered for this
> > drug are also minimum 15 patients with heart failure - and in the same way
> > for all drugs?
> >
> >
> > Thank you:
> > Agnes
>
>
>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Binary-logistic-regression-poor-models-tp5732023p5732039.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Binary logistic regression - poor models

Maguin, Eugene
In reply to this post by dfva
Even though your question is about logistic regression, I think you'd be having the same sort of questions if you were doing an ordinary multiple regression analysis. My opinion is that you and whoever else is working on this do not have an analysis plan.
Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of dfva
Sent: Monday, April 25, 2016 6:11 PM
To: [hidden email]
Subject: Binary logistic regression - poor models

Hi all!

I am fairly novice in SPSS binary regression. I would like to carry out an analysis on the following topic:
- I want to predict a medical event (this is the heart failure) I have the following predictors: age, gender, the existence of 12 different diseases for each patient (e.g diabetes mellitus) and cumulative doses for
42 drugs - so each patient can I describe with 58 variables. The existence variables have 'yes' and 'no' values, the cumulative drugs are variables with continuous values.

I know that age has effect on the dependent variable, and probably some other variables has also affect on the dependent variable, but it is unknown.

I would like to determine the effect of each variable.

I tried to put all variables in the covariates box, but it gave me very poor model.
Then I tried it in 3 step: in the first step age and gender, and in the second step the diseases, and in the third step the drugs. It resulted in very poor model as well.
Then I though I will test only one drog, and I put in the first step age and gender, and in the second step only one drug. The model was again very poor.
In all cases Rˇ2 values are under 0,1.  

In my sample the occurrence is only 10,1 percent.
If I make ROC analysis it shows very bad curve - fast straight across.

Could somebody help me, how can I solve this problem? Is binary logistic regression a good method for this problem? Even if the model is very poor???
Can I evaluate the p values for poor models or not???

Thank you!
Agnes



--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Binary-logistic-regression-poor-models-tp5732023.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Binary logistic regression - poor models

dfva
In reply to this post by Rich Ulrich
Dear all!

Thank you for drawing attention to the unusual distribution. Among the 42 drugs were a number of medications which were administered only a few patients. I have counted the number of patients for each drug. With the exception of those drugs, which did not receive minimum of 100 patients, the remaining drugs and related number of patients are as follows:
dr1 851
dr2 9234
dr3 128
dr4 827
dr5 5439
dr6 16846
dr7 4502
dr8 338
dr9 803
dr10 246
dr11 11522
dr12 3622
dr13 296
dr14 7972
dr15 814
dr16 4787
dr17 212
dr18 2688
dr19 4607
dr20 571
dr21 816
dr22 2243
dr23 3012
dr24 570

It hase a high variance. The number of the total population is only decreased with 8 patients.

My goal is to analyze the impact of drugs. I dont know if the logistic regression is the right method or not. I thought, if I calculate the OR for each drug I can establish a rankig between them and I can characterize their effects with the ORs. But the resulted model has very low sensitivity, perhaps because of the few cases of heart failure and the lots of variables. The calculated models have R-square about 0,022. It can be, that the sample is too complex for the logistic regression?

Prior knowledge:
We have relatively sparse prior knowledge about the effects of the drugs to be analyzed. In the literature we have only found detailed information about 3 drugs. So far we have analyzed only one of them. In this case: according to the literature we can establish a threshod cumulative dose. Under this cumulative dose it is associated with heart failure at very low incidence, and over this dose the incedence of heart failure increases exponentionally. I have made for this analysis chi-square test, and I evaluated the change of p value. But in this case we hade prior knowledge and I had made several runs of chi square test in the range that included the treshold value.
In other cases: it can be that the effect of the drug is independent from the dose. We dont know it.
The previously mentioned iterative calculation of the p values took a lots of time, but naturally I can do it for all drugs if you suggest this for me. The problem is: the inteval of the cumulative dose for each drug has a very wide range and the distribution of the patients for the different doses is very variable.

Furthermore I though, when I analyse the effect of drugs in this way, I can not consider the effect of other drugs, and variables (age, gender, ...) into account. We know, that the age has high impact on the outcome.
I think to create stata for each group of ages in case of one drug is not complicate, but I can not create strata for all drugs and age and gender together (and for all deases), because then every strata will contains only a few patients.

So I want to consider (if it is possible) the effect of other variables as well, and so I got to the logistic regression. But it gives me very poor results, or I think that it is very poor. It can also be that not the logistic regression is the key solution. Therefore, I ask for help.  

Agnes    
Reply | Threaded
Open this post in threaded view
|

Re: Binary logistic regression - poor models

Rich Ulrich
Okay, here is some help.  But why isn't a serious, experienced statistician
associated with the data before now?

First, you did not heed my advice from before: R-squared is /not/ a useful
measure of effect size for epidemiology studies with rare events or rare
predictors.  But you use that to justify calling your results "poor."  Not good.
If the R-squared were to be large, tens of thousands of cases would be an
excessive design.  When your N is that large, you look at something else.

Before you say that the modeling is poor:  What are the Odds Ratios?

Age and sex and diabetes are obviously, seriously related to outcome.
Accounting for them can make testing more precise.  However, if they
are also associated with the other predictors, they will "confound" the
testing - Depending on the direction of correlations, they (often) wipe
out an apparent univariate effect or they can strengthen it.

How related are these strong, a-prior factors to other predictors?

Multiple a-priori predictors, ones that are not central to the hypotheses, are
sometimes combined to make a single "propensity score" as a covariate.
I think I would be tempted, for these data, to start with a set of simple
analyses that used the propensity score as a covariate and computed the OR
(or something) for each other predictor.


To have 10% mortality, you have years of follow-up at somewhat-elevated
ages.  Some version of life-table could conceivably be appropriate, depending
on how the data were collected.  Does everyone enter at data set at the same
time, or at the same age?  What does the duration of follow-up depend on?

Hope this helps.

--
Rich Ulrich




> Date: Tue, 26 Apr 2016 14:15:02 -0700

> From: [hidden email]
> Subject: Re: Binary logistic regression - poor models
> To: [hidden email]
>
> Dear all!
>
> Thank you for drawing attention to the unusual distribution. Among the 42
> drugs were a number of medications which were administered only a few
> patients. I have counted the number of patients for each drug. With the
> exception of those drugs, which did not receive minimum of 100 patients, the
> remaining drugs and related number of patients are as follows:
> dr1 851
> dr2 9234
> dr3 128
> dr4 827
> dr5 5439
> dr6 16846
> dr7 4502
> dr8 338
> dr9 803
> dr10 246
> dr11 11522
> dr12 3622
> dr13 296
> dr14 7972
> dr15 814
> dr16 4787
> dr17 212
> dr18 2688
> dr19 4607
> dr20 571
> dr21 816
> dr22 2243
> dr23 3012
> dr24 570
>
> It hase a high variance. The number of the total population is only
> decreased with 8 patients.
>
> My goal is to analyze the impact of drugs. I dont know if the logistic
> regression is the right method or not. I thought, if I calculate the OR for
> each drug I can establish a rankig between them and I can characterize their
> effects with the ORs. But the resulted model has very low sensitivity,
> perhaps because of the few cases of heart failure and the lots of variables.
> The calculated models have R-square about 0,022. It can be, that the sample
> is too complex for the logistic regression?
>
> Prior knowledge:
> We have relatively sparse prior knowledge about the effects of the drugs to
> be analyzed. In the literature we have only found detailed information about
> 3 drugs. So far we have analyzed only one of them. In this case: according
> to the literature we can establish a threshod cumulative dose. Under this
> cumulative dose it is associated with heart failure at very low incidence,
> and over this dose the incedence of heart failure increases exponentionally.
> I have made for this analysis chi-square test, and I evaluated the change of
> p value. But in this case we hade prior knowledge and I had made several
> runs of chi square test in the range that included the treshold value.
> In other cases: it can be that the effect of the drug is independent from
> the dose. We dont know it.
> The previously mentioned iterative calculation of the p values took a lots
> of time, but naturally I can do it for all drugs if you suggest this for me.
> The problem is: the inteval of the cumulative dose for each drug has a very
> wide range and the distribution of the patients for the different doses is
> very variable.
>
> Furthermore I though, when I analyse the effect of drugs in this way, I can
> not consider the effect of other drugs, and variables (age, gender, ...)
> into account. We know, that the age has high impact on the outcome.
> I think to create stata for each group of ages in case of one drug is not
> complicate, but I can not create strata for all drugs and age and gender
> together (and for all deases), because then every strata will contains only
> a few patients.
>
> So I want to consider (if it is possible) the effect of other variables as
> well, and so I got to the logistic regression. But it gives me very poor
> results, or I think that it is very poor. It can also be that not the
> logistic regression is the key solution. Therefore, I ask for help.
>
> Agnes
>
>
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Binary logistic regression - poor models

dfva
Hi Rich Ulrich!

Sorry, I couldn't reply in the last week. Thank you for your help.

I understand, that R-square is not suitable quantity measure in this complex case.

The database cover 10 years follow-up of patients. The age of patients is very variable, from 30 up to 90+. I have calculated the OR for age, and it is 1.044. The OR for gender was also calculated. But in these cases, I did not take into account other factors. We see, that age and gender are surely the most important and basic predictors.

I have calculated the covariate matrix for the comorbidities as well, but this matrix does not show very high values. The highest value is 0.344.

In the last days, I have calculated a lot of basic computations. Now I see that I should probably divide the whole problem into smaller subproblems. We selected a drug, and we have made a selection for patients who received this medication. Showing the selected drug we see a dose dependence.  Distributions of ages for the different dose ranges are the same (differ not significantly). I have calculated age groups (for every 5 years) and I made a Kruskal-Wallis test, I hope this was the right choice. It was calculated for gender as well, and it is also OK.

In this subproblem are only a few frequent drug combinations. And here we faced with what you said: "How related are these strong, a-prior factors to other predictors?"  If the most frequent drug combinations are considered (all combinations contain the selected drug), the distribution of the selected drug dose differs in the different drug combinations. The distribution of age is OK. I can calculate for each combination the probability of the heart failure, but I think I should adjust the values to the dose range. So far I got so far. I read now about the confounding variables, and about standardization in SPSS. Is this the right way? I hope it can be made in this software.

Sorry for my basic failures, I learn this discipline now (I'm working in the area IT and data mining, but I never made so deep medical analysis until now. Unfortunately, my colleagues are not familiar with this area...)
I need to think about what you said: combining more predictors into a "propensity score" and use it as a covariate... I try it to interpret...

Thank you for your help!
Agnes
Reply | Threaded
Open this post in threaded view
|

Re: Binary logistic regression - poor models

Norberto Hernandez
dfva: If you have longitudinal data of the patients, did you consider the use of survival analysis (Cox Regression)?

2016-05-02 17:40 GMT-05:00 dfva <[hidden email]>:
Hi Rich Ulrich!

Sorry, I couldn't reply in the last week. Thank you for your help.

I understand, that R-square is not suitable quantity measure in this complex
case.

The database cover 10 years follow-up of patients. The age of patients is
very variable, from 30 up to 90+. I have calculated the OR for age, and it
is 1.044. The OR for gender was also calculated. But in these cases, I did
not take into account other factors. We see, that age and gender are surely
the most important and basic predictors.

I have calculated the covariate matrix for the comorbidities as well, but
this matrix does not show very high values. The highest value is 0.344.

In the last days, I have calculated a lot of basic computations. Now I see
that I should probably divide the whole problem into smaller subproblems. We
selected a drug, and we have made a selection for patients who received this
medication. Showing the selected drug we see a dose dependence.
Distributions of ages for the different dose ranges are the same (differ not
significantly). I have calculated age groups (for every 5 years) and I made
a Kruskal-Wallis test, I hope this was the right choice. It was calculated
for gender as well, and it is also OK.

In this subproblem are only a few frequent drug combinations. And here we
faced with what you said: "How related are these strong, a-prior factors to
other predictors?"  If the most frequent drug combinations are considered
(all combinations contain the selected drug), the distribution of the
selected drug dose differs in the different drug combinations. The
distribution of age is OK. I can calculate for each combination the
probability of the heart failure, but I think I should adjust the values to
the dose range. So far I got so far. I read now about the confounding
variables, and about standardization in SPSS. Is this the right way? I hope
it can be made in this software.

Sorry for my basic failures, I learn this discipline now (I'm working in the
area IT and data mining, but I never made so deep medical analysis until
now. Unfortunately, my colleagues are not familiar with this area...)
I need to think about what you said: combining more predictors into a
"propensity score" and use it as a covariate... I try it to interpret...

Thank you for your help!
Agnes




--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Binary-logistic-regression-poor-models-tp5732023p5732082.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Binary logistic regression - poor models

dfva
Dear Norberto,

Thank you! Not yet, but I will do it. Until now we have only made basic analysis. When I do it, I will back.

Thank you!
Agnes