Model building (was: "Re: Bootstrapping ROC AUC")

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Model building (was: "Re: Bootstrapping ROC AUC")

Marta Garcia-Granero
TO THE LIST, TO THE LIST (please)

Any discussion should be open to everyone in this list, some people might benefit from the information, and other people might contribute with better ideas (and more knowledge).

May I use this chance ask your advice on how to deal with categorical variables in a regression model? For example, we want to develop a simple additive model for clinical practice. In this study, a categorical variable is coded with four levels and univariate analysis shows that its overall effect to the outcome is significant, except for the third level (to the reference level) (p=0.93). We entered only two dummy variables with 0 and 1 to represent level 2 and 4 of the categorical variable into the multivariate analyses. Is it appropriate?

NO. The interpretation of the other dummy variables is modified when one of the dummies is omitted. Besides, yours is a wrong approach, the idea that only significant variables in univariate analysis should be considered for multivariate analysis is absolutely flawed.

If univariate analysis shows all the levels (>2) (to the reference level) are significant, is it appropriate to enter the categorical variable all together into the multivariate analyses? e.g. by directly entering “/CONTRAST (LVFunctionN)=Indicator(1)” in SPSS. 
Since LOGISTIC REGRESSION offers to automatically dummy code categorical variables (as opposed to LINEAR REGRESSION, which doesn't), use that option, don't dummy code them manually.


We just want to provide an unbiased estimate for the ROC statistic by bootstrip re-sampling (100 samples, each sample the size of the entire dataset,

were drawn at random with replacement). Would it be better than asymptotic methods for the SE(AUC) that you mentioned in part B) in your last

email? Or is it not necessary to use bootstrip resampling in model validation?

Bootstrapping will not validate your model (nor the ROC procedure). They are just  a way of computing a statistic (with its 95% CI) that logistic regression (at least, in SPSS), doesn't give to you: the AUC. The only way to validate a model is testing it against a different dataset than the one used for model derivation. Since you seem to have quite a big sample size (over 7000), you should consider splitting in two, using one part to derive your model, and the second to validate it. Red this paper: "ANGIE WADE  Derivation versus validation , Arch Dis Child 2000;83:459–460"     

 

I have not managed to run MACRO in SPSS, it seems quite difficult. I will read the SPSS manual.


Just select everything from DEFINE to !ENDDEFINE and run it once. Then go to the line: BOOTROC blah, blah blah (set correctly the arguments) and run that single line.

In a previous message, I wrote::


I see that you are using a stepwise method to get the model. Please, search the archived messages, this topic has been discussed several times and the general idea is that this is a VERY BAD way of building a model (at least, you should not use FSTEP(WALD), but FSTEP(LR), not so awful). See Scott Millis' answer to a message named "Multiple Regression with Continuous and Categorical Variables" (july 24th) for a good collection of reasons for not using stepwise methods. ......(but please, build the model in a more sensible way than the one you are using, see my July 24th reply to the same message named "Multiple Regression with Continuous and Categorical Variables" for some guidelines in model building strategies, extracted from Hosmer&Lemshow book on Logistic Regression) .

I insist that you do that before attempting (perpetrating?)  any serious model building. Get a good multivariate analysis book and learn the concepts of: confounding, interaction (effect modification), suppressor variables, adjusting, descriptive vs predictive models... There's a lot to be learned before laying your hands on the mouse and start clicking in SPSS.


Regards,
Marta García-Granero
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD