Re: Bootstrapping ROC AUC

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Bootstrapping ROC AUC

Marta Garcia-Granero
[hidden email] escribió:
Many thanks, Marta!
  
You are welcome, but  I'd rather keep the thread open to everyone, not private.
Since I were using stata before, I am quite new to SPSS.
I hope that you would not mind helping me in use your code to fit my purpose.

I want to draw 100 samples
I think that 100 bootstrap samples is too small.
from my data named bootsstripdata.sav(contains 21 variables and N=7475)
Run the following LOGISTIC REGRESSION on the 100 samples, then calculate the
average areas under the ROC curve and the standard deviation of ROC.
  
Then you are NOT bootstraping the AUC, but the logistic regression model itself. What do you really want to do?

A) Get 100 different fitted regression models (different coefficients, pseudo R-square measures, goodness of it... plus 100 different AUC).

I see that you are using a stepwise method to get the model. Please, search the archived messages, this topic has been discussed several times and the general idea is that this is a VERY BAD way of building a model (at least, you should not use FSTEP(WALD), but FSTEP(LR), not so awful). See Scott Millis' answer to a message named "Multiple Regression with Continuous and Categorical Variables" (july 24th) for a good collection of reasons for not using stepwise methods. Besides, in your case, you might end up with different variables,  with different coefficients, being included. What's the use of getting an average AUC for different models?

B) Get one single model and bootstrap its AUC.

The second approach needs very little modification of my macro, the first approach will make it useless (a very different approach should be then used: build a file with the 100 bootstrapped samples, run logistic  regression splitting the file, and use OMS to capture the relevant info and save it to a new dataset, then open the dataset and work with it)

If you want to use the second approach, then this is what you have to do:

1) Run LOGISTIC REGRESSION and save the predicted probabilities (but please, build the model in a more sensible way than the one you are using, see my July 24th reply to the same message named "Multiple Regression with Continuous and Categorical Variables" for some guidelines in model building strategies, extracted from Hosmer&Lemshow book on Logistic Regression) .
A variable called PRE-1 will be added to your dataset

2) Run the MACRO:

BOOTROC PRE_1 StatusatdisN(1) k=100. /* (Assuming that StatusatdisN =1 is the event).

You will get bootstrap estimates for the AUC of the unique model you have built. On second thoughts, why do you want to use bootstrap to get the AUC and its standard error? given the sample size you mention (over 7000), asymptotic methods for the SE(AUC) will be OK. You can use SPSS procedure ROC:

ROC PRE_1 BY StatusatdisN(1)
/PLOT=CURVE(REFERENCE)
/PRINT=SE
/CRITERIA=TESTPOS(LARGE) CI(95).

HTH,
Marta García-Granero

LOGISTIC REGRESSION VARIABLES  StatusatdisN
    /METHOD = FSTEP(WALD) nyha2 nyha3 nyha4  LVFunctionN Angina
  Renaldisease AVRhaemodynamicpathologyN AorticvalveimplanttypeN conCABG
PSurgicalInterventionsN PulmonarydiseaseN DiabetesN Hypertension PVD
Operativepriority af ageg85 sex
/CONTRAST (sex)=Indicator(1)
/CONTRAST (ageg85)=Indicator(1)
  /CONTRAST (af)=Indicator(1)
  /CONTRAST (nyha2)=Indicator(1)
  /CONTRAST (nyha3)=Indicator(1)
  /CONTRAST (nyha4)=Indicator(1)
 /CONTRAST (LVFunctionN)=Indicator(1)  /CONTRAST (Angina)=Indicator(1)
 /CONTRAST (Renaldisease)=Indicator(1)  /CONTRAST (AVRhaemodynamicpathologyN
 )=Indicator(1)  /CONTRAST (AorticvalveimplanttypeN)=Indicator(1)
/CONTRAST (conCABG)=Indicator(1)
 /CONTRAST (PSurgicalInterventionsN)=Indicator(1)  /CONTRAST (PulmonarydiseaseN
 )=Indicator(1)  /CONTRAST (DiabetesN)=Indicator(1)
/CONTRAST (Hypertension)=Indicator(1)
 /CONTRAST (PVD)=Indicator(1)  /CONTRAST (Operativepriority)=Indicator(1)
  /SAVE = PRED PGROUP ZRESID
  /PRINT = GOODFIT  CI(95)
  /CRITERIA = PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .


Haiyan Gao escribió:
  
I am trying to use bootstrap resampling techniques to calculate the
average areas under the ROC curve and the standard deviation of ROC
curve predicted by logistic regression.

Has anyone written or does anyone know any algorithm to resample by
bootstrap or calculate the average and standard deviation of ROC
curve?
    

  


--
For miscellaneous statistical stuff, visit:
http://gjyp.nl/marta/
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Bootstrapping ROC AUC

jimmyxiao3
Hi Marta,

I'm doing a clinical scoring system, and need to internal validate it. I'm just curious how to perform ROC by bootstrap on SPSS. You have posted a code which is BOOTROC PRE_1 StatusatdisN(1) k=100. /* (Assuming that StatusatdisN =1 is the event). But it need run marco at first. Since I'm not good at SPSS. Could you post a whole code about how to present a ROC curve by resampling my socring system? I appricate that!

Looking forward for you replay,
Best regards,

Jimmy