|
Dear all,
I am trying to use bootstrap resampling techniques to calculate the average areas under the ROC curve and the standard deviation of ROC curve predicted by logistic regression. Has anyone written or does anyone know any algorithm to resample by bootstrap or calculate the average and standard deviation of ROC curve? Any hints would be very much appreciated. Best wishes, Haiyan ------------------------------------------------------------------------ This email is confidential and is intended solely for the person or Entity to whom it is addressed. If this is not you, please forward the Message to [hidden email]. We have scanned this email before sending it, but cannot guarantee that malicious software is absent and we shall carry no liability in this regard. We advise that information intended to be kept confidential should not Be sent by email. We also advise that health concerns should be Discussed with a medical professional in person or by telephone. NHS Direct can also provide advice. We shall not be liable for any failure to follow this advice. University College London Hospitals NHS Foundation Trust (UCLH). ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Haiyan Gao escribió:
> I am trying to use bootstrap resampling techniques to calculate the > average areas under the ROC curve and the standard deviation of ROC > curve predicted by logistic regression. > > Has anyone written or does anyone know any algorithm to resample by > bootstrap or calculate the average and standard deviation of ROC curve? First of all, please don't answer to a message if you want to start new thread, use a new title that reflects the nature of the question. Second, here it goes. HTH, Marta García-Granero DEFINE BOOTROC(!POSITIONAL=!TOKENS(1)/ !POSITIONAL !CHAREND('(')/ !POSITIONAL !CHAREND(')')/ testpos= !DEFAULT(LARGE) !TOKENS(1)/ k= !DEFAULT (20000) !TOKENS(1)/ seed=!DEFAULT(RANDOM) !TOKENS(1)). DATASET NAME OriginalData. DATASET COPY WorkingData WINDOW=HIDDEN. DATASET ACTIVATE WorkingData. * Preparing data for matrix (recoding state variable to 0&1 and sorting by it) *. !IF (!UPCASE(!testpos) !EQ 'LARGE') !THEN. . COMPUTE !2=(!2 NE !3). !ELSE. . COMPUTE !2=(!2 EQ !3). !IFEND. SORT CASES BY !2 (A). PRESERVE. * Initialize seed and fix mxloops to number of bootstrap samples *. SET RNG=MT. SET MTINDEX=!seed. SET MXLOOPS=!k. DO IF $casenum EQ 1. . PRINT. . !IF (!UPCASE(!seed) !EQ 'RANDOM') !THEN. . PRINT /'RANDOM seed was used'. . !ELSE. . PRINT /'Seed value: ' !QUOTE(!seed). . !IFEND. END IF. MATRIX. PRINT /TITLE='*** BOOTSTRAPPING 95%CI FOR AUC - ROC ***'. * Read sorted data *. GET data /VAR=!2 !1 /NAMES=vnames /MISSING=OMIT. COMPUTE vname=vnames(2). * General data and sample AUC *. COMPUTE totaln=NROW(data). COMPUTE n2=CSUM(data(:,1)). COMPUTE n1=totaln-n2. COMPUTE group1=data(1:n1,2). COMPUTE group2=data((n1+1):totaln,2). COMPUTE Ranks=RNKORDER({group1;group2}). COMPUTE R2=CSUM(Ranks&*data(:,1)). COMPUTE U2=n1*n2+n2*(n2+1)/2-R2. COMPUTE AUC=U2/(n1*n2). PRINT AUC /FORMAT='F8.3' /TITLE='SAMPLE AUC'. *** BOOTSTRAPPING ***. COMPUTE k=!k. /* Number of bootsamples *. COMPUTE bootAUC =MAKE(k,1,0). COMPUTE bootgrp1=MAKE(n1,1,0). COMPUTE bootgrp2=MAKE(n2,1,0). LOOP i=1 TO k. /* Extracting k bootstrap samples from both groups *. - LOOP j= 1 TO n1. - COMPUTE flipcoin=1+TRUNC(n1*UNIFORM(1,1)). - COMPUTE bootgrp1(j)=group1(flipcoin). - END LOOP. - LOOP j= 1 TO n2. - COMPUTE flipcoin=1+TRUNC(n2*UNIFORM(1,1)). - COMPUTE bootgrp2(j)=group2(flipcoin). - END LOOP. - COMPUTE Ranks=RNKORDER({bootgrp1;bootgrp2}). - COMPUTE R2=CSUM(Ranks&*data(:,1)). - COMPUTE U2=n1*n2+n2*(n2+1)/2-R2. - COMPUTE bootAUC(i)=U2/(n1*n2). END LOOP. * Gran mean of bootstrapped AUC *. COMPUTE mean=CSUM(bootAUC)/k. * Bootstrap estimator of the standard error of the AUC *. COMPUTE BootSEM=SQRT((CSSQ(bootAUC)-k&*(mean&**2))/(k-1)). PRINT {mean,BootSEM} /FORMAT='F8.3' /CLABEL='Mean-AUC','SE(AUC)*' /RNAME=vname /TITLE='Bootstrapped Statistics for AUC'. PRINT/TITLE='(*) Std. Deviation of bootstrapped AUC'. * NP confidence interval *. * Ordered array: sorting algorithm by R Ristow & J Peck *. COMPUTE sortedbm=bootAUC. COMPUTE sortedbm(GRADE(bootAUC))=bootAUC. COMPUTE lower1=sortedbm(k*0.025). COMPUTE upper1=sortedbm(1+k*0.975). * Parametric confidence intervals (BV1&BV2)*. COMPUTE z = 1.959964. COMPUTE lower2=mean-z*BootSEM. COMPUTE upper2=mean+z*BootSEM. COMPUTE lower3=AUC-z*BootSEM. COMPUTE upper3=AUC+z*BootSEM. PRINT {lower1,upper1;lower2,upper2;lower3,upper3} /FORMAT='F8.3' /CLABEL='Lower CL','Upper CL' /RLABEL='BP','BV1','BV2' /TITLE='95%CI: Non parametric (percentiles 2.5 & 97.5) & Parametric (Z based) BV1&BV2'. SAVE bootAUC /OUTFILE='C:\Temp\BootStrappedAUC.sav' /NAMES=vname. PRINT k /FORMAT='F8' /RLABEL='K=' /TITLE='K bootsampled AUC saved to C:\Temp\BootStrappedAUC.sav'. END MATRIX. RESTORE. GET FILE ='C:\Temp\BootStrappedAUC.sav' . DATASET NAME BootstrappedAUC. FREQUENCIES VARIABLES=ALL /FORMAT=NOTABLE /HISTOGRAM NORMAL /STATISTICS=SKEWNESS KURTOSIS. DATASET ACTIVATE OriginalData. DATASET CLOSE WorkingData. DATASET CLOSE BootstrappedAUC. !ENDDEFINE. * Sample dataset *. SET LOCALE=ENGLISH. DATA LIST FREE/Outcome (F8) Hemoglob Bilirrub (2 F8.1). BEGIN DATA 1 18.7 2.2 1 17.0 1.6 1 15.6 2.0 1 14.3 3.8 1 13.3 1.8 1 10.9 3.5 1 8.7 5.5 1 17.8 2.7 1 16.6 3.6 1 15.6 1.6 1 14.3 4.2 1 12.5 4.5 1 10.9 4.1 1 7.4 3.0 1 17.8 2.5 1 16.3 4.1 1 15.4 4.1 1 14.3 3.3 1 12.3 5.0 1 10.9 1.5 1 5.7 4.6 1 17.6 4.1 1 16.1 2.0 1 15.4 2.2 1 14.1 3.7 1 12.2 3.5 1 10.8 3.3 1 9.7 4.9 1 17.6 3.2 1 16.0 2.6 1 15.3 2.0 1 14.0 5.8 1 12.2 2.4 1 10.6 3.4 1 11.6 3.7 1 17.6 1.0 1 16.0 0.8 1 15.1 3.2 1 13.9 2.9 1 12.0 2.8 1 10.5 6.3 1 13.4 2.3 1 17.5 1.6 1 15.8 3.7 1 14.8 1.8 1 13.8 3.7 1 12.0 3.5 1 10.2 3.3 1 14.6 5.0 1 17.4 1.8 1 15.8 3.0 1 14.7 3.7 1 13.6 2.3 1 11.8 2.3 1 9.9 4.0 1 15.6 1.4 1 17.4 2.4 1 15.8 1.7 1 14.7 3.0 1 13.5 2.1 1 11.8 4.5 1 9.8 4.2 1 17.0 0.4 2 15.8 1.8 2 5.7 6.2 2 7.6 4.7 2 9.2 5.6 2 5.1 5.8 2 6.7 5.9 2 12.3 5.6 2 5.5 4.8 2 7.4 6.8 2 8.8 5.6 2 3.4 3.9 2 9.5 3.6 2 5.3 4.8 2 7.1 5.6 2 9.4 3.8 2 5.3 2.8 END DATA. VALUE LABELS Outcome 1 ' Absent' 2 'Present'. ROC Hemoglob BY Outcome (2) /PLOT=CURVE(REFERENCE) /PRINT=SE /CRITERIA=TESTPOS(SMALL) CI(95). * Minimum arguments MACRO call *. BOOTROC Hemoglob Outcome(2) testpos=SMALL. ROC Bilirrub BY Outcome (2) /PLOT=CURVE(REFERENCE) /PRINT=SE /CRITERIA=TESTPOS(LARGE) CI(95). * Minimum arguments MACRO call *. BOOTROC Bilirrub Outcome(2) testpos=LARGE. * Other optional arguments: * k = Nr of bootstrap samples (more samples, more stable results but more running time) * seed = RANDOM or any number (if you want to replicate exactly your results, set it to any number) *. BOOTROC Hemoglob Outcome(2) testpos=SMALL k=30000 seed=12345. -- For miscellaneous statistical stuff, visit: http://gjyp.nl/marta/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
