SPSSX Discussion

Re: Question: print or list in if condition

Classic

List

Threaded

2 messages Options

Haiyan.Gao

Re: Question: print or list in if condition

Dear all,

I am trying to use bootstrap resampling techniques to calculate the
average areas under the ROC curve and the standard deviation of ROC
curve predicted by logistic regression.

Has anyone written or does anyone know any algorithm to resample by
bootstrap or calculate the average and standard deviation of ROC curve?

Any hints would be very much appreciated.

Best wishes,
Haiyan

------------------------------------------------------------------------
This email is confidential and is intended solely for the person or
Entity to whom it is addressed. If this is not you, please forward the
Message to [hidden email]. We have scanned this email
before sending it, but cannot guarantee that malicious software is
absent and we shall carry no liability in this regard.

We advise that information intended to be kept confidential should not
Be sent by email. We also advise that health concerns should be
Discussed with a medical professional in person or by telephone.
NHS Direct can also provide advice. We shall not be liable for any
failure to follow this advice. University College London Hospitals NHS
Foundation Trust (UCLH).

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Marta Garcia-Granero

Bootstrapping ROC AUC (was "Re: Question: print or list in if condition")

Haiyan Gao escribió:
> I am trying to use bootstrap resampling techniques to calculate the
> average areas under the ROC curve and the standard deviation of ROC
> curve predicted by logistic regression.
>
> Has anyone written or does anyone know any algorithm to resample by
> bootstrap or calculate the average and standard deviation of ROC curve?

First of all, please don't answer to a message if you want to start new
thread, use a new title that reflects the nature of the question.

Second, here it goes.

HTH,
Marta García-Granero

DEFINE BOOTROC(!POSITIONAL=!TOKENS(1)/
!POSITIONAL !CHAREND('(')/
!POSITIONAL !CHAREND(')')/
testpos= !DEFAULT(LARGE) !TOKENS(1)/
k= !DEFAULT (20000) !TOKENS(1)/
seed=!DEFAULT(RANDOM) !TOKENS(1)).
DATASET NAME OriginalData.
DATASET COPY WorkingData WINDOW=HIDDEN.
DATASET ACTIVATE WorkingData.
* Preparing data for matrix (recoding state variable to 0&1 and sorting
by it) *.
!IF (!UPCASE(!testpos) !EQ 'LARGE') !THEN.
. COMPUTE !2=(!2 NE !3).
!ELSE.
. COMPUTE !2=(!2 EQ !3).
!IFEND.
SORT CASES BY !2 (A).
PRESERVE.
* Initialize seed and fix mxloops to number of bootstrap samples *.
SET RNG=MT.
SET MTINDEX=!seed.
SET MXLOOPS=!k.
DO IF $casenum EQ 1.
. PRINT.
. !IF (!UPCASE(!seed) !EQ 'RANDOM') !THEN.
. PRINT /'RANDOM seed was used'.
. !ELSE.
. PRINT /'Seed value: ' !QUOTE(!seed).
. !IFEND.
END IF.
MATRIX.
PRINT /TITLE='*** BOOTSTRAPPING 95%CI FOR AUC - ROC ***'.
* Read sorted data *.
GET data /VAR=!2 !1 /NAMES=vnames /MISSING=OMIT.
COMPUTE vname=vnames(2).
* General data and sample AUC *.
COMPUTE totaln=NROW(data).
COMPUTE n2=CSUM(data(:,1)).
COMPUTE n1=totaln-n2.
COMPUTE group1=data(1:n1,2).
COMPUTE group2=data((n1+1):totaln,2).
COMPUTE Ranks=RNKORDER({group1;group2}).
COMPUTE R2=CSUM(Ranks&*data(:,1)).
COMPUTE U2=n1*n2+n2*(n2+1)/2-R2.
COMPUTE AUC=U2/(n1*n2).
PRINT AUC
/FORMAT='F8.3'
/TITLE='SAMPLE AUC'.
*** BOOTSTRAPPING ***.
COMPUTE k=!k. /* Number of bootsamples *.
COMPUTE bootAUC =MAKE(k,1,0).
COMPUTE bootgrp1=MAKE(n1,1,0).
COMPUTE bootgrp2=MAKE(n2,1,0).
LOOP i=1 TO k. /* Extracting k bootstrap samples from both groups *.
- LOOP j= 1 TO n1.
- COMPUTE flipcoin=1+TRUNC(n1*UNIFORM(1,1)).
- COMPUTE bootgrp1(j)=group1(flipcoin).
- END LOOP.
- LOOP j= 1 TO n2.
- COMPUTE flipcoin=1+TRUNC(n2*UNIFORM(1,1)).
- COMPUTE bootgrp2(j)=group2(flipcoin).
- END LOOP.
- COMPUTE Ranks=RNKORDER({bootgrp1;bootgrp2}).
- COMPUTE R2=CSUM(Ranks&*data(:,1)).
- COMPUTE U2=n1*n2+n2*(n2+1)/2-R2.
- COMPUTE bootAUC(i)=U2/(n1*n2).
END LOOP.
* Gran mean of bootstrapped AUC *.
COMPUTE mean=CSUM(bootAUC)/k.
* Bootstrap estimator of the standard error of the AUC *.
COMPUTE BootSEM=SQRT((CSSQ(bootAUC)-k&*(mean&**2))/(k-1)).
PRINT {mean,BootSEM}
/FORMAT='F8.3'
/CLABEL='Mean-AUC','SE(AUC)*'
/RNAME=vname
/TITLE='Bootstrapped Statistics for AUC'.
PRINT/TITLE='(*) Std. Deviation of bootstrapped AUC'.
* NP confidence interval *.
* Ordered array: sorting algorithm by R Ristow & J Peck *.
COMPUTE sortedbm=bootAUC.
COMPUTE sortedbm(GRADE(bootAUC))=bootAUC.
COMPUTE lower1=sortedbm(k*0.025).
COMPUTE upper1=sortedbm(1+k*0.975).
* Parametric confidence intervals (BV1&BV2)*.
COMPUTE z = 1.959964.
COMPUTE lower2=mean-z*BootSEM.
COMPUTE upper2=mean+z*BootSEM.
COMPUTE lower3=AUC-z*BootSEM.
COMPUTE upper3=AUC+z*BootSEM.
PRINT {lower1,upper1;lower2,upper2;lower3,upper3}
/FORMAT='F8.3'
/CLABEL='Lower CL','Upper CL'
/RLABEL='BP','BV1','BV2'
/TITLE='95%CI: Non parametric (percentiles 2.5 & 97.5) & Parametric (Z
based) BV1&BV2'.
SAVE bootAUC /OUTFILE='C:\Temp\BootStrappedAUC.sav' /NAMES=vname.
PRINT k
/FORMAT='F8'
/RLABEL='K='
/TITLE='K bootsampled AUC saved to C:\Temp\BootStrappedAUC.sav'.
END MATRIX.
RESTORE.
GET FILE ='C:\Temp\BootStrappedAUC.sav' .
DATASET NAME BootstrappedAUC.
FREQUENCIES
VARIABLES=ALL /FORMAT=NOTABLE
/HISTOGRAM NORMAL
/STATISTICS=SKEWNESS KURTOSIS.
DATASET ACTIVATE OriginalData.
DATASET CLOSE WorkingData.
DATASET CLOSE BootstrappedAUC.
!ENDDEFINE.

* Sample dataset *.
SET LOCALE=ENGLISH.
DATA LIST FREE/Outcome (F8) Hemoglob Bilirrub (2 F8.1).
BEGIN DATA
1 18.7 2.2 1 17.0 1.6 1 15.6 2.0 1 14.3 3.8 1 13.3 1.8 1 10.9 3.5
1 8.7 5.5 1 17.8 2.7 1 16.6 3.6 1 15.6 1.6 1 14.3 4.2 1 12.5 4.5
1 10.9 4.1 1 7.4 3.0 1 17.8 2.5 1 16.3 4.1 1 15.4 4.1 1 14.3 3.3
1 12.3 5.0 1 10.9 1.5 1 5.7 4.6 1 17.6 4.1 1 16.1 2.0 1 15.4 2.2
1 14.1 3.7 1 12.2 3.5 1 10.8 3.3 1 9.7 4.9 1 17.6 3.2 1 16.0 2.6
1 15.3 2.0 1 14.0 5.8 1 12.2 2.4 1 10.6 3.4 1 11.6 3.7 1 17.6 1.0
1 16.0 0.8 1 15.1 3.2 1 13.9 2.9 1 12.0 2.8 1 10.5 6.3 1 13.4 2.3
1 17.5 1.6 1 15.8 3.7 1 14.8 1.8 1 13.8 3.7 1 12.0 3.5 1 10.2 3.3
1 14.6 5.0 1 17.4 1.8 1 15.8 3.0 1 14.7 3.7 1 13.6 2.3 1 11.8 2.3
1 9.9 4.0 1 15.6 1.4 1 17.4 2.4 1 15.8 1.7 1 14.7 3.0 1 13.5 2.1
1 11.8 4.5 1 9.8 4.2 1 17.0 0.4 2 15.8 1.8 2 5.7 6.2 2 7.6 4.7
2 9.2 5.6 2 5.1 5.8 2 6.7 5.9 2 12.3 5.6 2 5.5 4.8 2 7.4 6.8
2 8.8 5.6 2 3.4 3.9 2 9.5 3.6 2 5.3 4.8 2 7.1 5.6 2 9.4 3.8
2 5.3 2.8
END DATA.

VALUE LABELS Outcome 1 ' Absent' 2 'Present'.

ROC Hemoglob BY Outcome (2)
/PLOT=CURVE(REFERENCE)
/PRINT=SE
/CRITERIA=TESTPOS(SMALL) CI(95).

* Minimum arguments MACRO call *.
BOOTROC Hemoglob Outcome(2) testpos=SMALL.

ROC Bilirrub BY Outcome (2)
/PLOT=CURVE(REFERENCE)
/PRINT=SE
/CRITERIA=TESTPOS(LARGE) CI(95).

* Minimum arguments MACRO call *.
BOOTROC Bilirrub Outcome(2) testpos=LARGE.

* Other optional arguments:
* k = Nr of bootstrap samples (more samples, more stable results but
more running time)
* seed = RANDOM or any number (if you want to replicate exactly your
results, set it to any number) *.

BOOTROC Hemoglob Outcome(2) testpos=SMALL k=30000 seed=12345.

--
For miscellaneous statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD