Hi,
I s there a way to calculate CIs for a non-parametric correlations?
Thanks
Assistant Professor Department of Physical Therapy Education College of Health Professions SUNY Upstate Medical University Room 2232 Silverman Hall 750 Adams Street Syracuse, NY 13210-1834 315 464 6577 FAX 315 464 6887 [hidden email] |
El 19/09/2011 16:36, Moshe Marko escribió:
Hi: You can use: a) Bootstrap (see below for a very simple code that computes percentile based CI). b) If sample size is big enough (over 30 should be enough) you can use parametric methods (Fisher Rz transform, SE(Rz)=SQRT(1/(n-3)), Rz+/- 1.96*SE(Rz) & backtransform). c) I don't have the reference right now, but I read a paper that suggested the method described in b, but replacing SE(Rz) by SQRT((1+R²)/(n+3)). They claimed that the CI coverage was at least 95% even if the R values were very close to 1. HTH, Marta García-Granero *********************************************** * BOOTSTRAPPING SPEARMAN'S CORR. COEFF. (Rs) * *********************************************** * (C) Marta García-Granero 07/2008 * * send questions to: [hidden email] * * Feel free to use or modify this code, but * * acknowledge the author * *********************************************** * Warning: if SSPS 12/older is used, eliminate "SET=RNG.", and replace * "SET MTINDEX=!seed." by "SET SEED=!seed." *. DEFINE RSCIBOOT(vars=!TOKENS(2)/ k=!DEFAULT (20000) !TOKENS(1)/ seed=!DEFAULT(RANDOM) !TOKENS(1)). PRESERVE. * Boostrapping conditions *. SET MXLOOPS=!k. SET RNG=MT. SET MTINDEX=!seed. DO IF $casenum EQ 1. . PRINT. . !IF (!UPCASE(!seed) !EQ 'RANDOM') !THEN. . PRINT /'RANDOM seed was used'. . !ELSE. . PRINT /'Seed value: ' !QUOTE(!seed). . !IFEND. END IF. MATRIX. PRINT /TITLE="BOOTSTRAP 95% CI ESTIMATION FOR SPEARMAN'S Rs". COMPUTE k=!k. * Read data *. GET data /VAR=!vars /MISSING=OMIT /NAME=vnames. * Statistics for full sample *. COMPUTE n=NROW(data). COMPUTE x=RNKORDER(data(:,1)). COMPUTE y=RNKORDER(data(:,2)). COMPUTE meanX=CSUM(x)/n. COMPUTE meanY=CSUM(y)/n. COMPUTE VarX=(CSSQ(x)-n&*(meanx&**2))/(n-1). COMPUTE VarY=(CSSQ(y)-n&*(meany&**2))/(n-1). COMPUTE CovXY=((T(x)*y)-n*meanX*meanY)/(n-1). COMPUTE rs=CovXY/SQRT(VarX*VarY). COMPUTE tvalue=rs*SQRT((n-2)/(1-rs**2)). COMPUTE tsig=2*(1-TCDF(ABS(tvalue),n-2)). PRINT vnames /FORMAT='A8' /TITLE='Variables analyzed'. PRINT /TITLE='******* SAMPLE STATISTICS ********'. PRINT rs /FORMAT='F4.2' /RLABEL='Rs = ' /TITLE="Spearman's Rs correlation coefficient". PRINT n /FORMAT='F4.0' /RLABEL='n =' /TITLE='Sample size (n>10 for reliable hypothesis testing)'. PRINT {tvalue,tsig} /FORMAT='F4.3' /CLABEL='T','2*Sig' /TITLE='Hypothesis test for Rs (df=n-2)'. * Boostrap *. COMPUTE bootrs= MAKE(k,1,0). COMPUTE bootsamp=MAKE(n,2,0). * Random resampling from sample (k times) *. LOOP i=1 TO k. . COMPUTE flipcoin=1+TRUNC(n*UNIFORM(n,1)). . COMPUTE bootsamp=data(flipcoin,:). . COMPUTE x=RNKORDER(bootsamp(:,1)). . COMPUTE y=RNKORDER(bootsamp(:,2)). . COMPUTE meanX=CSUM(x)/n. . COMPUTE meanY=CSUM(y)/n. . COMPUTE VarX=(CSSQ(x)-n&*(meanx&**2))/(n-1). . COMPUTE VarY=(CSSQ(y)-n&*(meany&**2))/(n-1). . COMPUTE CovXY=((T(x)*y)-n*meanX*meanY)/(n-1). . COMPUTE bootrs(i)=CovXY/SQRT(VarX*VarY). END LOOP. * Gran mean of bootstrapped Spearman Rs *. COMPUTE mean=CSUM(bootrs)/k. * Bootstrap estimator of the standard error of the Rs *. COMPUTE BootSEM=SQRT((CSSQ(bootrs)-k&*(mean&**2))/(k-1)). * Sorting algorithm by R Ristow & J Peck *. COMPUTE sortedrs=bootrs. COMPUTE sortedrs(GRADE(bootrs))=bootrs. COMPUTE lower=sortedrs(k*0.025). COMPUTE upper=sortedrs(1+k*0.975). * Report *. PRINT /TITLE='******* BOOTSTRAP RESULTS ********'. PRINT k /FORMAT='F8.0' /TITLE='Bootstrapping conditions: k (Nr. reps.)'. PRINT {mean,BootSEM} /FORMAT='F8.3' /CLABEL='Mean-Rs','SE(Rs)*' /TITLE='Bootstrapped Statistics for Rs'. PRINT/TITLE='(*) Std. Deviation of bootstrapped Rs'. PRINT {lower,upper} /FORMAT='F8.2' /TITLE="95%CI for Rs (percentiles 2.5&97.5):" /CLABEL='Lower','Upper'. END MATRIX. RESTORE. !ENDDEFINE. * Sample dataset (random sample of 50 cases from Rosner's dataset FEV&Smoke) *. DATA LIST FREE/fev(F8.3) hgt (F8.1). BEGIN DATA 1.987 58.5 1.735 54.0 2.604 61.5 2.980 60.0 3.000 65.5 1.776 51.0 2.531 58.0 3.842 69.0 1.751 58.0 1.698 54.5 1.697 59.0 2.288 61.5 3.114 64.5 2.135 58.5 1.759 53.0 2.048 64.5 1.658 53.0 1.789 52.0 3.004 64.0 2.503 63.0 2.316 59.5 1.704 51.0 1.606 57.5 1.165 47.0 2.164 60.0 2.639 63.0 1.728 56.5 2.303 57.0 2.382 62.0 1.535 55.0 1.514 52.0 2.524 64.0 3.490 67.0 2.292 63.0 2.889 64.0 2.957 64.5 2.250 58.0 2.633 62.0 2.417 62.5 4.273 72.5 2.751 63.0 3.774 67.0 3.169 64.0 2.704 61.0 3.255 66.0 2.901 59.5 3.680 67.0 3.022 61.5 3.780 70.0 2.822 69.5 END DATA. VARIABLE LABEL hgt 'Height (inches)' /fev 'FEV (l)'. * MACRO call *. RSCIBOOT VARS=hgt fev. |
In reply to this post by Moshe Marko
You do not say that you are using Spearman's coefficient rather than
Kendall's tau - The two are distinctly different, and Kendall's requires a larger N for a stable coefficient because there are far fewer discrete values possible for it. Bootstrapping should still work, with the appropriate modifications. The Wiki article includes the formula for the variance. -- Rich Ulrich Date: Mon, 19 Sep 2011 10:36:40 -0400 From: [hidden email] Subject: calculating 95% CI for non-parametric correlation To: [hidden email] Hi,
I s there a way to calculate CIs for a non-parametric correlations? Thanks
|
Free forum by Nabble | Edit this page |