SPSSX Discussion - Re: Calculating CI in SPSS when percentage is close to 0

Re: Calculating CI in SPSS when percentage is close to 0

Posted by Bruce Weaver on Aug 08, 2013; 9:20pm
URL: http://spssx-discussion.165.s1.nabble.com/Calculating-CI-in-SPSS-when-percentage-is-close-to-0-tp5721509p5721529.html

I assume you are concerned about the usual Wald method not working very well for extreme proportions. As you probably know, there are several alternatives that perform better. Personally, I like the Wilson method. Find below some syntax I wrote to compute various CIs, and below that, some syntax Marta GG posted to this list a few years ago. HTH.

* ================================================================== .
* File: CI_for_proportion.SPS .
* Date: 19-Nov-2012 .
* Author: Bruce Weaver, bweaver@lakeheadu.ca .
* ================================================================== .

* Get confidence interval for a binomial proportion using:
- Wald method
- Adjusted Wald method (Agresti & Coull, 1998)
- Wilson score method (identical to Ghosh's 1979 method)
- Jeffreys method
.
* The data used here are from Table I in Newcombe (1998), Statistics
in Medicine, Vol 17, 857-872.

NEW FILE.
DATASET CLOSE ALL.

DATA LIST LIST /x(f8.0) n(f8.0) confid(f5.3) .
BEGIN DATA.
81 263 .95
15 148 .95
0 20 .95
1 29 .95
81 263 .90
15 148 .90
0 20 .90
1 29 .90
81 263 .99
15 148 .99
0 20 .99
1 29 .99
16 48 .95
16 48 .99
END DATA.

compute alpha = 1 - confid.
compute p = x/n.
compute q = 1-p.
compute z = probit(1-alpha/2).

* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .

* Wald method (i.e., the usual normal approximation).

compute #se = SQRT(p*q/n).
compute Lower1 = p - z*#se.
if Lower1 LT 0 Lower1 = 0.
compute Upper1 = p + z*#se.
if Upper1 GT 1 Upper1 = 1.

* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .

* Adjusted Wald method due to Agresti & Coull (1998).

compute #p = (x + z**2/2) / (n + z**2).
compute #q = 1 - #p.
compute #se = SQRT(#p*#q/(n+z**2)).
compute Lower2 = #p - z*#se.
if Lower2 LT 0 Lower2 = 0.
compute Upper2 = #p + z*#se.
if Upper2 GT 1 Upper2 = 1.

* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .

* Wilson score method (Method 3 in Newcombe, 1998) .
* Code adapted from Robert Newcombe's code posted here:
http://archive.uwcm.ac.uk/uwcm/ms/Robert2.html .

* The method of Ghosh (1979), as described in Glass & Hopkins
* (1996, p 326) is identical to Wilson's method.
* Glass & Hopkins describe it as the "method of choice for all values
of p and n" .

COMPUTE #x1 = 2*n*p+z**2 .
COMPUTE #x2 = z*(z**2+4*n*p*(1-p))**0.5 .
COMPUTE #x3 = 2*(n+z**2) .
COMPUTE Lower3 = (#x1 - #x2) / #x3 .
COMPUTE Upper3 = (#x1 + #x2) / #x3 .

* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .

* Jeffreys method shown on the IBM-SPSS website at
* http://www-01.ibm.com/support/docview.wss?uid=swg21474963 .

compute Lower4 = idf.beta(alpha/2,x+.5,n-x+.5).
compute Upper4 = idf.beta(1-alpha/2,x+.5,n-x+.5).

* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .

* Format variables and list the results of all methods .

formats p q Lower1 to Upper4 (f5.4).
sort cases by p confid.

list var x n confid p Lower1 to Upper4 .

* Method 1: Wald method (i.e., the usual normal approximation) .
* Method 2: Adjusted Wald method (using z**2/2 and z**2 rather than 2 and 4).
* Method 3: Wilson score method (from Newcombe paper), identical to Ghosh (1979).
* Method 4: Jeffreys method (http://www-01.ibm.com/support/docview.wss?uid=swg21474963).

* Data from Newcombe (1998), Table I.

variable labels
x "Successes"
n "Trials"
p "p(Success)"
confid "Confidence Level"
Lower1 "Wald: Lower"
Upper1 "Wald: Upper"
Lower2 "Adj Wald: Lower"
Upper2 "Adj Wald: Upper"
Lower3 "Wilson score/Ghosh: Lower"
Upper3 "Wilson score/Ghosh: Upper"
Lower4 "Jeffreys: Lower"
Upper4 "Jeffreys: Upper"
.

SUMMARIZE
/TABLES=x n p confid Lower1 Upper1 Lower2 Upper2 Lower3 Upper3 Lower4 Upper4
/FORMAT=VALIDLIST NOCASENUM TOTAL
/TITLE='Confidence Intervals for Binomial Proportions'
/MISSING=VARIABLE
/CELLS=NONE.

* ================================================================== .

****************************************************************
** CONFIDENCE INTERVAL FOR A PROPORTION USING WILSON'S METHOD **
** DATA ARE EXTRACTED FROM CROSSTAB TABLES USING OMS, AND **
** MULTIPLE DATASETS ARE USED (REQUIRES SPSS 14 OR NEWER) **
****************************************************************

* Code posted to SPSSX-L mailing list by Marta Garcia-Granera, 20-Aug-2009.

* Example dataset (replace by your own) *.
GET FILE='GSS 93 for Missing Values.sav'.
DATASET NAME OriginalData.

* Don't change anything here *.
PRESERVE.
SET OLANG=ENGLISH.
OMS SELECT TABLES
/IF SUBTYPES=['Crosstabulation']
/DESTINATION FORMAT=SAV NUMBERED='id' OUTFILE='C:\Temp\table.sav'.

* Crosstabulations (replace by your own variables) *.
* Grouping variables in rows, the proportion variable alone in the column *.
CROSSTABS
/TABLES=degree wrkstat polviews BY sex
/FORMAT= AVALUE TABLES
/CELLS= COUNT ROW
/COUNT ROUND CELL .

* Don't change anything here *.
OMSEND.
GET FILE='C:\Temp\table.sav'
/DROP= Command_ TO Var1.
DATASET NAME ProcessedData.

* Eliminate superfluous rows of data (step language-dependent,
needs SET OLANG=ENGLISH) *.
SELECT IF (Var3 EQ 'Count') AND (Var2 NE '').
COMPUTE id=$casenum.
EXECUTE. /* Needed for next command *.
DELETE VARIABLES Var3.

* This SPSS code is adapted from a macro by Dr. Robert G. Newcombe,
* University of Wales College of Medicine, Cardiff, UK.
* It calculates a confidence interval for a proportion x/n,
* using an appropriate method
* (E.B.Wilson. J Am Stat Assoc 1927, 22, 209-212).

* This part of the code is dataset-independent
(even the names of the variables are authomatically read),
and can be left unmodified, unless 99% CI are needed or
CI for the second column (instead of first) is wanted .

MATRIX.
PRINT /TITLE='NEWCOMBE METHOD: CI FOR A PROPORTION'.
GET data /FILE = * /NAMES = namevec.
GET rnames /VAR = var2.
COMPUTE vnames = namevec(3:5).
PRINT data(:,3:5)
/FORMAT='F8.0'
/CNAMES=vnames
/RNAMES=rnames
/TITLE='Input data (first column is used to compute proportions & CI
limits)'.
COMPUTE id = data(:,1)./* Matching variable *.
COMPUTE num = data(:,3)./* Replace by data(:,4) if interested in 2nd
value *.
COMPUTE den = data(:,5).
COMPUTE p = num/den .
COMPUTE z = MAKE(NROW(data),1,1.959964)./* Use
MAKE(NROW(data),1,2.575829) for 99%CI *.
COMPUTE x1 = 2*num+z&**2 .
COMPUTE x2 = z&*SQRT(z&**2+4*num&*(1-p)).
COMPUTE x3 = 2*(den+z&**2) .
COMPUTE x4 = (x1-x2)/x3 .
COMPUTE x5 = (x1+x2)/x3 .
PRINT {100*p,100*x4,100*x5}
/FORMAT='F8.2'
/TITLE='Point estimate & 95%CI for a proportion'
/RNAMES=rnames
/CLABELS='Point','Lower','Upper'.
* Export data *.
COMPUTE outdata = {id,100*p,100*x4,100*x5}.
COMPUTE outname = {'id','p','lower','upper'}.
SAVE outdata /OUTFILE = 'C:\Temp\ProportionCI.sav' /NAMES = outname.
END MATRIX.
MATCH FILES /FILE=*
/FILE='C:\Temp\ProportionCI.sav'
/BY id.
SUMMARIZE
/TABLES=Var2 p lower upper
/FORMAT=LIST NOCASENUM NOTOTAL
/TITLE='Point estimates & 95%CI for one proportion'
/FOOTNOTE 'Wilson method'
/CELLS=NONE.

* HTH,
* Marta GG .

ipnyc wrote

Hello,

Per my title, is there a way to calculate a 95% CI for a proportion that is close to 0 (0.1% to be more specific)? There are plenty of online calculators but I was hoping to avoid calculating manually. I am using IBM SPSS 19.
Thank you!

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).