Login  Register

Re: Calculating CI in SPSS when percentage is close to 0

Posted by Bruce Weaver on Aug 08, 2013; 9:20pm
URL: http://spssx-discussion.165.s1.nabble.com/Calculating-CI-in-SPSS-when-percentage-is-close-to-0-tp5721509p5721529.html

I assume you are concerned about the usual Wald method not working very well for extreme proportions.  As you probably know, there are several alternatives that perform better.  Personally, I like the Wilson method.  Find below some syntax I wrote to compute various CIs, and below that, some syntax Marta GG posted to this list a few years ago.  HTH.

*  ================================================================== .
*  File:   CI_for_proportion.SPS .
*  Date:   19-Nov-2012 .
*  Author:  Bruce Weaver, bweaver@lakeheadu.ca .
*  ================================================================== .

* Get confidence interval for a binomial proportion using:
                 - Wald method
   - Adjusted Wald method (Agresti & Coull, 1998)
                 - Wilson score method (identical to Ghosh's 1979 method)
   - Jeffreys method
.
* The data used here are from Table I in Newcombe (1998), Statistics
   in Medicine, Vol 17, 857-872.

NEW FILE.
DATASET CLOSE ALL.

DATA LIST LIST /x(f8.0) n(f8.0) confid(f5.3) .
BEGIN DATA.
81 263 .95
15 148 .95
0   20 .95
1   29 .95
81 263 .90
15 148 .90
0   20 .90
1   29 .90
81 263 .99
15 148 .99
0   20 .99
1   29 .99
16  48 .95
16  48 .99
END DATA.

compute alpha = 1 - confid.
compute p = x/n.
compute q = 1-p.
compute z = probit(1-alpha/2).

*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .

* Wald method (i.e., the usual normal approximation).

compute #se = SQRT(p*q/n).
compute Lower1 = p - z*#se.
if Lower1 LT 0 Lower1 = 0.
compute Upper1 = p + z*#se.
if Upper1 GT 1 Upper1 = 1.

*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .

* Adjusted Wald method due to Agresti & Coull (1998).

compute #p = (x + z**2/2) / (n + z**2).
compute #q = 1 - #p.
compute #se = SQRT(#p*#q/(n+z**2)).
compute Lower2 = #p - z*#se.
if Lower2 LT 0 Lower2 = 0.
compute Upper2 = #p + z*#se.
if Upper2 GT 1 Upper2 = 1.

*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .

* Wilson score method (Method 3 in Newcombe, 1998) .
* Code adapted from Robert Newcombe's code posted here:
    http://archive.uwcm.ac.uk/uwcm/ms/Robert2.html .

* The method of Ghosh (1979), as described in Glass & Hopkins
* (1996, p 326) is identical to Wilson's method.
* Glass & Hopkins describe it as the "method of choice for all values
   of p and n" .

COMPUTE #x1 = 2*n*p+z**2 .
COMPUTE #x2 = z*(z**2+4*n*p*(1-p))**0.5 .
COMPUTE #x3 = 2*(n+z**2) .
COMPUTE Lower3 = (#x1 - #x2) / #x3 .
COMPUTE Upper3 = (#x1 + #x2) / #x3 .

*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .

* Jeffreys method shown on the IBM-SPSS website at
* http://www-01.ibm.com/support/docview.wss?uid=swg21474963 .

compute Lower4 = idf.beta(alpha/2,x+.5,n-x+.5).
compute Upper4 = idf.beta(1-alpha/2,x+.5,n-x+.5).

*  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .

* Format variables and list the results of all methods .

formats p q Lower1 to Upper4 (f5.4).
sort cases by p confid.

list var x n confid p Lower1 to Upper4 .

* Method 1:  Wald method (i.e., the usual normal approximation) .
* Method 2:  Adjusted Wald method (using z**2/2 and z**2 rather than 2 and 4).
* Method 3:  Wilson score method (from Newcombe paper), identical to Ghosh (1979).
* Method 4:  Jeffreys method (http://www-01.ibm.com/support/docview.wss?uid=swg21474963).

* Data from Newcombe (1998), Table I.

variable labels
 x "Successes"
 n "Trials"
 p "p(Success)"
 confid "Confidence Level"
 Lower1 "Wald: Lower"
 Upper1 "Wald: Upper"
 Lower2 "Adj Wald: Lower"
 Upper2 "Adj Wald: Upper"
 Lower3 "Wilson score/Ghosh: Lower"
 Upper3 "Wilson score/Ghosh: Upper"
 Lower4 "Jeffreys: Lower"
 Upper4 "Jeffreys: Upper"
.

SUMMARIZE
  /TABLES=x n p confid Lower1 Upper1 Lower2 Upper2 Lower3 Upper3 Lower4 Upper4
  /FORMAT=VALIDLIST NOCASENUM TOTAL
  /TITLE='Confidence Intervals for Binomial Proportions'
  /MISSING=VARIABLE
  /CELLS=NONE.

*  ================================================================== .



****************************************************************
** CONFIDENCE INTERVAL FOR A PROPORTION USING WILSON'S METHOD **
** DATA ARE EXTRACTED FROM CROSSTAB TABLES USING OMS, AND     **
** MULTIPLE DATASETS ARE USED (REQUIRES SPSS 14 OR NEWER)     **
****************************************************************

* Code posted to SPSSX-L mailing list by Marta Garcia-Granera, 20-Aug-2009.

* Example dataset (replace by your own) *.
GET FILE='GSS 93 for Missing Values.sav'.
DATASET NAME OriginalData.

* Don't change anything here *.
PRESERVE.
SET OLANG=ENGLISH.
OMS SELECT TABLES
/IF SUBTYPES=['Crosstabulation']
/DESTINATION FORMAT=SAV NUMBERED='id' OUTFILE='C:\Temp\table.sav'.

* Crosstabulations (replace by your own variables) *.
* Grouping variables in rows, the proportion variable alone in the column *.
CROSSTABS
  /TABLES=degree wrkstat polviews BY sex
  /FORMAT= AVALUE TABLES
  /CELLS= COUNT ROW
  /COUNT ROUND CELL .

* Don't change anything here *.
OMSEND.
GET FILE='C:\Temp\table.sav'
 /DROP= Command_ TO Var1.
DATASET NAME ProcessedData.

* Eliminate superfluous rows of data (step language-dependent,
  needs SET OLANG=ENGLISH) *.
SELECT IF (Var3 EQ 'Count') AND (Var2 NE '').
COMPUTE id=$casenum.
EXECUTE. /* Needed for next command *.
DELETE VARIABLES Var3.

* This SPSS code is adapted from a macro by Dr. Robert G. Newcombe,
* University of Wales College of Medicine, Cardiff, UK.
* It calculates a confidence interval for a proportion x/n,
* using an appropriate method
* (E.B.Wilson. J Am Stat Assoc 1927, 22, 209-212).

* This part of the code is dataset-independent
  (even the names of the variables are authomatically read),
  and can be left unmodified, unless 99% CI are needed or
  CI for the second column (instead of first) is wanted .

MATRIX.
PRINT /TITLE='NEWCOMBE METHOD: CI FOR A PROPORTION'.
GET data /FILE = * /NAMES = namevec.
GET rnames /VAR = var2.
COMPUTE vnames = namevec(3:5).
PRINT data(:,3:5)
 /FORMAT='F8.0'
 /CNAMES=vnames
 /RNAMES=rnames
 /TITLE='Input data (first column is used to compute proportions & CI
limits)'.
COMPUTE  id = data(:,1)./* Matching variable *.
COMPUTE num = data(:,3)./* Replace by data(:,4) if interested in 2nd
value *.
COMPUTE den = data(:,5).
COMPUTE   p = num/den .
COMPUTE   z = MAKE(NROW(data),1,1.959964)./* Use
MAKE(NROW(data),1,2.575829) for 99%CI *.
COMPUTE  x1 = 2*num+z&**2 .
COMPUTE  x2 = z&*SQRT(z&**2+4*num&*(1-p)).
COMPUTE  x3 = 2*(den+z&**2) .
COMPUTE  x4 = (x1-x2)/x3 .
COMPUTE  x5 = (x1+x2)/x3 .
PRINT {100*p,100*x4,100*x5}
 /FORMAT='F8.2'
 /TITLE='Point estimate & 95%CI for a proportion'
 /RNAMES=rnames
 /CLABELS='Point','Lower','Upper'.
* Export data *.
COMPUTE outdata = {id,100*p,100*x4,100*x5}.
COMPUTE outname = {'id','p','lower','upper'}.
SAVE outdata /OUTFILE = 'C:\Temp\ProportionCI.sav' /NAMES = outname.
END MATRIX.
MATCH FILES /FILE=*
 /FILE='C:\Temp\ProportionCI.sav'
 /BY id.
SUMMARIZE
  /TABLES=Var2 p lower upper
  /FORMAT=LIST NOCASENUM NOTOTAL
  /TITLE='Point estimates & 95%CI for one proportion'
  /FOOTNOTE 'Wilson method'
  /CELLS=NONE.

* HTH,
* Marta GG .


ipnyc wrote
Hello,

Per my title, is there a way to calculate a 95% CI for a proportion that is close to 0 (0.1% to be more specific)? There are plenty of online calculators but I was hoping to avoid calculating manually. I am using IBM SPSS 19.
Thank you!
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).