Hello,
Per my title, is there a way to calculate a 95% CI for a proportion that is close to 0 (0.1% to be more specific)? There are plenty of online calculators but I was hoping to avoid calculating manually. I am using IBM SPSS 19. Thank you! |
Administrator
|
What formula would one use to calculate this?
There is the COMPUTE statement in SPSS.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by ipnyc
The PROPOR extension command provides binomial
and Poisson CIs. It can take aggregate counts in the dialog or pass
the data in syntax.
It requires the Python Essentials. Both the Essentials and the command can be downloaded via the SPSS Community website (www.ibm.com/developerworks/spssdevcentral). Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: ipnyc <[hidden email]> To: [hidden email], Date: 08/07/2013 05:04 PM Subject: [SPSSX-L] Calculating CI in SPSS when percentage is close to 0 Sent by: "SPSSX(r) Discussion" <[hidden email]> Hello, Per my title, is there a way to calculate a 95% CI for a proportion that is close to 0 (0.1% to be more specific)? There are plenty of online calculators but I was hoping to avoid calculating manually. I am using IBM SPSS 19. Thank you! -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Calculating-CI-in-SPSS-when-percentage-is-close-to-0-tp5721509.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by David Marso
Friends,
I have been using SPSS for a while along with Syntax but just couldn't figure out easy solution to following issue: Issue: I need to APPEND or add 10+ data files. Files do share few common variables but each file also has new variables. For e.g., file 1 has 10 common variables but 30 new variables only unique to this file. I can easily add the files but the issue is that each file has different number format for common variables. For e.g., VAR1 is F30 in file 1 but F40 in file 2 and so on. Objective: 1. Create long list of variables that covers all variables across 10 files. THis will be master long list of variables 2. Get maximum value of column width and apply to all common variables across files. For e.g., if VAR1 is F30 in file 1 and f40 in file 2 then file 1 VAR1's should be replaced as F40. Currently, I am doing this in excel by comparing data dictionary of all 10 files and it is soul destroying process. 3. Once variable format is same across files, I would then append all the files in one go. I tried to find solution on spss tools net but couldn't find anything, except comparing two datasets which really doesn't solve my problem. Any help is greatly appreciated - Jon P, I would appreciate if you can share your expertise here. Manmit ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
For numeric variables, the format is irrelevant
for the merge. Each merged variable will have the format from
the first file where it is encountered, but no data will be lost. For
string variables the widths do have to match, unfortunately, but the STATS
ADJUST WIDTHS extension command available from the SPSS Community site
can synchronize strings widths across files.
Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: MR <[hidden email]> To: [hidden email], Date: 08/07/2013 06:59 PM Subject: [SPSSX-L] Merging multiple challenging files Sent by: "SPSSX(r) Discussion" <[hidden email]> Friends, I have been using SPSS for a while along with Syntax but just couldn't figure out easy solution to following issue: Issue: I need to APPEND or add 10+ data files. Files do share few common variables but each file also has new variables. For e.g., file 1 has 10 common variables but 30 new variables only unique to this file. I can easily add the files but the issue is that each file has different number format for common variables. For e.g., VAR1 is F30 in file 1 but F40 in file 2 and so on. Objective: 1. Create long list of variables that covers all variables across 10 files. THis will be master long list of variables 2. Get maximum value of column width and apply to all common variables across files. For e.g., if VAR1 is F30 in file 1 and f40 in file 2 then file 1 VAR1's should be replaced as F40. Currently, I am doing this in excel by comparing data dictionary of all 10 files and it is soul destroying process. 3. Once variable format is same across files, I would then append all the files in one go. I tried to find solution on spss tools net but couldn't find anything, except comparing two datasets which really doesn't solve my problem. Any help is greatly appreciated - Jon P, I would appreciate if you can share your expertise here. Manmit ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by MR
Please don't HIJACK other people's threads!
Begin a NEW topic rather than changing the subject line. When you do so I will consider reading and answering your question! --
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
David,
My mistake, sincerely apologies it. I will create a new thread. On 2013-08-07, at 10:10 PM, David Marso <[hidden email]> wrote: > Please don't HIJACK other people's threads! > Begin a NEW topic rather than changing the subject line. > When you do so I will consider reading and answering your question! > -- > > MR wrote >> Friends, >> >> I have been using SPSS for a while along with Syntax but just couldn't >> figure out easy solution to following issue: >> >> Issue: I need to APPEND or add 10+ data files. Files do share few common >> variables but each file also has new variables. For e.g., file 1 has 10 >> common variables but 30 new variables only unique to this file. >> >> I can easily add the files but the issue is that each file has different >> number format for common variables. For e.g., VAR1 is F30 in file 1 but >> F40 in file 2 and so on. >> >> Objective: >> 1. Create long list of variables that covers all variables across 10 >> files. THis will be master long list of variables >> 2. Get maximum value of column width and apply to all common variables >> across files. For e.g., if VAR1 is F30 in file 1 and f40 in file 2 then >> file 1 VAR1's should be replaced as F40. Currently, I am doing this in >> excel by comparing data dictionary of all 10 files and it is soul >> destroying process. >> 3. Once variable format is same across files, I would then append all the >> files in one go. >> >> I tried to find solution on spss tools net but couldn't find anything, >> except comparing two datasets which really doesn't solve my problem. >> >> Any help is greatly appreciated - Jon P, I would appreciate if you can >> share your expertise here. >> >> Manmit >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to > >> LISTSERV@.UGA > >> (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD > > > > > > ----- > Please reply to the list and not to my personal email. > Those desiring my consulting or training services please feel free to email me. > --- > "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." > Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Calculating-CI-in-SPSS-when-percentage-is-close-to-0-tp5721509p5721518.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Jon K Peck
Hi,
APPLY DICTIONARY has an option /NEWVARS that creates, well, new variables. If the variables are in the source dataset but not in the target dataset, they will be created. You could use that to build a (dummy) master file with all the dictionary info. Then you might still run into problems with string variables though. Regards, Albert-Jan ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us? ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
This problem reminds me of one grave risk: What if the same var-name
exists in two files, where it does NOT denote the same variable? This one can be nasty to discover and track down, if you don't prevent it in the first place. (Speaking from experience.) I learned to look with care at the MAPS provided for file merges. Most new/old variables occur in sets.... When a variable appears by itself, ask if it belongs there. You have text-variables? I think I would pre-process those relevant files, before tackling the fuller set. Maybe there is a reason that one master file is needed. However, when this sort of question comes up, I think it is appropriate to mention the alternatives. I don't think I ever created one file with all the data from all the scales and reports, though I occasionally had to work from files made that way -- I even had ones with multiple periods of data in one line. My first step in those cases was to "normalize" the data, a concept that I learned by reading about database management. Data that occurs together gets its own file. Data that are parallel, like "periods", become separate lines. Data that are essential for many analyses (age, sex, etc.) are stored together in one well-maintained master file, which has every assigned ID, including the ones that might be ruled out for non-participation. Data from separate sources are "reduced" as needed, to summary totals, composite scores, etc. Related sets - ones always used together - may be maintained together in one file. This gives me dozens of variables to worry about or names to scan, instead of hundreds. For analyses: MATCH FILES brings in two or three or four files that are needed for a particular analysis. -- Rich Ulrich Date: Wed, 7 Aug 2013 23:30:25 -0700 From: [hidden email] Subject: Re: Merging multiple challenging files To: [hidden email] Hi, APPLY DICTIONARY has an option /NEWVARS that creates, well, new variables. If the variables are in the source dataset but not in the target dataset, they will be created. You could use that to build a (dummy) master file with all the dictionary info. Then you might still run into problems with string variables though. Regards, ...snip, previousAlbert-Jan |
Administrator
|
In reply to this post by ipnyc
I assume you are concerned about the usual Wald method not working very well for extreme proportions. As you probably know, there are several alternatives that perform better. Personally, I like the Wilson method. Find below some syntax I wrote to compute various CIs, and below that, some syntax Marta GG posted to this list a few years ago. HTH.
* ================================================================== . * File: CI_for_proportion.SPS . * Date: 19-Nov-2012 . * Author: Bruce Weaver, bweaver@lakeheadu.ca . * ================================================================== . * Get confidence interval for a binomial proportion using: - Wald method - Adjusted Wald method (Agresti & Coull, 1998) - Wilson score method (identical to Ghosh's 1979 method) - Jeffreys method . * The data used here are from Table I in Newcombe (1998), Statistics in Medicine, Vol 17, 857-872. NEW FILE. DATASET CLOSE ALL. DATA LIST LIST /x(f8.0) n(f8.0) confid(f5.3) . BEGIN DATA. 81 263 .95 15 148 .95 0 20 .95 1 29 .95 81 263 .90 15 148 .90 0 20 .90 1 29 .90 81 263 .99 15 148 .99 0 20 .99 1 29 .99 16 48 .95 16 48 .99 END DATA. compute alpha = 1 - confid. compute p = x/n. compute q = 1-p. compute z = probit(1-alpha/2). * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ . * Wald method (i.e., the usual normal approximation). compute #se = SQRT(p*q/n). compute Lower1 = p - z*#se. if Lower1 LT 0 Lower1 = 0. compute Upper1 = p + z*#se. if Upper1 GT 1 Upper1 = 1. * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ . * Adjusted Wald method due to Agresti & Coull (1998). compute #p = (x + z**2/2) / (n + z**2). compute #q = 1 - #p. compute #se = SQRT(#p*#q/(n+z**2)). compute Lower2 = #p - z*#se. if Lower2 LT 0 Lower2 = 0. compute Upper2 = #p + z*#se. if Upper2 GT 1 Upper2 = 1. * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ . * Wilson score method (Method 3 in Newcombe, 1998) . * Code adapted from Robert Newcombe's code posted here: http://archive.uwcm.ac.uk/uwcm/ms/Robert2.html . * The method of Ghosh (1979), as described in Glass & Hopkins * (1996, p 326) is identical to Wilson's method. * Glass & Hopkins describe it as the "method of choice for all values of p and n" . COMPUTE #x1 = 2*n*p+z**2 . COMPUTE #x2 = z*(z**2+4*n*p*(1-p))**0.5 . COMPUTE #x3 = 2*(n+z**2) . COMPUTE Lower3 = (#x1 - #x2) / #x3 . COMPUTE Upper3 = (#x1 + #x2) / #x3 . * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ . * Jeffreys method shown on the IBM-SPSS website at * http://www-01.ibm.com/support/docview.wss?uid=swg21474963 . compute Lower4 = idf.beta(alpha/2,x+.5,n-x+.5). compute Upper4 = idf.beta(1-alpha/2,x+.5,n-x+.5). * ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ . * Format variables and list the results of all methods . formats p q Lower1 to Upper4 (f5.4). sort cases by p confid. list var x n confid p Lower1 to Upper4 . * Method 1: Wald method (i.e., the usual normal approximation) . * Method 2: Adjusted Wald method (using z**2/2 and z**2 rather than 2 and 4). * Method 3: Wilson score method (from Newcombe paper), identical to Ghosh (1979). * Method 4: Jeffreys method (http://www-01.ibm.com/support/docview.wss?uid=swg21474963). * Data from Newcombe (1998), Table I. variable labels x "Successes" n "Trials" p "p(Success)" confid "Confidence Level" Lower1 "Wald: Lower" Upper1 "Wald: Upper" Lower2 "Adj Wald: Lower" Upper2 "Adj Wald: Upper" Lower3 "Wilson score/Ghosh: Lower" Upper3 "Wilson score/Ghosh: Upper" Lower4 "Jeffreys: Lower" Upper4 "Jeffreys: Upper" . SUMMARIZE /TABLES=x n p confid Lower1 Upper1 Lower2 Upper2 Lower3 Upper3 Lower4 Upper4 /FORMAT=VALIDLIST NOCASENUM TOTAL /TITLE='Confidence Intervals for Binomial Proportions' /MISSING=VARIABLE /CELLS=NONE. * ================================================================== . **************************************************************** ** CONFIDENCE INTERVAL FOR A PROPORTION USING WILSON'S METHOD ** ** DATA ARE EXTRACTED FROM CROSSTAB TABLES USING OMS, AND ** ** MULTIPLE DATASETS ARE USED (REQUIRES SPSS 14 OR NEWER) ** **************************************************************** * Code posted to SPSSX-L mailing list by Marta Garcia-Granera, 20-Aug-2009. * Example dataset (replace by your own) *. GET FILE='GSS 93 for Missing Values.sav'. DATASET NAME OriginalData. * Don't change anything here *. PRESERVE. SET OLANG=ENGLISH. OMS SELECT TABLES /IF SUBTYPES=['Crosstabulation'] /DESTINATION FORMAT=SAV NUMBERED='id' OUTFILE='C:\Temp\table.sav'. * Crosstabulations (replace by your own variables) *. * Grouping variables in rows, the proportion variable alone in the column *. CROSSTABS /TABLES=degree wrkstat polviews BY sex /FORMAT= AVALUE TABLES /CELLS= COUNT ROW /COUNT ROUND CELL . * Don't change anything here *. OMSEND. GET FILE='C:\Temp\table.sav' /DROP= Command_ TO Var1. DATASET NAME ProcessedData. * Eliminate superfluous rows of data (step language-dependent, needs SET OLANG=ENGLISH) *. SELECT IF (Var3 EQ 'Count') AND (Var2 NE ''). COMPUTE id=$casenum. EXECUTE. /* Needed for next command *. DELETE VARIABLES Var3. * This SPSS code is adapted from a macro by Dr. Robert G. Newcombe, * University of Wales College of Medicine, Cardiff, UK. * It calculates a confidence interval for a proportion x/n, * using an appropriate method * (E.B.Wilson. J Am Stat Assoc 1927, 22, 209-212). * This part of the code is dataset-independent (even the names of the variables are authomatically read), and can be left unmodified, unless 99% CI are needed or CI for the second column (instead of first) is wanted . MATRIX. PRINT /TITLE='NEWCOMBE METHOD: CI FOR A PROPORTION'. GET data /FILE = * /NAMES = namevec. GET rnames /VAR = var2. COMPUTE vnames = namevec(3:5). PRINT data(:,3:5) /FORMAT='F8.0' /CNAMES=vnames /RNAMES=rnames /TITLE='Input data (first column is used to compute proportions & CI limits)'. COMPUTE id = data(:,1)./* Matching variable *. COMPUTE num = data(:,3)./* Replace by data(:,4) if interested in 2nd value *. COMPUTE den = data(:,5). COMPUTE p = num/den . COMPUTE z = MAKE(NROW(data),1,1.959964)./* Use MAKE(NROW(data),1,2.575829) for 99%CI *. COMPUTE x1 = 2*num+z&**2 . COMPUTE x2 = z&*SQRT(z&**2+4*num&*(1-p)). COMPUTE x3 = 2*(den+z&**2) . COMPUTE x4 = (x1-x2)/x3 . COMPUTE x5 = (x1+x2)/x3 . PRINT {100*p,100*x4,100*x5} /FORMAT='F8.2' /TITLE='Point estimate & 95%CI for a proportion' /RNAMES=rnames /CLABELS='Point','Lower','Upper'. * Export data *. COMPUTE outdata = {id,100*p,100*x4,100*x5}. COMPUTE outname = {'id','p','lower','upper'}. SAVE outdata /OUTFILE = 'C:\Temp\ProportionCI.sav' /NAMES = outname. END MATRIX. MATCH FILES /FILE=* /FILE='C:\Temp\ProportionCI.sav' /BY id. SUMMARIZE /TABLES=Var2 p lower upper /FORMAT=LIST NOCASENUM NOTOTAL /TITLE='Point estimates & 95%CI for one proportion' /FOOTNOTE 'Wilson method' /CELLS=NONE. * HTH, * Marta GG .
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Free forum by Nabble | Edit this page |