First of all, I agree with Hector that there is no need to go on
thanking me again and again (not even privately, I assure you). If these tutorials are useful to someone in this list that will be enough for me. Anyway, thanks to all who wrote to me for that. Second, I hope this message reaches the list. It looks like my ISP (Terra) has been included in SpamHaus blacklists for allowing some SPAM, and I don't know if the problem has been solved o not, but most my outgoing mail has been blocked (they blacklisted the mail server, not a range of IPs). Now, the next lesson. I'm taking the weekend off, don't expect much messages from me until monday... ************ COMPARING TWO SAMPLES (CONTINUOUS VARIABLES) ************ Again, sample data are extracted from Swinscow&Campbell's "Statistic at Square One" (http://www.bmj.com/collections/statsbk/) * A) PAIRED DATA *. BACKGROUND (from the book): The addition of bran to the diet has been reported to benefit patients with diverticulosis. Several different bran preparations are available, and a clinician wants to test the efficacy of two of them on patients, since favourable claims have been made for each. Among the consequences of administering bran that requires testing is the transit time through the alimentary canal. A random sample of patients with disease of comparable severity and aged 20-44 is chosen and the two treatments administered on two successive occasions, the order of the treatments also being determined from the table of random numbers. * Sample dataset (Table 7.2 Transit times: paired comparison) *. DATA LIST LIST/time_a time_b (2 F8.0). BEGIN DATA 63 55 54 62 79 108 68 77 87 83 84 78 92 79 57 94 66 69 53 66 76 72 63 77 END DATA. VARIABLE LABEL time_a 'Transit times with A (h)'/ time_b 'Transit times with B (h)'. This are clearly paired data. Quoting from Swinscow&Campbell's book: "Why should I use a paired test if my data are paired? What happens if I don't?: Pairing provides information about an experiment, and the more information that can be provided in the analysis the more sensitive the test. One of the major sources of variability is between subjects variability. By repeating measures within subjects, each subject acts as its own control, and the between subjects variability is removed. In general this means that if there is a true difference between the pairs the paired test is more likely to pick it up: it is more powerful. When the pairs are generated by matching the matching criteria may not be important. In this case, the paired and unpaired tests could give similar results." Statistical analyses: Again, we have to check for normality, but, only the differences will be studied (normality of time_a and time_b is not a condition). We need to compute them manually and run a full EDA on them (but with this low sample size we will not ask for a histogram). Warning: this low sample size makes normality testing a bit problematic too, due to lack of power of normality tests; we should also take a look at skewness coefficient too (and outliers...). With less than 10 cases, all normality tests should be discarded. The situation is even worse with K-S(Lilliefors) (see "Preliminary testing for normality: some statistical aspects of a common concept" by V. Schoder, A. Himmelmann and K. P. Wilhelm, published in Clinical Dermatology in 2006). I don't have a reference here right now, but I recall that Shapiro-Wilk test performance was a bit better than K-S or other normality tests (like D'Agostino-Pearson...). COMPUTE DiffTimes=time_b - time_a. *** EDA ***. EXAMINE VARIABLES=DiffTimes /PLOT BOXPLOT STEMLEAF NPPLOT /COMPARE GROUP /PERCENTILES(5,10,25,50,75,90,95) /STATISTICS DESCRIPTIVES /CINTERVAL 95. The variable (differences) doesn't have outliers (see stem&leaf and box-plot graphs), it also is fairly symmetric (skweness is not important: its absolute value is below 1 and less than twice its standard error - a simple Z test) and SW and KS tests are clearly non significant (but of little use give the sample size). We can use a parametric method: Two-sample t-test (for paired data). Data (differences) can be then summarized (for descriptive purposes) using the mean and the standard deviation (SD) of the differences. It is a common mistake to present the mean&SD for time_a and time_b variables, instead of summarizing the differences. *** PARAMETRIC TEST ***. T-TEST PAIRS = time_a WITH time_b /CRITERIA = CI(.95). The output includes a 95%CI for the mean differences. Some statisticians consider that p-values should be abandoned and only those CI should be presented (since they give the same information a p-value gives, if there is or not a significant difference between time_a and time_b - a CI not including 0 between its limits is significant whereas a CI that includes 0 is not). See "Confidence Intervals as an Alternative to Significance Testing" by Eduard Brandstätter1 and Johannes Kepler (Methods of Psychological Research Online 1999, Vol.4, No.2) for a full discussion. ********************************************* Again, if differences were notnormally distributed, non-parametric tests should be used. Symmetry plays the same role as in one-sample testing for choosing between Wilcoxon and Sign test. The correct summary is the median&IQR of the differences (we got them with the EDA, above). *** Wilcoxon test ***. NPAR TEST /WILCOXON=time_a WITH time_b. SPSS will always give the asymptotic p-value for this test, even if sample size is small (when it becomes unreliable). If the Exact Tests module is installed and sample size - excluding ties - is below 25 (like in this case, n=12), you can ask for the exact p-value adding "/METHOD=EXACT TIMER(1)" to the above syntax. If you don't have that module, then you have to check the significance with a Wilcoxon table, like this one: Critical values of the Wilcoxon Matched Pairs Signed Rank Test For any N (number of subjects minus ties) the observed value is significant At a given level of significance if it is equal to or less than the critical value Shown in the table below 1-tailed test 2-tailed test N p<0.05 p<0.01 p<0.05 p<0.01 ------------------------------- 5 1 - - - 6 2 - 1 - 7 4 0 2 - 8 6 2 4 0 9 8 3 6 2 10 11 5 8 3 11 14 7 11 5 12 17 10 14 7 13 21 13 17 10 14 26 16 21 13 15 30 20 25 16 16 36 24 30 19 17 41 28 36 23 18 47 33 40 28 19 54 38 46 32 20 60 43 52 37 21 68 49 59 43 22 75 56 66 49 23 83 62 73 55 24 92 69 81 61 25 101 77 90 68 ------------------------------- How is this table used? First, from SPSS output (Ranks table), we locate two important data: - Are there tied data? If the answer is yes, then compute N=total-ties. In this case, there are no tied data, and N=12 - The smallest of the two sum of ranks. We will call this statistic T. In this case T=23. Now we go to the Wilcoxon table and locate the row where N=12, advance to "2-tailed test p<0.05" column and read the value: Tcrit=12. The result of the test will be significant if T (the statistic we got from the Ranks table) is lower than or equal to Tcrit (the value we read from the Wilcoxon table). In this case, T=23 > Tcrit=14. The result is clearly non significant. *** Sign test ***. NPAR TEST /SIGN= time_a WITH time_b. Here SPSS will either give the exact or the asymptotic p-value, depending on sample size (exact p-value in this case, bacause n=12). For the 95%CI for the median, we could use the same trick I explained on my previous tutorial, but... - We must use the differences, and since some of them can be null or even negative, RATIO will discard those data and give a false CI, based only in positive differences. - This can be avoided by adding a constant, like 100, to all data before running RATIO to render them all positive (and substract 100 afterwards from the CI limits we get) * We could make this task authomatic with a MACRO that used OMS to extract the CI limits, substract 100 and then print the results (up to you...) *. TEMPORARY. COMPUTE One=1. COMPUTE DiffTimes= 100 + DiffTimes. RATIO STATISTICS DiffTimes WITH one /PRINT = CIN(95) MEDIAN . The 95%CI goes from 94 to 114; substracting 100: (-6; 14), median diff = 5.5. This method, based on RATIO procedure, doesn't need symmetry of the differences (it is adequate if we are complementing sign test). If we can assume symmetry, then we can use another method, based on Wilcoxon test, that is described in detail in Douglas Altman's book "Statistics with Confidence": * Method based on Wilcoxon test (with a MACRO) *. DEFINE MedianCIPaired (!POS=!TOKENS(2)/k=!DEFAULT(1000) !TOKENS(1)). PRESERVE. SET MXLOOPS=!k. /* If [n(n+1)/2]>1000, change MXLOOPS accordingly *. MATRIX. PRINT /TITLE='CONFIDENCE INTERVAL FOR DIFFERENCES BETWEEN TWO MEDIANS (PAIRED)'. PRINT /TITLE='Underlying assumption: distribution of differences is symmetrical'. * Get data: replace variable names by your own (order will affect the sign) *. GET data /VAR = !1 /NAMES = vname /MISSING = OMIT. COMPUTE n=NROW(data). COMPUTE d=data(:,1)-data(:,2). RELEASE data. * Compute all averages *. COMPUTE nt=n*(n+1)/2. COMPUTE dmean=MAKE(1,nt,0). COMPUTE counter=1. LOOP i=1 TO n. - LOOP j=i TO n. - COMPUTE dmean(counter)=(d(i)+d(j))/2. - COMPUTE counter=counter+1. - END LOOP. END LOOP. * Sorting algorithm (R Ristow & J Peck) *. COMPUTE sdmean=dmean. COMPUTE sdmean(GRADE(dmean)) = dmean. RELEASE counter,dmean. * Compute median of all averages *. COMPUTE pair=(TRUNC(nt/2) EQ nt/2). /* Check if nt is odd (0) or even (1) *. DO IF pair EQ 0. /* Median formula for odd samples *. - COMPUTE median=sdmean((nt+1)/2). ELSE. /* Median formula for even samples *. - COMPUTE median=(sdmean(nt/2)+sdmean(1+nt/2))/2. END IF. PRINT median /TITLE='Difference between population medians (A - B) ' /RLABEL='Point' /FORMAT='F8.1'. * Exact or asymptotic 95% & 99% CI *. DO IF n LE 50. /* Exact Wilcoxon's critical values (Table 18.6 of Altman's book) *. - COMPUTE w95={0,0,0,0,0,1,3,4,6,9,11,14,18,22,26,30,35,41,47,53,59,66,74,82,90, 99,108,117,127,138,148,160,171,183,196,209,222,236,250,265,280,295 ,311,328,344,362,379,397,416,735}. - COMPUTE w99={0,0,0,0,0,0,0,0,1,2,4,6,8,10,13,16,20,24,28,33,38,43,49,55,62,69, 76,84,92,101,110,119,129,139,149,160,172,183,195,208,221,234,248, 262,277,292,308,323,340,356,374}. - COMPUTE u95=w95(n). - COMPUTE u99=w99(n). - RELEASE w95,w99. - PRINT /TITLE'Exact confidence intervals calculated (N=<50)'. ELSE. /* Asymptotic Wilcoxon's critical values *. - COMPUTE u95=1+TRUNC(nt/2-1.959964*sqrt(n*(n+1)*+(2*n+1)/24)). - COMPUTE u99=1+TRUNC(nt/2-2.575829*sqrt(n*(n+1)*+(2*n+1)/24)). - PRINT /TITLE='Asymptotic confidence intervals calculated (N>50)'. END IF. DO IF u95 EQ 0. - PRINT /TITLE='Warning: sample size is too low (<6) for accurate 95%CI. Discard it.'. - COMPUTE u95=1. /* Replace 0 by 1 to avoid a computing error *. END IF. DO IF u99 EQ 0. - PRINT /TITLE='Warning: sample size is too low (<9) for accurate 99%CI. Discard it.'. - COMPUTE u99=1. /* Replace 0 by 1 to avoid a computing error *. END IF. COMPUTE low95=sdmean(u95). COMPUTE high95=sdmean(nt+1-u95). COMPUTE low99=sdmean(u99). COMPUTE high99=sdmean(nt+1-u99). PRINT {low95,high95;low99,high99} /FORMAT='F8.1' /TITLE='CI for difference between medians' /RLABEL='95%Level' '99%Level' /CLABEL='Lower' 'Upper'. END MATRIX. RESTORE. !ENDDEFINE. * MACRO call *. MedianCIPaired time_b time_a. * B) UNPAIRED DATA *. Background: The addition of bran to the diet has been reported to benefit patients with diverticulosis. Several different bran preparations are available, and a clinician wants to test the efficacy of two of them on patients, since favourable claims have been made for each. Among the consequences of administering bran that requires testing is the transit time through the alimentary canal. Does it differ in the two groups of patients taking these two preparations? The assumptions are: 1. that the data are plausibly Normal 2. that the two samples come from distributions that may differ in their mean value, but not in the standard deviation (Homogeneity of Variances - HOV - condition) The second condition must be tested with Levene test (although other methods exist: Bartlett, Cochran, Hartley's test...). If smaple sizes are samll, we'll face the same problem we had when test for normality: lack of power. One rule of thumb, independent of sample size, is to suspect lack of HOV if the ratio of the biggest/smallest SD is over 2. If HOV fails, then the t-test must be modified (it is called Welch test): both the degrees of freedom and the standard error of the difference are corrected before computing the t-statistics and its significance. * Table 7.1 Transit times: unpaired comparison *. DATA LIST FREE/treatmnt trantime (2 F8.0). BEGIN DATA 1 44 1 51 1 52 1 55 1 60 1 62 1 66 1 68 1 69 1 71 1 71 1 76 1 82 1 91 1 108 2 52 2 64 2 68 2 74 2 79 2 83 2 84 2 88 2 95 2 97 2 101 2 116 END DATA. VARIABLE LABEL treatmnt 'Treatments'/trantime 'Transit times (h)'. VALUE LABEl treatmnt 1'Treatment A' 2'Treatment B'. VAR LEVEL treatmnt (NOMINAL). Statistical analyses: Again, we have to check for normality, but inside each treatment group. Since sample sizes are a bit low we will not pay to much attention to the SW or KS(Lilliefors) p-values, but will focus on outliers and skewness. We will also ask for Levene test ("hidden" inside /PLOT subcommand - SPREAD -) EXAMINE VARIABLES=trantime BY treatmnt /PLOT BOXPLOT STEMLEAF NPPLOT SPREADLEVEL(1) /COMPARE GROUP /PERCENTILES(5,10,25,50,75,90,95) /STATISTICS DESCRIPTIVES /CINTERVAL 95 /NOTOTAL. The box-plot shows an outlier in sample A. Since it is not an extreme outlier (it is not more than 3 IQR from the 3rd quartil), its impact on overall normality is low (skewness is not very big), but we could do both tests (parametric & non-parametric) and compare them, just in case. We don't suspect lack of HOV (ratio of biggest/smallest SD is 17.6/16.5, clearly below 2 and Levene test, based on means, is clearly non significant. Well discuss the "other" Levenes later (when we talk about non-parametric testing). T-TEST GROUPS = treatmnt(1 2) /VARIABLES = trantime /CRITERIA = CI(.95) . Results: Levene test (based on the mean, the same we got with EDA) helps us to choose between the first row ("Equal variances assumed") or the second row ("Equal variances not assumed"). Sometimes I've seen people conclude that the means were not different, just because the HOV test was non significant. The test for the equality of means is shown to the right (2-tailed p=0.031). The 95%CI (last columns to the right) go from -28.6 to -1.5, indicating also a significant result (the limits don't include 0). If we had suspected lack of HOV, Welch test would have been used (2nd row), with the following results: p=0.033; Mean Diff.=-15.0, 95%CI:(-28.7 to -1.3). ***************************************************************************** If we suspected that the variable is not normally distributed (in at least one of the groups), since our sample sizes are too low to rely on the robustness of the t-test, we could use a non-parametric test: Mann-Whitney's U test or median test. Mann-Whitney test is also called Wilcoxon-Mann-Whitney test, since there is test, called the Wilcoxon Rank Sum test (not to be mistaken with the one for paired data, called Wilcoxon Signed Ranks test( that is mathematically equivalent to the Mann-Whitney test. To be able to use Mann-Whitney's U test, we have to check first that both sample distributions are similar in shape (see SPSS help) and spread (see: http://www.bmj.com/cgi/content/full/323/7309/391). Quoting again from Campbell&Swinscow's book (chapter 10): "Do non-parametric tests compare medians? It is a commonly held belief that a Mann-Whitney U test is in fact a test for differences in medians. However, two groups could have the same median and yet have a significant Mann-Whitney U test. Consider the following data for two groups, each with 100 observations. Group 1: 98 (0), 1, 2; Group 2: 51 (0), 1, 48 (2). The median in both cases is 0, but from the Mann-Whitney test P<0.0001. Only if we are prepared to make the additional assumption that the difference in the two groups is simply a shift in location (that is, the distribution of the data in one group is simply shifted by a fixed amount from the other) can we say that the test is a test of the difference in medians... ". All this means that before using MW test, we should take a look at all indicators of shape and spread: skewness & kurtosis of both samples and robusts tests for spread (Levene based on the median - adecuate when the distributions to be compared are skewed -, and Levene based on trimmed mean - adequate when the distributions have high kurtosis). If we use MW test in distributions with different chapes/spread, then we should take into account that we are not testing differences in medians only. Median test is robust to differences in shape/spread, and can be used when MW test is not adequate (when we want to focus in median differences only). *** Mann-Whitney's U test ***. NPAR TESTS /M-W= trantime BY treatmnt(1 2). When sample sizes are small (smaller than 25?, I'm not really sure of this figure), SPSS gives both the exact and asymptotic - corrected for ties - p-values. Unless your data are heavily tied (take a look at them before the analysis, or rank them manually using: "RANK VAR=trantime(A)." and run FREQUENCIES on the ranked variable), the general rule is that if the exact p-value is given, it will be more accurate than the asymptotic. Confidence interval for the median differences: this time RATIO can't help us... We will use a method (based on MW test - it assumes distributions similar in shape/spread) described in detail in Altman's book "Statistics with Confidence": * MACRO definition *. DEFINE MedianCIUnpaired (!POS=!TOKENS(2)/k=!DEFAULT(1000) !TOKENS(1)). PRESERVE. SET MXLOOPS=!k. /* If n1*n2>1000, change MXLOOPS accordingly *. MATRIX. PRINT /TITLE='CONFIDENCE INTERVAL FOR DIFFERENCES BETWEEN TWO MEDIANS (UNPAIRED)'. PRINT /TITLE='Underlying assumption: the distributions are similar in shape'. * Get unsorted data; replace variable names by your own (grouping variable goes first) *. GET nsdata /VAR = !1 /NAMES = vname /MISSING = OMIT. * First of all, sort them by grouping variable (algorithm by R Ristow & J Peck) *. COMPUTE data = nsdata. COMPUTE data(GRADE(nsdata(:,1)),:) = nsdata. RELEASE nsdata. * Get group values, count sample sizes & split data in two groups *. COMPUTE totaln=NROW(data). COMPUTE ngrp1=data(1,1). COMPUTE ngrp2=data(totaln,1). COMPUTE groupvar=vname(1). PRINT {ngrp1;ngrp2} /FORMAT='F8.0' /RLABEL='Group A=' 'Group B=' /CNAME=groupvar /TITLE='Group values'. COMPUTE n=CSUM(DESIGN(data(:,1))). PRINT {T(n)} /FORMAT='F8.0' /RLABEL=' N(a)=' ' N(b)=' /CNAME=groupvar /TITLE='Sample sizes'. DO IF RMIN(n) GT 1. /* Don't compute CI if any n(i)=1. COMPUTE group1=data(1:n(1),2). COMPUTE group2=data((n(1)+1):totaln,2). * Compute vector of all differences *. COMPUTE n1n2=n(1)&*n(2). COMPUTE diff=MAKE(1,n1n2,0). COMPUTE counter=1. LOOP i=1 TO n(1). - LOOP j=1 TO n(2). - COMPUTE diff(counter)=group1(i)-group2(j). - COMPUTE counter=counter+1. - END LOOP. END LOOP. * Sort differences (R Ristow & J Peck) *. COMPUTE sdiff = diff. COMPUTE sdiff(grade(diff)) = diff. RELEASE data,diff,group1,group2. /* Get rid of now useless data *. * Compute median of differences *. COMPUTE pair=(TRUNC(n1n2/2) EQ n1n2/2). /* Check if n1n2 is odd (0) or even (1) *. DO IF pair EQ 0. /* Median formula for odd samples. - COMPUTE median=sdiff((n1n2+1)/2). ELSE. /* Median formula for even samples *. - COMPUTE median=(sdiff(n1n2/2)+sdiff(1+(n1n2/2)))/2. END IF. RELEASE pair. COMPUTE depvarn=vname(2). PRINT median /TITLE='Difference between population medians (A - B) ' /CNAME=depvarn /RLABEL=' Point' /FORMAT='F8.1'. * Exact or asymptotic 95% & 99% CI *. DO IF ((n(1) LE 25) AND (n(2) LE 25)). /* Exact critical values (Table 18.5 of Altman's book)*. - COMPUTE d95={0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0; 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0; 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0; 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0; 0,0,0,0,3,4,6,7,8,9,10,12,13,14,15,16,18,19,20,21,23,24,25,26,28; 0,0,0,0,4,6,7,9,11,12,14,15,17,18,20,22,23,25,26,28,30,31,33,34,36; 0,0,0,0,6,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45; 0,0,0,0,7,9,11,14,16,18,20,23,25,27,30,32,35,37,39,42,44,46,49,51,54; 0,0,0,0,8,11,13,16,18,21,24,27,29,32,35,38,40,43,46,49,51,54,57,60,63; 0,0,0,0,9,12,15,18,21,24,27,30,34,37,40,43,46,49,53,56,59,62,65,68,72; 0,0,0,0,10,14,17,20,24,27,31,34,38,41,45,48,52,56,59,63,66,70,74,77,81; 0,0,0,0,12,15,19,23,27,30,34,38,42,46,50,54,58,62,66,70,74,78,82,86,90; 0,0,0,0,13,17,21,25,29,34,38,42,46,51,55,60,64,68,73,77,81,86,90,95,99; 0,0,0,0,14,18,23,27,32,37,41,46,51,56,60,65,70,75,79,84,89,94,99,103,108; 0,0,0,0,15,20,25,30,35,40,45,50,55,60,65,71,76,81,86,91,97,102,107,112,118; 0,0,0,0,16,22,27,32,38,43,48,54,60,65,71,76,82,87,93,99,104,110,116,121,127; 0,0,0,0,18,23,29,35,40,46,52,58,64,70,76,82,88,98,100,106,112,118,124,130,136; 0,0,0,0,19,25,31,37,43,49,56,62,68,75,81,87,98,100,107,113,120,126,133,139,146; 0,0,0,0,20,26,33,39,46,53,59,66,73,79,86,93,100,107,114,120,127,134,141,148,155; 0,0,0,0,21,28,35,42,49,56,63,70,77,84,91,99,106,113,120,128,135,142,150,157,164; 0,0,0,0,23,30,37,44,51,59,66,74,81,89,97,104,112,120,127,135,143,151,158,166,174; 0,0,0,0,24,31,39,46,54,62,70,78,86,94,102,110,118,126,134,142,151,159,167,175,183; 0,0,0,0,25,33,41,49,57,65,74,82,90,99,107,116,124,133,141,150,158,167,176,184,193; 0,0,0,0,26,34,43,51,60,68,77,86,95,103,112,121,130,139,148,157,166,175,184,193,202; 0,0,0,0,28,36,45,54,63,72,81,90,99,108,118,127,136,146,155,164,174,183,193,202,212}. - COMPUTE d99={0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0; 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0; 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0; 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0; 0,0,0,0,1,2,2,3,4,5,6,7,8,8,9,10,11,12,13,14,15,15,16,17,18; 0,0,0,0,2,3,4,5,6,7,8,10,11,12,13,14,16,17,18,19,20,22,23,24,25; 0,0,0,0,2,4,5,7,8,10,11,13,14,16,17,19,20,22,23,25,26,28,30,31,33; 0,0,0,0,3,5,7,8,10,12,14,16,18,19,21,23,25,27,29,31,33,35,36,38,40; 0,0,0,0,4,6,8,10,12,14,17,19,21,23,25,28,30,32,34,37,39,41,44,46,48; 0,0,0,0,5,7,10,12,14,17,19,22,25,27,30,32,35,38,40,43,45,48,51,53,56; 0,0,0,0,6,8,11,14,17,19,22,25,28,31,34,37,40,43,46,49,52,55,58,61,64; 0,0,0,0,7,10,13,16,19,22,25,28,32,35,38,42,45,48,52,55,59,62,65,69,72; 0,0,0,0,8,11,14,18,21,25,28,32,35,39,43,46,50,54,58,61,65,69,73,76,80; 0,0,0,0,8,12,16,19,23,27,31,35,39,43,47,51,55,59,64,68,72,76,80,84,88; 0,0,0,0,9,13,17,21,25,30,34,38,43,47,52,56,61,65,70,74,79,83,88,92,97; 0,0,0,0,10,14,19,23,28,32,37,42,46,51,56,61,66,71,75,80,85,90,95,100,105; 0,0,0,0,11,16,20,25,30,35,40,45,50,55,61,66,71,76,82,87,92,97,103,108,113; 0,0,0,0,12,17,22,27,32,38,43,48,54,59,65,71,76,82,88,93,99,105,110,116,122; 0,0,0,0,13,18,23,29,34,40,46,52,58,64,70,75,82,88,94,100,106,112,118,124,130; 0,0,0,0,14,19,25,31,37,43,49,55,61,68,74,80,87,93,100,106,113,119,126,132,139; 0,0,0,0,15,20,26,33,39,45,52,59,65,72,79,85,92,99,106,113,119,126,133,140,147; 0,0,0,0,15,22,28,35,41,48,55,62,69,76,83,90,97,105,112,119,126,134,141,148,156; 0,0,0,0,16,23,30,36,44,51,58,65,73,80,88,95,103,110,118,126,133,141,149,156,164; 0,0,0,0,17,24,31,38,46,53,61,69,76,84,92,100,108,116,124,132,140,148,156,165,173; 0,0,0,0,18,25,33,40,48,56,64,72,80,88,97,105,113,122,130,139,147,156,164,173,181}. - COMPUTE u95=d95(n(1),n(2)). - COMPUTE u99=d99(n(1),n(2)). - PRINT /TITLE='Exact confidence intervals calculated (N(a) and N(b) =< 25)'. - DO IF u95 EQ 0. - PRINT /TITLE'Warning: sample sizes are too low for accurate 95%CI. Discard it.'. - COMPUTE u95=1. /* Replace 0 by 1 to avoid a computing error *. - END IF. - DO IF u99 EQ 0. - PRINT /TITLE='Warning: sample sizes are too low for accurate 99%CI. Discard it.'. - COMPUTE u99=1. /* Replace 0 by 1 to avoid a computing error *. - END IF. - RELEASE d95,d99. ELSE. /* Asymptotic critical values *. - COMPUTE u95=1+TRUNC(n1n2/2-1.959964*sqrt(n1n2*(n(1)+n(2)+1)/12)). - COMPUTE u99=1+TRUNC(n1n2/2-2.575829*sqrt(n1n2*(n(1)+n(2)+1)/12)). - PRINT /TITLE'Asymptotic confidence intervals calculated (N(a) and/or N(b) > 25)'. END IF. COMPUTE low95=sdiff(u95). COMPUTE high95=sdiff(n1n2+1-u95). COMPUTE low99=sdiff(u99). COMPUTE high99=sdiff(n1n2+1-u99). PRINT {low95,high95;low99,high99} /FORMAT='F8.1' /TITLE='CI for difference between medians' /RLABEL='95%Level' '99%Level' /CLABEL='Lower' 'Upper'. ELSE. - PRINT /TITLE='At least one sample size = 1. No CI can be calculated.'. END IF. END MATRIX. RESTORE. !ENDDEFINE. * MACRO call (grouping variable gous first) *. MedianCIUnpaired treatmnt trantime. WINPEPI output for the same data (same result as my MACRO, of course - I checked the accuracy of MACRO before posting it): Difference between population medians* (A - B) = -16.0 Approx. 95% C.I. = -29.0 to -2.0 *Assuming the distributions are similar in shape. *** Median test ***. NPAR TESTS /MEDIAN=trantime BY treatmnt(1 2). No 95%CI for median differences can be computed when distributions are different (at least, none that I have found). WinPepi doesn't include one either. Regards, Marta García-Granero |
Free forum by Nabble | Edit this page |