Graduate course on SPSS - Lesson 3

Graduate course on SPSS - Lesson 3

Marta García-Granero
Again, sample data are extracted from Swinscow&Campbell's "Statistic
at Square One" (


BACKGROUND (from the book): The addition of bran to the diet  has been
reported to benefit patients with diverticulosis. Several different
bran preparations are available, and a clinician wants to test the
efficacy of two of them on patients, since favourable claims have been
made for each. Among the consequences of administering bran that
requires testing is the transit time through the alimentary canal. A
random sample of patients with disease of comparable severity and aged
20-44 is chosen and the two treatments administered on two successive
occasions, the order of the treatments also being determined from the
table of random numbers.

* Sample dataset (Table 7.2 Transit times: paired comparison) *.
DATA LIST LIST/time_a time_b (2 F8.0).
63  55
54  62
79 108
68  77
87  83
84  78
92  79
57  94
66  69
53  66
76  72
63  77
VARIABLE LABEL time_a 'Transit times with A (h)'/
               time_b 'Transit times with B (h)'.

This are clearly paired data. Quoting from Swinscow&Campbell's book:

"Why should I use a paired test if my data are paired? What happens if
I don't?:

Pairing provides information about an experiment, and the more
information that can be provided in the analysis the more sensitive
the test. One of the major sources of variability is between subjects
variability. By repeating measures within subjects, each subject acts
as its own control, and the between subjects variability is removed.
In general this means that if there is a true difference between the
pairs the paired test is more likely to pick it up: it is more
powerful. When the pairs are generated by matching the matching
criteria may not be important. In this case, the paired and unpaired
tests could give similar results."

Statistical analyses:

Again, we have to check for normality, but, only the differences will
be studied (normality of time_a and time_b is not a condition). We
need to compute them manually and run a full EDA on them (but with
this low sample size we will not ask for a histogram). Warning: this
low sample size makes normality testing a bit problematic too, due to
lack of power of normality tests; we should also take a look at
skewness coefficient too (and outliers...). With less than 10 cases,
all normality tests should be discarded. The situation is even worse
with K-S(Lilliefors) (see "Preliminary testing for normality: some
statistical aspects of a common concept" by V. Schoder, A. Himmelmann
and K. P. Wilhelm, published in Clinical Dermatology in 2006). I don't
have a reference here right now, but I recall that Shapiro-Wilk test
performance was a bit better than K-S or other normality tests (like

COMPUTE DiffTimes=time_b - time_a.

*** EDA ***.

The variable (differences) doesn't have outliers (see stem&leaf and
box-plot graphs), it also is fairly symmetric (skweness is not
important: its absolute value is below 1 and less than twice its
standard error - a simple Z test) and SW and KS tests are clearly non
significant (but of little use give the sample size). We can use a
parametric method: Two-sample t-test (for paired data). Data
(differences) can be then summarized (for descriptive purposes) using
the mean and the standard deviation (SD) of the differences. It is a
common mistake to present the mean&SD for time_a and time_b variables,
instead of summarizing the differences.

  PAIRS = time_a WITH time_b
  /CRITERIA = CI(.95).

The output includes a 95%CI for the mean differences. Some
statisticians consider that p-values should be abandoned and only
those CI should be presented (since they give the same information a
p-value gives, if there is or not a significant difference between
time_a and time_b - a CI not including 0 between its limits is
significant whereas a CI that includes 0 is not). See "Confidence
Intervals as an Alternative to Significance Testing" by Eduard
Brandstätter1 and Johannes Kepler (Methods of Psychological Research
Online 1999, Vol.4, No.2) for a full discussion.


Again, if differences were notnormally distributed, non-parametric
tests should be used. Symmetry plays the same role as in one-sample
testing for choosing between Wilcoxon and Sign test. The correct
summary is the median&IQR of the differences (we got them with the
EDA, above).

*** Wilcoxon test ***.
 /WILCOXON=time_a WITH time_b.

SPSS will always give the asymptotic p-value for this test, even if
sample size is small (when it becomes unreliable). If the Exact Tests
module is installed and sample size - excluding ties - is below 25
(like in this case, n=12), you can ask for the exact p-value adding
"/METHOD=EXACT TIMER(1)" to the above syntax. If you don't have that
module, then you have to check the significance with a Wilcoxon table,
like this one:

Critical values of the Wilcoxon Matched Pairs Signed Rank Test For any
N (number of subjects minus ties) the observed value is significant At
a given level of significance if it is equal to or less than the
critical value Shown in the table below

   1-tailed test 2-tailed test
 N p<0.05 p<0.01 p<0.05 p<0.01
 5    1      -      -      -
 6    2      -      1      -
 7    4      0      2      -
 8    6      2      4      0
 9    8      3      6      2
10   11      5      8      3
11   14      7     11      5
12   17     10     14      7
13   21     13     17     10
14   26     16     21     13
15   30     20     25     16
16   36     24     30     19
17   41     28     36     23
18   47     33     40     28
19   54     38     46     32
20   60     43     52     37
21   68     49     59     43
22   75     56     66     49
23   83     62     73     55
24   92     69     81     61
25  101     77     90     68

How is this table used? First, from SPSS output (Ranks table), we
locate two important data:

- Are there tied data? If the answer is yes, then compute
  N=total-ties. In this case, there are   no tied data, and N=12
- The smallest of the two sum of ranks. We will call this statistic
  T. In this case T=23.

Now we go to the Wilcoxon table and locate the row where N=12, advance
to "2-tailed test p<0.05" column and read the value: Tcrit=12. The
result of the test will be significant if T (the statistic we got from
the Ranks table) is lower than or equal to Tcrit (the value we read
from the Wilcoxon table). In this case, T=23 > Tcrit=14. The result is
clearly non significant.

*** Sign test ***.
 /SIGN= time_a WITH time_b.

Here SPSS will either give the exact or the asymptotic p-value,
depending on sample size (exact p-value in this case, bacause n=12).

For the 95%CI for the median, we could use the same trick I explained
on my previous tutorial, but...

- We must use the differences, and since some of them can be null or
  even negative, RATIO will discard those data and  give a false CI,
  based only in positive differences.
- This can be avoided by adding a constant, like 100, to all data
  before running RATIO to render them all positive  (and substract 100
  afterwards from the CI limits we get)

* We could make this task authomatic with a MACRO that used OMS to
  extract the CI limits, substract 100 and then print the results (up
  to you...) *.

COMPUTE DiffTimes= 100 + DiffTimes.
  /PRINT =  CIN(95) MEDIAN .

The 95%CI goes from 94 to 114; substracting 100: (-6; 14), median diff = 5.5.

This method, based on RATIO procedure, doesn't need symmetry of the
differences (it is adequate if we are complementing sign test).

If we can assume symmetry, then we can use another method, based on
Wilcoxon test, that is described in detail in Douglas Altman's book
"Statistics with Confidence":

* Method based on Wilcoxon test (with a MACRO) *.

DEFINE MedianCIPaired (!POS=!TOKENS(2)/k=!DEFAULT(1000) !TOKENS(1)).
SET MXLOOPS=!k.  /* If [n(n+1)/2]>1000, change MXLOOPS accordingly *.
PRINT /TITLE='Underlying assumption: distribution of differences is symmetrical'.
* Get data: replace variable names by your own (order will affect the sign) *.
GET data /VAR = !1
         /NAMES = vname
         /MISSING = OMIT.
COMPUTE d=data(:,1)-data(:,2).
* Compute all averages *.
COMPUTE nt=n*(n+1)/2.
COMPUTE dmean=MAKE(1,nt,0).
COMPUTE counter=1.
LOOP i=1 TO n.
- LOOP j=i TO n.
-  COMPUTE dmean(counter)=(d(i)+d(j))/2.
-  COMPUTE counter=counter+1.
* Sorting algorithm (R Ristow & J Peck) *.
COMPUTE sdmean=dmean.
COMPUTE sdmean(GRADE(dmean)) = dmean.
RELEASE counter,dmean.
* Compute median of all averages *.
COMPUTE pair=(TRUNC(nt/2) EQ nt/2).       /* Check if nt is odd (0) or even (1) *.
DO IF pair EQ 0.                          /* Median formula for odd samples *.
- COMPUTE median=sdmean((nt+1)/2).
ELSE.                                     /* Median formula for even samples *.
- COMPUTE median=(sdmean(nt/2)+sdmean(1+nt/2))/2.
PRINT median
 /TITLE='Difference between population medians (A - B) '
* Exact or asymptotic 95% & 99% CI *.
DO IF n LE 50.       /* Exact Wilcoxon's critical values (Table 18.6 of Altman's book) *.
- COMPUTE w95={0,0,0,0,0,1,3,4,6,9,11,14,18,22,26,30,35,41,47,53,59,66,74,82,90,
- COMPUTE w99={0,0,0,0,0,0,0,0,1,2,4,6,8,10,13,16,20,24,28,33,38,43,49,55,62,69,
- COMPUTE u95=w95(n).
- COMPUTE u99=w99(n).
- RELEASE w95,w99.
- PRINT /TITLE'Exact confidence intervals calculated (N=<50)'.
ELSE.                /* Asymptotic Wilcoxon's critical values *.
- COMPUTE u95=1+TRUNC(nt/2-1.959964*sqrt(n*(n+1)*+(2*n+1)/24)).
- COMPUTE u99=1+TRUNC(nt/2-2.575829*sqrt(n*(n+1)*+(2*n+1)/24)).
- PRINT /TITLE='Asymptotic confidence intervals calculated (N>50)'.
DO IF u95 EQ 0.
- PRINT /TITLE='Warning: sample size is too low (<6) for accurate 95%CI. Discard it.'.
- COMPUTE u95=1. /* Replace 0 by 1 to avoid a computing error *.
DO IF u99 EQ 0.
- PRINT /TITLE='Warning: sample size is too low (<9) for accurate 99%CI. Discard it.'.
- COMPUTE u99=1. /* Replace 0 by 1 to avoid a computing error *.
COMPUTE low95=sdmean(u95).
COMPUTE high95=sdmean(nt+1-u95).
COMPUTE low99=sdmean(u99).
COMPUTE high99=sdmean(nt+1-u99).
PRINT {low95,high95;low99,high99}
 /TITLE='CI for difference between medians'
 /RLABEL='95%Level' '99%Level'
 /CLABEL='Lower' 'Upper'.

* MACRO call *.

MedianCIPaired time_b time_a.


Background: The addition of bran to the diet has been reported to
benefit patients with diverticulosis. Several different bran
preparations are available, and a clinician wants to test the efficacy
of two of them on patients, since favourable claims have been made for
each. Among the consequences of administering bran that requires
testing is the transit time through the alimentary canal. Does it
differ in the two groups of patients taking these two preparations?

The assumptions are:

   1. that the data are plausibly Normal
   2. that the two samples come from distributions  that may differ in
   their mean value, but not in the standard deviation (Homogeneity
   of Variances - HOV - condition)

The second condition must be tested with Levene test (although other
methods exist: Bartlett, Cochran, Hartley's test...). If smaple sizes
are samll, we'll face the same problem we had when test for normality:
lack of power. One rule of thumb, independent of sample size, is to
suspect lack of HOV if the ratio of the biggest/smallest SD is over 2.

If HOV fails, then the t-test must be modified (it is called Welch
test): both the degrees of freedom and the standard error of the
difference are corrected before computing the t-statistics and its

* Table 7.1 Transit times: unpaired comparison *.
DATA LIST FREE/treatmnt trantime (2 F8.0).
1 44 1 51 1 52 1 55 1  60 1 62 1 66 1 68 1 69 1 71
1 71 1 76 1 82 1 91 1 108 2 52 2 64 2 68 2 74 2 79
2 83 2 84 2 88 2 95 2  97 2 101 2 116
VARIABLE LABEL treatmnt 'Treatments'/trantime 'Transit times (h)'.
VALUE LABEl treatmnt 1'Treatment A' 2'Treatment B'.

Statistical analyses:

Again, we have to check for normality, but inside each treatment
group. Since sample sizes are a bit low we will not pay to much
attention to the SW or KS(Lilliefors) p-values, but will focus on
outliers and skewness. We will also ask for Levene test ("hidden"
inside /PLOT subcommand - SPREAD -)

  VARIABLES=trantime BY treatmnt

The box-plot shows an outlier in sample A. Since it is not an extreme
outlier (it is not more than 3 IQR from the 3rd quartil), its impact
on overall normality is low (skewness is not very big), but we could
do both tests (parametric & non-parametric) and compare them, just in
case. We don't suspect lack of HOV (ratio of biggest/smallest SD is
17.6/16.5, clearly below 2 and Levene test, based on means, is clearly
non significant. Well discuss the "other" Levenes later (when we talk
about non-parametric testing).

  GROUPS = treatmnt(1 2)
  /VARIABLES = trantime
  /CRITERIA = CI(.95) .

Results: Levene test (based on the mean, the same we got with EDA)
helps us to choose between the first row ("Equal variances assumed")
or the second row ("Equal variances not assumed"). Sometimes I've seen
people conclude that the means were not different, just because the
HOV test was non significant. The test for the equality of means is
shown to the right (2-tailed p=0.031). The 95%CI (last columns to the
right) go from -28.6 to -1.5, indicating also a significant result
(the limits don't include 0).

If we had suspected lack of HOV, Welch test would have been used (2nd
row), with the following results: p=0.033; Mean Diff.=-15.0,
95%CI:(-28.7 to -1.3).


If we suspected that the variable is not normally distributed (in at
least one of the groups), since our sample sizes are too low to rely
on the robustness of the t-test, we could use a non-parametric test:
Mann-Whitney's U test or median test. Mann-Whitney test is also called
Wilcoxon-Mann-Whitney test, since there is test, called the Wilcoxon
Rank Sum test (not to be mistaken with the one for paired data, called
Wilcoxon Signed Ranks test( that is mathematically equivalent to the
Mann-Whitney test. To be able to use Mann-Whitney's U test, we have to
check first that both sample distributions are similar in shape (see
SPSS help) and spread (see:

Quoting again from Campbell&Swinscow's book (chapter 10):

"Do non-parametric tests compare medians?
It is a commonly held belief that a Mann-Whitney U test is in fact a
test for differences in medians. However, two groups could have the
same median and yet have a significant Mann-Whitney U test. Consider
the following data for two groups, each with 100 observations. Group
1: 98 (0), 1, 2; Group 2: 51 (0), 1, 48 (2). The median in both cases
is 0, but from the Mann-Whitney test P<0.0001. Only if we are prepared
to make the additional assumption that the difference in the two
groups is simply a shift in location (that is, the distribution of the
data in one group is simply shifted by a fixed amount from the other)
can we say that the test is a test of the difference in medians... ".

All this means that before using MW test, we should take a look at all
indicators of shape and spread: skewness & kurtosis of both samples
and robusts tests for spread (Levene based on the median - adecuate
when the distributions to be compared are skewed -, and Levene based
on trimmed mean - adequate when the distributions have high kurtosis).
If we use MW test in distributions with different chapes/spread, then
we should take into account that we are not testing differences in
medians only.

Median test is robust to differences in shape/spread, and can be used
when MW test is not adequate (when we want to focus in median
differences only).

*** Mann-Whitney's U test ***.

  /M-W= trantime BY treatmnt(1 2).

When sample sizes are small (smaller than 25?, I'm not really sure of
this figure), SPSS gives both the exact and asymptotic - corrected for
ties - p-values. Unless your data are heavily tied (take a look at
them before the analysis, or rank them manually using: "RANK
VAR=trantime(A)." and run FREQUENCIES on the ranked variable), the
general rule is that if the exact p-value is given, it will be more
accurate than the asymptotic.

Confidence interval for the median differences: this time RATIO can't
help us...

We will use a method (based on MW test - it assumes distributions
similar in shape/spread) described in detail in Altman's book
"Statistics with Confidence":

* MACRO definition *.

DEFINE MedianCIUnpaired (!POS=!TOKENS(2)/k=!DEFAULT(1000) !TOKENS(1)).
SET MXLOOPS=!k.  /* If n1*n2>1000, change MXLOOPS accordingly *.
PRINT /TITLE='Underlying assumption: the distributions are similar in shape'.
* Get unsorted data; replace variable names by your own (grouping variable goes first) *.
GET nsdata /VAR = !1
           /NAMES = vname /MISSING = OMIT.
* First of all, sort them by grouping variable (algorithm by R Ristow & J Peck) *.
COMPUTE data = nsdata.
COMPUTE data(GRADE(nsdata(:,1)),:) = nsdata.
RELEASE nsdata.
* Get group values, count sample sizes & split data in two groups *.
COMPUTE totaln=NROW(data).
COMPUTE ngrp1=data(1,1).
COMPUTE ngrp2=data(totaln,1).
COMPUTE groupvar=vname(1).
PRINT {ngrp1;ngrp2}
 /RLABEL='Group A=' 'Group B='
 /TITLE='Group values'.
COMPUTE n=CSUM(DESIGN(data(:,1))).
PRINT {T(n)}
 /RLABEL='   N(a)=' '   N(b)='
 /TITLE='Sample sizes'.
DO IF RMIN(n) GT 1. /* Don't compute CI if any n(i)=1.
COMPUTE group1=data(1:n(1),2).
COMPUTE group2=data((n(1)+1):totaln,2).
* Compute vector of all differences *.
COMPUTE n1n2=n(1)&*n(2).
COMPUTE diff=MAKE(1,n1n2,0).
COMPUTE counter=1.
LOOP i=1 TO n(1).
- LOOP j=1 TO n(2).
-  COMPUTE diff(counter)=group1(i)-group2(j).
- COMPUTE counter=counter+1.
* Sort differences (R Ristow & J Peck) *.
COMPUTE sdiff = diff.
COMPUTE sdiff(grade(diff)) = diff.
RELEASE data,diff,group1,group2. /* Get rid of now useless data *.
* Compute median of differences *.
COMPUTE pair=(TRUNC(n1n2/2) EQ n1n2/2). /* Check if n1n2 is odd (0) or even (1) *.
DO IF pair EQ 0.                        /* Median formula for odd samples.
- COMPUTE median=sdiff((n1n2+1)/2).
ELSE.                                   /* Median formula for even samples *.
- COMPUTE median=(sdiff(n1n2/2)+sdiff(1+(n1n2/2)))/2.
COMPUTE depvarn=vname(2).
PRINT median
 /TITLE='Difference between population medians (A - B) '
 /RLABEL='   Point'
* Exact or asymptotic 95% & 99% CI *.
DO IF ((n(1) LE 25) AND (n(2) LE 25)). /* Exact critical values (Table 18.5 of Altman's book)*.
- COMPUTE d95={0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;
- COMPUTE d99={0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0;
- COMPUTE u95=d95(n(1),n(2)).
- COMPUTE u99=d99(n(1),n(2)).
- PRINT /TITLE='Exact confidence intervals calculated (N(a) and N(b) =< 25)'.
- DO IF u95 EQ 0.
-  PRINT /TITLE'Warning: sample sizes are too low for accurate 95%CI. Discard it.'.
-  COMPUTE u95=1. /* Replace 0 by 1 to avoid a computing error *.
- DO IF u99 EQ 0.
-  PRINT /TITLE='Warning: sample sizes are too low for accurate 99%CI. Discard it.'.
-  COMPUTE u99=1. /* Replace 0 by 1 to avoid a computing error *.
- RELEASE d95,d99.
ELSE. /* Asymptotic critical values *.
- COMPUTE u95=1+TRUNC(n1n2/2-1.959964*sqrt(n1n2*(n(1)+n(2)+1)/12)).
- COMPUTE u99=1+TRUNC(n1n2/2-2.575829*sqrt(n1n2*(n(1)+n(2)+1)/12)).
- PRINT /TITLE'Asymptotic confidence intervals calculated (N(a) and/or N(b) > 25)'.
COMPUTE low95=sdiff(u95).
COMPUTE high95=sdiff(n1n2+1-u95).
COMPUTE low99=sdiff(u99).
COMPUTE high99=sdiff(n1n2+1-u99).
PRINT {low95,high95;low99,high99}
 /TITLE='CI for difference between medians'
 /RLABEL='95%Level' '99%Level'
 /CLABEL='Lower' 'Upper'.
- PRINT /TITLE='At least one sample size = 1. No CI can be calculated.'.

* MACRO call (grouping variable gous first) *.

MedianCIUnpaired treatmnt trantime.

WINPEPI output for the same data (same result as my MACRO, of course -
I checked the accuracy of MACRO before posting it):

   Difference between population medians* (A - B) = -16.0
     Approx. 95% C.I. = -29.0 to -2.0
     *Assuming the distributions are similar in shape.

*** Median test ***.
  /MEDIAN=trantime BY treatmnt(1 2).

No 95%CI for median differences can be computed when distributions are
different (at least, none that I have found). WinPepi doesn't include
one either.

Marta García-Granero