SPSSX Discussion

Graduate course on SPSS - Lesson 2

Classic

List

Threaded

10 messages Options

Marta García-Granero

Graduate course on SPSS - Lesson 2

Later than I expected, but here it comes at last:

Background of the data we are going to analyze (absolutely invented by
me): Some diseases (like Wilson's disease) are associated to high
urinary copper levels. Suppose we are interested in testing if a group
of children (all from the same region) show abnormal copper levels. We
are told that normal levels are <0.6 µmol/24 h).

* Sample dataset (Exercise 1.1 from "Statistics at Square One"
Swinscow&Campbell: http://www.bmj.com/collections/statsbk/) *.

DATA LIST FREE/copper(F8.2).
BEGIN DATA
0.70 0.45 0.72 0.30 1.16 0.69 0.83 0.74 1.24 0.77
0.65 0.76 0.42 0.94 0.36 0.98 0.64 0.90 0.63 0.55
0.78 0.10 0.52 0.42 0.58 0.62 1.12 0.86 0.74 1.04
0.65 0.66 0.81 0.48 0.85 0.75 0.73 0.50 0.34 0.88
END DATA.
VAR LABEL copper 'Urinary copper levels (µmol/24 h)'.

First, we have to check if the variable is normally distributed (to
choose between a parametric or a non-parametric method). We'll run a
full Exploratory Data Analysis (EDA) on the variable: descriptives
(mean, SD, skewness, kurtosis...), percentyles, graphics (histogram -
useful if sample size is not too small -, stem&leaf, box-plot - both
are good to check for outliers) and normality tests
(Kolmogorov-Smirnov with Lilliefors correction and Shapiro-Wilk), both
are given when NPPLOT is added to to the /PLOT subcommand. VERY
IMPORTANT: Don't use one sample Kolmogorov-Smirnov(*), since it has
very low power to detect non-normality.
(*) NPAR TESTS /K-S(NORMAL)= copper.

*** EDA ***.
EXAMINE
VARIABLES=copper
/PLOT BOXPLOT STEMLEAF HISTOGRAM NPPLOT
/COMPARE GROUP
/PERCENTILES(5,10,25,50,75,90,95)
/STATISTICS DESCRIPTIVES
/CINTERVAL 95.

Discussion of the results: The variable doesn't have outliers (see
stem&leaf and box-plot graphs), the variable is fairly symmetric
(skweness is not important: its absolute value is below 1 and less
than twice its standard error - a simple Z test). We can use a
parametric method: One-sample t-test (to compare the sample mean with
the expected - population - mean of 0.6). Data can be then summarized
(for descriptive purposes) using the mean and the standard deviation
(SD). Don't use the Standard Error of the Mean (SEM) as a descriptive
spread measure (it isn't). The confidence interval (95%CI) for the
mean is also sometimes used as complementary information.

*** PARAMETRIC TEST ***.
T-TEST
/TESTVAL = 0.6
/VARIABLES = copper
/CRITERIA = CI(.95) .

The result is significant (two-tailed p-value=0.015). If we want a
one-tailed p-value (because we have solid a priori evidences that the
difference should go only in one direction), we can halve the
two-tailed p-value given by SPSS (one-tailed=0.01548/2=0.0077 -->
0.008). This can't be done lightly (it is sometimes considered a form
of "cheating" to render significant results that only showed a
tendency towards significance), see these references for a good
discussion on the topic:

- http://www.bmj.com/cgi/content/full/309/6949/248
- http://circ.ahajournals.org/cgi/content/full/105/25/3062
See also the chapter "One-tail vs. two-tail p-values in this book:
- http://www.graphpad.com/manuals/Prism4/StatisticsGuide.pdf

*************************************

Had normality failed, we should have used a non-parametric test
instead of the t-test: Wilcoxon or sign tests. The election of one or
the other depends on symmetry. Wilcoxon test, although non-parametric,
needs symmetry (it could give false significant results on highly
assymmetric variables). Mean and SD shouldn't be used to summarize
non-normal data, median and IQR (InterQuartilic Range, the interval
formed by percentyles 25 & 75 should be used instead). See Lang&Secic
"How to Report Statistics in Medicine" (ACP series) for more details.

Althoug SPSS apparently doesn't give those tests for one sample, they
can be easily obtained "tricking" the program a bit: we only need to
compute an extra variable with the population parameter (median, in
this case).

COMPUTE PopMedian=0.6.

*** Wilcoxon test *.
NPAR TEST
/WILCOXON=copper WITH PopMedian
/STATISTICS QUARTILES.

SPSS will always give the asymptotic p-value for this test, even if
sample size is small (when it becomes unreliable). If the Exact Tests
module is installed and sample size - excluding ties - is below 25
(this is not the case since our sample size is 40), you can ask for
the exact p-value adding "/METHOD=EXACT TIMER(1)" to the above
syntax. If you don't have that module, then you have to check the
significance with a Wilcoxon table, like this one:

Critical values of the Wilcoxon Matched Pairs Signed Rank Test For any
N (number of subjects minus ties) the observed value is significant At
a given level of significance if it is equal to or less than the
critical value Shown in the table below

1-tailed test 2-tailed test
N p<0.05 p<0.01 p<0.05 p<0.01
-------------------------------
5 1 - - -
6 2 - 1 -
7 4 0 2 -
8 6 2 4 0
9 8 3 6 2
10 11 5 8 3
11 14 7 11 5
12 17 10 14 7
13 21 13 17 10
14 26 16 21 13
15 30 20 25 16
16 36 24 30 19
17 41 28 36 23
18 47 33 40 28
19 54 38 46 32
20 60 43 52 37
21 68 49 59 43
22 75 56 66 49
23 83 62 73 55
24 92 69 81 61
25 101 77 90 68
-------------------------------

* Sign test *.
NPAR TEST
/SIGN= copper WITH PopMedian
/STATISTICS QUARTILES.

Here SPSS will either give the exact or the asymptotic p-value,
depending on sample size.

To get a 95%CI for the median, again SPSS has to be "tricked" into it
(using RATIO).

COMPUTE One=1.

RATIO STATISTICS copper WITH One
/PRINT = CIN(95) MEDIAN .

Of course, all these tricks can be programmed in MACROS (to create
PopMedian and One variables, run the tests and then get rid of the
useless extra variables).

Tomorrow (hopefully, we'll discuss two-samples testing, continuous
variables, paired and unpaired, both by parametric and non-parametric
methods).

mailto:[hidden email]

Marta Garcia-Granero, PhD
Statistician

Edward Boadi

Re: Graduate course on SPSS - Lesson 2

Hi Marta,
I cant find a post on "Graduate course on SPSS - Lesson 1" in the archives.
Was there a post on "Graduate course on SPSS - Lesson 1" ?

Thanks.
Edward.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
Marta García-Granero
Sent: Thursday, January 25, 2007 6:24 AM
To: [hidden email]
Subject: Graduate course on SPSS - Lesson 2

Later than I expected, but here it comes at last:

Background of the data we are going to analyze (absolutely invented by
me): Some diseases (like Wilson's disease) are associated to high
urinary copper levels. Suppose we are interested in testing if a group
of children (all from the same region) show abnormal copper levels. We
are told that normal levels are <0.6 µmol/24 h).

* Sample dataset (Exercise 1.1 from "Statistics at Square One"
Swinscow&Campbell: http://www.bmj.com/collections/statsbk/) *.

DATA LIST FREE/copper(F8.2).
BEGIN DATA
0.70 0.45 0.72 0.30 1.16 0.69 0.83 0.74 1.24 0.77
0.65 0.76 0.42 0.94 0.36 0.98 0.64 0.90 0.63 0.55
0.78 0.10 0.52 0.42 0.58 0.62 1.12 0.86 0.74 1.04
0.65 0.66 0.81 0.48 0.85 0.75 0.73 0.50 0.34 0.88
END DATA.
VAR LABEL copper 'Urinary copper levels (µmol/24 h)'.

First, we have to check if the variable is normally distributed (to
choose between a parametric or a non-parametric method). We'll run a
full Exploratory Data Analysis (EDA) on the variable: descriptives
(mean, SD, skewness, kurtosis...), percentyles, graphics (histogram -
useful if sample size is not too small -, stem&leaf, box-plot - both
are good to check for outliers) and normality tests
(Kolmogorov-Smirnov with Lilliefors correction and Shapiro-Wilk), both
are given when NPPLOT is added to to the /PLOT subcommand. VERY
IMPORTANT: Don't use one sample Kolmogorov-Smirnov(*), since it has
very low power to detect non-normality.
(*) NPAR TESTS /K-S(NORMAL)= copper.

*** EDA ***.
EXAMINE
VARIABLES=copper
/PLOT BOXPLOT STEMLEAF HISTOGRAM NPPLOT
/COMPARE GROUP
/PERCENTILES(5,10,25,50,75,90,95)
/STATISTICS DESCRIPTIVES
/CINTERVAL 95.

Discussion of the results: The variable doesn't have outliers (see
stem&leaf and box-plot graphs), the variable is fairly symmetric
(skweness is not important: its absolute value is below 1 and less
than twice its standard error - a simple Z test). We can use a
parametric method: One-sample t-test (to compare the sample mean with
the expected - population - mean of 0.6). Data can be then summarized
(for descriptive purposes) using the mean and the standard deviation
(SD). Don't use the Standard Error of the Mean (SEM) as a descriptive
spread measure (it isn't). The confidence interval (95%CI) for the
mean is also sometimes used as complementary information.

*** PARAMETRIC TEST ***.
T-TEST
/TESTVAL = 0.6
/VARIABLES = copper
/CRITERIA = CI(.95) .

The result is significant (two-tailed p-value=0.015). If we want a
one-tailed p-value (because we have solid a priori evidences that the
difference should go only in one direction), we can halve the
two-tailed p-value given by SPSS (one-tailed=0.01548/2=0.0077 -->
0.008). This can't be done lightly (it is sometimes considered a form
of "cheating" to render significant results that only showed a
tendency towards significance), see these references for a good
discussion on the topic:

- http://www.bmj.com/cgi/content/full/309/6949/248
- http://circ.ahajournals.org/cgi/content/full/105/25/3062
See also the chapter "One-tail vs. two-tail p-values in this book:
- http://www.graphpad.com/manuals/Prism4/StatisticsGuide.pdf

*************************************

Had normality failed, we should have used a non-parametric test
instead of the t-test: Wilcoxon or sign tests. The election of one or
the other depends on symmetry. Wilcoxon test, although non-parametric,
needs symmetry (it could give false significant results on highly
assymmetric variables). Mean and SD shouldn't be used to summarize
non-normal data, median and IQR (InterQuartilic Range, the interval
formed by percentyles 25 & 75 should be used instead). See Lang&Secic
"How to Report Statistics in Medicine" (ACP series) for more details.

Althoug SPSS apparently doesn't give those tests for one sample, they
can be easily obtained "tricking" the program a bit: we only need to
compute an extra variable with the population parameter (median, in
this case).

COMPUTE PopMedian=0.6.

*** Wilcoxon test *.
NPAR TEST
/WILCOXON=copper WITH PopMedian
/STATISTICS QUARTILES.

SPSS will always give the asymptotic p-value for this test, even if
sample size is small (when it becomes unreliable). If the Exact Tests
module is installed and sample size - excluding ties - is below 25
(this is not the case since our sample size is 40), you can ask for
the exact p-value adding "/METHOD=EXACT TIMER(1)" to the above
syntax. If you don't have that module, then you have to check the
significance with a Wilcoxon table, like this one:

Critical values of the Wilcoxon Matched Pairs Signed Rank Test For any
N (number of subjects minus ties) the observed value is significant At
a given level of significance if it is equal to or less than the
critical value Shown in the table below

1-tailed test 2-tailed test
N p<0.05 p<0.01 p<0.05 p<0.01
-------------------------------
5 1 - - -
6 2 - 1 -
7 4 0 2 -
8 6 2 4 0
9 8 3 6 2
10 11 5 8 3
11 14 7 11 5
12 17 10 14 7
13 21 13 17 10
14 26 16 21 13
15 30 20 25 16
16 36 24 30 19
17 41 28 36 23
18 47 33 40 28
19 54 38 46 32
20 60 43 52 37
21 68 49 59 43
22 75 56 66 49
23 83 62 73 55
24 92 69 81 61
25 101 77 90 68
-------------------------------

* Sign test *.
NPAR TEST
/SIGN= copper WITH PopMedian
/STATISTICS QUARTILES.

Here SPSS will either give the exact or the asymptotic p-value,
depending on sample size.

To get a 95%CI for the median, again SPSS has to be "tricked" into it
(using RATIO).

COMPUTE One=1.

RATIO STATISTICS copper WITH One
/PRINT = CIN(95) MEDIAN .

Of course, all these tricks can be programmed in MACROS (to create
PopMedian and One variables, run the tests and then get rid of the
useless extra variables).

Tomorrow (hopefully, we'll discuss two-samples testing, continuous
variables, paired and unpaired, both by parametric and non-parametric
methods).

mailto:[hidden email]

Marta Garcia-Granero, PhD
Statistician

Marta García-Granero

Re: Graduate course on SPSS - Lesson 2

Hi Edward

Yes, there was (january 22th), and exactly with that name. Since I got
some replies to it, it was distributed. If you still can't find it,
tell me and I'll send it to you again privately.

EB> I cant find a post on "Graduate course on SPSS - Lesson 1" in the
EB> archives. Was there a post on "Graduate course on SPSS - Lesson
EB> 1"?

Regards,

Marta

Edward Boadi

Re: Graduate course on SPSS - Lesson 2

In reply to this post by Marta García-Granero

Hi Marta,
Melissa has given me a copy of the said post.

Thanks.
Edward.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
Marta García-Granero
Sent: Thursday, January 25, 2007 9:56 AM
To: [hidden email]
Subject: Re: Graduate course on SPSS - Lesson 2

Hi Edward

Yes, there was (january 22th), and exactly with that name. Since I got
some replies to it, it was distributed. If you still can't find it,
tell me and I'll send it to you again privately.

EB> I cant find a post on "Graduate course on SPSS - Lesson 1" in the
EB> archives. Was there a post on "Graduate course on SPSS - Lesson
EB> 1"?

Regards,

Marta

Marta García-Granero

Graduate course on SPSS - Lesson 2 (addendum)

Hi everybody

The MACROS I mentioned (to ease the task of running non-parametric one
sample tests):

DEFINE OneSampleWilcoxon(!POS=!TOKENS(1)/
median=!TOKENS(1)/
EXACT=!DEFAULT(NO) !TOKENS(1)).
TEMPORARY.
COMPUTE PopMedian=!median.
!IF (!UPCASE(!!EXACT) !EQ 'NO') !THEN
NPAR TEST
/WILCOXON=!1 WITH PopMedian
/STATISTICS QUARTILES.
!ELSE
NPAR TEST
/WILCOXON=!1 WITH PopMedian
/STATISTICS QUARTILES
/METHOD=EXACT TIMER(1).
!IFEND.
!ENDDEFINE.

DEFINE OneSampleSign(!POS=!TOKENS(1)/
median=!TOKENS(1)).
TEMPORARY.
COMPUTE PopMedian=!median.
NPAR TEST
/SIGN=!1 WITH PopMedian
/STATISTICS QUARTILES.
!ENDDEFINE.

DEFINE OneMedianCI(!POS=!TOKENS(1)/
!POS=!DEFAULT(95) !TOKENS(1)).
TEMPORARY.
COMPUTE one=1.
RATIO STATISTICS !1 WITH one
/PRINT = CIN(!2) MEDIAN .
!ENDDEFINE.

* MACRO calls (on the Copper dataset) *.
OneSampleWilcoxon copper median=0.6.
OneSampleWilcoxon copper median=0.6 EXACT=Yes.
OneSampleSign copper median=0.6.
OneMedianCI copper.
OneMedianCI copper 99.

Regards,
Marta

(Working on next chapter)

Manmit Shrimali-2

Re: Graduate course on SPSS - Lesson 2

In reply to this post by Marta García-Granero

Marta:

I wanted to thank you for sharing your knowledge with all. Your inputs, responses, contributions and efforts are highly appreciated.

Thanks,

Manmit

-----Original Message-----
From: SPSSX(r) Discussion on behalf of Marta García-Granero
Sent: Thu 1/25/2007 4:54 PM
To: [hidden email]
Cc:
Subject: Graduate course on SPSS - Lesson 2

Later than I expected, but here it comes at last:

Background of the data we are going to analyze (absolutely invented by
me): Some diseases (like Wilson's disease) are associated to high
urinary copper levels. Suppose we are interested in testing if a group
of children (all from the same region) show abnormal copper levels. We
are told that normal levels are <0.6 µmol/24 h).

* Sample dataset (Exercise 1.1 from "Statistics at Square One"
Swinscow&Campbell: http://www.bmj.com/collections/statsbk/) *.

DATA LIST FREE/copper(F8.2).
BEGIN DATA
0.70 0.45 0.72 0.30 1.16 0.69 0.83 0.74 1.24 0.77
0.65 0.76 0.42 0.94 0.36 0.98 0.64 0.90 0.63 0.55
0.78 0.10 0.52 0.42 0.58 0.62 1.12 0.86 0.74 1.04
0.65 0.66 0.81 0.48 0.85 0.75 0.73 0.50 0.34 0.88
END DATA.
VAR LABEL copper 'Urinary copper levels (µmol/24 h)'.

First, we have to check if the variable is normally distributed (to
choose between a parametric or a non-parametric method). We'll run a
full Exploratory Data Analysis (EDA) on the variable: descriptives
(mean, SD, skewness, kurtosis...), percentyles, graphics (histogram -
useful if sample size is not too small -, stem&leaf, box-plot - both
are good to check for outliers) and normality tests
(Kolmogorov-Smirnov with Lilliefors correction and Shapiro-Wilk), both
are given when NPPLOT is added to to the /PLOT subcommand. VERY
IMPORTANT: Don't use one sample Kolmogorov-Smirnov(*), since it has
very low power to detect non-normality.
(*) NPAR TESTS /K-S(NORMAL)= copper.

*** EDA ***.
EXAMINE
VARIABLES=copper
/PLOT BOXPLOT STEMLEAF HISTOGRAM NPPLOT
/COMPARE GROUP
/PERCENTILES(5,10,25,50,75,90,95)
/STATISTICS DESCRIPTIVES
/CINTERVAL 95.

Discussion of the results: The variable doesn't have outliers (see
stem&leaf and box-plot graphs), the variable is fairly symmetric
(skweness is not important: its absolute value is below 1 and less
than twice its standard error - a simple Z test). We can use a
parametric method: One-sample t-test (to compare the sample mean with
the expected - population - mean of 0.6). Data can be then summarized
(for descriptive purposes) using the mean and the standard deviation
(SD). Don't use the Standard Error of the Mean (SEM) as a descriptive
spread measure (it isn't). The confidence interval (95%CI) for the
mean is also sometimes used as complementary information.

*** PARAMETRIC TEST ***.
T-TEST
/TESTVAL = 0.6
/VARIABLES = copper
/CRITERIA = CI(.95) .

The result is significant (two-tailed p-value=0.015). If we want a
one-tailed p-value (because we have solid a priori evidences that the
difference should go only in one direction), we can halve the
two-tailed p-value given by SPSS (one-tailed=0.01548/2=0.0077 -->
0.008). This can't be done lightly (it is sometimes considered a form
of "cheating" to render significant results that only showed a
tendency towards significance), see these references for a good
discussion on the topic:

- http://www.bmj.com/cgi/content/full/309/6949/248
- http://circ.ahajournals.org/cgi/content/full/105/25/3062
See also the chapter "One-tail vs. two-tail p-values in this book:
- http://www.graphpad.com/manuals/Prism4/StatisticsGuide.pdf

*************************************

Had normality failed, we should have used a non-parametric test
instead of the t-test: Wilcoxon or sign tests. The election of one or
the other depends on symmetry. Wilcoxon test, although non-parametric,
needs symmetry (it could give false significant results on highly
assymmetric variables). Mean and SD shouldn't be used to summarize
non-normal data, median and IQR (InterQuartilic Range, the interval
formed by percentyles 25 & 75 should be used instead). See Lang&Secic
"How to Report Statistics in Medicine" (ACP series) for more details.

Althoug SPSS apparently doesn't give those tests for one sample, they
can be easily obtained "tricking" the program a bit: we only need to
compute an extra variable with the population parameter (median, in
this case).

COMPUTE PopMedian=0.6.

*** Wilcoxon test *.
NPAR TEST
/WILCOXON=copper WITH PopMedian
/STATISTICS QUARTILES.

SPSS will always give the asymptotic p-value for this test, even if
sample size is small (when it becomes unreliable). If the Exact Tests
module is installed and sample size - excluding ties - is below 25
(this is not the case since our sample size is 40), you can ask for
the exact p-value adding "/METHOD=EXACT TIMER(1)" to the above
syntax. If you don't have that module, then you have to check the
significance with a Wilcoxon table, like this one:

Critical values of the Wilcoxon Matched Pairs Signed Rank Test For any
N (number of subjects minus ties) the observed value is significant At
a given level of significance if it is equal to or less than the
critical value Shown in the table below

1-tailed test 2-tailed test
N p<0.05 p<0.01 p<0.05 p<0.01
-------------------------------
5 1 - - -
6 2 - 1 -
7 4 0 2 -
8 6 2 4 0
9 8 3 6 2
10 11 5 8 3
11 14 7 11 5
12 17 10 14 7
13 21 13 17 10
14 26 16 21 13
15 30 20 25 16
16 36 24 30 19
17 41 28 36 23
18 47 33 40 28
19 54 38 46 32
20 60 43 52 37
21 68 49 59 43
22 75 56 66 49
23 83 62 73 55
24 92 69 81 61
25 101 77 90 68
-------------------------------

* Sign test *.
NPAR TEST
/SIGN= copper WITH PopMedian
/STATISTICS QUARTILES.

Here SPSS will either give the exact or the asymptotic p-value,
depending on sample size.

To get a 95%CI for the median, again SPSS has to be "tricked" into it
(using RATIO).

COMPUTE One=1.

RATIO STATISTICS copper WITH One
/PRINT = CIN(95) MEDIAN .

Of course, all these tricks can be programmed in MACROS (to create
PopMedian and One variables, run the tests and then get rid of the
useless extra variables).

Tomorrow (hopefully, we'll discuss two-samples testing, continuous
variables, paired and unpaired, both by parametric and non-parametric
methods).

mailto:[hidden email]

Marta Garcia-Granero, PhD
Statistician

Roberts, Michael

Re: Graduate course on SPSS - Lesson 2

Ditto that!

Mike

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Manmit Shrimali
Sent: Thursday, January 25, 2007 10:29 AM
To: [hidden email]
Subject: Re: Graduate course on SPSS - Lesson 2

Marta:

I wanted to thank you for sharing your knowledge with all. Your inputs, responses, contributions and efforts are highly appreciated.

Thanks,

Manmit

-----Original Message-----
From: SPSSX(r) Discussion on behalf of Marta García-Granero
Sent: Thu 1/25/2007 4:54 PM
To: [hidden email]
Cc:
Subject: Graduate course on SPSS - Lesson 2

Later than I expected, but here it comes at last:

Background of the data we are going to analyze (absolutely invented by
me): Some diseases (like Wilson's disease) are associated to high
urinary copper levels. Suppose we are interested in testing if a group
of children (all from the same region) show abnormal copper levels. We
are told that normal levels are <0.6 µmol/24 h).

* Sample dataset (Exercise 1.1 from "Statistics at Square One"
Swinscow&Campbell: http://www.bmj.com/collections/statsbk/) *.

DATA LIST FREE/copper(F8.2).
BEGIN DATA
0.70 0.45 0.72 0.30 1.16 0.69 0.83 0.74 1.24 0.77
0.65 0.76 0.42 0.94 0.36 0.98 0.64 0.90 0.63 0.55
0.78 0.10 0.52 0.42 0.58 0.62 1.12 0.86 0.74 1.04
0.65 0.66 0.81 0.48 0.85 0.75 0.73 0.50 0.34 0.88
END DATA.
VAR LABEL copper 'Urinary copper levels (µmol/24 h)'.

First, we have to check if the variable is normally distributed (to
choose between a parametric or a non-parametric method). We'll run a
full Exploratory Data Analysis (EDA) on the variable: descriptives
(mean, SD, skewness, kurtosis...), percentyles, graphics (histogram -
useful if sample size is not too small -, stem&leaf, box-plot - both
are good to check for outliers) and normality tests
(Kolmogorov-Smirnov with Lilliefors correction and Shapiro-Wilk), both
are given when NPPLOT is added to to the /PLOT subcommand. VERY
IMPORTANT: Don't use one sample Kolmogorov-Smirnov(*), since it has
very low power to detect non-normality.
(*) NPAR TESTS /K-S(NORMAL)= copper.

*** EDA ***.
EXAMINE
VARIABLES=copper
/PLOT BOXPLOT STEMLEAF HISTOGRAM NPPLOT
/COMPARE GROUP
/PERCENTILES(5,10,25,50,75,90,95)
/STATISTICS DESCRIPTIVES
/CINTERVAL 95.

Discussion of the results: The variable doesn't have outliers (see
stem&leaf and box-plot graphs), the variable is fairly symmetric
(skweness is not important: its absolute value is below 1 and less
than twice its standard error - a simple Z test). We can use a
parametric method: One-sample t-test (to compare the sample mean with
the expected - population - mean of 0.6). Data can be then summarized
(for descriptive purposes) using the mean and the standard deviation
(SD). Don't use the Standard Error of the Mean (SEM) as a descriptive
spread measure (it isn't). The confidence interval (95%CI) for the
mean is also sometimes used as complementary information.

*** PARAMETRIC TEST ***.
T-TEST
/TESTVAL = 0.6
/VARIABLES = copper
/CRITERIA = CI(.95) .

The result is significant (two-tailed p-value=0.015). If we want a
one-tailed p-value (because we have solid a priori evidences that the
difference should go only in one direction), we can halve the
two-tailed p-value given by SPSS (one-tailed=0.01548/2=0.0077 -->
0.008). This can't be done lightly (it is sometimes considered a form
of "cheating" to render significant results that only showed a
tendency towards significance), see these references for a good
discussion on the topic:

- http://www.bmj.com/cgi/content/full/309/6949/248
- http://circ.ahajournals.org/cgi/content/full/105/25/3062
See also the chapter "One-tail vs. two-tail p-values in this book:
- http://www.graphpad.com/manuals/Prism4/StatisticsGuide.pdf

*************************************

Had normality failed, we should have used a non-parametric test
instead of the t-test: Wilcoxon or sign tests. The election of one or
the other depends on symmetry. Wilcoxon test, although non-parametric,
needs symmetry (it could give false significant results on highly
assymmetric variables). Mean and SD shouldn't be used to summarize
non-normal data, median and IQR (InterQuartilic Range, the interval
formed by percentyles 25 & 75 should be used instead). See Lang&Secic
"How to Report Statistics in Medicine" (ACP series) for more details.

Althoug SPSS apparently doesn't give those tests for one sample, they
can be easily obtained "tricking" the program a bit: we only need to
compute an extra variable with the population parameter (median, in
this case).

COMPUTE PopMedian=0.6.

*** Wilcoxon test *.
NPAR TEST
/WILCOXON=copper WITH PopMedian
/STATISTICS QUARTILES.

SPSS will always give the asymptotic p-value for this test, even if
sample size is small (when it becomes unreliable). If the Exact Tests
module is installed and sample size - excluding ties - is below 25
(this is not the case since our sample size is 40), you can ask for
the exact p-value adding "/METHOD=EXACT TIMER(1)" to the above
syntax. If you don't have that module, then you have to check the
significance with a Wilcoxon table, like this one:

Critical values of the Wilcoxon Matched Pairs Signed Rank Test For any
N (number of subjects minus ties) the observed value is significant At
a given level of significance if it is equal to or less than the
critical value Shown in the table below

1-tailed test 2-tailed test
N p<0.05 p<0.01 p<0.05 p<0.01
-------------------------------
5 1 - - -
6 2 - 1 -
7 4 0 2 -
8 6 2 4 0
9 8 3 6 2
10 11 5 8 3
11 14 7 11 5
12 17 10 14 7
13 21 13 17 10
14 26 16 21 13
15 30 20 25 16
16 36 24 30 19
17 41 28 36 23
18 47 33 40 28
19 54 38 46 32
20 60 43 52 37
21 68 49 59 43
22 75 56 66 49
23 83 62 73 55
24 92 69 81 61
25 101 77 90 68
-------------------------------

* Sign test *.
NPAR TEST
/SIGN= copper WITH PopMedian
/STATISTICS QUARTILES.

Here SPSS will either give the exact or the asymptotic p-value,
depending on sample size.

To get a 95%CI for the median, again SPSS has to be "tricked" into it
(using RATIO).

COMPUTE One=1.

RATIO STATISTICS copper WITH One
/PRINT = CIN(95) MEDIAN .

Of course, all these tricks can be programmed in MACROS (to create
PopMedian and One variables, run the tests and then get rid of the
useless extra variables).

Tomorrow (hopefully, we'll discuss two-samples testing, continuous
variables, paired and unpaired, both by parametric and non-parametric
methods).

mailto:[hidden email]

Marta Garcia-Granero, PhD
Statistician

Hector Maletta

Re: Graduate course on SPSS - Lesson 2

I hope it is not now the entire list that will thank Marta, one at a
time! We are all grateful, and she knows it. More generally, in the interest
of parsimony I should recommend that messages conveying just a sentiment of
gratefulness for help received are better sent privately; they may (or even
should) be sent to the list only if they contain some value added, such as a
summary of the discussion, conclusions reached or suchlike.

Hector

-----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de
Roberts, Michael
Enviado el: 25 January 2007 12:43
Para: [hidden email]
Asunto: Re: Graduate course on SPSS - Lesson 2

Ditto that!

Mike

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On
Behalf Of Manmit Shrimali
Sent: Thursday, January 25, 2007 10:29 AM
To: [hidden email]
Subject: Re: Graduate course on SPSS - Lesson 2

Marta:

I wanted to thank you for sharing your knowledge with all. Your
inputs, responses, contributions and efforts are highly appreciated.

Thanks,

Manmit

-----Original Message-----
From: SPSSX(r) Discussion on behalf of Marta García-Granero
Sent: Thu 1/25/2007 4:54 PM
To: [hidden email]
Cc:
Subject: Graduate course on SPSS - Lesson 2

Later than I expected, but here it comes at last:

Background of the data we are going to analyze (absolutely
invented by
me): Some diseases (like Wilson's disease) are associated to
high
urinary copper levels. Suppose we are interested in testing
if a group
of children (all from the same region) show abnormal copper
levels. We
are told that normal levels are <0.6 µmol/24 h).

* Sample dataset (Exercise 1.1 from "Statistics at Square
One"
Swinscow&Campbell:
http://www.bmj.com/collections/statsbk/) *.

DATA LIST FREE/copper(F8.2).
BEGIN DATA
0.70 0.45 0.72 0.30 1.16 0.69 0.83 0.74 1.24 0.77
0.65 0.76 0.42 0.94 0.36 0.98 0.64 0.90 0.63 0.55
0.78 0.10 0.52 0.42 0.58 0.62 1.12 0.86 0.74 1.04
0.65 0.66 0.81 0.48 0.85 0.75 0.73 0.50 0.34 0.88
END DATA.
VAR LABEL copper 'Urinary copper levels (µmol/24 h)'.

First, we have to check if the variable is normally
distributed (to
choose between a parametric or a non-parametric method).
We'll run a
full Exploratory Data Analysis (EDA) on the variable:
descriptives
(mean, SD, skewness, kurtosis...), percentyles, graphics
(histogram -
useful if sample size is not too small -, stem&leaf,
box-plot - both
are good to check for outliers) and normality tests
(Kolmogorov-Smirnov with Lilliefors correction and
Shapiro-Wilk), both
are given when NPPLOT is added to to the /PLOT subcommand.
VERY
IMPORTANT: Don't use one sample Kolmogorov-Smirnov(*), since
it has
very low power to detect non-normality.
(*) NPAR TESTS /K-S(NORMAL)= copper.

*** EDA ***.
EXAMINE
VARIABLES=copper
/PLOT BOXPLOT STEMLEAF HISTOGRAM NPPLOT
/COMPARE GROUP
/PERCENTILES(5,10,25,50,75,90,95)
/STATISTICS DESCRIPTIVES
/CINTERVAL 95.

Discussion of the results: The variable doesn't have
outliers (see
stem&leaf and box-plot graphs), the variable is fairly
symmetric
(skweness is not important: its absolute value is below 1
and less
than twice its standard error - a simple Z test). We can use
a
parametric method: One-sample t-test (to compare the sample
mean with
the expected - population - mean of 0.6). Data can be then
summarized
(for descriptive purposes) using the mean and the standard
deviation
(SD). Don't use the Standard Error of the Mean (SEM) as a
descriptive
spread measure (it isn't). The confidence interval (95%CI)
for the
mean is also sometimes used as complementary information.

*** PARAMETRIC TEST ***.
T-TEST
/TESTVAL = 0.6
/VARIABLES = copper
/CRITERIA = CI(.95) .

The result is significant (two-tailed p-value=0.015). If we
want a
one-tailed p-value (because we have solid a priori evidences
that the
difference should go only in one direction), we can halve
the
two-tailed p-value given by SPSS
(one-tailed=0.01548/2=0.0077 -->
0.008). This can't be done lightly (it is sometimes
considered a form
of "cheating" to render significant results that only showed
a
tendency towards significance), see these references for a
good
discussion on the topic:

- http://www.bmj.com/cgi/content/full/309/6949/248
- http://circ.ahajournals.org/cgi/content/full/105/25/3062
See also the chapter "One-tail vs. two-tail p-values in this
book:
- http://www.graphpad.com/manuals/Prism4/StatisticsGuide.pdf

*************************************

Had normality failed, we should have used a non-parametric
test
instead of the t-test: Wilcoxon or sign tests. The election
of one or
the other depends on symmetry. Wilcoxon test, although
non-parametric,
needs symmetry (it could give false significant results on
highly
assymmetric variables). Mean and SD shouldn't be used to
summarize
non-normal data, median and IQR (InterQuartilic Range, the
interval
formed by percentyles 25 & 75 should be used instead). See
Lang&Secic
"How to Report Statistics in Medicine" (ACP series) for more
details.

Althoug SPSS apparently doesn't give those tests for one
sample, they
can be easily obtained "tricking" the program a bit: we only
need to
compute an extra variable with the population parameter
(median, in
this case).

COMPUTE PopMedian=0.6.

*** Wilcoxon test *.
NPAR TEST
/WILCOXON=copper WITH PopMedian
/STATISTICS QUARTILES.

SPSS will always give the asymptotic p-value for this test,
even if
sample size is small (when it becomes unreliable). If the
Exact Tests
module is installed and sample size - excluding ties - is
below 25
(this is not the case since our sample size is 40), you can
ask for
the exact p-value adding "/METHOD=EXACT TIMER(1)" to the
above
syntax. If you don't have that module, then you have to
check the
significance with a Wilcoxon table, like this one:

Critical values of the Wilcoxon Matched Pairs Signed Rank
Test For any
N (number of subjects minus ties) the observed value is
significant At
a given level of significance if it is equal to or less than
the
critical value Shown in the table below

1-tailed test 2-tailed test
N p<0.05 p<0.01 p<0.05 p<0.01
-------------------------------
5 1 - - -
6 2 - 1 -
7 4 0 2 -
8 6 2 4 0
9 8 3 6 2
10 11 5 8 3
11 14 7 11 5
12 17 10 14 7
13 21 13 17 10
14 26 16 21 13
15 30 20 25 16
16 36 24 30 19
17 41 28 36 23
18 47 33 40 28
19 54 38 46 32
20 60 43 52 37
21 68 49 59 43
22 75 56 66 49
23 83 62 73 55
24 92 69 81 61
25 101 77 90 68
-------------------------------

* Sign test *.
NPAR TEST
/SIGN= copper WITH PopMedian
/STATISTICS QUARTILES.

Here SPSS will either give the exact or the asymptotic
p-value,
depending on sample size.

To get a 95%CI for the median, again SPSS has to be
"tricked" into it
(using RATIO).

COMPUTE One=1.

RATIO STATISTICS copper WITH One
/PRINT = CIN(95) MEDIAN .

Of course, all these tricks can be programmed in MACROS (to
create
PopMedian and One variables, run the tests and then get rid
of the
useless extra variables).

Tomorrow (hopefully, we'll discuss two-samples testing,
continuous
variables, paired and unpaired, both by parametric and
non-parametric
methods).

mailto:[hidden email]

Marta Garcia-Granero, PhD
Statistician

Cleland, Patricia (EDU)

Re: Graduate course on SPSS - Lesson 2

In reply to this post by Marta García-Granero

Now that Marta has shown us how to 'trick' SPSS into calculating the 95%CI for the median. I'm wondering if there's a way to 'trick' SPSS into calculating the 95%CI for the 5th, 10th, 25th, 75th, and 90th percentiles, similar to SAS PROC UNIVARIATE?

Thanks

Pat

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Marta García-Granero
Sent: January 25, 2007 6:24 AM
To: [hidden email]
Subject: Graduate course on SPSS - Lesson 2

Later than I expected, but here it comes at last:

Background of the data we are going to analyze (absolutely invented by
me): Some diseases (like Wilson's disease) are associated to high
urinary copper levels. Suppose we are interested in testing if a group
of children (all from the same region) show abnormal copper levels. We
are told that normal levels are <0.6 µmol/24 h).

* Sample dataset (Exercise 1.1 from "Statistics at Square One"
Swinscow&Campbell: http://www.bmj.com/collections/statsbk/) *.

DATA LIST FREE/copper(F8.2).
BEGIN DATA
0.70 0.45 0.72 0.30 1.16 0.69 0.83 0.74 1.24 0.77
0.65 0.76 0.42 0.94 0.36 0.98 0.64 0.90 0.63 0.55
0.78 0.10 0.52 0.42 0.58 0.62 1.12 0.86 0.74 1.04
0.65 0.66 0.81 0.48 0.85 0.75 0.73 0.50 0.34 0.88
END DATA.
VAR LABEL copper 'Urinary copper levels (µmol/24 h)'.

First, we have to check if the variable is normally distributed (to
choose between a parametric or a non-parametric method). We'll run a
full Exploratory Data Analysis (EDA) on the variable: descriptives
(mean, SD, skewness, kurtosis...), percentyles, graphics (histogram -
useful if sample size is not too small -, stem&leaf, box-plot - both
are good to check for outliers) and normality tests
(Kolmogorov-Smirnov with Lilliefors correction and Shapiro-Wilk), both
are given when NPPLOT is added to to the /PLOT subcommand. VERY
IMPORTANT: Don't use one sample Kolmogorov-Smirnov(*), since it has
very low power to detect non-normality.
(*) NPAR TESTS /K-S(NORMAL)= copper.

*** EDA ***.
EXAMINE
VARIABLES=copper
/PLOT BOXPLOT STEMLEAF HISTOGRAM NPPLOT
/COMPARE GROUP
/PERCENTILES(5,10,25,50,75,90,95)
/STATISTICS DESCRIPTIVES
/CINTERVAL 95.

Discussion of the results: The variable doesn't have outliers (see
stem&leaf and box-plot graphs), the variable is fairly symmetric
(skweness is not important: its absolute value is below 1 and less
than twice its standard error - a simple Z test). We can use a
parametric method: One-sample t-test (to compare the sample mean with
the expected - population - mean of 0.6). Data can be then summarized
(for descriptive purposes) using the mean and the standard deviation
(SD). Don't use the Standard Error of the Mean (SEM) as a descriptive
spread measure (it isn't). The confidence interval (95%CI) for the
mean is also sometimes used as complementary information.

*** PARAMETRIC TEST ***.
T-TEST
/TESTVAL = 0.6
/VARIABLES = copper
/CRITERIA = CI(.95) .

The result is significant (two-tailed p-value=0.015). If we want a
one-tailed p-value (because we have solid a priori evidences that the
difference should go only in one direction), we can halve the
two-tailed p-value given by SPSS (one-tailed=0.01548/2=0.0077 -->
0.008). This can't be done lightly (it is sometimes considered a form
of "cheating" to render significant results that only showed a
tendency towards significance), see these references for a good
discussion on the topic:

- http://www.bmj.com/cgi/content/full/309/6949/248
- http://circ.ahajournals.org/cgi/content/full/105/25/3062
See also the chapter "One-tail vs. two-tail p-values in this book:
- http://www.graphpad.com/manuals/Prism4/StatisticsGuide.pdf

*************************************

Had normality failed, we should have used a non-parametric test
instead of the t-test: Wilcoxon or sign tests. The election of one or
the other depends on symmetry. Wilcoxon test, although non-parametric,
needs symmetry (it could give false significant results on highly
assymmetric variables). Mean and SD shouldn't be used to summarize
non-normal data, median and IQR (InterQuartilic Range, the interval
formed by percentyles 25 & 75 should be used instead). See Lang&Secic
"How to Report Statistics in Medicine" (ACP series) for more details.

Althoug SPSS apparently doesn't give those tests for one sample, they
can be easily obtained "tricking" the program a bit: we only need to
compute an extra variable with the population parameter (median, in
this case).

COMPUTE PopMedian=0.6.

*** Wilcoxon test *.
NPAR TEST
/WILCOXON=copper WITH PopMedian
/STATISTICS QUARTILES.

SPSS will always give the asymptotic p-value for this test, even if
sample size is small (when it becomes unreliable). If the Exact Tests
module is installed and sample size - excluding ties - is below 25
(this is not the case since our sample size is 40), you can ask for
the exact p-value adding "/METHOD=EXACT TIMER(1)" to the above
syntax. If you don't have that module, then you have to check the
significance with a Wilcoxon table, like this one:

Critical values of the Wilcoxon Matched Pairs Signed Rank Test For any
N (number of subjects minus ties) the observed value is significant At
a given level of significance if it is equal to or less than the
critical value Shown in the table below

1-tailed test 2-tailed test
N p<0.05 p<0.01 p<0.05 p<0.01
-------------------------------
5 1 - - -
6 2 - 1 -
7 4 0 2 -
8 6 2 4 0
9 8 3 6 2
10 11 5 8 3
11 14 7 11 5
12 17 10 14 7
13 21 13 17 10
14 26 16 21 13
15 30 20 25 16
16 36 24 30 19
17 41 28 36 23
18 47 33 40 28
19 54 38 46 32
20 60 43 52 37
21 68 49 59 43
22 75 56 66 49
23 83 62 73 55
24 92 69 81 61
25 101 77 90 68
-------------------------------

* Sign test *.
NPAR TEST
/SIGN= copper WITH PopMedian
/STATISTICS QUARTILES.

Here SPSS will either give the exact or the asymptotic p-value,
depending on sample size.

To get a 95%CI for the median, again SPSS has to be "tricked" into it
(using RATIO).

COMPUTE One=1.

RATIO STATISTICS copper WITH One
/PRINT = CIN(95) MEDIAN .

Of course, all these tricks can be programmed in MACROS (to create
PopMedian and One variables, run the tests and then get rid of the
useless extra variables).

Tomorrow (hopefully, we'll discuss two-samples testing, continuous
variables, paired and unpaired, both by parametric and non-parametric
methods).

mailto:[hidden email]

Marta Garcia-Granero, PhD
Statistician

Marta García-Granero

Re: Graduate course on SPSS - Lesson 2

Hi Pat

CPE> Now that Marta has shown us how to 'trick' SPSS into calculating
CPE> the 95%CI for the median. I'm wondering if there's a way to
CPE> 'trick' SPSS into calculating the 95%CI for the 5th, 10th, 25th,
CPE> 75th, and 90th percentiles, similar to SAS PROC UNIVARIATE?

Do you have access to a description of the statistical methods used by
SAS? I could try to work out some solution with SPSS

With SPSS, other solution is using boostrap. It should be quite easy
with MATRIX (I've already done some boostrapping with MATRIX, but not
for percentiles, only for means, multiple R-squared and Spearman's
rho).

The only reference I found after Googling (I mean, the only one I
could access from outside the University, right now I'm at home coping
with a cold) describes the use of the binomial distribution for that,
explains in detail how to do that for the median, and says "it could
be used for other percentiles".

--
Regards,
Dr. Marta García-Granero,PhD mailto:[hidden email]
Statistician

---
"It is unwise to use a statistical procedure whose use one does
not understand. SPSS syntax guide cannot supply this knowledge, and it
is certainly no substitute for the basic understanding of statistics
and statistical thinking that is essential for the wise choice of
methods and the correct interpretation of their results".

(Adapted from WinPepi manual - I'm sure Joe Abrahmson will not mind)