Graduate course on SPSS - Lesson 2

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Graduate course on SPSS - Lesson 2

Marta García-Granero
Later than I expected, but here it comes at last:

Background of the data we are going to analyze (absolutely invented by
me): Some diseases (like Wilson's disease) are associated to high
urinary copper levels. Suppose we are interested in testing if a group
of children (all from the same region) show abnormal copper levels. We
are told that normal levels are <0.6 µmol/24 h).

* Sample dataset (Exercise 1.1 from "Statistics at Square One"
  Swinscow&Campbell: http://www.bmj.com/collections/statsbk/) *.

DATA LIST FREE/copper(F8.2).
BEGIN DATA
0.70 0.45 0.72 0.30 1.16 0.69 0.83 0.74 1.24 0.77
0.65 0.76 0.42 0.94 0.36 0.98 0.64 0.90 0.63 0.55
0.78 0.10 0.52 0.42 0.58 0.62 1.12 0.86 0.74 1.04
0.65 0.66 0.81 0.48 0.85 0.75 0.73 0.50 0.34 0.88
END DATA.
VAR LABEL copper 'Urinary copper levels (µmol/24 h)'.

First, we have to check if the variable is normally distributed (to
choose between a parametric or a non-parametric method). We'll run a
full Exploratory Data Analysis (EDA) on the variable: descriptives
(mean, SD, skewness, kurtosis...), percentyles, graphics (histogram -
useful if sample size is not too small -, stem&leaf, box-plot - both
are good to check for outliers) and normality tests
(Kolmogorov-Smirnov with Lilliefors correction and Shapiro-Wilk), both
are given when NPPLOT is added to to the /PLOT subcommand. VERY
IMPORTANT: Don't use one sample Kolmogorov-Smirnov(*), since it has
very low power to detect non-normality.
(*) NPAR TESTS /K-S(NORMAL)= copper.

*** EDA ***.
EXAMINE
 VARIABLES=copper
 /PLOT BOXPLOT STEMLEAF HISTOGRAM NPPLOT
 /COMPARE GROUP
 /PERCENTILES(5,10,25,50,75,90,95)
 /STATISTICS DESCRIPTIVES
 /CINTERVAL 95.

Discussion of the results: The variable doesn't have outliers (see
stem&leaf and box-plot graphs), the variable is fairly symmetric
(skweness is not important: its absolute value is below 1 and less
than twice its standard error - a simple Z test). We can use a
parametric method: One-sample t-test (to compare the sample mean with
the expected - population - mean of 0.6). Data can be then summarized
(for descriptive purposes) using the mean and the standard deviation
(SD). Don't use the Standard Error of the Mean (SEM) as a descriptive
spread measure (it isn't). The confidence interval (95%CI) for the
mean is also sometimes used as complementary information.

*** PARAMETRIC TEST ***.
T-TEST
 /TESTVAL = 0.6
 /VARIABLES = copper
 /CRITERIA = CI(.95) .

The result is significant (two-tailed p-value=0.015). If we want a
one-tailed p-value (because we have solid a priori evidences that the
difference should go only in one direction), we can halve the
two-tailed p-value given by SPSS (one-tailed=0.01548/2=0.0077 -->
0.008). This can't be done lightly (it is sometimes considered a form
of "cheating" to render significant results that only showed a
tendency towards significance), see these references for a good
discussion on the topic:

- http://www.bmj.com/cgi/content/full/309/6949/248
- http://circ.ahajournals.org/cgi/content/full/105/25/3062
See also the chapter "One-tail vs. two-tail p-values in this book:
- http://www.graphpad.com/manuals/Prism4/StatisticsGuide.pdf

*************************************

Had normality failed, we should have used a non-parametric test
instead of the t-test: Wilcoxon or sign tests. The election of one or
the other depends on symmetry. Wilcoxon test, although non-parametric,
needs symmetry (it could give false significant results on highly
assymmetric variables). Mean and SD shouldn't be used to summarize
non-normal data, median and IQR (InterQuartilic Range, the interval
formed by percentyles 25 & 75 should be used instead). See Lang&Secic
"How to Report Statistics in Medicine" (ACP series) for more details.

Althoug SPSS apparently doesn't give those tests for one sample, they
can be easily obtained "tricking" the program a bit: we only need to
compute an extra variable with the population parameter (median, in
this case).

COMPUTE PopMedian=0.6.

*** Wilcoxon test *.
NPAR TEST
 /WILCOXON=copper WITH PopMedian
 /STATISTICS QUARTILES.

SPSS will always give the asymptotic p-value for this test, even if
sample size is small (when it becomes unreliable). If the Exact Tests
module is installed and sample size - excluding ties - is below 25
(this is not the case since our sample size is 40), you can ask for
the exact p-value adding  "/METHOD=EXACT TIMER(1)" to the above
syntax. If you don't have that module, then you have to check the
significance with a Wilcoxon table, like this one:

Critical values of the Wilcoxon Matched Pairs Signed Rank Test For any
N (number of subjects minus ties) the observed value is significant At
a given level of significance if it is equal to or less than the
critical value Shown in the table below

   1-tailed test 2-tailed test
 N p<0.05 p<0.01 p<0.05 p<0.01
-------------------------------
 5    1      -      -      -
 6    2      -      1      -
 7    4      0      2      -
 8    6      2      4      0
 9    8      3      6      2
10   11      5      8      3
11   14      7     11      5
12   17     10     14      7
13   21     13     17     10
14   26     16     21     13
15   30     20     25     16
16   36     24     30     19
17   41     28     36     23
18   47     33     40     28
19   54     38     46     32
20   60     43     52     37
21   68     49     59     43
22   75     56     66     49
23   83     62     73     55
24   92     69     81     61
25  101     77     90     68
-------------------------------

* Sign test *.
NPAR TEST
 /SIGN= copper WITH PopMedian
 /STATISTICS QUARTILES.

Here SPSS will either give the exact or the asymptotic p-value,
depending on sample size.

To get a 95%CI for the median, again SPSS has to be "tricked" into it
(using RATIO).

COMPUTE One=1.

RATIO STATISTICS copper WITH One
 /PRINT = CIN(95) MEDIAN .

Of course, all these tricks can be programmed in MACROS (to create
PopMedian and One variables, run the tests and then get rid of the
useless extra variables).

Tomorrow (hopefully, we'll discuss two-samples testing, continuous
variables, paired and unpaired, both by parametric and non-parametric
methods).

                          mailto:[hidden email]


Marta Garcia-Granero, PhD
Statistician
Reply | Threaded
Open this post in threaded view
|

Re: Graduate course on SPSS - Lesson 2

Edward Boadi
Hi Marta,
I cant find a post on  "Graduate course on SPSS - Lesson 1" in the archives.
Was there a post on "Graduate course on SPSS - Lesson 1" ?

Thanks.
Edward.


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
Marta García-Granero
Sent: Thursday, January 25, 2007 6:24 AM
To: [hidden email]
Subject: Graduate course on SPSS - Lesson 2


Later than I expected, but here it comes at last:

Background of the data we are going to analyze (absolutely invented by
me): Some diseases (like Wilson's disease) are associated to high
urinary copper levels. Suppose we are interested in testing if a group
of children (all from the same region) show abnormal copper levels. We
are told that normal levels are <0.6 µmol/24 h).

* Sample dataset (Exercise 1.1 from "Statistics at Square One"
  Swinscow&Campbell: http://www.bmj.com/collections/statsbk/) *.

DATA LIST FREE/copper(F8.2).
BEGIN DATA
0.70 0.45 0.72 0.30 1.16 0.69 0.83 0.74 1.24 0.77
0.65 0.76 0.42 0.94 0.36 0.98 0.64 0.90 0.63 0.55
0.78 0.10 0.52 0.42 0.58 0.62 1.12 0.86 0.74 1.04
0.65 0.66 0.81 0.48 0.85 0.75 0.73 0.50 0.34 0.88
END DATA.
VAR LABEL copper 'Urinary copper levels (µmol/24 h)'.

First, we have to check if the variable is normally distributed (to
choose between a parametric or a non-parametric method). We'll run a
full Exploratory Data Analysis (EDA) on the variable: descriptives
(mean, SD, skewness, kurtosis...), percentyles, graphics (histogram -
useful if sample size is not too small -, stem&leaf, box-plot - both
are good to check for outliers) and normality tests
(Kolmogorov-Smirnov with Lilliefors correction and Shapiro-Wilk), both
are given when NPPLOT is added to to the /PLOT subcommand. VERY
IMPORTANT: Don't use one sample Kolmogorov-Smirnov(*), since it has
very low power to detect non-normality.
(*) NPAR TESTS /K-S(NORMAL)= copper.

*** EDA ***.
EXAMINE
 VARIABLES=copper
 /PLOT BOXPLOT STEMLEAF HISTOGRAM NPPLOT
 /COMPARE GROUP
 /PERCENTILES(5,10,25,50,75,90,95)
 /STATISTICS DESCRIPTIVES
 /CINTERVAL 95.

Discussion of the results: The variable doesn't have outliers (see
stem&leaf and box-plot graphs), the variable is fairly symmetric
(skweness is not important: its absolute value is below 1 and less
than twice its standard error - a simple Z test). We can use a
parametric method: One-sample t-test (to compare the sample mean with
the expected - population - mean of 0.6). Data can be then summarized
(for descriptive purposes) using the mean and the standard deviation
(SD). Don't use the Standard Error of the Mean (SEM) as a descriptive
spread measure (it isn't). The confidence interval (95%CI) for the
mean is also sometimes used as complementary information.

*** PARAMETRIC TEST ***.
T-TEST
 /TESTVAL = 0.6
 /VARIABLES = copper
 /CRITERIA = CI(.95) .

The result is significant (two-tailed p-value=0.015). If we want a
one-tailed p-value (because we have solid a priori evidences that the
difference should go only in one direction), we can halve the
two-tailed p-value given by SPSS (one-tailed=0.01548/2=0.0077 -->
0.008). This can't be done lightly (it is sometimes considered a form
of "cheating" to render significant results that only showed a
tendency towards significance), see these references for a good
discussion on the topic:

- http://www.bmj.com/cgi/content/full/309/6949/248
- http://circ.ahajournals.org/cgi/content/full/105/25/3062
See also the chapter "One-tail vs. two-tail p-values in this book:
- http://www.graphpad.com/manuals/Prism4/StatisticsGuide.pdf

*************************************

Had normality failed, we should have used a non-parametric test
instead of the t-test: Wilcoxon or sign tests. The election of one or
the other depends on symmetry. Wilcoxon test, although non-parametric,
needs symmetry (it could give false significant results on highly
assymmetric variables). Mean and SD shouldn't be used to summarize
non-normal data, median and IQR (InterQuartilic Range, the interval
formed by percentyles 25 & 75 should be used instead). See Lang&Secic
"How to Report Statistics in Medicine" (ACP series) for more details.

Althoug SPSS apparently doesn't give those tests for one sample, they
can be easily obtained "tricking" the program a bit: we only need to
compute an extra variable with the population parameter (median, in
this case).

COMPUTE PopMedian=0.6.

*** Wilcoxon test *.
NPAR TEST
 /WILCOXON=copper WITH PopMedian
 /STATISTICS QUARTILES.

SPSS will always give the asymptotic p-value for this test, even if
sample size is small (when it becomes unreliable). If the Exact Tests
module is installed and sample size - excluding ties - is below 25
(this is not the case since our sample size is 40), you can ask for
the exact p-value adding  "/METHOD=EXACT TIMER(1)" to the above
syntax. If you don't have that module, then you have to check the
significance with a Wilcoxon table, like this one:

Critical values of the Wilcoxon Matched Pairs Signed Rank Test For any
N (number of subjects minus ties) the observed value is significant At
a given level of significance if it is equal to or less than the
critical value Shown in the table below

   1-tailed test 2-tailed test
 N p<0.05 p<0.01 p<0.05 p<0.01
-------------------------------
 5    1      -      -      -
 6    2      -      1      -
 7    4      0      2      -
 8    6      2      4      0
 9    8      3      6      2
10   11      5      8      3
11   14      7     11      5
12   17     10     14      7
13   21     13     17     10
14   26     16     21     13
15   30     20     25     16
16   36     24     30     19
17   41     28     36     23
18   47     33     40     28
19   54     38     46     32
20   60     43     52     37
21   68     49     59     43
22   75     56     66     49
23   83     62     73     55
24   92     69     81     61
25  101     77     90     68
-------------------------------

* Sign test *.
NPAR TEST
 /SIGN= copper WITH PopMedian
 /STATISTICS QUARTILES.

Here SPSS will either give the exact or the asymptotic p-value,
depending on sample size.

To get a 95%CI for the median, again SPSS has to be "tricked" into it
(using RATIO).

COMPUTE One=1.

RATIO STATISTICS copper WITH One
 /PRINT = CIN(95) MEDIAN .

Of course, all these tricks can be programmed in MACROS (to create
PopMedian and One variables, run the tests and then get rid of the
useless extra variables).

Tomorrow (hopefully, we'll discuss two-samples testing, continuous
variables, paired and unpaired, both by parametric and non-parametric
methods).

                          mailto:[hidden email]


Marta Garcia-Granero, PhD
Statistician
Reply | Threaded
Open this post in threaded view
|

Re: Graduate course on SPSS - Lesson 2

Marta García-Granero
Hi Edward

Yes, there was (january 22th), and exactly with that name. Since I got
some replies to it, it was distributed. If you still can't find it,
tell me and I'll send it to you again privately.

EB> I cant find a post on  "Graduate course on SPSS - Lesson 1" in the
EB> archives. Was there a post on "Graduate course on SPSS - Lesson
EB> 1"?


Regards,

Marta
Reply | Threaded
Open this post in threaded view
|

Re: Graduate course on SPSS - Lesson 2

Edward Boadi
In reply to this post by Marta García-Granero
Hi Marta,
Melissa has given me a copy of the said post.

Thanks.
Edward.


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
Marta García-Granero
Sent: Thursday, January 25, 2007 9:56 AM
To: [hidden email]
Subject: Re: Graduate course on SPSS - Lesson 2


Hi Edward

Yes, there was (january 22th), and exactly with that name. Since I got
some replies to it, it was distributed. If you still can't find it,
tell me and I'll send it to you again privately.

EB> I cant find a post on  "Graduate course on SPSS - Lesson 1" in the
EB> archives. Was there a post on "Graduate course on SPSS - Lesson
EB> 1"?


Regards,

Marta
Reply | Threaded
Open this post in threaded view
|

Graduate course on SPSS - Lesson 2 (addendum)

Marta García-Granero
Hi everybody

The MACROS I mentioned (to ease the task of running non-parametric one
sample tests):

DEFINE OneSampleWilcoxon(!POS=!TOKENS(1)/
                         median=!TOKENS(1)/
                         EXACT=!DEFAULT(NO) !TOKENS(1)).
TEMPORARY.
COMPUTE PopMedian=!median.
!IF (!UPCASE(!!EXACT) !EQ 'NO') !THEN
NPAR TEST
 /WILCOXON=!1 WITH PopMedian
 /STATISTICS QUARTILES.
!ELSE
NPAR TEST
 /WILCOXON=!1 WITH PopMedian
 /STATISTICS QUARTILES
 /METHOD=EXACT TIMER(1).
!IFEND.
!ENDDEFINE.

DEFINE OneSampleSign(!POS=!TOKENS(1)/
                     median=!TOKENS(1)).
TEMPORARY.
COMPUTE PopMedian=!median.
NPAR TEST
 /SIGN=!1 WITH PopMedian
 /STATISTICS QUARTILES.
!ENDDEFINE.

DEFINE OneMedianCI(!POS=!TOKENS(1)/
                   !POS=!DEFAULT(95) !TOKENS(1)).
TEMPORARY.
COMPUTE one=1.
RATIO STATISTICS !1 WITH one
 /PRINT = CIN(!2) MEDIAN .
!ENDDEFINE.

* MACRO calls (on the Copper dataset) *.
OneSampleWilcoxon copper median=0.6.
OneSampleWilcoxon copper median=0.6 EXACT=Yes.
OneSampleSign copper median=0.6.
OneMedianCI copper.
OneMedianCI copper 99.

Regards,
Marta

(Working on next chapter)
Reply | Threaded
Open this post in threaded view
|

Re: Graduate course on SPSS - Lesson 2

Manmit Shrimali-2
In reply to this post by Marta García-Granero
Marta:

I wanted to thank you for sharing your knowledge with all. Your inputs, responses, contributions and efforts are highly appreciated.

Thanks,

Manmit

        -----Original Message-----
        From: SPSSX(r) Discussion on behalf of Marta García-Granero
        Sent: Thu 1/25/2007 4:54 PM
        To: [hidden email]
        Cc:
        Subject: Graduate course on SPSS - Lesson 2



        Later than I expected, but here it comes at last:

        Background of the data we are going to analyze (absolutely invented by
        me): Some diseases (like Wilson's disease) are associated to high
        urinary copper levels. Suppose we are interested in testing if a group
        of children (all from the same region) show abnormal copper levels. We
        are told that normal levels are <0.6 µmol/24 h).

        * Sample dataset (Exercise 1.1 from "Statistics at Square One"
          Swinscow&Campbell: http://www.bmj.com/collections/statsbk/) *.

        DATA LIST FREE/copper(F8.2).
        BEGIN DATA
        0.70 0.45 0.72 0.30 1.16 0.69 0.83 0.74 1.24 0.77
        0.65 0.76 0.42 0.94 0.36 0.98 0.64 0.90 0.63 0.55
        0.78 0.10 0.52 0.42 0.58 0.62 1.12 0.86 0.74 1.04
        0.65 0.66 0.81 0.48 0.85 0.75 0.73 0.50 0.34 0.88
        END DATA.
        VAR LABEL copper 'Urinary copper levels (µmol/24 h)'.

        First, we have to check if the variable is normally distributed (to
        choose between a parametric or a non-parametric method). We'll run a
        full Exploratory Data Analysis (EDA) on the variable: descriptives
        (mean, SD, skewness, kurtosis...), percentyles, graphics (histogram -
        useful if sample size is not too small -, stem&leaf, box-plot - both
        are good to check for outliers) and normality tests
        (Kolmogorov-Smirnov with Lilliefors correction and Shapiro-Wilk), both
        are given when NPPLOT is added to to the /PLOT subcommand. VERY
        IMPORTANT: Don't use one sample Kolmogorov-Smirnov(*), since it has
        very low power to detect non-normality.
        (*) NPAR TESTS /K-S(NORMAL)= copper.

        *** EDA ***.
        EXAMINE
         VARIABLES=copper
         /PLOT BOXPLOT STEMLEAF HISTOGRAM NPPLOT
         /COMPARE GROUP
         /PERCENTILES(5,10,25,50,75,90,95)
         /STATISTICS DESCRIPTIVES
         /CINTERVAL 95.

        Discussion of the results: The variable doesn't have outliers (see
        stem&leaf and box-plot graphs), the variable is fairly symmetric
        (skweness is not important: its absolute value is below 1 and less
        than twice its standard error - a simple Z test). We can use a
        parametric method: One-sample t-test (to compare the sample mean with
        the expected - population - mean of 0.6). Data can be then summarized
        (for descriptive purposes) using the mean and the standard deviation
        (SD). Don't use the Standard Error of the Mean (SEM) as a descriptive
        spread measure (it isn't). The confidence interval (95%CI) for the
        mean is also sometimes used as complementary information.

        *** PARAMETRIC TEST ***.
        T-TEST
         /TESTVAL = 0.6
         /VARIABLES = copper
         /CRITERIA = CI(.95) .

        The result is significant (two-tailed p-value=0.015). If we want a
        one-tailed p-value (because we have solid a priori evidences that the
        difference should go only in one direction), we can halve the
        two-tailed p-value given by SPSS (one-tailed=0.01548/2=0.0077 -->
        0.008). This can't be done lightly (it is sometimes considered a form
        of "cheating" to render significant results that only showed a
        tendency towards significance), see these references for a good
        discussion on the topic:

        - http://www.bmj.com/cgi/content/full/309/6949/248
        - http://circ.ahajournals.org/cgi/content/full/105/25/3062
        See also the chapter "One-tail vs. two-tail p-values in this book:
        - http://www.graphpad.com/manuals/Prism4/StatisticsGuide.pdf

        *************************************

        Had normality failed, we should have used a non-parametric test
        instead of the t-test: Wilcoxon or sign tests. The election of one or
        the other depends on symmetry. Wilcoxon test, although non-parametric,
        needs symmetry (it could give false significant results on highly
        assymmetric variables). Mean and SD shouldn't be used to summarize
        non-normal data, median and IQR (InterQuartilic Range, the interval
        formed by percentyles 25 & 75 should be used instead). See Lang&Secic
        "How to Report Statistics in Medicine" (ACP series) for more details.

        Althoug SPSS apparently doesn't give those tests for one sample, they
        can be easily obtained "tricking" the program a bit: we only need to
        compute an extra variable with the population parameter (median, in
        this case).

        COMPUTE PopMedian=0.6.

        *** Wilcoxon test *.
        NPAR TEST
         /WILCOXON=copper WITH PopMedian
         /STATISTICS QUARTILES.

        SPSS will always give the asymptotic p-value for this test, even if
        sample size is small (when it becomes unreliable). If the Exact Tests
        module is installed and sample size - excluding ties - is below 25
        (this is not the case since our sample size is 40), you can ask for
        the exact p-value adding  "/METHOD=EXACT TIMER(1)" to the above
        syntax. If you don't have that module, then you have to check the
        significance with a Wilcoxon table, like this one:

        Critical values of the Wilcoxon Matched Pairs Signed Rank Test For any
        N (number of subjects minus ties) the observed value is significant At
        a given level of significance if it is equal to or less than the
        critical value Shown in the table below

           1-tailed test 2-tailed test
         N p<0.05 p<0.01 p<0.05 p<0.01
        -------------------------------
         5    1      -      -      -
         6    2      -      1      -
         7    4      0      2      -
         8    6      2      4      0
         9    8      3      6      2
        10   11      5      8      3
        11   14      7     11      5
        12   17     10     14      7
        13   21     13     17     10
        14   26     16     21     13
        15   30     20     25     16
        16   36     24     30     19
        17   41     28     36     23
        18   47     33     40     28
        19   54     38     46     32
        20   60     43     52     37
        21   68     49     59     43
        22   75     56     66     49
        23   83     62     73     55
        24   92     69     81     61
        25  101     77     90     68
        -------------------------------

        * Sign test *.
        NPAR TEST
         /SIGN= copper WITH PopMedian
         /STATISTICS QUARTILES.

        Here SPSS will either give the exact or the asymptotic p-value,
        depending on sample size.

        To get a 95%CI for the median, again SPSS has to be "tricked" into it
        (using RATIO).

        COMPUTE One=1.

        RATIO STATISTICS copper WITH One
         /PRINT = CIN(95) MEDIAN .

        Of course, all these tricks can be programmed in MACROS (to create
        PopMedian and One variables, run the tests and then get rid of the
        useless extra variables).

        Tomorrow (hopefully, we'll discuss two-samples testing, continuous
        variables, paired and unpaired, both by parametric and non-parametric
        methods).

                                  mailto:[hidden email]


        Marta Garcia-Granero, PhD
        Statistician

Reply | Threaded
Open this post in threaded view
|

Re: Graduate course on SPSS - Lesson 2

Roberts, Michael
Ditto that!


Mike


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Manmit Shrimali
Sent: Thursday, January 25, 2007 10:29 AM
To: [hidden email]
Subject: Re: Graduate course on SPSS - Lesson 2

Marta:

I wanted to thank you for sharing your knowledge with all. Your inputs, responses, contributions and efforts are highly appreciated.

Thanks,

Manmit

        -----Original Message-----
        From: SPSSX(r) Discussion on behalf of Marta García-Granero
        Sent: Thu 1/25/2007 4:54 PM
        To: [hidden email]
        Cc:
        Subject: Graduate course on SPSS - Lesson 2



        Later than I expected, but here it comes at last:

        Background of the data we are going to analyze (absolutely invented by
        me): Some diseases (like Wilson's disease) are associated to high
        urinary copper levels. Suppose we are interested in testing if a group
        of children (all from the same region) show abnormal copper levels. We
        are told that normal levels are <0.6 µmol/24 h).

        * Sample dataset (Exercise 1.1 from "Statistics at Square One"
          Swinscow&Campbell: http://www.bmj.com/collections/statsbk/) *.

        DATA LIST FREE/copper(F8.2).
        BEGIN DATA
        0.70 0.45 0.72 0.30 1.16 0.69 0.83 0.74 1.24 0.77
        0.65 0.76 0.42 0.94 0.36 0.98 0.64 0.90 0.63 0.55
        0.78 0.10 0.52 0.42 0.58 0.62 1.12 0.86 0.74 1.04
        0.65 0.66 0.81 0.48 0.85 0.75 0.73 0.50 0.34 0.88
        END DATA.
        VAR LABEL copper 'Urinary copper levels (µmol/24 h)'.

        First, we have to check if the variable is normally distributed (to
        choose between a parametric or a non-parametric method). We'll run a
        full Exploratory Data Analysis (EDA) on the variable: descriptives
        (mean, SD, skewness, kurtosis...), percentyles, graphics (histogram -
        useful if sample size is not too small -, stem&leaf, box-plot - both
        are good to check for outliers) and normality tests
        (Kolmogorov-Smirnov with Lilliefors correction and Shapiro-Wilk), both
        are given when NPPLOT is added to to the /PLOT subcommand. VERY
        IMPORTANT: Don't use one sample Kolmogorov-Smirnov(*), since it has
        very low power to detect non-normality.
        (*) NPAR TESTS /K-S(NORMAL)= copper.

        *** EDA ***.
        EXAMINE
         VARIABLES=copper
         /PLOT BOXPLOT STEMLEAF HISTOGRAM NPPLOT
         /COMPARE GROUP
         /PERCENTILES(5,10,25,50,75,90,95)
         /STATISTICS DESCRIPTIVES
         /CINTERVAL 95.

        Discussion of the results: The variable doesn't have outliers (see
        stem&leaf and box-plot graphs), the variable is fairly symmetric
        (skweness is not important: its absolute value is below 1 and less
        than twice its standard error - a simple Z test). We can use a
        parametric method: One-sample t-test (to compare the sample mean with
        the expected - population - mean of 0.6). Data can be then summarized
        (for descriptive purposes) using the mean and the standard deviation
        (SD). Don't use the Standard Error of the Mean (SEM) as a descriptive
        spread measure (it isn't). The confidence interval (95%CI) for the
        mean is also sometimes used as complementary information.

        *** PARAMETRIC TEST ***.
        T-TEST
         /TESTVAL = 0.6
         /VARIABLES = copper
         /CRITERIA = CI(.95) .

        The result is significant (two-tailed p-value=0.015). If we want a
        one-tailed p-value (because we have solid a priori evidences that the
        difference should go only in one direction), we can halve the
        two-tailed p-value given by SPSS (one-tailed=0.01548/2=0.0077 -->
        0.008). This can't be done lightly (it is sometimes considered a form
        of "cheating" to render significant results that only showed a
        tendency towards significance), see these references for a good
        discussion on the topic:

        - http://www.bmj.com/cgi/content/full/309/6949/248
        - http://circ.ahajournals.org/cgi/content/full/105/25/3062
        See also the chapter "One-tail vs. two-tail p-values in this book:
        - http://www.graphpad.com/manuals/Prism4/StatisticsGuide.pdf

        *************************************

        Had normality failed, we should have used a non-parametric test
        instead of the t-test: Wilcoxon or sign tests. The election of one or
        the other depends on symmetry. Wilcoxon test, although non-parametric,
        needs symmetry (it could give false significant results on highly
        assymmetric variables). Mean and SD shouldn't be used to summarize
        non-normal data, median and IQR (InterQuartilic Range, the interval
        formed by percentyles 25 & 75 should be used instead). See Lang&Secic
        "How to Report Statistics in Medicine" (ACP series) for more details.

        Althoug SPSS apparently doesn't give those tests for one sample, they
        can be easily obtained "tricking" the program a bit: we only need to
        compute an extra variable with the population parameter (median, in
        this case).

        COMPUTE PopMedian=0.6.

        *** Wilcoxon test *.
        NPAR TEST
         /WILCOXON=copper WITH PopMedian
         /STATISTICS QUARTILES.

        SPSS will always give the asymptotic p-value for this test, even if
        sample size is small (when it becomes unreliable). If the Exact Tests
        module is installed and sample size - excluding ties - is below 25
        (this is not the case since our sample size is 40), you can ask for
        the exact p-value adding  "/METHOD=EXACT TIMER(1)" to the above
        syntax. If you don't have that module, then you have to check the
        significance with a Wilcoxon table, like this one:

        Critical values of the Wilcoxon Matched Pairs Signed Rank Test For any
        N (number of subjects minus ties) the observed value is significant At
        a given level of significance if it is equal to or less than the
        critical value Shown in the table below

           1-tailed test 2-tailed test
         N p<0.05 p<0.01 p<0.05 p<0.01
        -------------------------------
         5    1      -      -      -
         6    2      -      1      -
         7    4      0      2      -
         8    6      2      4      0
         9    8      3      6      2
        10   11      5      8      3
        11   14      7     11      5
        12   17     10     14      7
        13   21     13     17     10
        14   26     16     21     13
        15   30     20     25     16
        16   36     24     30     19
        17   41     28     36     23
        18   47     33     40     28
        19   54     38     46     32
        20   60     43     52     37
        21   68     49     59     43
        22   75     56     66     49
        23   83     62     73     55
        24   92     69     81     61
        25  101     77     90     68
        -------------------------------

        * Sign test *.
        NPAR TEST
         /SIGN= copper WITH PopMedian
         /STATISTICS QUARTILES.

        Here SPSS will either give the exact or the asymptotic p-value,
        depending on sample size.

        To get a 95%CI for the median, again SPSS has to be "tricked" into it
        (using RATIO).

        COMPUTE One=1.

        RATIO STATISTICS copper WITH One
         /PRINT = CIN(95) MEDIAN .

        Of course, all these tricks can be programmed in MACROS (to create
        PopMedian and One variables, run the tests and then get rid of the
        useless extra variables).

        Tomorrow (hopefully, we'll discuss two-samples testing, continuous
        variables, paired and unpaired, both by parametric and non-parametric
        methods).

                                  mailto:[hidden email]


        Marta Garcia-Granero, PhD
        Statistician
Reply | Threaded
Open this post in threaded view
|

Re: Graduate course on SPSS - Lesson 2

Hector Maletta
        I hope it is not now the entire list that will thank Marta, one at a
time! We are all grateful, and she knows it. More generally, in the interest
of parsimony I should recommend that messages conveying just a sentiment of
gratefulness for help received are better sent privately; they may (or even
should) be sent to the list only if they contain some value added, such as a
summary of the discussion, conclusions reached or suchlike.

        Hector

        -----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de
Roberts, Michael
Enviado el: 25 January 2007 12:43
Para: [hidden email]
Asunto: Re: Graduate course on SPSS - Lesson 2

        Ditto that!


        Mike


        -----Original Message-----
        From: SPSSX(r) Discussion [mailto:[hidden email]] On
Behalf Of Manmit Shrimali
        Sent: Thursday, January 25, 2007 10:29 AM
        To: [hidden email]
        Subject: Re: Graduate course on SPSS - Lesson 2

        Marta:

        I wanted to thank you for sharing your knowledge with all. Your
inputs, responses, contributions and efforts are highly appreciated.

        Thanks,

        Manmit

                -----Original Message-----
                From: SPSSX(r) Discussion on behalf of Marta García-Granero
                Sent: Thu 1/25/2007 4:54 PM
                To: [hidden email]
                Cc:
                Subject: Graduate course on SPSS - Lesson 2



                Later than I expected, but here it comes at last:

                Background of the data we are going to analyze (absolutely
invented by
                me): Some diseases (like Wilson's disease) are associated to
high
                urinary copper levels. Suppose we are interested in testing
if a group
                of children (all from the same region) show abnormal copper
levels. We
                are told that normal levels are <0.6 µmol/24 h).

                * Sample dataset (Exercise 1.1 from "Statistics at Square
One"
                  Swinscow&Campbell:
http://www.bmj.com/collections/statsbk/) *.

                DATA LIST FREE/copper(F8.2).
                BEGIN DATA
                0.70 0.45 0.72 0.30 1.16 0.69 0.83 0.74 1.24 0.77
                0.65 0.76 0.42 0.94 0.36 0.98 0.64 0.90 0.63 0.55
                0.78 0.10 0.52 0.42 0.58 0.62 1.12 0.86 0.74 1.04
                0.65 0.66 0.81 0.48 0.85 0.75 0.73 0.50 0.34 0.88
                END DATA.
                VAR LABEL copper 'Urinary copper levels (µmol/24 h)'.

                First, we have to check if the variable is normally
distributed (to
                choose between a parametric or a non-parametric method).
We'll run a
                full Exploratory Data Analysis (EDA) on the variable:
descriptives
                (mean, SD, skewness, kurtosis...), percentyles, graphics
(histogram -
                useful if sample size is not too small -, stem&leaf,
box-plot - both
                are good to check for outliers) and normality tests
                (Kolmogorov-Smirnov with Lilliefors correction and
Shapiro-Wilk), both
                are given when NPPLOT is added to to the /PLOT subcommand.
VERY
                IMPORTANT: Don't use one sample Kolmogorov-Smirnov(*), since
it has
                very low power to detect non-normality.
                (*) NPAR TESTS /K-S(NORMAL)= copper.

                *** EDA ***.
                EXAMINE
                 VARIABLES=copper
                 /PLOT BOXPLOT STEMLEAF HISTOGRAM NPPLOT
                 /COMPARE GROUP
                 /PERCENTILES(5,10,25,50,75,90,95)
                 /STATISTICS DESCRIPTIVES
                 /CINTERVAL 95.

                Discussion of the results: The variable doesn't have
outliers (see
                stem&leaf and box-plot graphs), the variable is fairly
symmetric
                (skweness is not important: its absolute value is below 1
and less
                than twice its standard error - a simple Z test). We can use
a
                parametric method: One-sample t-test (to compare the sample
mean with
                the expected - population - mean of 0.6). Data can be then
summarized
                (for descriptive purposes) using the mean and the standard
deviation
                (SD). Don't use the Standard Error of the Mean (SEM) as a
descriptive
                spread measure (it isn't). The confidence interval (95%CI)
for the
                mean is also sometimes used as complementary information.

                *** PARAMETRIC TEST ***.
                T-TEST
                 /TESTVAL = 0.6
                 /VARIABLES = copper
                 /CRITERIA = CI(.95) .

                The result is significant (two-tailed p-value=0.015). If we
want a
                one-tailed p-value (because we have solid a priori evidences
that the
                difference should go only in one direction), we can halve
the
                two-tailed p-value given by SPSS
(one-tailed=0.01548/2=0.0077 -->
                0.008). This can't be done lightly (it is sometimes
considered a form
                of "cheating" to render significant results that only showed
a
                tendency towards significance), see these references for a
good
                discussion on the topic:

                - http://www.bmj.com/cgi/content/full/309/6949/248
                - http://circ.ahajournals.org/cgi/content/full/105/25/3062
                See also the chapter "One-tail vs. two-tail p-values in this
book:
                - http://www.graphpad.com/manuals/Prism4/StatisticsGuide.pdf

                *************************************

                Had normality failed, we should have used a non-parametric
test
                instead of the t-test: Wilcoxon or sign tests. The election
of one or
                the other depends on symmetry. Wilcoxon test, although
non-parametric,
                needs symmetry (it could give false significant results on
highly
                assymmetric variables). Mean and SD shouldn't be used to
summarize
                non-normal data, median and IQR (InterQuartilic Range, the
interval
                formed by percentyles 25 & 75 should be used instead). See
Lang&Secic
                "How to Report Statistics in Medicine" (ACP series) for more
details.

                Althoug SPSS apparently doesn't give those tests for one
sample, they
                can be easily obtained "tricking" the program a bit: we only
need to
                compute an extra variable with the population parameter
(median, in
                this case).

                COMPUTE PopMedian=0.6.

                *** Wilcoxon test *.
                NPAR TEST
                 /WILCOXON=copper WITH PopMedian
                 /STATISTICS QUARTILES.

                SPSS will always give the asymptotic p-value for this test,
even if
                sample size is small (when it becomes unreliable). If the
Exact Tests
                module is installed and sample size - excluding ties - is
below 25
                (this is not the case since our sample size is 40), you can
ask for
                the exact p-value adding  "/METHOD=EXACT TIMER(1)" to the
above
                syntax. If you don't have that module, then you have to
check the
                significance with a Wilcoxon table, like this one:

                Critical values of the Wilcoxon Matched Pairs Signed Rank
Test For any
                N (number of subjects minus ties) the observed value is
significant At
                a given level of significance if it is equal to or less than
the
                critical value Shown in the table below

                   1-tailed test 2-tailed test
                 N p<0.05 p<0.01 p<0.05 p<0.01
                -------------------------------
                 5    1      -      -      -
                 6    2      -      1      -
                 7    4      0      2      -
                 8    6      2      4      0
                 9    8      3      6      2
                10   11      5      8      3
                11   14      7     11      5
                12   17     10     14      7
                13   21     13     17     10
                14   26     16     21     13
                15   30     20     25     16
                16   36     24     30     19
                17   41     28     36     23
                18   47     33     40     28
                19   54     38     46     32
                20   60     43     52     37
                21   68     49     59     43
                22   75     56     66     49
                23   83     62     73     55
                24   92     69     81     61
                25  101     77     90     68
                -------------------------------

                * Sign test *.
                NPAR TEST
                 /SIGN= copper WITH PopMedian
                 /STATISTICS QUARTILES.

                Here SPSS will either give the exact or the asymptotic
p-value,
                depending on sample size.

                To get a 95%CI for the median, again SPSS has to be
"tricked" into it
                (using RATIO).

                COMPUTE One=1.

                RATIO STATISTICS copper WITH One
                 /PRINT = CIN(95) MEDIAN .

                Of course, all these tricks can be programmed in MACROS (to
create
                PopMedian and One variables, run the tests and then get rid
of the
                useless extra variables).

                Tomorrow (hopefully, we'll discuss two-samples testing,
continuous
                variables, paired and unpaired, both by parametric and
non-parametric
                methods).

                                          mailto:[hidden email]


                Marta Garcia-Granero, PhD
                Statistician
Reply | Threaded
Open this post in threaded view
|

Re: Graduate course on SPSS - Lesson 2

Cleland, Patricia (EDU)
In reply to this post by Marta García-Granero
Now that Marta has shown us how to 'trick' SPSS into calculating the 95%CI for the median.  I'm wondering if there's a way to 'trick' SPSS into calculating the 95%CI for the 5th, 10th, 25th, 75th, and 90th percentiles, similar to SAS PROC UNIVARIATE?

Thanks

Pat

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Marta García-Granero
Sent: January 25, 2007 6:24 AM
To: [hidden email]
Subject: Graduate course on SPSS - Lesson 2

Later than I expected, but here it comes at last:

Background of the data we are going to analyze (absolutely invented by
me): Some diseases (like Wilson's disease) are associated to high
urinary copper levels. Suppose we are interested in testing if a group
of children (all from the same region) show abnormal copper levels. We
are told that normal levels are <0.6 µmol/24 h).

* Sample dataset (Exercise 1.1 from "Statistics at Square One"
  Swinscow&Campbell: http://www.bmj.com/collections/statsbk/) *.

DATA LIST FREE/copper(F8.2).
BEGIN DATA
0.70 0.45 0.72 0.30 1.16 0.69 0.83 0.74 1.24 0.77
0.65 0.76 0.42 0.94 0.36 0.98 0.64 0.90 0.63 0.55
0.78 0.10 0.52 0.42 0.58 0.62 1.12 0.86 0.74 1.04
0.65 0.66 0.81 0.48 0.85 0.75 0.73 0.50 0.34 0.88
END DATA.
VAR LABEL copper 'Urinary copper levels (µmol/24 h)'.

First, we have to check if the variable is normally distributed (to
choose between a parametric or a non-parametric method). We'll run a
full Exploratory Data Analysis (EDA) on the variable: descriptives
(mean, SD, skewness, kurtosis...), percentyles, graphics (histogram -
useful if sample size is not too small -, stem&leaf, box-plot - both
are good to check for outliers) and normality tests
(Kolmogorov-Smirnov with Lilliefors correction and Shapiro-Wilk), both
are given when NPPLOT is added to to the /PLOT subcommand. VERY
IMPORTANT: Don't use one sample Kolmogorov-Smirnov(*), since it has
very low power to detect non-normality.
(*) NPAR TESTS /K-S(NORMAL)= copper.

*** EDA ***.
EXAMINE
 VARIABLES=copper
 /PLOT BOXPLOT STEMLEAF HISTOGRAM NPPLOT
 /COMPARE GROUP
 /PERCENTILES(5,10,25,50,75,90,95)
 /STATISTICS DESCRIPTIVES
 /CINTERVAL 95.

Discussion of the results: The variable doesn't have outliers (see
stem&leaf and box-plot graphs), the variable is fairly symmetric
(skweness is not important: its absolute value is below 1 and less
than twice its standard error - a simple Z test). We can use a
parametric method: One-sample t-test (to compare the sample mean with
the expected - population - mean of 0.6). Data can be then summarized
(for descriptive purposes) using the mean and the standard deviation
(SD). Don't use the Standard Error of the Mean (SEM) as a descriptive
spread measure (it isn't). The confidence interval (95%CI) for the
mean is also sometimes used as complementary information.

*** PARAMETRIC TEST ***.
T-TEST
 /TESTVAL = 0.6
 /VARIABLES = copper
 /CRITERIA = CI(.95) .

The result is significant (two-tailed p-value=0.015). If we want a
one-tailed p-value (because we have solid a priori evidences that the
difference should go only in one direction), we can halve the
two-tailed p-value given by SPSS (one-tailed=0.01548/2=0.0077 -->
0.008). This can't be done lightly (it is sometimes considered a form
of "cheating" to render significant results that only showed a
tendency towards significance), see these references for a good
discussion on the topic:

- http://www.bmj.com/cgi/content/full/309/6949/248
- http://circ.ahajournals.org/cgi/content/full/105/25/3062
See also the chapter "One-tail vs. two-tail p-values in this book:
- http://www.graphpad.com/manuals/Prism4/StatisticsGuide.pdf

*************************************

Had normality failed, we should have used a non-parametric test
instead of the t-test: Wilcoxon or sign tests. The election of one or
the other depends on symmetry. Wilcoxon test, although non-parametric,
needs symmetry (it could give false significant results on highly
assymmetric variables). Mean and SD shouldn't be used to summarize
non-normal data, median and IQR (InterQuartilic Range, the interval
formed by percentyles 25 & 75 should be used instead). See Lang&Secic
"How to Report Statistics in Medicine" (ACP series) for more details.

Althoug SPSS apparently doesn't give those tests for one sample, they
can be easily obtained "tricking" the program a bit: we only need to
compute an extra variable with the population parameter (median, in
this case).

COMPUTE PopMedian=0.6.

*** Wilcoxon test *.
NPAR TEST
 /WILCOXON=copper WITH PopMedian
 /STATISTICS QUARTILES.

SPSS will always give the asymptotic p-value for this test, even if
sample size is small (when it becomes unreliable). If the Exact Tests
module is installed and sample size - excluding ties - is below 25
(this is not the case since our sample size is 40), you can ask for
the exact p-value adding  "/METHOD=EXACT TIMER(1)" to the above
syntax. If you don't have that module, then you have to check the
significance with a Wilcoxon table, like this one:

Critical values of the Wilcoxon Matched Pairs Signed Rank Test For any
N (number of subjects minus ties) the observed value is significant At
a given level of significance if it is equal to or less than the
critical value Shown in the table below

   1-tailed test 2-tailed test
 N p<0.05 p<0.01 p<0.05 p<0.01
-------------------------------
 5    1      -      -      -
 6    2      -      1      -
 7    4      0      2      -
 8    6      2      4      0
 9    8      3      6      2
10   11      5      8      3
11   14      7     11      5
12   17     10     14      7
13   21     13     17     10
14   26     16     21     13
15   30     20     25     16
16   36     24     30     19
17   41     28     36     23
18   47     33     40     28
19   54     38     46     32
20   60     43     52     37
21   68     49     59     43
22   75     56     66     49
23   83     62     73     55
24   92     69     81     61
25  101     77     90     68
-------------------------------

* Sign test *.
NPAR TEST
 /SIGN= copper WITH PopMedian
 /STATISTICS QUARTILES.

Here SPSS will either give the exact or the asymptotic p-value,
depending on sample size.

To get a 95%CI for the median, again SPSS has to be "tricked" into it
(using RATIO).

COMPUTE One=1.

RATIO STATISTICS copper WITH One
 /PRINT = CIN(95) MEDIAN .

Of course, all these tricks can be programmed in MACROS (to create
PopMedian and One variables, run the tests and then get rid of the
useless extra variables).

Tomorrow (hopefully, we'll discuss two-samples testing, continuous
variables, paired and unpaired, both by parametric and non-parametric
methods).

                          mailto:[hidden email]


Marta Garcia-Granero, PhD
Statistician
Reply | Threaded
Open this post in threaded view
|

Re: Graduate course on SPSS - Lesson 2

Marta García-Granero
Hi Pat

CPE> Now that Marta has shown us how to 'trick' SPSS into calculating
CPE> the 95%CI for the median.  I'm wondering if there's a way to
CPE> 'trick' SPSS into calculating the 95%CI for the 5th, 10th, 25th,
CPE> 75th, and 90th percentiles, similar to SAS PROC UNIVARIATE?

Do you have access to a description of the statistical methods used by
SAS? I could try to work out some solution with SPSS

With SPSS, other solution is using boostrap. It should be quite easy
with MATRIX (I've already done some boostrapping with MATRIX, but not
for percentiles, only for means, multiple R-squared and Spearman's
rho).

The only reference I found after Googling (I mean, the only one I
could access from outside the University, right now I'm at home coping
with a cold) describes the use of the binomial distribution for that,
explains in detail how to do that for the median, and says "it could
be used for other percentiles".



--
Regards,
Dr. Marta García-Granero,PhD           mailto:[hidden email]
Statistician

---
"It is unwise to use a statistical procedure whose use one does
not understand. SPSS syntax guide cannot supply this knowledge, and it
is certainly no substitute for the basic understanding of statistics
and statistical thinking that is essential for the wise choice of
methods and the correct interpretation of their results".

(Adapted from WinPepi manual - I'm sure Joe Abrahmson will not mind)