Later than I expected, but here it comes at last:
Background of the data we are going to analyze (absolutely invented by me): Some diseases (like Wilson's disease) are associated to high urinary copper levels. Suppose we are interested in testing if a group of children (all from the same region) show abnormal copper levels. We are told that normal levels are <0.6 µmol/24 h). * Sample dataset (Exercise 1.1 from "Statistics at Square One" Swinscow&Campbell: http://www.bmj.com/collections/statsbk/) *. DATA LIST FREE/copper(F8.2). BEGIN DATA 0.70 0.45 0.72 0.30 1.16 0.69 0.83 0.74 1.24 0.77 0.65 0.76 0.42 0.94 0.36 0.98 0.64 0.90 0.63 0.55 0.78 0.10 0.52 0.42 0.58 0.62 1.12 0.86 0.74 1.04 0.65 0.66 0.81 0.48 0.85 0.75 0.73 0.50 0.34 0.88 END DATA. VAR LABEL copper 'Urinary copper levels (µmol/24 h)'. First, we have to check if the variable is normally distributed (to choose between a parametric or a non-parametric method). We'll run a full Exploratory Data Analysis (EDA) on the variable: descriptives (mean, SD, skewness, kurtosis...), percentyles, graphics (histogram - useful if sample size is not too small -, stem&leaf, box-plot - both are good to check for outliers) and normality tests (Kolmogorov-Smirnov with Lilliefors correction and Shapiro-Wilk), both are given when NPPLOT is added to to the /PLOT subcommand. VERY IMPORTANT: Don't use one sample Kolmogorov-Smirnov(*), since it has very low power to detect non-normality. (*) NPAR TESTS /K-S(NORMAL)= copper. *** EDA ***. EXAMINE VARIABLES=copper /PLOT BOXPLOT STEMLEAF HISTOGRAM NPPLOT /COMPARE GROUP /PERCENTILES(5,10,25,50,75,90,95) /STATISTICS DESCRIPTIVES /CINTERVAL 95. Discussion of the results: The variable doesn't have outliers (see stem&leaf and box-plot graphs), the variable is fairly symmetric (skweness is not important: its absolute value is below 1 and less than twice its standard error - a simple Z test). We can use a parametric method: One-sample t-test (to compare the sample mean with the expected - population - mean of 0.6). Data can be then summarized (for descriptive purposes) using the mean and the standard deviation (SD). Don't use the Standard Error of the Mean (SEM) as a descriptive spread measure (it isn't). The confidence interval (95%CI) for the mean is also sometimes used as complementary information. *** PARAMETRIC TEST ***. T-TEST /TESTVAL = 0.6 /VARIABLES = copper /CRITERIA = CI(.95) . The result is significant (two-tailed p-value=0.015). If we want a one-tailed p-value (because we have solid a priori evidences that the difference should go only in one direction), we can halve the two-tailed p-value given by SPSS (one-tailed=0.01548/2=0.0077 --> 0.008). This can't be done lightly (it is sometimes considered a form of "cheating" to render significant results that only showed a tendency towards significance), see these references for a good discussion on the topic: - http://www.bmj.com/cgi/content/full/309/6949/248 - http://circ.ahajournals.org/cgi/content/full/105/25/3062 See also the chapter "One-tail vs. two-tail p-values in this book: - http://www.graphpad.com/manuals/Prism4/StatisticsGuide.pdf ************************************* Had normality failed, we should have used a non-parametric test instead of the t-test: Wilcoxon or sign tests. The election of one or the other depends on symmetry. Wilcoxon test, although non-parametric, needs symmetry (it could give false significant results on highly assymmetric variables). Mean and SD shouldn't be used to summarize non-normal data, median and IQR (InterQuartilic Range, the interval formed by percentyles 25 & 75 should be used instead). See Lang&Secic "How to Report Statistics in Medicine" (ACP series) for more details. Althoug SPSS apparently doesn't give those tests for one sample, they can be easily obtained "tricking" the program a bit: we only need to compute an extra variable with the population parameter (median, in this case). COMPUTE PopMedian=0.6. *** Wilcoxon test *. NPAR TEST /WILCOXON=copper WITH PopMedian /STATISTICS QUARTILES. SPSS will always give the asymptotic p-value for this test, even if sample size is small (when it becomes unreliable). If the Exact Tests module is installed and sample size - excluding ties - is below 25 (this is not the case since our sample size is 40), you can ask for the exact p-value adding "/METHOD=EXACT TIMER(1)" to the above syntax. If you don't have that module, then you have to check the significance with a Wilcoxon table, like this one: Critical values of the Wilcoxon Matched Pairs Signed Rank Test For any N (number of subjects minus ties) the observed value is significant At a given level of significance if it is equal to or less than the critical value Shown in the table below 1-tailed test 2-tailed test N p<0.05 p<0.01 p<0.05 p<0.01 ------------------------------- 5 1 - - - 6 2 - 1 - 7 4 0 2 - 8 6 2 4 0 9 8 3 6 2 10 11 5 8 3 11 14 7 11 5 12 17 10 14 7 13 21 13 17 10 14 26 16 21 13 15 30 20 25 16 16 36 24 30 19 17 41 28 36 23 18 47 33 40 28 19 54 38 46 32 20 60 43 52 37 21 68 49 59 43 22 75 56 66 49 23 83 62 73 55 24 92 69 81 61 25 101 77 90 68 ------------------------------- * Sign test *. NPAR TEST /SIGN= copper WITH PopMedian /STATISTICS QUARTILES. Here SPSS will either give the exact or the asymptotic p-value, depending on sample size. To get a 95%CI for the median, again SPSS has to be "tricked" into it (using RATIO). COMPUTE One=1. RATIO STATISTICS copper WITH One /PRINT = CIN(95) MEDIAN . Of course, all these tricks can be programmed in MACROS (to create PopMedian and One variables, run the tests and then get rid of the useless extra variables). Tomorrow (hopefully, we'll discuss two-samples testing, continuous variables, paired and unpaired, both by parametric and non-parametric methods). mailto:[hidden email] Marta Garcia-Granero, PhD Statistician |
Hi Marta,
I cant find a post on "Graduate course on SPSS - Lesson 1" in the archives. Was there a post on "Graduate course on SPSS - Lesson 1" ? Thanks. Edward. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of Marta García-Granero Sent: Thursday, January 25, 2007 6:24 AM To: [hidden email] Subject: Graduate course on SPSS - Lesson 2 Later than I expected, but here it comes at last: Background of the data we are going to analyze (absolutely invented by me): Some diseases (like Wilson's disease) are associated to high urinary copper levels. Suppose we are interested in testing if a group of children (all from the same region) show abnormal copper levels. We are told that normal levels are <0.6 µmol/24 h). * Sample dataset (Exercise 1.1 from "Statistics at Square One" Swinscow&Campbell: http://www.bmj.com/collections/statsbk/) *. DATA LIST FREE/copper(F8.2). BEGIN DATA 0.70 0.45 0.72 0.30 1.16 0.69 0.83 0.74 1.24 0.77 0.65 0.76 0.42 0.94 0.36 0.98 0.64 0.90 0.63 0.55 0.78 0.10 0.52 0.42 0.58 0.62 1.12 0.86 0.74 1.04 0.65 0.66 0.81 0.48 0.85 0.75 0.73 0.50 0.34 0.88 END DATA. VAR LABEL copper 'Urinary copper levels (µmol/24 h)'. First, we have to check if the variable is normally distributed (to choose between a parametric or a non-parametric method). We'll run a full Exploratory Data Analysis (EDA) on the variable: descriptives (mean, SD, skewness, kurtosis...), percentyles, graphics (histogram - useful if sample size is not too small -, stem&leaf, box-plot - both are good to check for outliers) and normality tests (Kolmogorov-Smirnov with Lilliefors correction and Shapiro-Wilk), both are given when NPPLOT is added to to the /PLOT subcommand. VERY IMPORTANT: Don't use one sample Kolmogorov-Smirnov(*), since it has very low power to detect non-normality. (*) NPAR TESTS /K-S(NORMAL)= copper. *** EDA ***. EXAMINE VARIABLES=copper /PLOT BOXPLOT STEMLEAF HISTOGRAM NPPLOT /COMPARE GROUP /PERCENTILES(5,10,25,50,75,90,95) /STATISTICS DESCRIPTIVES /CINTERVAL 95. Discussion of the results: The variable doesn't have outliers (see stem&leaf and box-plot graphs), the variable is fairly symmetric (skweness is not important: its absolute value is below 1 and less than twice its standard error - a simple Z test). We can use a parametric method: One-sample t-test (to compare the sample mean with the expected - population - mean of 0.6). Data can be then summarized (for descriptive purposes) using the mean and the standard deviation (SD). Don't use the Standard Error of the Mean (SEM) as a descriptive spread measure (it isn't). The confidence interval (95%CI) for the mean is also sometimes used as complementary information. *** PARAMETRIC TEST ***. T-TEST /TESTVAL = 0.6 /VARIABLES = copper /CRITERIA = CI(.95) . The result is significant (two-tailed p-value=0.015). If we want a one-tailed p-value (because we have solid a priori evidences that the difference should go only in one direction), we can halve the two-tailed p-value given by SPSS (one-tailed=0.01548/2=0.0077 --> 0.008). This can't be done lightly (it is sometimes considered a form of "cheating" to render significant results that only showed a tendency towards significance), see these references for a good discussion on the topic: - http://www.bmj.com/cgi/content/full/309/6949/248 - http://circ.ahajournals.org/cgi/content/full/105/25/3062 See also the chapter "One-tail vs. two-tail p-values in this book: - http://www.graphpad.com/manuals/Prism4/StatisticsGuide.pdf ************************************* Had normality failed, we should have used a non-parametric test instead of the t-test: Wilcoxon or sign tests. The election of one or the other depends on symmetry. Wilcoxon test, although non-parametric, needs symmetry (it could give false significant results on highly assymmetric variables). Mean and SD shouldn't be used to summarize non-normal data, median and IQR (InterQuartilic Range, the interval formed by percentyles 25 & 75 should be used instead). See Lang&Secic "How to Report Statistics in Medicine" (ACP series) for more details. Althoug SPSS apparently doesn't give those tests for one sample, they can be easily obtained "tricking" the program a bit: we only need to compute an extra variable with the population parameter (median, in this case). COMPUTE PopMedian=0.6. *** Wilcoxon test *. NPAR TEST /WILCOXON=copper WITH PopMedian /STATISTICS QUARTILES. SPSS will always give the asymptotic p-value for this test, even if sample size is small (when it becomes unreliable). If the Exact Tests module is installed and sample size - excluding ties - is below 25 (this is not the case since our sample size is 40), you can ask for the exact p-value adding "/METHOD=EXACT TIMER(1)" to the above syntax. If you don't have that module, then you have to check the significance with a Wilcoxon table, like this one: Critical values of the Wilcoxon Matched Pairs Signed Rank Test For any N (number of subjects minus ties) the observed value is significant At a given level of significance if it is equal to or less than the critical value Shown in the table below 1-tailed test 2-tailed test N p<0.05 p<0.01 p<0.05 p<0.01 ------------------------------- 5 1 - - - 6 2 - 1 - 7 4 0 2 - 8 6 2 4 0 9 8 3 6 2 10 11 5 8 3 11 14 7 11 5 12 17 10 14 7 13 21 13 17 10 14 26 16 21 13 15 30 20 25 16 16 36 24 30 19 17 41 28 36 23 18 47 33 40 28 19 54 38 46 32 20 60 43 52 37 21 68 49 59 43 22 75 56 66 49 23 83 62 73 55 24 92 69 81 61 25 101 77 90 68 ------------------------------- * Sign test *. NPAR TEST /SIGN= copper WITH PopMedian /STATISTICS QUARTILES. Here SPSS will either give the exact or the asymptotic p-value, depending on sample size. To get a 95%CI for the median, again SPSS has to be "tricked" into it (using RATIO). COMPUTE One=1. RATIO STATISTICS copper WITH One /PRINT = CIN(95) MEDIAN . Of course, all these tricks can be programmed in MACROS (to create PopMedian and One variables, run the tests and then get rid of the useless extra variables). Tomorrow (hopefully, we'll discuss two-samples testing, continuous variables, paired and unpaired, both by parametric and non-parametric methods). mailto:[hidden email] Marta Garcia-Granero, PhD Statistician |
Hi Edward
Yes, there was (january 22th), and exactly with that name. Since I got some replies to it, it was distributed. If you still can't find it, tell me and I'll send it to you again privately. EB> I cant find a post on "Graduate course on SPSS - Lesson 1" in the EB> archives. Was there a post on "Graduate course on SPSS - Lesson EB> 1"? Regards, Marta |
In reply to this post by Marta García-Granero
Hi Marta,
Melissa has given me a copy of the said post. Thanks. Edward. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of Marta García-Granero Sent: Thursday, January 25, 2007 9:56 AM To: [hidden email] Subject: Re: Graduate course on SPSS - Lesson 2 Hi Edward Yes, there was (january 22th), and exactly with that name. Since I got some replies to it, it was distributed. If you still can't find it, tell me and I'll send it to you again privately. EB> I cant find a post on "Graduate course on SPSS - Lesson 1" in the EB> archives. Was there a post on "Graduate course on SPSS - Lesson EB> 1"? Regards, Marta |
Hi everybody
The MACROS I mentioned (to ease the task of running non-parametric one sample tests): DEFINE OneSampleWilcoxon(!POS=!TOKENS(1)/ median=!TOKENS(1)/ EXACT=!DEFAULT(NO) !TOKENS(1)). TEMPORARY. COMPUTE PopMedian=!median. !IF (!UPCASE(!!EXACT) !EQ 'NO') !THEN NPAR TEST /WILCOXON=!1 WITH PopMedian /STATISTICS QUARTILES. !ELSE NPAR TEST /WILCOXON=!1 WITH PopMedian /STATISTICS QUARTILES /METHOD=EXACT TIMER(1). !IFEND. !ENDDEFINE. DEFINE OneSampleSign(!POS=!TOKENS(1)/ median=!TOKENS(1)). TEMPORARY. COMPUTE PopMedian=!median. NPAR TEST /SIGN=!1 WITH PopMedian /STATISTICS QUARTILES. !ENDDEFINE. DEFINE OneMedianCI(!POS=!TOKENS(1)/ !POS=!DEFAULT(95) !TOKENS(1)). TEMPORARY. COMPUTE one=1. RATIO STATISTICS !1 WITH one /PRINT = CIN(!2) MEDIAN . !ENDDEFINE. * MACRO calls (on the Copper dataset) *. OneSampleWilcoxon copper median=0.6. OneSampleWilcoxon copper median=0.6 EXACT=Yes. OneSampleSign copper median=0.6. OneMedianCI copper. OneMedianCI copper 99. Regards, Marta (Working on next chapter) |
In reply to this post by Marta García-Granero
Marta:
I wanted to thank you for sharing your knowledge with all. Your inputs, responses, contributions and efforts are highly appreciated. Thanks, Manmit -----Original Message----- From: SPSSX(r) Discussion on behalf of Marta García-Granero Sent: Thu 1/25/2007 4:54 PM To: [hidden email] Cc: Subject: Graduate course on SPSS - Lesson 2 Later than I expected, but here it comes at last: Background of the data we are going to analyze (absolutely invented by me): Some diseases (like Wilson's disease) are associated to high urinary copper levels. Suppose we are interested in testing if a group of children (all from the same region) show abnormal copper levels. We are told that normal levels are <0.6 µmol/24 h). * Sample dataset (Exercise 1.1 from "Statistics at Square One" Swinscow&Campbell: http://www.bmj.com/collections/statsbk/) *. DATA LIST FREE/copper(F8.2). BEGIN DATA 0.70 0.45 0.72 0.30 1.16 0.69 0.83 0.74 1.24 0.77 0.65 0.76 0.42 0.94 0.36 0.98 0.64 0.90 0.63 0.55 0.78 0.10 0.52 0.42 0.58 0.62 1.12 0.86 0.74 1.04 0.65 0.66 0.81 0.48 0.85 0.75 0.73 0.50 0.34 0.88 END DATA. VAR LABEL copper 'Urinary copper levels (µmol/24 h)'. First, we have to check if the variable is normally distributed (to choose between a parametric or a non-parametric method). We'll run a full Exploratory Data Analysis (EDA) on the variable: descriptives (mean, SD, skewness, kurtosis...), percentyles, graphics (histogram - useful if sample size is not too small -, stem&leaf, box-plot - both are good to check for outliers) and normality tests (Kolmogorov-Smirnov with Lilliefors correction and Shapiro-Wilk), both are given when NPPLOT is added to to the /PLOT subcommand. VERY IMPORTANT: Don't use one sample Kolmogorov-Smirnov(*), since it has very low power to detect non-normality. (*) NPAR TESTS /K-S(NORMAL)= copper. *** EDA ***. EXAMINE VARIABLES=copper /PLOT BOXPLOT STEMLEAF HISTOGRAM NPPLOT /COMPARE GROUP /PERCENTILES(5,10,25,50,75,90,95) /STATISTICS DESCRIPTIVES /CINTERVAL 95. Discussion of the results: The variable doesn't have outliers (see stem&leaf and box-plot graphs), the variable is fairly symmetric (skweness is not important: its absolute value is below 1 and less than twice its standard error - a simple Z test). We can use a parametric method: One-sample t-test (to compare the sample mean with the expected - population - mean of 0.6). Data can be then summarized (for descriptive purposes) using the mean and the standard deviation (SD). Don't use the Standard Error of the Mean (SEM) as a descriptive spread measure (it isn't). The confidence interval (95%CI) for the mean is also sometimes used as complementary information. *** PARAMETRIC TEST ***. T-TEST /TESTVAL = 0.6 /VARIABLES = copper /CRITERIA = CI(.95) . The result is significant (two-tailed p-value=0.015). If we want a one-tailed p-value (because we have solid a priori evidences that the difference should go only in one direction), we can halve the two-tailed p-value given by SPSS (one-tailed=0.01548/2=0.0077 --> 0.008). This can't be done lightly (it is sometimes considered a form of "cheating" to render significant results that only showed a tendency towards significance), see these references for a good discussion on the topic: - http://www.bmj.com/cgi/content/full/309/6949/248 - http://circ.ahajournals.org/cgi/content/full/105/25/3062 See also the chapter "One-tail vs. two-tail p-values in this book: - http://www.graphpad.com/manuals/Prism4/StatisticsGuide.pdf ************************************* Had normality failed, we should have used a non-parametric test instead of the t-test: Wilcoxon or sign tests. The election of one or the other depends on symmetry. Wilcoxon test, although non-parametric, needs symmetry (it could give false significant results on highly assymmetric variables). Mean and SD shouldn't be used to summarize non-normal data, median and IQR (InterQuartilic Range, the interval formed by percentyles 25 & 75 should be used instead). See Lang&Secic "How to Report Statistics in Medicine" (ACP series) for more details. Althoug SPSS apparently doesn't give those tests for one sample, they can be easily obtained "tricking" the program a bit: we only need to compute an extra variable with the population parameter (median, in this case). COMPUTE PopMedian=0.6. *** Wilcoxon test *. NPAR TEST /WILCOXON=copper WITH PopMedian /STATISTICS QUARTILES. SPSS will always give the asymptotic p-value for this test, even if sample size is small (when it becomes unreliable). If the Exact Tests module is installed and sample size - excluding ties - is below 25 (this is not the case since our sample size is 40), you can ask for the exact p-value adding "/METHOD=EXACT TIMER(1)" to the above syntax. If you don't have that module, then you have to check the significance with a Wilcoxon table, like this one: Critical values of the Wilcoxon Matched Pairs Signed Rank Test For any N (number of subjects minus ties) the observed value is significant At a given level of significance if it is equal to or less than the critical value Shown in the table below 1-tailed test 2-tailed test N p<0.05 p<0.01 p<0.05 p<0.01 ------------------------------- 5 1 - - - 6 2 - 1 - 7 4 0 2 - 8 6 2 4 0 9 8 3 6 2 10 11 5 8 3 11 14 7 11 5 12 17 10 14 7 13 21 13 17 10 14 26 16 21 13 15 30 20 25 16 16 36 24 30 19 17 41 28 36 23 18 47 33 40 28 19 54 38 46 32 20 60 43 52 37 21 68 49 59 43 22 75 56 66 49 23 83 62 73 55 24 92 69 81 61 25 101 77 90 68 ------------------------------- * Sign test *. NPAR TEST /SIGN= copper WITH PopMedian /STATISTICS QUARTILES. Here SPSS will either give the exact or the asymptotic p-value, depending on sample size. To get a 95%CI for the median, again SPSS has to be "tricked" into it (using RATIO). COMPUTE One=1. RATIO STATISTICS copper WITH One /PRINT = CIN(95) MEDIAN . Of course, all these tricks can be programmed in MACROS (to create PopMedian and One variables, run the tests and then get rid of the useless extra variables). Tomorrow (hopefully, we'll discuss two-samples testing, continuous variables, paired and unpaired, both by parametric and non-parametric methods). mailto:[hidden email] Marta Garcia-Granero, PhD Statistician |
Ditto that!
Mike -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Manmit Shrimali Sent: Thursday, January 25, 2007 10:29 AM To: [hidden email] Subject: Re: Graduate course on SPSS - Lesson 2 Marta: I wanted to thank you for sharing your knowledge with all. Your inputs, responses, contributions and efforts are highly appreciated. Thanks, Manmit -----Original Message----- From: SPSSX(r) Discussion on behalf of Marta García-Granero Sent: Thu 1/25/2007 4:54 PM To: [hidden email] Cc: Subject: Graduate course on SPSS - Lesson 2 Later than I expected, but here it comes at last: Background of the data we are going to analyze (absolutely invented by me): Some diseases (like Wilson's disease) are associated to high urinary copper levels. Suppose we are interested in testing if a group of children (all from the same region) show abnormal copper levels. We are told that normal levels are <0.6 µmol/24 h). * Sample dataset (Exercise 1.1 from "Statistics at Square One" Swinscow&Campbell: http://www.bmj.com/collections/statsbk/) *. DATA LIST FREE/copper(F8.2). BEGIN DATA 0.70 0.45 0.72 0.30 1.16 0.69 0.83 0.74 1.24 0.77 0.65 0.76 0.42 0.94 0.36 0.98 0.64 0.90 0.63 0.55 0.78 0.10 0.52 0.42 0.58 0.62 1.12 0.86 0.74 1.04 0.65 0.66 0.81 0.48 0.85 0.75 0.73 0.50 0.34 0.88 END DATA. VAR LABEL copper 'Urinary copper levels (µmol/24 h)'. First, we have to check if the variable is normally distributed (to choose between a parametric or a non-parametric method). We'll run a full Exploratory Data Analysis (EDA) on the variable: descriptives (mean, SD, skewness, kurtosis...), percentyles, graphics (histogram - useful if sample size is not too small -, stem&leaf, box-plot - both are good to check for outliers) and normality tests (Kolmogorov-Smirnov with Lilliefors correction and Shapiro-Wilk), both are given when NPPLOT is added to to the /PLOT subcommand. VERY IMPORTANT: Don't use one sample Kolmogorov-Smirnov(*), since it has very low power to detect non-normality. (*) NPAR TESTS /K-S(NORMAL)= copper. *** EDA ***. EXAMINE VARIABLES=copper /PLOT BOXPLOT STEMLEAF HISTOGRAM NPPLOT /COMPARE GROUP /PERCENTILES(5,10,25,50,75,90,95) /STATISTICS DESCRIPTIVES /CINTERVAL 95. Discussion of the results: The variable doesn't have outliers (see stem&leaf and box-plot graphs), the variable is fairly symmetric (skweness is not important: its absolute value is below 1 and less than twice its standard error - a simple Z test). We can use a parametric method: One-sample t-test (to compare the sample mean with the expected - population - mean of 0.6). Data can be then summarized (for descriptive purposes) using the mean and the standard deviation (SD). Don't use the Standard Error of the Mean (SEM) as a descriptive spread measure (it isn't). The confidence interval (95%CI) for the mean is also sometimes used as complementary information. *** PARAMETRIC TEST ***. T-TEST /TESTVAL = 0.6 /VARIABLES = copper /CRITERIA = CI(.95) . The result is significant (two-tailed p-value=0.015). If we want a one-tailed p-value (because we have solid a priori evidences that the difference should go only in one direction), we can halve the two-tailed p-value given by SPSS (one-tailed=0.01548/2=0.0077 --> 0.008). This can't be done lightly (it is sometimes considered a form of "cheating" to render significant results that only showed a tendency towards significance), see these references for a good discussion on the topic: - http://www.bmj.com/cgi/content/full/309/6949/248 - http://circ.ahajournals.org/cgi/content/full/105/25/3062 See also the chapter "One-tail vs. two-tail p-values in this book: - http://www.graphpad.com/manuals/Prism4/StatisticsGuide.pdf ************************************* Had normality failed, we should have used a non-parametric test instead of the t-test: Wilcoxon or sign tests. The election of one or the other depends on symmetry. Wilcoxon test, although non-parametric, needs symmetry (it could give false significant results on highly assymmetric variables). Mean and SD shouldn't be used to summarize non-normal data, median and IQR (InterQuartilic Range, the interval formed by percentyles 25 & 75 should be used instead). See Lang&Secic "How to Report Statistics in Medicine" (ACP series) for more details. Althoug SPSS apparently doesn't give those tests for one sample, they can be easily obtained "tricking" the program a bit: we only need to compute an extra variable with the population parameter (median, in this case). COMPUTE PopMedian=0.6. *** Wilcoxon test *. NPAR TEST /WILCOXON=copper WITH PopMedian /STATISTICS QUARTILES. SPSS will always give the asymptotic p-value for this test, even if sample size is small (when it becomes unreliable). If the Exact Tests module is installed and sample size - excluding ties - is below 25 (this is not the case since our sample size is 40), you can ask for the exact p-value adding "/METHOD=EXACT TIMER(1)" to the above syntax. If you don't have that module, then you have to check the significance with a Wilcoxon table, like this one: Critical values of the Wilcoxon Matched Pairs Signed Rank Test For any N (number of subjects minus ties) the observed value is significant At a given level of significance if it is equal to or less than the critical value Shown in the table below 1-tailed test 2-tailed test N p<0.05 p<0.01 p<0.05 p<0.01 ------------------------------- 5 1 - - - 6 2 - 1 - 7 4 0 2 - 8 6 2 4 0 9 8 3 6 2 10 11 5 8 3 11 14 7 11 5 12 17 10 14 7 13 21 13 17 10 14 26 16 21 13 15 30 20 25 16 16 36 24 30 19 17 41 28 36 23 18 47 33 40 28 19 54 38 46 32 20 60 43 52 37 21 68 49 59 43 22 75 56 66 49 23 83 62 73 55 24 92 69 81 61 25 101 77 90 68 ------------------------------- * Sign test *. NPAR TEST /SIGN= copper WITH PopMedian /STATISTICS QUARTILES. Here SPSS will either give the exact or the asymptotic p-value, depending on sample size. To get a 95%CI for the median, again SPSS has to be "tricked" into it (using RATIO). COMPUTE One=1. RATIO STATISTICS copper WITH One /PRINT = CIN(95) MEDIAN . Of course, all these tricks can be programmed in MACROS (to create PopMedian and One variables, run the tests and then get rid of the useless extra variables). Tomorrow (hopefully, we'll discuss two-samples testing, continuous variables, paired and unpaired, both by parametric and non-parametric methods). mailto:[hidden email] Marta Garcia-Granero, PhD Statistician |
I hope it is not now the entire list that will thank Marta, one at a
time! We are all grateful, and she knows it. More generally, in the interest of parsimony I should recommend that messages conveying just a sentiment of gratefulness for help received are better sent privately; they may (or even should) be sent to the list only if they contain some value added, such as a summary of the discussion, conclusions reached or suchlike. Hector -----Mensaje original----- De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Roberts, Michael Enviado el: 25 January 2007 12:43 Para: [hidden email] Asunto: Re: Graduate course on SPSS - Lesson 2 Ditto that! Mike -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Manmit Shrimali Sent: Thursday, January 25, 2007 10:29 AM To: [hidden email] Subject: Re: Graduate course on SPSS - Lesson 2 Marta: I wanted to thank you for sharing your knowledge with all. Your inputs, responses, contributions and efforts are highly appreciated. Thanks, Manmit -----Original Message----- From: SPSSX(r) Discussion on behalf of Marta García-Granero Sent: Thu 1/25/2007 4:54 PM To: [hidden email] Cc: Subject: Graduate course on SPSS - Lesson 2 Later than I expected, but here it comes at last: Background of the data we are going to analyze (absolutely invented by me): Some diseases (like Wilson's disease) are associated to high urinary copper levels. Suppose we are interested in testing if a group of children (all from the same region) show abnormal copper levels. We are told that normal levels are <0.6 µmol/24 h). * Sample dataset (Exercise 1.1 from "Statistics at Square One" Swinscow&Campbell: http://www.bmj.com/collections/statsbk/) *. DATA LIST FREE/copper(F8.2). BEGIN DATA 0.70 0.45 0.72 0.30 1.16 0.69 0.83 0.74 1.24 0.77 0.65 0.76 0.42 0.94 0.36 0.98 0.64 0.90 0.63 0.55 0.78 0.10 0.52 0.42 0.58 0.62 1.12 0.86 0.74 1.04 0.65 0.66 0.81 0.48 0.85 0.75 0.73 0.50 0.34 0.88 END DATA. VAR LABEL copper 'Urinary copper levels (µmol/24 h)'. First, we have to check if the variable is normally distributed (to choose between a parametric or a non-parametric method). We'll run a full Exploratory Data Analysis (EDA) on the variable: descriptives (mean, SD, skewness, kurtosis...), percentyles, graphics (histogram - useful if sample size is not too small -, stem&leaf, box-plot - both are good to check for outliers) and normality tests (Kolmogorov-Smirnov with Lilliefors correction and Shapiro-Wilk), both are given when NPPLOT is added to to the /PLOT subcommand. VERY IMPORTANT: Don't use one sample Kolmogorov-Smirnov(*), since it has very low power to detect non-normality. (*) NPAR TESTS /K-S(NORMAL)= copper. *** EDA ***. EXAMINE VARIABLES=copper /PLOT BOXPLOT STEMLEAF HISTOGRAM NPPLOT /COMPARE GROUP /PERCENTILES(5,10,25,50,75,90,95) /STATISTICS DESCRIPTIVES /CINTERVAL 95. Discussion of the results: The variable doesn't have outliers (see stem&leaf and box-plot graphs), the variable is fairly symmetric (skweness is not important: its absolute value is below 1 and less than twice its standard error - a simple Z test). We can use a parametric method: One-sample t-test (to compare the sample mean with the expected - population - mean of 0.6). Data can be then summarized (for descriptive purposes) using the mean and the standard deviation (SD). Don't use the Standard Error of the Mean (SEM) as a descriptive spread measure (it isn't). The confidence interval (95%CI) for the mean is also sometimes used as complementary information. *** PARAMETRIC TEST ***. T-TEST /TESTVAL = 0.6 /VARIABLES = copper /CRITERIA = CI(.95) . The result is significant (two-tailed p-value=0.015). If we want a one-tailed p-value (because we have solid a priori evidences that the difference should go only in one direction), we can halve the two-tailed p-value given by SPSS (one-tailed=0.01548/2=0.0077 --> 0.008). This can't be done lightly (it is sometimes considered a form of "cheating" to render significant results that only showed a tendency towards significance), see these references for a good discussion on the topic: - http://www.bmj.com/cgi/content/full/309/6949/248 - http://circ.ahajournals.org/cgi/content/full/105/25/3062 See also the chapter "One-tail vs. two-tail p-values in this book: - http://www.graphpad.com/manuals/Prism4/StatisticsGuide.pdf ************************************* Had normality failed, we should have used a non-parametric test instead of the t-test: Wilcoxon or sign tests. The election of one or the other depends on symmetry. Wilcoxon test, although non-parametric, needs symmetry (it could give false significant results on highly assymmetric variables). Mean and SD shouldn't be used to summarize non-normal data, median and IQR (InterQuartilic Range, the interval formed by percentyles 25 & 75 should be used instead). See Lang&Secic "How to Report Statistics in Medicine" (ACP series) for more details. Althoug SPSS apparently doesn't give those tests for one sample, they can be easily obtained "tricking" the program a bit: we only need to compute an extra variable with the population parameter (median, in this case). COMPUTE PopMedian=0.6. *** Wilcoxon test *. NPAR TEST /WILCOXON=copper WITH PopMedian /STATISTICS QUARTILES. SPSS will always give the asymptotic p-value for this test, even if sample size is small (when it becomes unreliable). If the Exact Tests module is installed and sample size - excluding ties - is below 25 (this is not the case since our sample size is 40), you can ask for the exact p-value adding "/METHOD=EXACT TIMER(1)" to the above syntax. If you don't have that module, then you have to check the significance with a Wilcoxon table, like this one: Critical values of the Wilcoxon Matched Pairs Signed Rank Test For any N (number of subjects minus ties) the observed value is significant At a given level of significance if it is equal to or less than the critical value Shown in the table below 1-tailed test 2-tailed test N p<0.05 p<0.01 p<0.05 p<0.01 ------------------------------- 5 1 - - - 6 2 - 1 - 7 4 0 2 - 8 6 2 4 0 9 8 3 6 2 10 11 5 8 3 11 14 7 11 5 12 17 10 14 7 13 21 13 17 10 14 26 16 21 13 15 30 20 25 16 16 36 24 30 19 17 41 28 36 23 18 47 33 40 28 19 54 38 46 32 20 60 43 52 37 21 68 49 59 43 22 75 56 66 49 23 83 62 73 55 24 92 69 81 61 25 101 77 90 68 ------------------------------- * Sign test *. NPAR TEST /SIGN= copper WITH PopMedian /STATISTICS QUARTILES. Here SPSS will either give the exact or the asymptotic p-value, depending on sample size. To get a 95%CI for the median, again SPSS has to be "tricked" into it (using RATIO). COMPUTE One=1. RATIO STATISTICS copper WITH One /PRINT = CIN(95) MEDIAN . Of course, all these tricks can be programmed in MACROS (to create PopMedian and One variables, run the tests and then get rid of the useless extra variables). Tomorrow (hopefully, we'll discuss two-samples testing, continuous variables, paired and unpaired, both by parametric and non-parametric methods). mailto:[hidden email] Marta Garcia-Granero, PhD Statistician |
In reply to this post by Marta García-Granero
Now that Marta has shown us how to 'trick' SPSS into calculating the 95%CI for the median. I'm wondering if there's a way to 'trick' SPSS into calculating the 95%CI for the 5th, 10th, 25th, 75th, and 90th percentiles, similar to SAS PROC UNIVARIATE?
Thanks Pat -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Marta García-Granero Sent: January 25, 2007 6:24 AM To: [hidden email] Subject: Graduate course on SPSS - Lesson 2 Later than I expected, but here it comes at last: Background of the data we are going to analyze (absolutely invented by me): Some diseases (like Wilson's disease) are associated to high urinary copper levels. Suppose we are interested in testing if a group of children (all from the same region) show abnormal copper levels. We are told that normal levels are <0.6 µmol/24 h). * Sample dataset (Exercise 1.1 from "Statistics at Square One" Swinscow&Campbell: http://www.bmj.com/collections/statsbk/) *. DATA LIST FREE/copper(F8.2). BEGIN DATA 0.70 0.45 0.72 0.30 1.16 0.69 0.83 0.74 1.24 0.77 0.65 0.76 0.42 0.94 0.36 0.98 0.64 0.90 0.63 0.55 0.78 0.10 0.52 0.42 0.58 0.62 1.12 0.86 0.74 1.04 0.65 0.66 0.81 0.48 0.85 0.75 0.73 0.50 0.34 0.88 END DATA. VAR LABEL copper 'Urinary copper levels (µmol/24 h)'. First, we have to check if the variable is normally distributed (to choose between a parametric or a non-parametric method). We'll run a full Exploratory Data Analysis (EDA) on the variable: descriptives (mean, SD, skewness, kurtosis...), percentyles, graphics (histogram - useful if sample size is not too small -, stem&leaf, box-plot - both are good to check for outliers) and normality tests (Kolmogorov-Smirnov with Lilliefors correction and Shapiro-Wilk), both are given when NPPLOT is added to to the /PLOT subcommand. VERY IMPORTANT: Don't use one sample Kolmogorov-Smirnov(*), since it has very low power to detect non-normality. (*) NPAR TESTS /K-S(NORMAL)= copper. *** EDA ***. EXAMINE VARIABLES=copper /PLOT BOXPLOT STEMLEAF HISTOGRAM NPPLOT /COMPARE GROUP /PERCENTILES(5,10,25,50,75,90,95) /STATISTICS DESCRIPTIVES /CINTERVAL 95. Discussion of the results: The variable doesn't have outliers (see stem&leaf and box-plot graphs), the variable is fairly symmetric (skweness is not important: its absolute value is below 1 and less than twice its standard error - a simple Z test). We can use a parametric method: One-sample t-test (to compare the sample mean with the expected - population - mean of 0.6). Data can be then summarized (for descriptive purposes) using the mean and the standard deviation (SD). Don't use the Standard Error of the Mean (SEM) as a descriptive spread measure (it isn't). The confidence interval (95%CI) for the mean is also sometimes used as complementary information. *** PARAMETRIC TEST ***. T-TEST /TESTVAL = 0.6 /VARIABLES = copper /CRITERIA = CI(.95) . The result is significant (two-tailed p-value=0.015). If we want a one-tailed p-value (because we have solid a priori evidences that the difference should go only in one direction), we can halve the two-tailed p-value given by SPSS (one-tailed=0.01548/2=0.0077 --> 0.008). This can't be done lightly (it is sometimes considered a form of "cheating" to render significant results that only showed a tendency towards significance), see these references for a good discussion on the topic: - http://www.bmj.com/cgi/content/full/309/6949/248 - http://circ.ahajournals.org/cgi/content/full/105/25/3062 See also the chapter "One-tail vs. two-tail p-values in this book: - http://www.graphpad.com/manuals/Prism4/StatisticsGuide.pdf ************************************* Had normality failed, we should have used a non-parametric test instead of the t-test: Wilcoxon or sign tests. The election of one or the other depends on symmetry. Wilcoxon test, although non-parametric, needs symmetry (it could give false significant results on highly assymmetric variables). Mean and SD shouldn't be used to summarize non-normal data, median and IQR (InterQuartilic Range, the interval formed by percentyles 25 & 75 should be used instead). See Lang&Secic "How to Report Statistics in Medicine" (ACP series) for more details. Althoug SPSS apparently doesn't give those tests for one sample, they can be easily obtained "tricking" the program a bit: we only need to compute an extra variable with the population parameter (median, in this case). COMPUTE PopMedian=0.6. *** Wilcoxon test *. NPAR TEST /WILCOXON=copper WITH PopMedian /STATISTICS QUARTILES. SPSS will always give the asymptotic p-value for this test, even if sample size is small (when it becomes unreliable). If the Exact Tests module is installed and sample size - excluding ties - is below 25 (this is not the case since our sample size is 40), you can ask for the exact p-value adding "/METHOD=EXACT TIMER(1)" to the above syntax. If you don't have that module, then you have to check the significance with a Wilcoxon table, like this one: Critical values of the Wilcoxon Matched Pairs Signed Rank Test For any N (number of subjects minus ties) the observed value is significant At a given level of significance if it is equal to or less than the critical value Shown in the table below 1-tailed test 2-tailed test N p<0.05 p<0.01 p<0.05 p<0.01 ------------------------------- 5 1 - - - 6 2 - 1 - 7 4 0 2 - 8 6 2 4 0 9 8 3 6 2 10 11 5 8 3 11 14 7 11 5 12 17 10 14 7 13 21 13 17 10 14 26 16 21 13 15 30 20 25 16 16 36 24 30 19 17 41 28 36 23 18 47 33 40 28 19 54 38 46 32 20 60 43 52 37 21 68 49 59 43 22 75 56 66 49 23 83 62 73 55 24 92 69 81 61 25 101 77 90 68 ------------------------------- * Sign test *. NPAR TEST /SIGN= copper WITH PopMedian /STATISTICS QUARTILES. Here SPSS will either give the exact or the asymptotic p-value, depending on sample size. To get a 95%CI for the median, again SPSS has to be "tricked" into it (using RATIO). COMPUTE One=1. RATIO STATISTICS copper WITH One /PRINT = CIN(95) MEDIAN . Of course, all these tricks can be programmed in MACROS (to create PopMedian and One variables, run the tests and then get rid of the useless extra variables). Tomorrow (hopefully, we'll discuss two-samples testing, continuous variables, paired and unpaired, both by parametric and non-parametric methods). mailto:[hidden email] Marta Garcia-Granero, PhD Statistician |
Hi Pat
CPE> Now that Marta has shown us how to 'trick' SPSS into calculating CPE> the 95%CI for the median. I'm wondering if there's a way to CPE> 'trick' SPSS into calculating the 95%CI for the 5th, 10th, 25th, CPE> 75th, and 90th percentiles, similar to SAS PROC UNIVARIATE? Do you have access to a description of the statistical methods used by SAS? I could try to work out some solution with SPSS With SPSS, other solution is using boostrap. It should be quite easy with MATRIX (I've already done some boostrapping with MATRIX, but not for percentiles, only for means, multiple R-squared and Spearman's rho). The only reference I found after Googling (I mean, the only one I could access from outside the University, right now I'm at home coping with a cold) describes the use of the binomial distribution for that, explains in detail how to do that for the median, and says "it could be used for other percentiles". -- Regards, Dr. Marta García-Granero,PhD mailto:[hidden email] Statistician --- "It is unwise to use a statistical procedure whose use one does not understand. SPSS syntax guide cannot supply this knowledge, and it is certainly no substitute for the basic understanding of statistics and statistical thinking that is essential for the wise choice of methods and the correct interpretation of their results". (Adapted from WinPepi manual - I'm sure Joe Abrahmson will not mind) |
Free forum by Nabble | Edit this page |