Yahoo! Mail Now Faster and Cleaner. Experience it today! |
|
I have thought of a liberal test as one that is more likely to find statistical significance (even where it does not truly exist), more likely to make a Type I error and less prone to Type II errors. A liberal test has more power. A conservative test is less likely to find statistical significance (even where it does truly exist), less likely to make Type I errors and more prone to Type II errors. Conservative tests have less power. There are many definitions out there, but, my sense is that most are variations on the above themes. Tim Daciuk Director, Demo Team SPSS, an IBM Company Phone - 1-416-265-9789 Cell - 1-426-996-9789
Yahoo! Mail Now Faster and Cleaner. Experience it today! |
|
In reply to this post by E. Bernardo
Hi Eins,
Please follow the attached link.....it's a long
one.
It quotes Robert P. Abelson, "Statistics as Principled
Argument",, which I highly recommend.
HTH,
Martin Holt
|
|
In reply to this post by E. Bernardo
Eins Bernardo wrote:
> LSD is considered as Liberal of the Post Hoc Test in ANOVA, while the > Duncan is more conservative thanthe LSD. Can someone > differentiate/contrast between liberal and conservative in > statistical context? A liberal test has a Type I error rate (or alpha level) that is larger than the stated value. So a test that claims to have a Type I error rate of 0.05 might actually have a rate of 0.08 or 0.13. A test can become liberal if you fail to properly adjust for multiple comparisons or if you allow early stopping at several points during the trial without appropriate adjustments. Sometimes failure to meet the underlying assumptions of a statistical test can produce a liberal result. A conservative test has a Type I error rate that is smaller than the stated values. So a test that claims to have a Type I error rate of 0.05 might actually have a rate of 0.03 or 0.01. Some adjustments for multiple comparisons can produce conservative tests. Also failure to meet the underlying assumptions of a statistical test can produce a conservative result. The research community generally shuns liberal tests, but do keep in mind that a conservative test often suffers from loss of power and (equivalently) an increase in the Type II error rate. -- Steve Simon, Standard Disclaimer Two free webinars coming soon! "What do all these numbers mean? Odds ratios, relative risks, and number needed to treat" Thursday, December 17, 2009, 11am-noon, CST. "The first three steps in a descriptive data analysis, with examples in PASW/SPSS" Thursday, January 21, 2010, 11am-noon, CST. Details at www.pmean.com/webinars ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Administrator
|
Very nicely stated, Steve. Going back to the original post, let me add that Fisher's LSD is therefore neither liberal nor conservative when there are exactly 3 groups. In that situation, the family-wise alpha is maintained at exactly the same level as the per-contrast alpha. And so, given it's greater power, Fisher's LSD ought to be used a lot more than it is WHEN there are 3 groups. If anyone needs references to support the use of Fisher's LSD with 3 groups, here are two. Howell, DC. Statistical Methods for Psychology (various editions & years, chapter on multiple comparison procedures). Meier U. A note on the power of Fisher’s least significant difference procedure. Pharmaceut. Statist. 2006; 5: 253–263. Here is the abastract from Meier's article. Fisher's least significant difference (LSD) procedure is a two-step testing procedure for pairwise comparisons of several treatment groups. In the first step of the procedure, a global test is performed for the null hypothesis that the expected means of all treatment groups under study are equal. If this global null hypothesis can be rejected at the pre-specified level of significance, then in the second step of the procedure, one is permitted in principle to perform all pairwise comparisons at the same level of significance (although in practice, not all of them may be of primary interest). Fisher's LSD procedure is known to preserve the experimentwise type I error rate at the nominal level of significance, if (and only if) the number of treatment groups is three. The procedure may therefore be applied to phase III clinical trials comparing two doses of an active treatment against placebo in the confirmatory sense (while in this case, no confirmatory comparison has to be performed between the two active treatment groups). The power properties of this approach are examined in the present paper. It is shown that the power of the first step global test - and therefore the power of the overall procedure - may be relevantly lower than the power of the pairwise comparison between the more-favourable active dose group and placebo. Achieving a certain overall power for this comparison with Fisher's LSD procedure - irrespective of the effect size at the less-favourable dose group - may require slightly larger treatment groups than sizing the study with respect to the simple Bonferroni alpha adjustment. Therefore if Fisher's LSD procedure is used to avoid an alpha adjustment for phase III clinical trials, the potential loss of power due to the first-step global test should be considered at the planning stage. Copyright © 2006 John Wiley & Sons, Ltd.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
If I may add my two cents to this thread: the issue of whether
a multiple comparison procedure (either planned comparison or post hoc test; See Kirk's Experimental Design text for background on these distinctions) is liberal or conservative requires one to make a decision about the overall Type I error rate [alpha(overvall)] that one is willing to accept. The two stage procedure typically used for the Fisher's LSD (Least Significant Difference test which is equivalent to doing all pairwise comparisons or t-tests between means but using the Mean Square Error from the ANOVA instead of just the variance in the two groups being compared) uses (a) one statistical test for ANOVA and (b) a seperate set of statistical tests for the comparisons. Statisticians such as Rand Wilcox argue against such two stage procedures because, technically, one does not have to have a significant ANOVA to do multiple comparisons (e.g., an ANOVA is not really necessary for planned comparisons though some of the values in an ANOVA table facillitates computations). If one keeps alpha(overall) fixed to some reasonable level, such as alpha(overall)= 0.05, then all of the multiple comparisons have their per comparison alpha [i.e., alpha(per comparison)} adjusted to a lower level or value. To make things clearer, consider: alpha(overall) can be calculated with the following formula: alpha(overall) = (1 - (1 - alpha(per comparison))**K) where K is equal to the number of comparisons or tests that one is conducting. In the LSD framework, each test has alpha(per comparison)=0.05, that is, the usual alpha level is used for each test. But after 3 statistical tests (e.g., all pairwise comparisons between three means), the alpha(overall)= 0.14, that is, after doing three t-tests among the means, the is a probability 0.14 that a Type I error has been commited. As the number of comparisons increases, the overall probability that one has commited a Type I also increases until it quite likely that one or more tests results are Type I errors (this is also relevant when is checking for significant correlations in a correlation matrix). The Bonferroni solution to this situation is to set alpha(overall)=0.05 and divide alpha(overall) by the number of tests (i.e., K from above). Consider: corrected alpha(per comparison) = alpha(overall)/K However, one can change alpha(per comparison) to different values in the context of planned comparisons. All multiple comparison procedures keep alpha(overall) = 0.05 but calculate alpha(per comparison) in different ways because they rely upon different distributions and how many comparisons are being made. The Scheffe F is the most conservative multiple comparison procedure because it will have the largest critical value that an obtained difference is compared against. The Scheffe F can be use to compare pairs of means or complex combinations of means (the previous tests are assuming that one is only doing pairwise comparisons between means). To make the discussion of multiple comparisons more rational, one has to adopt a Neyman-Pearson framework that requires one to specify a specific effect size (e.g., standardized difference betwee population means) and allows on to identify the Type II error rate or, equivalently the levels of statistical power (= 1- Type II error rate). Fisher's LSD can be interpreted in the context of the Neyman-Pearson framework but Fisher himself did not accept it as a meaningful or valid statistical framework, consequently, issues of Type II errors and statistical power are irrelevant if one is really being "old school". Gerd Gigerenzer has written about this in his articles on the history of statistics. But if one is willing to use the Neyman-Pearson framework and specify a fixed effect size that one wants to detect for a specific level of power while keeping alpha(overall)= 0.05, then one can ask which of the different pairwise comparison procedures produces the smallest critical difference which has to be exceeded by the obtained difference between sampel means. I believe that the LSD procedure will produce the smallest critical difference, the Scheffe F the largest, and all other tests will provide critical differences between these extremes. In this sense, the LSD is most liberal because it requires the smallest difference between means to achieve statistical significant (but at the cost of an increase alpha(overall)) while the Scheffe is the most conservative because it will have the largest critical difference (but it will also be the least powerful). The SPSS procedures that provide these tests, such as in GLM, do an odd thing. Instead of telling one what the actual LSD value or Bonferroni or Tukey or whatever is, it just tells you whether the observed difference between exceeds this critical difference (i.e., it is identified as statistically significant) or not. Formulas for hand calculation of the actual difference is provided in a number of sources such as Kirk's Experimental Design text and, if memory serves, the Glass & Hopkins Stat Methods in Ed & Psych. If one has a dataset where a one-way ANOVA is appropriate, do the various multiple comparisons and see which results are singificant by LSD but become nonsignificant with other tests. If the difference/effect size is really different from zero, then the tests that are nonsignificant are actually Type II errors. Increasing the sample size (thereby increasing statistical power) will usually change these test results from nonsignificant to significant. In summary, in defining whether multiple comparison is liberal or conservative depends upon a number of factors but what is critical is (1) keeping alpha(overall) equal to some specified value such 0.05, (2) adjusting the alpha(per comparison) appropriately, and (3) identifying what the critical difference is that a difference between obtained sample means have to exceed to claim that the difference is greater than zero (i.e., the two means are different from each other). I'll shut up now. -Mike Palij New York University [hidden email] ----- Original Message ----- From: "Bruce Weaver" <[hidden email]> To: <[hidden email]> Sent: Wednesday, December 09, 2009 11:18 AM Subject: Re: Liberal and conservative? > Steve Simon, P.Mean Consulting wrote: >> >> Eins Bernardo wrote: >> >>> LSD is considered as Liberal of the Post Hoc Test in ANOVA, while the >>> Duncan is more conservative thanthe LSD. Can someone >>> differentiate/contrast between liberal and conservative in >>> statistical context? >> >> A liberal test has a Type I error rate (or alpha level) that is larger >> than the stated value. So a test that claims to have a Type I error rate >> of 0.05 might actually have a rate of 0.08 or 0.13. A test can become >> liberal if you fail to properly adjust for multiple comparisons or if >> you allow early stopping at several points during the trial without >> appropriate adjustments. Sometimes failure to meet the underlying >> assumptions of a statistical test can produce a liberal result. >> >> A conservative test has a Type I error rate that is smaller than the >> stated values. So a test that claims to have a Type I error rate of 0.05 >> might actually have a rate of 0.03 or 0.01. Some adjustments for >> multiple comparisons can produce conservative tests. Also failure to >> meet the underlying assumptions of a statistical test can produce a >> conservative result. >> >> The research community generally shuns liberal tests, but do keep in >> mind that a conservative test often suffers from loss of power and >> (equivalently) an increase in the Type II error rate. >> -- >> Steve Simon, Standard Disclaimer >> > > Very nicely stated, Steve. > > Going back to the original post, let me add that Fisher's LSD is therefore > neither liberal nor conservative when there are exactly 3 groups. In that > situation, the family-wise alpha is maintained at exactly the same level as > the per-contrast alpha. And so, given it's greater power, Fisher's LSD > ought to be used a lot more than it is WHEN there are 3 groups. > > If anyone needs references to support the use of Fisher's LSD with 3 groups, > here are two. > > Howell, DC. Statistical Methods for Psychology (various editions & > years, chapter on multiple comparison procedures). > > Meier U. A note on the power of Fisher’s least significant difference > procedure. Pharmaceut. Statist. 2006; 5: 253–263. > > Here is the abastract from Meier's article. > > Fisher's least significant difference (LSD) procedure is a two-step testing > procedure for pairwise comparisons of several treatment groups. In the first > step of the procedure, a global test is performed > for the null hypothesis that the expected means of all treatment groups > under study are equal. If this global null hypothesis can be rejected at the > pre-specified level of significance, then in the second step of the > procedure, one is permitted in principle to perform all pairwise comparisons > at the same level of significance (although in practice, not all of them may > be of primary interest). Fisher's LSD procedure is known to preserve the > experimentwise type I error rate at the nominal level of significance, if > (and only if) the number of treatment groups is three. The procedure may > therefore be applied to phase III clinical trials comparing two doses of an > active treatment against placebo in the confirmatory sense (while in this > case, no confirmatory comparison has to be performed between the two active > treatment groups). The power properties of this approach are examined in the > present paper. It is shown that the power of the first step global test - > and therefore the power of the overall procedure - may be relevantly lower > than the power of the pairwise comparison between the more-favourable active > dose group and placebo. Achieving a certain overall power for this > comparison with Fisher's LSD procedure - irrespective of the effect size at > the less-favourable dose group - may require slightly larger treatment > groups than sizing the study with respect to the simple Bonferroni alpha > adjustment. Therefore if Fisher's LSD procedure is used to avoid an alpha > adjustment for phase III clinical trials, the potential loss of power due to > the first-step global test should be considered at the planning stage. > Copyright © 2006 John Wiley & Sons, Ltd. > > > ----- > -- > Bruce Weaver > [hidden email] > http://sites.google.com/a/lakeheadu.ca/bweaver/ > "When all else fails, RTFM." > > NOTE: My Hotmail account is not monitored regularly. > To send me an e-mail, please use the address shown above. > -- > View this message in context: http://old.nabble.com/Liberal-and-conservative--tp26710095p26712988.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Administrator
|
But it is important to note that one only proceeds to b if the ANOVA at step a was significant. I agree that a significant omnibus F-test is not necessary for most multiple comparison procedures, despite what some of us might have been taught back in the day. But a significant omnibus F-test is required before one proceeds to the pair-wise tests when doing Fisher's LSD. Here is the meat of Dave Howell's argument about why Fisher's LSD controls the family-wise alpha at the per-contrast alpha level when there are 3 groups. 1. When the complete null hypothesis is true (i.e., all 3 population means are equal), "the requirement for a significant overall F [before proceeding to the pairwise tests] ensures that the familywise error rate will equal alpha" (6th Ed., p.368). 2. If two of the population means are equal, but different from the 3rd, then there is only one opportunity for a Type I error to occur--i.e., the test that compares samples from the two populations with equal means. 3. If the 3 population means are all different, then there is no opportunity for a Type I error to occur. That formula applies when the contrasts are all mutually independent. But that's not the case for all pair-wise contrasts for a set of means. You're better off using the Bonferroni approximation, I think. I think your argument might be right here if one proceeded with the pair-wise tests regardless of whether the omnibus F-test was significant or not. But remember that you only get to the pair-wise tests if you first reject the null for the overall F-test. Howell argues that this is what provides the "protection" for Fisher's protected t-tests as they are sometimes called. Me too. ;-) Cheers, Bruce
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
| Free forum by Nabble | Edit this page |
