|
Hi all,
I have a question about when to correct for multiple testing. The scenario seems to be a very common one but we have been struggling to figure out the best way to analyze it, so any help would be greatly appreciated. We have conducted a usability test in which 40 participants were asked to fulfill three tasks with two different software versions (the main experimental factor) in a repeated measures design. The hypothesis was that one of the software versions would be more usable, and this was tested by collecting 3 objective measures (time, number of clicks, number of errors). For each of the objective measures and for each of the three tasks we've then used Wilcoxon signed-rank tests (because the data was not normally distributed) for paired samples with a 1-tailed test. Question: Do we have to correct for multiple testing in this setting, because the data was collected from the same participants? Many thanks in advance, Katharina ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Administrator
|
I don't think there is a single correct answer to your question. For example, where does your study fall on the "exploratory to confirmatory" spectrum? The closer you are to the confirmatory end, the greater the need to correct for multiple tests, IMO. I know of at least one article that argues no correction needs to be applied for purely exploratory studies, on the other hand. http://plog.yejh.tc.edu.tw/gallery/53/%E5%88%A4%E6%96%B7%E5%A4%9A%E5%85%83%E8%A9%95%E9%87%8F.pdf The importance of correcting for multiple tests is also related to the number of tests, obviously. If you have only 3, it's not as big an issue as if you have dozens or hundreds. In the latter case, it's possible to draw some pretty outlandish conclusions. E.g., http://prefrontal.org/files/posters/Bennett-Salmon-2009.pdf HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
I thought of a couple of questions when I read this current thread. If
Katharina elects to correct for multiple tests, I'd guess that she'd use a Bonferroni correction. Her data are repeated measures and I'd expect, a priori, that the measures are correlated. And, when samples are drawn repeatedly from a population and analyzed, the test statistic would also be correlated. Given correlated data, is a Bonferroni correction correct, in the sense of preserving a specific overall p value? I'd think not but maybe I'm wrong. If I'm not, however, what would be the correct correction? Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Bruce Weaver Sent: Thursday, August 26, 2010 9:39 AM To: [hidden email] Subject: Re: Multiple testing Katharina Reinecke wrote: > > Hi all, > > I have a question about when to correct for multiple testing. The scenario > seems to be a very common one but we have been struggling to figure out > the best way to analyze it, so any help would be greatly appreciated. > > We have conducted a usability test in which 40 participants were asked to > fulfill three tasks with two different software versions (the main > experimental factor) in a repeated measures design. > > The hypothesis was that one of the software versions would be more usable, > and this was tested by collecting 3 objective measures (time, number of > clicks, number of errors). > > For each of the objective measures and for each of the three tasks we've > then used Wilcoxon signed-rank tests (because the data was not normally > distributed) for paired samples with a 1-tailed test. > > Question: Do we have to correct for multiple testing in this setting, > because the data was collected from the same participants? > > Many thanks in advance, > Katharina > > I don't think there is a single correct answer to your question. For example, where does your study fall on the "exploratory to confirmatory" spectrum? The closer you are to the confirmatory end, the greater the need to correct for multiple tests, IMO. I know of at least one article that argues no correction needs to be applied for purely exploratory studies, on the other hand. <a href="http://plog.yejh.tc.edu.tw/gallery/53/%E5%88%A4%E6%96%B7%E5%A4%9A%E5%85%83%E">http://plog.yejh.tc.edu.tw/gallery/53/%E5%88%A4%E6%96%B7%E5%A4%9A%E5%85%83%E 8%A9%95%E9%87%8F.pdf The importance of correcting for multiple tests is also related to the number of tests, obviously. If you have only 3, it's not as big an issue as if you have dozens or hundreds. In the latter case, it's possible to draw some pretty outlandish conclusions. E.g., http://prefrontal.org/files/posters/Bennett-Salmon-2009.pdf HTH. ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multiple-testing-tp2652912p271 1864.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Administrator
|
Hi Gene. Here are a couple excerpts from the Bender & Lange article. "Bonferroni corrections should only be used in cases where the number of tests is quite small (say, less than 5) and the correlations among the test statistics are quite low." (p. 345) And a longer one on pages 345-346 : --- start of excerpt --- The case of multiple endpoints is one of the most common multiplicity problems in clinical trials [29,30]. There are several possible strategies to deal with multiple endpoints. The simplest approach, which should always be considered first, is to specify a single primary endpoint. This approach makes adjustments for multiple endpoints unnecessary. However, all other endpoints are then subsidiary and results concerning secondary endpoints can only have an exploratory rather than a confirmatory interpretation. The second possibility is to combine the outcomes in one aggregated endpoint (e.g., a summary score for quality of life data or the time to the first event in the case of survival data). The approach is adequate only if one is not interested in the results of the individual endpoints. Thirdly, for significance testing multivariate methods [e.g., multivariate analysis of variance (MANOVA) or Hotelling’s T^2 test] and global test statistics developed by O’Brien [31] and extended by Pocock et al. [32] can be used. Exact tests suitable for a large number of endpoints and small sample size have been developed by Läuter [33]. All these methods provide an overall assessment of effects in terms of statistical significance but offer no estimate of the magnitude of the effects. Again, information about the effects concerning the individual endpoints is lacking. In addition, Hotelling’s T^2 test lacks power since it tests for unstructured alternative hypotheses, when in fact one is really interested in evidence from several outcomes pointing in the same direction [34]. Hence, in the case of several equally important endpoints for which individual results are of interest, multiple test adjustments are required, either alone or in combination with previously mentioned approaches. Possible methods to adjust for multiple testing in the case of multiple endpoints are given by the general adjustment methods based upon P values [35] and the resampling methods [22] introduced above. It is also possible to allocate different type 1 error rates to several not equally important endpoints [36,37]. --- end of excerpt --- Cheers, Bruce
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
Bruce's points below, especially about using Bonferroni corrections
in situations where one is doing few tests and there are low correlations among measurement, are good advice. As the number of tests increase and/or the correlations increase, the Bonferroni correction becomes too conservative. There are procedures for using the information about correlations among measures though I don't have a reference on this immediately at hand. However, the SISA software webpages does allow one to use a web app to calculate the corrected per comparison alpha; see: http://www.quantitativeskills.com/sisa/calculations/bonfer.htm The output provides both Bonferroni and Sidak correction for the case of r=0.00 (independent measures) and the user specified correlation (i.e., mean correlation among measures). SISA provides background on this page but do not provide details on the calculations; see: http://www.quantitativeskills.com/sisa/calculations/bonhlp.htm For those with more technical statistical background, the situation we're discussing actually comes up often in genetics research and one article that compares different adjustment procedures is available on the PubMed website; see: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2276357/ The reference is: Conneely, K.N. & Boehnke, M. (2007) So many correlated tests, so little time! Rapid adjustment of p values for multiple correlated tests. American Journal of Human Genetics, 81(6), 1158-1168. The article points out that in genetics, permutation tests are often used for correlated tests (the Bonferroni and Sidak adjustments are too conservative for most situations) and the authors provide a new method for calculating an adjusted p value (Pact) that takes less time to calculate than then permutation results. For people proficient in R, the authors provide access to their R code for calculating Pact; Authors' Web site, http://csg.sph.umich.edu/boehnke/p_act.php (for R code for computation of PACT) -Mike Palij New York University [hidden email] ----- Original Message ----- From: "Bruce Weaver" <[hidden email]> To: <[hidden email]> Sent: Thursday, August 26, 2010 11:58 AM Subject: Re: Multiple testing > Gene Maguin wrote: >> >> I thought of a couple of questions when I read this current thread. If >> Katharina elects to correct for multiple tests, I'd guess that she'd use a >> Bonferroni correction. Her data are repeated measures and I'd expect, a >> priori, that the measures are correlated. And, when samples are drawn >> repeatedly from a population and analyzed, the test statistic would also >> be >> correlated. Given correlated data, is a Bonferroni correction correct, in >> the sense of preserving a specific overall p value? I'd think not but >> maybe >> I'm wrong. If I'm not, however, what would be the correct correction? >> >> Gene Maguin >> >> > > Hi Gene. Here are a couple excerpts from the Bender & Lange article. > > "Bonferroni corrections should only be used in cases where the number of > tests is quite small (say, less than 5) and the correlations among the test > statistics are quite low." (p. 345) > > And a longer one on pages 345-346 : > > --- start of excerpt --- > The case of multiple endpoints is one of the most common multiplicity > problems in clinical trials [29,30]. There are several possible strategies > to deal with multiple endpoints. The simplest approach, which should always > be considered first, is to specify a single primary endpoint. This approach > makes adjustments for multiple endpoints unnecessary. However, all other > endpoints are then subsidiary and results concerning secondary endpoints can > only have an exploratory rather than a confirmatory interpretation. The > second possibility is to combine the outcomes in one aggregated endpoint > (e.g., a summary score for quality of life > data or the time to the first event in the case of survival data). The > approach is adequate only if one is not interested in the results of the > individual endpoints. Thirdly, for significance testing multivariate methods > [e.g., multivariate analysis of variance (MANOVA) or Hotelling’s T^2 test] > and global test statistics developed by O’Brien [31] and extended by Pocock > et al. [32] can be used. Exact tests suitable > for a large number of endpoints and small sample size have been developed by > Läuter [33]. All these methods provide an overall assessment of effects in > terms of statistical significance but offer no estimate of the magnitude of > the effects. Again, information about the effects concerning the individual > endpoints is lacking. In addition, Hotelling’s T^2 test lacks power since it > tests for unstructured alternative hypotheses, when in fact one is really > interested in evidence from several outcomes pointing in the same direction > [34]. Hence, in the case of several equally important endpoints for which > individual results are of interest, multiple test adjustments are required, > either alone or in combination with previously mentioned approaches. > Possible methods to adjust for multiple testing in the case of multiple > endpoints are given by the general adjustment methods based upon P values > [35] and the resampling methods [22] introduced above. It is also possible > to allocate different type 1 error rates to several not equally important > endpoints [36,37]. > --- end of excerpt --- > > Cheers, > Bruce > > > ----- > -- > Bruce Weaver > [hidden email] > http://sites.google.com/a/lakeheadu.ca/bweaver/ > > "When all else fails, RTFM." > > NOTE: My Hotmail account is not monitored regularly. > To send me an e-mail, please use the address shown above. > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multiple-testing-tp2652912p2723296.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
