Bonferroni-adjusted t-tests

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Bonferroni-adjusted t-tests

Shin-7
Dear all:

I have a question regarding t-tests. One of the reviewers suggested me to
perform "Bonferroni-adjusted t-tests". I understand that Bonferroni is one
of the post hoc methods in MULTIPLE group comparisons of ANOVA. However,
our study deals with only TWO groups.

His/her main argument is that our results of two-group comparison
generated very small means, but probably due to a relatively large sample
size (n1=870, n2=780), t-tests detected statistical significance at 0.05,
which may not have any practical meaning. Therefore, he/she suggested us
to "adjuste the alpha level." He/She implies that if I indeed adjust the
alpha, some of the comparisons may not result in statistical significance.

Could anybody explain how we could perform Bonferroni-adjusted t-tests IN
A TWO-GROUP COMPARISON via SPSS? I have looked at the main menu but could
not find any option of this kind, except in ANOVA as a post hoc option.
Thank you for your help in advance.

Shin

S. Okazaki
Autonomous University of Madrid

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Bonferroni-adjusted t-tests

Marta Garcia-Granero
Shin escribió:

> Dear all:
>
> I have a question regarding t-tests. One of the reviewers suggested me to
> perform "Bonferroni-adjusted t-tests". I understand that Bonferroni is one
> of the post hoc methods in MULTIPLE group comparisons of ANOVA. However,
> our study deals with only TWO groups.
>
> His/her main argument is that our results of two-group comparison
> generated very small means, but probably due to a relatively large sample
> size (n1=870, n2=780), t-tests detected statistical significance at 0.05,
> which may not have any practical meaning. Therefore, he/she suggested us
> to "adjuste the alpha level." He/She implies that if I indeed adjust the
> alpha, some of the comparisons may not result in statistical significance.
>
> Could anybody explain how we could perform Bonferroni-adjusted t-tests IN
> A TWO-GROUP COMPARISON via SPSS? I have looked at the main menu but could
> not find any option of this kind, except in ANOVA as a post hoc option.
> Thank you for your help in advance.

Hi Shin

There are several things you can do:

1) Fight the reviewer's suggestion: Bonferroni adjustment can be too
stringent and do more harm than good to you statistical analysis
See Perneger's paper "What's wrong with Bonferroni adjustment" in
British Medical Journal (avalaible for free online at the web page).

2) Replace Bonferroni adjustment for something a bit less conservative,
like some FDR algorithm (see some references and documents at the end of
this message)

3) Acknowledge the reviewer's suggestion and perform BOnferroni
adjustment on your t tests. You don't nedd SPSS for that. The logic
behind the reviewer's request is simply to take the significance alpha
level (0.05 usually) and divide it by the numbers of tests you've run
(you'll get and adjusted alpha level, usually quite low). Then, you
declare significant only those tests with p-values lower than the
adjusted alpha level. Simple, and some times catastrophic for the
significance of the tests you performed (sometimes, not a single result
is significant after that, if the number of tests run was high).


HTH,
Marta García-Granero

--------------------------------
Appendix:

Extract of a document I downloaded from a web page that doesn't exist
anymore (http://www.math.tau.ac.il/~roee/FDR_Downloads2.htm a pity,it
included a program, called FDRAlgo.exe that could have helped you in the
task of getting a better adjustment method for your p-values than
Bonferroni's. Anyway, I still keep a copy of it, and I can send you the
program privately to your e-mail address)

THE MULTIPLICITY PROBLEM

Everyday, one can find, in the newspaper or in other popular press, some
claim of association between a stimulus and an outcome, with
consequences for health or general welfare of the population at large.
Many of these associations are suspect at best, and often will not hold
up under scrutiny. Examples of such association are: coffee and heart
attacks, vitamins and IQ, tomato sauce and cancer, and on and on. Many
of these claims have shaky foundations, and some have not been
replicated in further research. With so much conflicting information in
the popular press, the general public has learned to mistrust
statistical studies, and to shy away from the use of statistics in general.

There are several reasons that cause these incorrect conclusions to
become part of the scientific and popular press; usually scientists
fault such things as improper study design and poor data. Another reason
for these claims originates from large studies, where data analysts
report all the tests that are “statistically significant” (usually
defined as p < 0.05, where “p” denotes p-value) as a “real” effect. On
the surface, this practice seems innocuousness; since this is the rule
learned in statistics classes. The problem arises when multiple test are
performed “p < 0.05” outcome can often occur when there is no real
effect at all. Historically, the “p < 0.05” rule was devised for a
single test with the following logic: if p < 0.05 outcome was observed,
than the analyst has two options Either he\she can believe that there is
no real effect and that the data is so anomalous that it is within the
range of values that would be observed only 1 in 20, or he/she may
choose to believe that the observed association is real. Because the 1
in 20 chance is relatively small, the common practice is to “reject” the
hypothesis of no real effect and “accept” the conclusion that the effect
is real.

The logic breaks down when more than one test and comparisons are
considered in a single study. If one considers 20 or more tests, than
one expects at least one “1 in 20” significant outcome, even when none
of the effects are real. Thus, there is little protection offered by the
“1 in 20” rule, and incorrect claims can result.

Although incorrect decisions can be blamed on poor design, bad data,
etc., one should be aware that multiplicity can cause faulty
conclusions, and should be taken care off in large studies that includes
many tests and comparisons.

One example for these kinds of studies is: subgroup analysis in a
clinical trial.

As a part of the pharmaceutical development process new therapies
usually are evaluated using randomized clinical trials. In such studies,
patients are randomly assigned to either active or placebo therapy.
After the conclusion of the study, the active or placebo groups are
compared to see which is better, using a single pre-defined outcome of
interest. At this stage, there is no multiplicity problem, since there
is only one test. However, there are many reasons to evaluate patient
subgroups. The therapy might works better for younger patients, better
for patients with mild conditions as opposed to sever etc. While it is
good to ask such questions, such data must be analyzed carefully, and
with multiplicity problem in mind. If the data are thus subdivided into
many subgroups, it can easily happen that a patient subgroup shows
“statistical significance” by chance alone, leading analyst to
incorrectly recommend it for that subgroup, or worse yet, to recommend
it for all groups based on the evidence from the single subgroup.

Classical Multiple Comparison Procedures aim at controlling the
probability of committing even a single type-I error within the tested
family of hypotheses. The main problem with such classical procedures,
which hinder their application in applied research, is that they tend to
have substantially less power than uncorrected procedures. In many
instances, lack of multiplicity control is too permissive; the full
protection resulting from controlling the FWE is too restrictive. This
is the case when the overall conclusion from the various individual
inferences is not necessarily erroneous as soon as one of them is, yet
selection effect is still of concern.


The FDR is a new approach to multiple hypotheses testing. The FDR is the
expected proportion of true null hypotheses rejected out of the total
number of null hypotheses rejected. Multiple comparison procedures
controlling the FDR are more powerful than the commonly used multiple
comparison procedures based on the Family Wise Error Rate. FDR
controlling procedures are especially suited to to large multiple
comparison problems in which existing procedures lack power.

FDR methodology: http://www.math.tau.ac.il/~roee/meth.htm

The FDR is the expected proportion of true null hypotheses rejected out
of the total number of rejections. If R null hypotheses are rejected in
multiple testing procedure, V the number of true null hypotheses
rejected, the FDR is defined:

| V/R if R>0
FDR=E(Q) --> Q= |
| 0 if R=0

The concept:


Classical Multiple Comparison Procedures aim at controlling the
probability of committing even a single type-I error within the tested
family of hypotheses. The main problem with such classical procedures,
which hinder their application in applied research, is that they tend to
have substantially less power than uncorrected procedures. In many
instances, lack of multiplicity control is too permissive; the full
protection resulting from controlling the FWE is too restrictive. This
is the case when the overall conclusion from the various individual
inferences is not necessarily erroneous as soon as one of them is, yet
selection effect is still of concern. Benjamini and Hochberg (1995)
introduced the False Discovery Rate (FDR) - the expected ratio of
erroneous rejections to the number of rejected hypotheses, as an
appropriate error rate to control. The FDR is equal to the family wise
error rate when the number of true null hypotheses mo equals the number
of all hypotheses under test m, so in such a situation controlling the
FDR controls the FWE as well. But the FDR criterion is adaptive, in the
sense that when some of the tested hypotheses are not true (i.e. mo <
m), the FDR is smaller, and more so when more of the hypotheses are not
true. Hence FDR controlling procedures can be more powerful than FWE
controlling procedures at the same level.

List of FDR controlling procedures, when can they be used?

Procedure Independence Positive Pairwise General
dependence dependence
Linear Step Up procedure. + + + -
Generalized Linear Step Up procedure + + + +
Step Down + - - -
Generalized Step Down. + + + -
Two Stage Linear Step Up procedure. + ? ? -
Adaptive procedure. + ? + -
Troendle’s Step-up and Step-down
procedures - Many to one - -
Resampling procedure. + + + +


Historical perspective

The proposal for FDR control in BH was motivated by the paper of Soricç
(1987) which was a strong and emotional call for the necessity to
controlled inference because of the increased error resulting from
multiple inferences. Otherwise, he warned, the expected number of `false
discoveries` becomes large relative to the number of discoveries. Since
BH has been published we have learned of independent previous efforts in
the direction in which we went: looking for suitable error control in
face of multiplicity, when the full protective power of the FWE is not
necessary.

Shaffer (1997) has noted that an informal effort in this direction had
already been attempted by Elkund in an unpublished work in Swedish. This
work has been reported by Seeger (1968) who also attributes the
procedure to Elkund. Seeger proved that when all tested null hypotheses
are true the procedure controls the FWE at level q, but when some
hypotheses are true while other are false (i.e. when m0 < m), this is
not the case. Apparently Seeger's second result, that the procedure does
not always control the FWE at the desired level, had diminished the
interest in the procedure at the time it was proposed, to the point it
became completely unknown (e.g. no mentioning in Hochberg and Tamhane ,
1987).

Independently, Simes (1986) proposed a global test of the single
intersection hypothesis. He gave a nice proof (by induction) of the
error controlling property of the test, which is essentially Seeger’s
first result. Simes suggested also suggested this procedure as an
informal multiple testing procedure, but then Hommel (1988) showed - as
Seeger had done before - that it does not control the FWE in the strong
sense. Therefore, in the realm of FWE control, the procedure cannot be
used for making the multiple inferences about the individual hypotheses.
It can, and was used, to derive several other testing procedures e.g. by
Hochberg (1988) and Hommel (1988), but these procedure are less
powerful. Sen (1998a) points out that this equality is actually the
classical Ballot Theorem related to uniform order statistics. Interest
in the procedure as a multiple testing procedure came, in view of the
FDR criterion it controls (BH): see, for example, its implementation in
the new SAS MULTPROC software. For a review of the global testing
procedure and its extensions see Hochberg and Hommel (1997).

Partial list of FDR references:

Benjamini, Y., Hochberg, Y. (1995). " Controlling the False Discovery
Rate: a Practical and Powerful Approach to Multiple Testing ", Journal
of the Royal Statistical Society B, 57 289-300.

Benjamini, Y., Hochberg, Y., Kling, Y. (1993)." False Discovery Rate
control in pairwise comparisons ", Working Paper 93-2, Dept. of Statist.
and O.R., Tel Aviv Univ.

Benjamini, Y., Hochberg, Y., Kling, Y. (1997)." False Discovery Rate
control in multiple hypotheses testing using dependent test statistics
", Research Paper 97-1, Dept. of Statist. and O.R., Tel Aviv Univ.

Benjamini, Y., Kling, Y. (1999). "A look at Statistical Process Control
through the
P-Value" .Research Paper 99-8, Dept. of Statist. and O.R., Tel Aviv Univ.

Benjamini, Y., Liu, W., (1999) "A distribution-free multiple test
procedure that controls the false discovery rate", Research Paper 99-3,
Dept. of Statist. and O.R., Tel Aviv Univ.

Benjamini, Y., Yekutieli, D. (1997) "The control of the False Discovery
Rate under dependence". Research Paper 97-4, Dept. of Statist. and O.R.,
Tel Aviv Univ.

Hochberg, Y. (1988). " A sharper Bonferroni procedure for multiple tests
of significance ", Biometrika 75, 800-803.

Hochberg, Y., Hommel, G. (1998) " Step-up multiple testing procedures "
Encyc. Statist. Sc. Supplementary Vol, 2.

Hochberg, Y., Rom, D. (1996). "Extensions of multiple testing procedures
based on Simes' test " J. Statist. Plann. Inference 48, 141-152.

Hommel, G. (1988). "A stagewise rejective multiple test procedure based
on a modified Bonferroni test ".Biometrika 75,383-386.

Needleman et al. (1979). "Deficits in psychologic and classroom
performance of children with elevated dentine lead levels", New England
Journal of Medicine, 300, 689-695.

Sarkar, S. K., (1998) " Some probability inequalities for ordered random
variables: A proof of Simes' conjecture " The Annals of Statistics, 2,
494-504.

Seeger. (1968). "A note on a method for the analysis of significances en
mass" Technometrics, 10 586-593.

Simes, R. J., (1986) "An improved Bonferroni procedure for multiple
tests of significance", Biometrika73 751-754.

Wassmer, G., Reitmer, P., Kieser, M., Lehmacher, W. (1998) "Procedures
for testing multiple endpoints in clinical trials: an overview" J.
Statist. Plann. Inference (In press).

Westfall, P. H., Young, S. S.(1993), Resampling based multiple testing,
Wiley, New York.

Yekutieli, D., (1999) "Elkund-Seeger-Simes is Conservative for testing
all pairwise comparison" . Research Paper 99-7, Dept. of Statist. and
O.R., Tel Aviv Univ.

Yekutieli, D., Benjamini, Y., (1999) "Resampling based false discovery
controlling multiple test procedures for correlated test statistics" J.
Statist. Plann. Inference

FDR in Applications

A discussion of applied statistical work using FDR, organized by
scientific area.

Medicine (general)

Reviews: Curran-Everett [41] mostly pairwise in Physiology; Ottenbacher
[15] talks about FDR control in Epidimiology, not being aware of FDR
controlling procedures.

Methodology: Brown &Russell [10] operating characteristics; Witte et al
[35] sample size calculations; Stine & Heyse [54] estimates of overlap.

Applications: comparison of powder inhalers [1,2]; Pharmacological
comparisons [19]; Public health policy [37, 22]; Shuttle/Mir space
missions (Americans vs Russians) [44,45]; Quality of life comparisons [52].

Psychiatry and Neurology

Methodology: Ellis et al [43] developes the analysis of neurochemical maps;

Applications: Study of organ transplant patients [3]; Effects of ethanol
exposure [31]; Dose effect study in child psychotherapy [34]; Multiple
Sclerosis [19]; post-polio syndrom [53]; HIV associated dimenia [13];

Psychology, Psychometrics and Education

Reviews:; Wilkinson [25] sets guidelines

Methodology: Kesselman, Cribbie & Holland [23], Williams Jones & Tukey
[24] - develop pairwise comparisons; Wainer [8] adjusts graphical
displays. Ip [56] developes the study local dependencies in item
response problems.

Applications: National Assessment of Educational Progress data
[5,8,39,24]; Drug abuse [12]; Meta-analysis of psychoeducational
programs for heart disease patients [26]; Mental health outcomes [42];
cross sectional study of bus drivers [55]

Genetics and Biology

Methodology: Basford & Tukey [6] develop graphical profiles aiding in
plant breeding experiments; Weller et al [9,43] uses it in genetic
mapping (QTL analysis), elicitating response by Zaykin, Young & Westfall
[22]; Bovenhuis & Spelman [33] adapt the methods for selective
genotyping; Drijalenko & Elston [11] finds the concept useful in mapping;

Applications: QTL analysis for milk production [20,43] and milk yield
monitoring [29]; Breeding [6,50]; Defences in Daphnia [46]; Song
imitation in Zebra Finches [16];

Signal Denoising and Image Processing

Methodology: Benjamini & Abramovich [4] develop adaptive thresholding of
wavelet coefficients;

Aplications: High resolution defects detection [49]

Economics and Marketing

Methodology: Green & Babyak [7,20,40] develop multiple testing of
constraints and individual parameters in structural equations models;
Schaffer and Green [28] develop cluster based market segmentation

Meteorology

Applications: Long distances dependencies in pressure [32]

Statistical Theory

Reviews: Brown [48] reviews FDR development in an essay on statistical
decision theory; Pigeot [32] reviews multiple testing.

Methodology: The use of FDR as variable selection strategy was developed
by Abramovich et al, and other interpretations can be found in George
[47], George and Foster [51]. Shaffer found a connection to Duncan's
semi Bayesian procedure.

List of References for Applications

[1] Schlaeppi M, Edwards K, Fuller RW, et al.
Patient perception of the Diskus inhaler: A comparison with the
Turbuhaler inhaler.
BRIT J CLIN PRACT 50 (1): 14-19 JAN-FEB 1996

[2] Sharma RK, Edwards K, Hallett C, et al.
Perception among paediatric patients of the Diskus(R) inhaler, a novel
multidose powder inhaler
for use in the treatment of asthma - Comparison with the Turbuhaler(R)
inhaler.
CLIN DRUG INVEST 11 (3): 145-153 MAR 1996

[3] Thomason JM, Seymour RA, Ellis JS, et al.
Determinants of gingival overgrowth severity in organ transplant
patients - An examination of the role of HLA phenotype.
J CLIN PERIODONTOL 23 (7): 628-634 JUL 1996

[4] Abramovich F, Benjamini Y
Adaptive thresholding of wavelet coefficients.
COMPUT STAT DATA AN 22 (4): 351-361 AUG 10 1996

[5] Wainer H
Depicting error.
AM STAT 50 (2): 101-111 MAY 1996

[6] Basford KE, Tukey JW
Graphical profiles as an aid to understanding plant breeding experiments.
J STAT PLAN INFER 57 (1): 93-107 JAN 15 1997

[7] Green SB, Babyak MA
Control of type I errors with multiple tests of constraints in
structural equation modeling.
MULTIVAR BEHAV RES 32 (1): 39-51 1997

[8] Wainer H
Improving tabular displays, with NAEP tables as examples and inspirations.
J EDUC BEHAV STAT 22 (1): 1-30 SPR 1997

[9] Weller JI, Song JZ, Ronin YI, et al.
Designs and solutions to multiple trait comparisons.
ANIM BIOTECHNOL 8 (1): 107-122 1997

[10] Brown BW, Russell K
Methods correcting for multiple testing: Operating characteristics.
STAT MED 16 (22): 2511-2528 NOV 30 1997

[11] Drigalenko EI, Elston RC
False discoveries in genome scanning.
GENET EPIDEMIOL 14 (6): 779-784 1997

[12] Hubbard RL, Craddock SG, Flynn PM, et al.
Overview of 1-year follow-up outcomes in the Drug Abuse Treatment
Outcome Study (DATOS).
PSYCHOL ADDICT BEHAV 11 (4): 261-278 DEC 1997

[13] Robertson K, Fiscus S, Kapoor C, et al.
CSF, plasma viral load and HIV associated dementia.
J NEUROVIROL 4 (1): 90-94 FEB 1998

[14] Westfall PH, Young SS, Lin DKJ
Forward selection error control in the analysis of supersaturated designs.
STAT SINICA 8 (1): 101-117 JAN 1998

[15] Ottenbacher KJ
Quantitative evaluation of multiplicity in epidemiology and public
health research.
AM J EPIDEMIOL 147 (7): 615-619 APR 1 1998

[16] Tchernichovski O, Nottebohm F
Social inhibition of song imitation among sibling male zebra finches.
P NATL ACAD SCI USA 95 (15): 8951-8956 JUL 21 1998



[17] Mallet L, Mazoyer B, Martinot JL
Functional connectivity in depressive, obsessive-compulsive, and
schizophrenic disorders: an
explorative correlational analysis of regional cerebral metabolism.
PSYCHIAT RES-NEUROIM 82 (2): 83-93 MAY 20 1998

[18] Spelman RJ, Bovenhuis H
Moving from QTL experimental results to the utilization of QTL in
breeding programmes.
ANIM GENET 29 (2): 77-84 APR 1998

[19] Chen VJ, Bewley JR, Andis SL, et al.
Preclinical cellular pharmacology of LY231514 (MTA): a comparison with
methotrexate, LY309887 and
raltitrexed for their effects on intracellular folate and nucleoside
triphosphate pools in CCRF-CEM cells.
BRIT J CANCER 78: 27-34 Suppl. 3 1998

[20] Green SB, Thompson MS, Babyak MA
A Monte Carlo investigation of methods for controlling type I errors
with specification searches in
structural equation modeling.
MULTIVAR BEHAV RES 33 (3): 365-383 1998

[21] Weller JI, Song JZ, Heyen DW, et al.
A new approach to the problem of multiple comparisons in the genetic
dissection of complex traits.
GENETICS 150 (4): 1699-1706 DEC 1998

[22] Harris LE, Luft FC, Rudy DW, et al.
Effects of multidisciplinary case management in patients with chronic
renal insufficiency.
AM J MED 105 (6): 464-471 DEC 1998

[23] Keselman HJ, Cribbie R, Holland B
The pairwise multiple comparison multiplicity problem: An alternative
approach to familywise and
comparisonwise type I error control.
PSYCHOL METHODS 4 (1): 58-69 MAR 1999

[24] Williams VSL, Jones LV, Tukey JW
Controlling error in multiple comparisons, with examples from
state-to-state differences in
educational achievement.
J EDUC BEHAV STAT 24 (1): 42-69 SPR 1999

[25] Wilkinson L
Statistical methods in psychology journals - Guidelines and explanations.
AM PSYCHOL 54 (8): 594-604 AUG 1999

[26] Dusseldorp E, van Elderen T, Maes S, et al.
A meta-analysis of psychoeducational programs for coronary heart disease
patients.
HEALTH PSYCHOL 18 (5): 506-519 SEP 1999

[27] Yekutieli D, Benjamini Y
Resampling-based false discovery rate controlling multiple test
procedures for correlated test statistics.
J STAT PLAN INFER 82 (1-2): 171-196 DEC 1 1999

[28] Schaffer CM, Green PE
Cluster-based market segmentation: some further comparisons of
alternative approaches.
J MARKET RES SOC 40 (2): 155-163 APR 1998

[29] Lark RM, Nielsen BL, Mottram TT
A time series model of daily milk yields and its possible use for
detection of a disease (ketosis).
ANIM SCI 69: 573-582 Part 3 DEC 1999

[30] Heyen DW, Weller JI, Ron M, et al.
A genome scan for QTL influencing milk production and health traits in
dairy cattle.
PHYSIOL GENOMICS 1 (3): 165-175 NOV 11 1999

[31] Slawecki CJ, Somes C, Ehlers CL
Effects of prolonged ethanol exposure on neurophysiological measures
during an associative learning paradigm.
DRUG ALCOHOL DEPEN 58 (1-2): 125-132 FEB 1 2000

[32] Pigeot I
Basic concepts of multiple tests - A survey.
STAT PAP 41 (1): 3-36 JAN 2000

[33] Bovenhuis H, Spelman RJ
Selective genotyping to detect quantitative trait loci for multiple
traits in outbred populations.
J DAIRY SCI 83 (1): 173-180 JAN 2000

[34] Andrade AR, Lambert EW, Bickman L
Dose effect in child psychotherapy: Outcomes associated with negligible
treatment.
J AM ACAD CHILD PSY 39 (2): 161-168 FEB 2000

[35] Witte JS, Elston RC, Cardon LR
On the relative sample size required for multiple comparisons.
STAT MED 19 (3): 369-372 FEB 15 2000

[36] Zaykin DV, Young SS, Westfall PH
Using the false discovery rate approach in the genetic dissection of
complex traits: A response to
Weller et al.
GENETICS 154 (4): 1917-1918 APR 2000

[37] Tierney WM, Harris LE, Gaskins DL, et al.
Restricting Medicaid payments for transportation: Effects on inner-city
patients' health care.
AM J MED SCI 319 (5): 326-333 MAY 2000

[38] Suhy J, Rooney WD, Goodkin DE, et al.
H-1 MRSI comparison of white matter and lesions in primary progressive
and relapsing-remitting MS.
MULT SCLER 6 (3): 148-155 JUN 2000

[39] Benjamini Y, Hochberg Y
On the adaptive control of the false discovery fate in multiple testing
with independent statistics.
J EDUC BEHAV STAT 25 (1): 60-83 SPR 2000

[40] Cribbie RA
Evaluating the importance of individual parameters in structural
equation modeling: the need for type I error control.
PERS INDIV DIFFER 29 (3): 567-577 SEP 2000

[41] Curran-Everett D
Multiple comparisons: philosophies and illustrations.
AM J PHYSIOL-REG I 279 (1): R1-R8 JUL 2000

[42] Bickman L, Lambert EW, Andrade AR, et al.
The Fort Bragg continuum of care for children and adolescents: Mental
health outcomes over 5 years.
J CONSULT CLIN PSYCH 68 (4): 710-716 AUG 2000

[43] Ellis SP, Underwood MD, Arango V, et al.
Mixed models and multiple comparisons in analysis of human neurochemical
maps.
PSYCHIAT RES-NEUROIM 99 (2): 111-119 AUG 28 2000

[44] Kanas N, Salnitskiy V, Grund EM, et al.
Social and cultural issues during Shuttle/Mir space missions.
ACTA ASTRONAUT 47 (2-9): 647-655 Sp. Iss. SI JUL-NOV 2000

[45] Kanas N, Salnitskiy V, Grund EM, et al.
Interpersonal and cultural issues involving crews and ground personnel
during Shuttle/Mir space
Missions.
AVIAT SPACE ENVIR MD 71 (9): A11-A16 Suppl. S SEP 2000

[46] Barry MJ
Inducible defences in Daphnia: responses to two closely related predator
species.
OECOLOGIA 124 (3): 396-401 AUG 2000

[47] George EI
The variable selection problem.
J AM STAT ASSOC 95 (452): 1304-1308 DEC 2000

[48] Brown LD
An essay on statistical decision theory.
J AM STAT ASSOC 95 (452): 1277-1281 DEC 2000

[49] Recknagel RJ, Kowarschik R, Notni G
High-resolution defect detection and noise reduction using wavelet
methods for surface measurement.
J OPT A-PURE APPL OP 2 (6): 538-545 NOV 2000

[50] Luijten SH, Dierick A, Oostermeijer JGB, et al.
Population size, genetic variation, and reproductive success in a
rapidly declining, self-incompatible
perennial (Arnica montana) in The Netherlands.
CONSERV BIOL 14 (6): 1776-1787 DEC 2000

[51] George EI, Foster DP
Calibration and empirical Bayes variable selection.
BIOMETRIKA 87 (4): 731-747 DEC 2000

[52] Khatri P, Babyak M, Croughwell ND, et al.
Temperature during coronary artery bypass surgery affects quality of life.
ANN THORAC SURG 71 (1): 110-116 JAN 2001

[53] Trojan DA, Collet JP, Pollak MN, et al.
Serum insulin-like growth factor-I (IGF-I) does not correlate positively
with isometric strength, fatigue,
and quality of life in post-polio syndrome.
J NEUROL SCI 182 (2): 107-115 JAN 1 2001

[54] Stine RA, Heyse JF
Non-parametric estimates of overlap.
STAT MED 20 (2): 215-236 JAN 30 2001

[55] Vedantham K, Brunet A, Boyer R, et al.
Posttraumatic stress disorder, trauma exposure, and the current health
of Canadian bus drivers.
CAN J PSYCHIAT 46 (2): 149-155 MAR 2001

[56] Ip EH
Testing for local dependency in dichotomous and polytomous item response
models.
PSYCHOMETRIKA 66 (1): 109-132 MAR 2001

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Bonferroni-adjusted t-tests

Arthur Kramer
Because SPSS provides the exact p level, why not report the p levels as
"...significant at 'p=.0-whatever is obtained' " and follow that up with the
confidence intervals and an assessment of effect size.  With the sample
sizes you are dealing with, that may be more informative.

Arthur Kramer, Ph.D.
Director of Institutional Research
New Jersey City University
-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Marta García-Granero
Sent: Tuesday, January 29, 2008 10:04 AM
To: [hidden email]
Subject: Re: Bonferroni-adjusted t-tests

Shin escribió:

> Dear all:
>
> I have a question regarding t-tests. One of the reviewers suggested me to
> perform "Bonferroni-adjusted t-tests". I understand that Bonferroni is one
> of the post hoc methods in MULTIPLE group comparisons of ANOVA. However,
> our study deals with only TWO groups.
>
> His/her main argument is that our results of two-group comparison
> generated very small means, but probably due to a relatively large sample
> size (n1=870, n2=780), t-tests detected statistical significance at 0.05,
> which may not have any practical meaning. Therefore, he/she suggested us
> to "adjuste the alpha level." He/She implies that if I indeed adjust the
> alpha, some of the comparisons may not result in statistical significance.
>
> Could anybody explain how we could perform Bonferroni-adjusted t-tests IN
> A TWO-GROUP COMPARISON via SPSS? I have looked at the main menu but could
> not find any option of this kind, except in ANOVA as a post hoc option.
> Thank you for your help in advance.

Hi Shin

There are several things you can do:

1) Fight the reviewer's suggestion: Bonferroni adjustment can be too
stringent and do more harm than good to you statistical analysis
See Perneger's paper "What's wrong with Bonferroni adjustment" in
British Medical Journal (avalaible for free online at the web page).

2) Replace Bonferroni adjustment for something a bit less conservative,
like some FDR algorithm (see some references and documents at the end of
this message)

3) Acknowledge the reviewer's suggestion and perform BOnferroni
adjustment on your t tests. You don't nedd SPSS for that. The logic
behind the reviewer's request is simply to take the significance alpha
level (0.05 usually) and divide it by the numbers of tests you've run
(you'll get and adjusted alpha level, usually quite low). Then, you
declare significant only those tests with p-values lower than the
adjusted alpha level. Simple, and some times catastrophic for the
significance of the tests you performed (sometimes, not a single result
is significant after that, if the number of tests run was high).


HTH,
Marta García-Granero

--------------------------------
Appendix:

Extract of a document I downloaded from a web page that doesn't exist
anymore (http://www.math.tau.ac.il/~roee/FDR_Downloads2.htm a pity,it
included a program, called FDRAlgo.exe that could have helped you in the
task of getting a better adjustment method for your p-values than
Bonferroni's. Anyway, I still keep a copy of it, and I can send you the
program privately to your e-mail address)

THE MULTIPLICITY PROBLEM

Everyday, one can find, in the newspaper or in other popular press, some
claim of association between a stimulus and an outcome, with
consequences for health or general welfare of the population at large.
Many of these associations are suspect at best, and often will not hold
up under scrutiny. Examples of such association are: coffee and heart
attacks, vitamins and IQ, tomato sauce and cancer, and on and on. Many
of these claims have shaky foundations, and some have not been
replicated in further research. With so much conflicting information in
the popular press, the general public has learned to mistrust
statistical studies, and to shy away from the use of statistics in general.

There are several reasons that cause these incorrect conclusions to
become part of the scientific and popular press; usually scientists
fault such things as improper study design and poor data. Another reason
for these claims originates from large studies, where data analysts
report all the tests that are “statistically significant” (usually
defined as p < 0.05, where “p” denotes p-value) as a “real” effect. On
the surface, this practice seems innocuousness; since this is the rule
learned in statistics classes. The problem arises when multiple test are
performed “p < 0.05” outcome can often occur when there is no real
effect at all. Historically, the “p < 0.05” rule was devised for a
single test with the following logic: if p < 0.05 outcome was observed,
than the analyst has two options Either he\she can believe that there is
no real effect and that the data is so anomalous that it is within the
range of values that would be observed only 1 in 20, or he/she may
choose to believe that the observed association is real. Because the 1
in 20 chance is relatively small, the common practice is to “reject” the
hypothesis of no real effect and “accept” the conclusion that the effect
is real.

The logic breaks down when more than one test and comparisons are
considered in a single study. If one considers 20 or more tests, than
one expects at least one “1 in 20” significant outcome, even when none
of the effects are real. Thus, there is little protection offered by the
“1 in 20” rule, and incorrect claims can result.

Although incorrect decisions can be blamed on poor design, bad data,
etc., one should be aware that multiplicity can cause faulty
conclusions, and should be taken care off in large studies that includes
many tests and comparisons.

One example for these kinds of studies is: subgroup analysis in a
clinical trial.

As a part of the pharmaceutical development process new therapies
usually are evaluated using randomized clinical trials. In such studies,
patients are randomly assigned to either active or placebo therapy.
After the conclusion of the study, the active or placebo groups are
compared to see which is better, using a single pre-defined outcome of
interest. At this stage, there is no multiplicity problem, since there
is only one test. However, there are many reasons to evaluate patient
subgroups. The therapy might works better for younger patients, better
for patients with mild conditions as opposed to sever etc. While it is
good to ask such questions, such data must be analyzed carefully, and
with multiplicity problem in mind. If the data are thus subdivided into
many subgroups, it can easily happen that a patient subgroup shows
“statistical significance” by chance alone, leading analyst to
incorrectly recommend it for that subgroup, or worse yet, to recommend
it for all groups based on the evidence from the single subgroup.

Classical Multiple Comparison Procedures aim at controlling the
probability of committing even a single type-I error within the tested
family of hypotheses. The main problem with such classical procedures,
which hinder their application in applied research, is that they tend to
have substantially less power than uncorrected procedures. In many
instances, lack of multiplicity control is too permissive; the full
protection resulting from controlling the FWE is too restrictive. This
is the case when the overall conclusion from the various individual
inferences is not necessarily erroneous as soon as one of them is, yet
selection effect is still of concern.


The FDR is a new approach to multiple hypotheses testing. The FDR is the
expected proportion of true null hypotheses rejected out of the total
number of null hypotheses rejected. Multiple comparison procedures
controlling the FDR are more powerful than the commonly used multiple
comparison procedures based on the Family Wise Error Rate. FDR
controlling procedures are especially suited to to large multiple
comparison problems in which existing procedures lack power.

FDR methodology: http://www.math.tau.ac.il/~roee/meth.htm

The FDR is the expected proportion of true null hypotheses rejected out
of the total number of rejections. If R null hypotheses are rejected in
multiple testing procedure, V the number of true null hypotheses
rejected, the FDR is defined:

| V/R if R>0
FDR=E(Q) --> Q= |
| 0 if R=0

The concept:


Classical Multiple Comparison Procedures aim at controlling the
probability of committing even a single type-I error within the tested
family of hypotheses. The main problem with such classical procedures,
which hinder their application in applied research, is that they tend to
have substantially less power than uncorrected procedures. In many
instances, lack of multiplicity control is too permissive; the full
protection resulting from controlling the FWE is too restrictive. This
is the case when the overall conclusion from the various individual
inferences is not necessarily erroneous as soon as one of them is, yet
selection effect is still of concern. Benjamini and Hochberg (1995)
introduced the False Discovery Rate (FDR) - the expected ratio of
erroneous rejections to the number of rejected hypotheses, as an
appropriate error rate to control. The FDR is equal to the family wise
error rate when the number of true null hypotheses mo equals the number
of all hypotheses under test m, so in such a situation controlling the
FDR controls the FWE as well. But the FDR criterion is adaptive, in the
sense that when some of the tested hypotheses are not true (i.e. mo <
m), the FDR is smaller, and more so when more of the hypotheses are not
true. Hence FDR controlling procedures can be more powerful than FWE
controlling procedures at the same level.

List of FDR controlling procedures, when can they be used?

Procedure Independence Positive Pairwise General
dependence dependence
Linear Step Up procedure. + + + -
Generalized Linear Step Up procedure + + + +
Step Down + - - -
Generalized Step Down. + + + -
Two Stage Linear Step Up procedure. + ? ? -
Adaptive procedure. + ? + -
Troendle’s Step-up and Step-down
procedures - Many to one - -
Resampling procedure. + + + +


Historical perspective

The proposal for FDR control in BH was motivated by the paper of Soricç
(1987) which was a strong and emotional call for the necessity to
controlled inference because of the increased error resulting from
multiple inferences. Otherwise, he warned, the expected number of `false
discoveries` becomes large relative to the number of discoveries. Since
BH has been published we have learned of independent previous efforts in
the direction in which we went: looking for suitable error control in
face of multiplicity, when the full protective power of the FWE is not
necessary.

Shaffer (1997) has noted that an informal effort in this direction had
already been attempted by Elkund in an unpublished work in Swedish. This
work has been reported by Seeger (1968) who also attributes the
procedure to Elkund. Seeger proved that when all tested null hypotheses
are true the procedure controls the FWE at level q, but when some
hypotheses are true while other are false (i.e. when m0 < m), this is
not the case. Apparently Seeger's second result, that the procedure does
not always control the FWE at the desired level, had diminished the
interest in the procedure at the time it was proposed, to the point it
became completely unknown (e.g. no mentioning in Hochberg and Tamhane ,
1987).

Independently, Simes (1986) proposed a global test of the single
intersection hypothesis. He gave a nice proof (by induction) of the
error controlling property of the test, which is essentially Seeger’s
first result. Simes suggested also suggested this procedure as an
informal multiple testing procedure, but then Hommel (1988) showed - as
Seeger had done before - that it does not control the FWE in the strong
sense. Therefore, in the realm of FWE control, the procedure cannot be
used for making the multiple inferences about the individual hypotheses.
It can, and was used, to derive several other testing procedures e.g. by
Hochberg (1988) and Hommel (1988), but these procedure are less
powerful. Sen (1998a) points out that this equality is actually the
classical Ballot Theorem related to uniform order statistics. Interest
in the procedure as a multiple testing procedure came, in view of the
FDR criterion it controls (BH): see, for example, its implementation in
the new SAS MULTPROC software. For a review of the global testing
procedure and its extensions see Hochberg and Hommel (1997).

Partial list of FDR references:

Benjamini, Y., Hochberg, Y. (1995). " Controlling the False Discovery
Rate: a Practical and Powerful Approach to Multiple Testing ", Journal
of the Royal Statistical Society B, 57 289-300.

Benjamini, Y., Hochberg, Y., Kling, Y. (1993)." False Discovery Rate
control in pairwise comparisons ", Working Paper 93-2, Dept. of Statist.
and O.R., Tel Aviv Univ.

Benjamini, Y., Hochberg, Y., Kling, Y. (1997)." False Discovery Rate
control in multiple hypotheses testing using dependent test statistics
", Research Paper 97-1, Dept. of Statist. and O.R., Tel Aviv Univ.

Benjamini, Y., Kling, Y. (1999). "A look at Statistical Process Control
through the
P-Value" .Research Paper 99-8, Dept. of Statist. and O.R., Tel Aviv Univ.

Benjamini, Y., Liu, W., (1999) "A distribution-free multiple test
procedure that controls the false discovery rate", Research Paper 99-3,
Dept. of Statist. and O.R., Tel Aviv Univ.

Benjamini, Y., Yekutieli, D. (1997) "The control of the False Discovery
Rate under dependence". Research Paper 97-4, Dept. of Statist. and O.R.,
Tel Aviv Univ.

Hochberg, Y. (1988). " A sharper Bonferroni procedure for multiple tests
of significance ", Biometrika 75, 800-803.

Hochberg, Y., Hommel, G. (1998) " Step-up multiple testing procedures "
Encyc. Statist. Sc. Supplementary Vol, 2.

Hochberg, Y., Rom, D. (1996). "Extensions of multiple testing procedures
based on Simes' test " J. Statist. Plann. Inference 48, 141-152.

Hommel, G. (1988). "A stagewise rejective multiple test procedure based
on a modified Bonferroni test ".Biometrika 75,383-386.

Needleman et al. (1979). "Deficits in psychologic and classroom
performance of children with elevated dentine lead levels", New England
Journal of Medicine, 300, 689-695.

Sarkar, S. K., (1998) " Some probability inequalities for ordered random
variables: A proof of Simes' conjecture " The Annals of Statistics, 2,
494-504.

Seeger. (1968). "A note on a method for the analysis of significances en
mass" Technometrics, 10 586-593.

Simes, R. J., (1986) "An improved Bonferroni procedure for multiple
tests of significance", Biometrika73 751-754.

Wassmer, G., Reitmer, P., Kieser, M., Lehmacher, W. (1998) "Procedures
for testing multiple endpoints in clinical trials: an overview" J.
Statist. Plann. Inference (In press).

Westfall, P. H., Young, S. S.(1993), Resampling based multiple testing,
Wiley, New York.

Yekutieli, D., (1999) "Elkund-Seeger-Simes is Conservative for testing
all pairwise comparison" . Research Paper 99-7, Dept. of Statist. and
O.R., Tel Aviv Univ.

Yekutieli, D., Benjamini, Y., (1999) "Resampling based false discovery
controlling multiple test procedures for correlated test statistics" J.
Statist. Plann. Inference

FDR in Applications

A discussion of applied statistical work using FDR, organized by
scientific area.

Medicine (general)

Reviews: Curran-Everett [41] mostly pairwise in Physiology; Ottenbacher
[15] talks about FDR control in Epidimiology, not being aware of FDR
controlling procedures.

Methodology: Brown &Russell [10] operating characteristics; Witte et al
[35] sample size calculations; Stine & Heyse [54] estimates of overlap.

Applications: comparison of powder inhalers [1,2]; Pharmacological
comparisons [19]; Public health policy [37, 22]; Shuttle/Mir space
missions (Americans vs Russians) [44,45]; Quality of life comparisons [52].

Psychiatry and Neurology

Methodology: Ellis et al [43] developes the analysis of neurochemical maps;

Applications: Study of organ transplant patients [3]; Effects of ethanol
exposure [31]; Dose effect study in child psychotherapy [34]; Multiple
Sclerosis [19]; post-polio syndrom [53]; HIV associated dimenia [13];

Psychology, Psychometrics and Education

Reviews:; Wilkinson [25] sets guidelines

Methodology: Kesselman, Cribbie & Holland [23], Williams Jones & Tukey
[24] - develop pairwise comparisons; Wainer [8] adjusts graphical
displays. Ip [56] developes the study local dependencies in item
response problems.

Applications: National Assessment of Educational Progress data
[5,8,39,24]; Drug abuse [12]; Meta-analysis of psychoeducational
programs for heart disease patients [26]; Mental health outcomes [42];
cross sectional study of bus drivers [55]

Genetics and Biology

Methodology: Basford & Tukey [6] develop graphical profiles aiding in
plant breeding experiments; Weller et al [9,43] uses it in genetic
mapping (QTL analysis), elicitating response by Zaykin, Young & Westfall
[22]; Bovenhuis & Spelman [33] adapt the methods for selective
genotyping; Drijalenko & Elston [11] finds the concept useful in mapping;

Applications: QTL analysis for milk production [20,43] and milk yield
monitoring [29]; Breeding [6,50]; Defences in Daphnia [46]; Song
imitation in Zebra Finches [16];

Signal Denoising and Image Processing

Methodology: Benjamini & Abramovich [4] develop adaptive thresholding of
wavelet coefficients;

Aplications: High resolution defects detection [49]

Economics and Marketing

Methodology: Green & Babyak [7,20,40] develop multiple testing of
constraints and individual parameters in structural equations models;
Schaffer and Green [28] develop cluster based market segmentation

Meteorology

Applications: Long distances dependencies in pressure [32]

Statistical Theory

Reviews: Brown [48] reviews FDR development in an essay on statistical
decision theory; Pigeot [32] reviews multiple testing.

Methodology: The use of FDR as variable selection strategy was developed
by Abramovich et al, and other interpretations can be found in George
[47], George and Foster [51]. Shaffer found a connection to Duncan's
semi Bayesian procedure.

List of References for Applications

[1] Schlaeppi M, Edwards K, Fuller RW, et al.
Patient perception of the Diskus inhaler: A comparison with the
Turbuhaler inhaler.
BRIT J CLIN PRACT 50 (1): 14-19 JAN-FEB 1996

[2] Sharma RK, Edwards K, Hallett C, et al.
Perception among paediatric patients of the Diskus(R) inhaler, a novel
multidose powder inhaler
for use in the treatment of asthma - Comparison with the Turbuhaler(R)
inhaler.
CLIN DRUG INVEST 11 (3): 145-153 MAR 1996

[3] Thomason JM, Seymour RA, Ellis JS, et al.
Determinants of gingival overgrowth severity in organ transplant
patients - An examination of the role of HLA phenotype.
J CLIN PERIODONTOL 23 (7): 628-634 JUL 1996

[4] Abramovich F, Benjamini Y
Adaptive thresholding of wavelet coefficients.
COMPUT STAT DATA AN 22 (4): 351-361 AUG 10 1996

[5] Wainer H
Depicting error.
AM STAT 50 (2): 101-111 MAY 1996

[6] Basford KE, Tukey JW
Graphical profiles as an aid to understanding plant breeding experiments.
J STAT PLAN INFER 57 (1): 93-107 JAN 15 1997

[7] Green SB, Babyak MA
Control of type I errors with multiple tests of constraints in
structural equation modeling.
MULTIVAR BEHAV RES 32 (1): 39-51 1997

[8] Wainer H
Improving tabular displays, with NAEP tables as examples and inspirations.
J EDUC BEHAV STAT 22 (1): 1-30 SPR 1997

[9] Weller JI, Song JZ, Ronin YI, et al.
Designs and solutions to multiple trait comparisons.
ANIM BIOTECHNOL 8 (1): 107-122 1997

[10] Brown BW, Russell K
Methods correcting for multiple testing: Operating characteristics.
STAT MED 16 (22): 2511-2528 NOV 30 1997

[11] Drigalenko EI, Elston RC
False discoveries in genome scanning.
GENET EPIDEMIOL 14 (6): 779-784 1997

[12] Hubbard RL, Craddock SG, Flynn PM, et al.
Overview of 1-year follow-up outcomes in the Drug Abuse Treatment
Outcome Study (DATOS).
PSYCHOL ADDICT BEHAV 11 (4): 261-278 DEC 1997

[13] Robertson K, Fiscus S, Kapoor C, et al.
CSF, plasma viral load and HIV associated dementia.
J NEUROVIROL 4 (1): 90-94 FEB 1998

[14] Westfall PH, Young SS, Lin DKJ
Forward selection error control in the analysis of supersaturated designs.
STAT SINICA 8 (1): 101-117 JAN 1998

[15] Ottenbacher KJ
Quantitative evaluation of multiplicity in epidemiology and public
health research.
AM J EPIDEMIOL 147 (7): 615-619 APR 1 1998

[16] Tchernichovski O, Nottebohm F
Social inhibition of song imitation among sibling male zebra finches.
P NATL ACAD SCI USA 95 (15): 8951-8956 JUL 21 1998



[17] Mallet L, Mazoyer B, Martinot JL
Functional connectivity in depressive, obsessive-compulsive, and
schizophrenic disorders: an
explorative correlational analysis of regional cerebral metabolism.
PSYCHIAT RES-NEUROIM 82 (2): 83-93 MAY 20 1998

[18] Spelman RJ, Bovenhuis H
Moving from QTL experimental results to the utilization of QTL in
breeding programmes.
ANIM GENET 29 (2): 77-84 APR 1998

[19] Chen VJ, Bewley JR, Andis SL, et al.
Preclinical cellular pharmacology of LY231514 (MTA): a comparison with
methotrexate, LY309887 and
raltitrexed for their effects on intracellular folate and nucleoside
triphosphate pools in CCRF-CEM cells.
BRIT J CANCER 78: 27-34 Suppl. 3 1998

[20] Green SB, Thompson MS, Babyak MA
A Monte Carlo investigation of methods for controlling type I errors
with specification searches in
structural equation modeling.
MULTIVAR BEHAV RES 33 (3): 365-383 1998

[21] Weller JI, Song JZ, Heyen DW, et al.
A new approach to the problem of multiple comparisons in the genetic
dissection of complex traits.
GENETICS 150 (4): 1699-1706 DEC 1998

[22] Harris LE, Luft FC, Rudy DW, et al.
Effects of multidisciplinary case management in patients with chronic
renal insufficiency.
AM J MED 105 (6): 464-471 DEC 1998

[23] Keselman HJ, Cribbie R, Holland B
The pairwise multiple comparison multiplicity problem: An alternative
approach to familywise and
comparisonwise type I error control.
PSYCHOL METHODS 4 (1): 58-69 MAR 1999

[24] Williams VSL, Jones LV, Tukey JW
Controlling error in multiple comparisons, with examples from
state-to-state differences in
educational achievement.
J EDUC BEHAV STAT 24 (1): 42-69 SPR 1999

[25] Wilkinson L
Statistical methods in psychology journals - Guidelines and explanations.
AM PSYCHOL 54 (8): 594-604 AUG 1999

[26] Dusseldorp E, van Elderen T, Maes S, et al.
A meta-analysis of psychoeducational programs for coronary heart disease
patients.
HEALTH PSYCHOL 18 (5): 506-519 SEP 1999

[27] Yekutieli D, Benjamini Y
Resampling-based false discovery rate controlling multiple test
procedures for correlated test statistics.
J STAT PLAN INFER 82 (1-2): 171-196 DEC 1 1999

[28] Schaffer CM, Green PE
Cluster-based market segmentation: some further comparisons of
alternative approaches.
J MARKET RES SOC 40 (2): 155-163 APR 1998

[29] Lark RM, Nielsen BL, Mottram TT
A time series model of daily milk yields and its possible use for
detection of a disease (ketosis).
ANIM SCI 69: 573-582 Part 3 DEC 1999

[30] Heyen DW, Weller JI, Ron M, et al.
A genome scan for QTL influencing milk production and health traits in
dairy cattle.
PHYSIOL GENOMICS 1 (3): 165-175 NOV 11 1999

[31] Slawecki CJ, Somes C, Ehlers CL
Effects of prolonged ethanol exposure on neurophysiological measures
during an associative learning paradigm.
DRUG ALCOHOL DEPEN 58 (1-2): 125-132 FEB 1 2000

[32] Pigeot I
Basic concepts of multiple tests - A survey.
STAT PAP 41 (1): 3-36 JAN 2000

[33] Bovenhuis H, Spelman RJ
Selective genotyping to detect quantitative trait loci for multiple
traits in outbred populations.
J DAIRY SCI 83 (1): 173-180 JAN 2000

[34] Andrade AR, Lambert EW, Bickman L
Dose effect in child psychotherapy: Outcomes associated with negligible
treatment.
J AM ACAD CHILD PSY 39 (2): 161-168 FEB 2000

[35] Witte JS, Elston RC, Cardon LR
On the relative sample size required for multiple comparisons.
STAT MED 19 (3): 369-372 FEB 15 2000

[36] Zaykin DV, Young SS, Westfall PH
Using the false discovery rate approach in the genetic dissection of
complex traits: A response to
Weller et al.
GENETICS 154 (4): 1917-1918 APR 2000

[37] Tierney WM, Harris LE, Gaskins DL, et al.
Restricting Medicaid payments for transportation: Effects on inner-city
patients' health care.
AM J MED SCI 319 (5): 326-333 MAY 2000

[38] Suhy J, Rooney WD, Goodkin DE, et al.
H-1 MRSI comparison of white matter and lesions in primary progressive
and relapsing-remitting MS.
MULT SCLER 6 (3): 148-155 JUN 2000

[39] Benjamini Y, Hochberg Y
On the adaptive control of the false discovery fate in multiple testing
with independent statistics.
J EDUC BEHAV STAT 25 (1): 60-83 SPR 2000

[40] Cribbie RA
Evaluating the importance of individual parameters in structural
equation modeling: the need for type I error control.
PERS INDIV DIFFER 29 (3): 567-577 SEP 2000

[41] Curran-Everett D
Multiple comparisons: philosophies and illustrations.
AM J PHYSIOL-REG I 279 (1): R1-R8 JUL 2000

[42] Bickman L, Lambert EW, Andrade AR, et al.
The Fort Bragg continuum of care for children and adolescents: Mental
health outcomes over 5 years.
J CONSULT CLIN PSYCH 68 (4): 710-716 AUG 2000

[43] Ellis SP, Underwood MD, Arango V, et al.
Mixed models and multiple comparisons in analysis of human neurochemical
maps.
PSYCHIAT RES-NEUROIM 99 (2): 111-119 AUG 28 2000

[44] Kanas N, Salnitskiy V, Grund EM, et al.
Social and cultural issues during Shuttle/Mir space missions.
ACTA ASTRONAUT 47 (2-9): 647-655 Sp. Iss. SI JUL-NOV 2000

[45] Kanas N, Salnitskiy V, Grund EM, et al.
Interpersonal and cultural issues involving crews and ground personnel
during Shuttle/Mir space
Missions.
AVIAT SPACE ENVIR MD 71 (9): A11-A16 Suppl. S SEP 2000

[46] Barry MJ
Inducible defences in Daphnia: responses to two closely related predator
species.
OECOLOGIA 124 (3): 396-401 AUG 2000

[47] George EI
The variable selection problem.
J AM STAT ASSOC 95 (452): 1304-1308 DEC 2000

[48] Brown LD
An essay on statistical decision theory.
J AM STAT ASSOC 95 (452): 1277-1281 DEC 2000

[49] Recknagel RJ, Kowarschik R, Notni G
High-resolution defect detection and noise reduction using wavelet
methods for surface measurement.
J OPT A-PURE APPL OP 2 (6): 538-545 NOV 2000

[50] Luijten SH, Dierick A, Oostermeijer JGB, et al.
Population size, genetic variation, and reproductive success in a
rapidly declining, self-incompatible
perennial (Arnica montana) in The Netherlands.
CONSERV BIOL 14 (6): 1776-1787 DEC 2000

[51] George EI, Foster DP
Calibration and empirical Bayes variable selection.
BIOMETRIKA 87 (4): 731-747 DEC 2000

[52] Khatri P, Babyak M, Croughwell ND, et al.
Temperature during coronary artery bypass surgery affects quality of life.
ANN THORAC SURG 71 (1): 110-116 JAN 2001

[53] Trojan DA, Collet JP, Pollak MN, et al.
Serum insulin-like growth factor-I (IGF-I) does not correlate positively
with isometric strength, fatigue,
and quality of life in post-polio syndrome.
J NEUROL SCI 182 (2): 107-115 JAN 1 2001

[54] Stine RA, Heyse JF
Non-parametric estimates of overlap.
STAT MED 20 (2): 215-236 JAN 30 2001

[55] Vedantham K, Brunet A, Boyer R, et al.
Posttraumatic stress disorder, trauma exposure, and the current health
of Canadian bus drivers.
CAN J PSYCHIAT 46 (2): 149-155 MAR 2001

[56] Ip EH
Testing for local dependency in dichotomous and polytomous item response
models.
PSYCHOMETRIKA 66 (1): 109-132 MAR 2001

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Bonferroni-adjusted t-tests

Leah Quinlivan
Effect size for the t-test is Eta Squared   ή2

This is the formula :  t squared/ t squared/ df

t2

            t2  + df
Reporting CI and Effect size as well as having an alpha of .01 is more
appropriate for large samples



On 29/01/2008, Arthur Kramer <[hidden email]> wrote:

>
> Because SPSS provides the exact p level, why not report the p levels as
> "...significant at 'p=.0-whatever is obtained' " and follow that up with
> the
> confidence intervals and an assessment of effect size.  With the sample
> sizes you are dealing with, that may be more informative.
>
> Arthur Kramer, Ph.D.
> Director of Institutional Research
> New Jersey City University
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Marta García-Granero
> Sent: Tuesday, January 29, 2008 10:04 AM
> To: [hidden email]
> Subject: Re: Bonferroni-adjusted t-tests
>
> Shin escribió:
> > Dear all:
> >
> > I have a question regarding t-tests. One of the reviewers suggested me
> to
> > perform "Bonferroni-adjusted t-tests". I understand that Bonferroni is
> one
> > of the post hoc methods in MULTIPLE group comparisons of ANOVA. However,
> > our study deals with only TWO groups.
> >
> > His/her main argument is that our results of two-group comparison
> > generated very small means, but probably due to a relatively large
> sample
> > size (n1=870, n2=780), t-tests detected statistical significance at 0.05
> ,
> > which may not have any practical meaning. Therefore, he/she suggested us
> > to "adjuste the alpha level." He/She implies that if I indeed adjust the
> > alpha, some of the comparisons may not result in statistical
> significance.
> >
> > Could anybody explain how we could perform Bonferroni-adjusted t-tests
> IN
> > A TWO-GROUP COMPARISON via SPSS? I have looked at the main menu but
> could
> > not find any option of this kind, except in ANOVA as a post hoc option.
> > Thank you for your help in advance.
>
> Hi Shin
>
> There are several things you can do:
>
> 1) Fight the reviewer's suggestion: Bonferroni adjustment can be too
> stringent and do more harm than good to you statistical analysis
> See Perneger's paper "What's wrong with Bonferroni adjustment" in
> British Medical Journal (avalaible for free online at the web page).
>
> 2) Replace Bonferroni adjustment for something a bit less conservative,
> like some FDR algorithm (see some references and documents at the end of
> this message)
>
> 3) Acknowledge the reviewer's suggestion and perform BOnferroni
> adjustment on your t tests. You don't nedd SPSS for that. The logic
> behind the reviewer's request is simply to take the significance alpha
> level (0.05 usually) and divide it by the numbers of tests you've run
> (you'll get and adjusted alpha level, usually quite low). Then, you
> declare significant only those tests with p-values lower than the
> adjusted alpha level. Simple, and some times catastrophic for the
> significance of the tests you performed (sometimes, not a single result
> is significant after that, if the number of tests run was high).
>
>
> HTH,
> Marta García-Granero
>
> --------------------------------
> Appendix:
>
> Extract of a document I downloaded from a web page that doesn't exist
> anymore (http://www.math.tau.ac.il/~roee/FDR_Downloads2.htm a pity,it
> included a program, called FDRAlgo.exe that could have helped you in the
> task of getting a better adjustment method for your p-values than
> Bonferroni's. Anyway, I still keep a copy of it, and I can send you the
> program privately to your e-mail address)
>
> THE MULTIPLICITY PROBLEM
>
> Everyday, one can find, in the newspaper or in other popular press, some
> claim of association between a stimulus and an outcome, with
> consequences for health or general welfare of the population at large.
> Many of these associations are suspect at best, and often will not hold
> up under scrutiny. Examples of such association are: coffee and heart
> attacks, vitamins and IQ, tomato sauce and cancer, and on and on. Many
> of these claims have shaky foundations, and some have not been
> replicated in further research. With so much conflicting information in
> the popular press, the general public has learned to mistrust
> statistical studies, and to shy away from the use of statistics in
> general.
>
> There are several reasons that cause these incorrect conclusions to
> become part of the scientific and popular press; usually scientists
> fault such things as improper study design and poor data. Another reason
> for these claims originates from large studies, where data analysts
> report all the tests that are "statistically significant" (usually
> defined as p < 0.05, where "p" denotes p-value) as a "real" effect. On
> the surface, this practice seems innocuousness; since this is the rule
> learned in statistics classes. The problem arises when multiple test are
> performed "p < 0.05" outcome can often occur when there is no real
> effect at all. Historically, the "p < 0.05" rule was devised for a
> single test with the following logic: if p < 0.05 outcome was observed,
> than the analyst has two options Either he\she can believe that there is
> no real effect and that the data is so anomalous that it is within the
> range of values that would be observed only 1 in 20, or he/she may
> choose to believe that the observed association is real. Because the 1
> in 20 chance is relatively small, the common practice is to "reject" the
> hypothesis of no real effect and "accept" the conclusion that the effect
> is real.
>
> The logic breaks down when more than one test and comparisons are
> considered in a single study. If one considers 20 or more tests, than
> one expects at least one "1 in 20" significant outcome, even when none
> of the effects are real. Thus, there is little protection offered by the
> "1 in 20" rule, and incorrect claims can result.
>
> Although incorrect decisions can be blamed on poor design, bad data,
> etc., one should be aware that multiplicity can cause faulty
> conclusions, and should be taken care off in large studies that includes
> many tests and comparisons.
>
> One example for these kinds of studies is: subgroup analysis in a
> clinical trial.
>
> As a part of the pharmaceutical development process new therapies
> usually are evaluated using randomized clinical trials. In such studies,
> patients are randomly assigned to either active or placebo therapy.
> After the conclusion of the study, the active or placebo groups are
> compared to see which is better, using a single pre-defined outcome of
> interest. At this stage, there is no multiplicity problem, since there
> is only one test. However, there are many reasons to evaluate patient
> subgroups. The therapy might works better for younger patients, better
> for patients with mild conditions as opposed to sever etc. While it is
> good to ask such questions, such data must be analyzed carefully, and
> with multiplicity problem in mind. If the data are thus subdivided into
> many subgroups, it can easily happen that a patient subgroup shows
> "statistical significance" by chance alone, leading analyst to
> incorrectly recommend it for that subgroup, or worse yet, to recommend
> it for all groups based on the evidence from the single subgroup.
>
> Classical Multiple Comparison Procedures aim at controlling the
> probability of committing even a single type-I error within the tested
> family of hypotheses. The main problem with such classical procedures,
> which hinder their application in applied research, is that they tend to
> have substantially less power than uncorrected procedures. In many
> instances, lack of multiplicity control is too permissive; the full
> protection resulting from controlling the FWE is too restrictive. This
> is the case when the overall conclusion from the various individual
> inferences is not necessarily erroneous as soon as one of them is, yet
> selection effect is still of concern.
>
>
> The FDR is a new approach to multiple hypotheses testing. The FDR is the
> expected proportion of true null hypotheses rejected out of the total
> number of null hypotheses rejected. Multiple comparison procedures
> controlling the FDR are more powerful than the commonly used multiple
> comparison procedures based on the Family Wise Error Rate. FDR
> controlling procedures are especially suited to to large multiple
> comparison problems in which existing procedures lack power.
>
> FDR methodology: http://www.math.tau.ac.il/~roee/meth.htm
>
> The FDR is the expected proportion of true null hypotheses rejected out
> of the total number of rejections. If R null hypotheses are rejected in
> multiple testing procedure, V the number of true null hypotheses
> rejected, the FDR is defined:
>
> | V/R if R>0
> FDR=E(Q) --> Q= |
> | 0 if R=0
>
> The concept:
>
>
> Classical Multiple Comparison Procedures aim at controlling the
> probability of committing even a single type-I error within the tested
> family of hypotheses. The main problem with such classical procedures,
> which hinder their application in applied research, is that they tend to
> have substantially less power than uncorrected procedures. In many
> instances, lack of multiplicity control is too permissive; the full
> protection resulting from controlling the FWE is too restrictive. This
> is the case when the overall conclusion from the various individual
> inferences is not necessarily erroneous as soon as one of them is, yet
> selection effect is still of concern. Benjamini and Hochberg (1995)
> introduced the False Discovery Rate (FDR) - the expected ratio of
> erroneous rejections to the number of rejected hypotheses, as an
> appropriate error rate to control. The FDR is equal to the family wise
> error rate when the number of true null hypotheses mo equals the number
> of all hypotheses under test m, so in such a situation controlling the
> FDR controls the FWE as well. But the FDR criterion is adaptive, in the
> sense that when some of the tested hypotheses are not true (i.e. mo <
> m), the FDR is smaller, and more so when more of the hypotheses are not
> true. Hence FDR controlling procedures can be more powerful than FWE
> controlling procedures at the same level.
>
> List of FDR controlling procedures, when can they be used?
>
> Procedure Independence Positive Pairwise General
> dependence dependence
> Linear Step Up procedure. + + + -
> Generalized Linear Step Up procedure + + + +
> Step Down + - - -
> Generalized Step Down. + + + -
> Two Stage Linear Step Up procedure. + ? ? -
> Adaptive procedure. + ? + -
> Troendle's Step-up and Step-down
> procedures - Many to one - -
> Resampling procedure. + + + +
>
>
> Historical perspective
>
> The proposal for FDR control in BH was motivated by the paper of Soricç
> (1987) which was a strong and emotional call for the necessity to
> controlled inference because of the increased error resulting from
> multiple inferences. Otherwise, he warned, the expected number of `false
> discoveries` becomes large relative to the number of discoveries. Since
> BH has been published we have learned of independent previous efforts in
> the direction in which we went: looking for suitable error control in
> face of multiplicity, when the full protective power of the FWE is not
> necessary.
>
> Shaffer (1997) has noted that an informal effort in this direction had
> already been attempted by Elkund in an unpublished work in Swedish. This
> work has been reported by Seeger (1968) who also attributes the
> procedure to Elkund. Seeger proved that when all tested null hypotheses
> are true the procedure controls the FWE at level q, but when some
> hypotheses are true while other are false (i.e. when m0 < m), this is
> not the case. Apparently Seeger's second result, that the procedure does
> not always control the FWE at the desired level, had diminished the
> interest in the procedure at the time it was proposed, to the point it
> became completely unknown (e.g. no mentioning in Hochberg and Tamhane ,
> 1987).
>
> Independently, Simes (1986) proposed a global test of the single
> intersection hypothesis. He gave a nice proof (by induction) of the
> error controlling property of the test, which is essentially Seeger's
> first result. Simes suggested also suggested this procedure as an
> informal multiple testing procedure, but then Hommel (1988) showed - as
> Seeger had done before - that it does not control the FWE in the strong
> sense. Therefore, in the realm of FWE control, the procedure cannot be
> used for making the multiple inferences about the individual hypotheses.
> It can, and was used, to derive several other testing procedures e.g. by
> Hochberg (1988) and Hommel (1988), but these procedure are less
> powerful. Sen (1998a) points out that this equality is actually the
> classical Ballot Theorem related to uniform order statistics. Interest
> in the procedure as a multiple testing procedure came, in view of the
> FDR criterion it controls (BH): see, for example, its implementation in
> the new SAS MULTPROC software. For a review of the global testing
> procedure and its extensions see Hochberg and Hommel (1997).
>
> Partial list of FDR references:
>
> Benjamini, Y., Hochberg, Y. (1995). " Controlling the False Discovery
> Rate: a Practical and Powerful Approach to Multiple Testing ", Journal
> of the Royal Statistical Society B, 57 289-300.
>
> Benjamini, Y., Hochberg, Y., Kling, Y. (1993)." False Discovery Rate
> control in pairwise comparisons ", Working Paper 93-2, Dept. of Statist.
> and O.R., Tel Aviv Univ.
>
> Benjamini, Y., Hochberg, Y., Kling, Y. (1997)." False Discovery Rate
> control in multiple hypotheses testing using dependent test statistics
> ", Research Paper 97-1, Dept. of Statist. and O.R., Tel Aviv Univ.
>
> Benjamini, Y., Kling, Y. (1999). "A look at Statistical Process Control
> through the
> P-Value" .Research Paper 99-8, Dept. of Statist. and O.R., Tel Aviv Univ.
>
> Benjamini, Y., Liu, W., (1999) "A distribution-free multiple test
> procedure that controls the false discovery rate", Research Paper 99-3,
> Dept. of Statist. and O.R., Tel Aviv Univ.
>
> Benjamini, Y., Yekutieli, D. (1997) "The control of the False Discovery
> Rate under dependence". Research Paper 97-4, Dept. of Statist. and O.R.,
> Tel Aviv Univ.
>
> Hochberg, Y. (1988). " A sharper Bonferroni procedure for multiple tests
> of significance ", Biometrika 75, 800-803.
>
> Hochberg, Y., Hommel, G. (1998) " Step-up multiple testing procedures "
> Encyc. Statist. Sc. Supplementary Vol, 2.
>
> Hochberg, Y., Rom, D. (1996). "Extensions of multiple testing procedures
> based on Simes' test " J. Statist. Plann. Inference 48, 141-152.
>
> Hommel, G. (1988). "A stagewise rejective multiple test procedure based
> on a modified Bonferroni test ".Biometrika 75,383-386.
>
> Needleman et al. (1979). "Deficits in psychologic and classroom
> performance of children with elevated dentine lead levels", New England
> Journal of Medicine, 300, 689-695.
>
> Sarkar, S. K., (1998) " Some probability inequalities for ordered random
> variables: A proof of Simes' conjecture " The Annals of Statistics, 2,
> 494-504.
>
> Seeger. (1968). "A note on a method for the analysis of significances en
> mass" Technometrics, 10 586-593.
>
> Simes, R. J., (1986) "An improved Bonferroni procedure for multiple
> tests of significance", Biometrika73 751-754.
>
> Wassmer, G., Reitmer, P., Kieser, M., Lehmacher, W. (1998) "Procedures
> for testing multiple endpoints in clinical trials: an overview" J.
> Statist. Plann. Inference (In press).
>
> Westfall, P. H., Young, S. S.(1993), Resampling based multiple testing,
> Wiley, New York.
>
> Yekutieli, D., (1999) "Elkund-Seeger-Simes is Conservative for testing
> all pairwise comparison" . Research Paper 99-7, Dept. of Statist. and
> O.R., Tel Aviv Univ.
>
> Yekutieli, D., Benjamini, Y., (1999) "Resampling based false discovery
> controlling multiple test procedures for correlated test statistics" J.
> Statist. Plann. Inference
>
> FDR in Applications
>
> A discussion of applied statistical work using FDR, organized by
> scientific area.
>
> Medicine (general)
>
> Reviews: Curran-Everett [41] mostly pairwise in Physiology; Ottenbacher
> [15] talks about FDR control in Epidimiology, not being aware of FDR
> controlling procedures.
>
> Methodology: Brown &Russell [10] operating characteristics; Witte et al
> [35] sample size calculations; Stine & Heyse [54] estimates of overlap.
>
> Applications: comparison of powder inhalers [1,2]; Pharmacological
> comparisons [19]; Public health policy [37, 22]; Shuttle/Mir space
> missions (Americans vs Russians) [44,45]; Quality of life comparisons
> [52].
>
> Psychiatry and Neurology
>
> Methodology: Ellis et al [43] developes the analysis of neurochemical
> maps;
>
> Applications: Study of organ transplant patients [3]; Effects of ethanol
> exposure [31]; Dose effect study in child psychotherapy [34]; Multiple
> Sclerosis [19]; post-polio syndrom [53]; HIV associated dimenia [13];
>
> Psychology, Psychometrics and Education
>
> Reviews:; Wilkinson [25] sets guidelines
>
> Methodology: Kesselman, Cribbie & Holland [23], Williams Jones & Tukey
> [24] - develop pairwise comparisons; Wainer [8] adjusts graphical
> displays. Ip [56] developes the study local dependencies in item
> response problems.
>
> Applications: National Assessment of Educational Progress data
> [5,8,39,24]; Drug abuse [12]; Meta-analysis of psychoeducational
> programs for heart disease patients [26]; Mental health outcomes [42];
> cross sectional study of bus drivers [55]
>
> Genetics and Biology
>
> Methodology: Basford & Tukey [6] develop graphical profiles aiding in
> plant breeding experiments; Weller et al [9,43] uses it in genetic
> mapping (QTL analysis), elicitating response by Zaykin, Young & Westfall
> [22]; Bovenhuis & Spelman [33] adapt the methods for selective
> genotyping; Drijalenko & Elston [11] finds the concept useful in mapping;
>
> Applications: QTL analysis for milk production [20,43] and milk yield
> monitoring [29]; Breeding [6,50]; Defences in Daphnia [46]; Song
> imitation in Zebra Finches [16];
>
> Signal Denoising and Image Processing
>
> Methodology: Benjamini & Abramovich [4] develop adaptive thresholding of
> wavelet coefficients;
>
> Aplications: High resolution defects detection [49]
>
> Economics and Marketing
>
> Methodology: Green & Babyak [7,20,40] develop multiple testing of
> constraints and individual parameters in structural equations models;
> Schaffer and Green [28] develop cluster based market segmentation
>
> Meteorology
>
> Applications: Long distances dependencies in pressure [32]
>
> Statistical Theory
>
> Reviews: Brown [48] reviews FDR development in an essay on statistical
> decision theory; Pigeot [32] reviews multiple testing.
>
> Methodology: The use of FDR as variable selection strategy was developed
> by Abramovich et al, and other interpretations can be found in George
> [47], George and Foster [51]. Shaffer found a connection to Duncan's
> semi Bayesian procedure.
>
> List of References for Applications
>
> [1] Schlaeppi M, Edwards K, Fuller RW, et al.
> Patient perception of the Diskus inhaler: A comparison with the
> Turbuhaler inhaler.
> BRIT J CLIN PRACT 50 (1): 14-19 JAN-FEB 1996
>
> [2] Sharma RK, Edwards K, Hallett C, et al.
> Perception among paediatric patients of the Diskus(R) inhaler, a novel
> multidose powder inhaler
> for use in the treatment of asthma - Comparison with the Turbuhaler(R)
> inhaler.
> CLIN DRUG INVEST 11 (3): 145-153 MAR 1996
>
> [3] Thomason JM, Seymour RA, Ellis JS, et al.
> Determinants of gingival overgrowth severity in organ transplant
> patients - An examination of the role of HLA phenotype.
> J CLIN PERIODONTOL 23 (7): 628-634 JUL 1996
>
> [4] Abramovich F, Benjamini Y
> Adaptive thresholding of wavelet coefficients.
> COMPUT STAT DATA AN 22 (4): 351-361 AUG 10 1996
>
> [5] Wainer H
> Depicting error.
> AM STAT 50 (2): 101-111 MAY 1996
>
> [6] Basford KE, Tukey JW
> Graphical profiles as an aid to understanding plant breeding experiments.
> J STAT PLAN INFER 57 (1): 93-107 JAN 15 1997
>
> [7] Green SB, Babyak MA
> Control of type I errors with multiple tests of constraints in
> structural equation modeling.
> MULTIVAR BEHAV RES 32 (1): 39-51 1997
>
> [8] Wainer H
> Improving tabular displays, with NAEP tables as examples and inspirations.
> J EDUC BEHAV STAT 22 (1): 1-30 SPR 1997
>
> [9] Weller JI, Song JZ, Ronin YI, et al.
> Designs and solutions to multiple trait comparisons.
> ANIM BIOTECHNOL 8 (1): 107-122 1997
>
> [10] Brown BW, Russell K
> Methods correcting for multiple testing: Operating characteristics.
> STAT MED 16 (22): 2511-2528 NOV 30 1997
>
> [11] Drigalenko EI, Elston RC
> False discoveries in genome scanning.
> GENET EPIDEMIOL 14 (6): 779-784 1997
>
> [12] Hubbard RL, Craddock SG, Flynn PM, et al.
> Overview of 1-year follow-up outcomes in the Drug Abuse Treatment
> Outcome Study (DATOS).
> PSYCHOL ADDICT BEHAV 11 (4): 261-278 DEC 1997
>
> [13] Robertson K, Fiscus S, Kapoor C, et al.
> CSF, plasma viral load and HIV associated dementia.
> J NEUROVIROL 4 (1): 90-94 FEB 1998
>
> [14] Westfall PH, Young SS, Lin DKJ
> Forward selection error control in the analysis of supersaturated designs.
> STAT SINICA 8 (1): 101-117 JAN 1998
>
> [15] Ottenbacher KJ
> Quantitative evaluation of multiplicity in epidemiology and public
> health research.
> AM J EPIDEMIOL 147 (7): 615-619 APR 1 1998
>
> [16] Tchernichovski O, Nottebohm F
> Social inhibition of song imitation among sibling male zebra finches.
> P NATL ACAD SCI USA 95 (15): 8951-8956 JUL 21 1998
>
>
>
> [17] Mallet L, Mazoyer B, Martinot JL
> Functional connectivity in depressive, obsessive-compulsive, and
> schizophrenic disorders: an
> explorative correlational analysis of regional cerebral metabolism.
> PSYCHIAT RES-NEUROIM 82 (2): 83-93 MAY 20 1998
>
> [18] Spelman RJ, Bovenhuis H
> Moving from QTL experimental results to the utilization of QTL in
> breeding programmes.
> ANIM GENET 29 (2): 77-84 APR 1998
>
> [19] Chen VJ, Bewley JR, Andis SL, et al.
> Preclinical cellular pharmacology of LY231514 (MTA): a comparison with
> methotrexate, LY309887 and
> raltitrexed for their effects on intracellular folate and nucleoside
> triphosphate pools in CCRF-CEM cells.
> BRIT J CANCER 78: 27-34 Suppl. 3 1998
>
> [20] Green SB, Thompson MS, Babyak MA
> A Monte Carlo investigation of methods for controlling type I errors
> with specification searches in
> structural equation modeling.
> MULTIVAR BEHAV RES 33 (3): 365-383 1998
>
> [21] Weller JI, Song JZ, Heyen DW, et al.
> A new approach to the problem of multiple comparisons in the genetic
> dissection of complex traits.
> GENETICS 150 (4): 1699-1706 DEC 1998
>
> [22] Harris LE, Luft FC, Rudy DW, et al.
> Effects of multidisciplinary case management in patients with chronic
> renal insufficiency.
> AM J MED 105 (6): 464-471 DEC 1998
>
> [23] Keselman HJ, Cribbie R, Holland B
> The pairwise multiple comparison multiplicity problem: An alternative
> approach to familywise and
> comparisonwise type I error control.
> PSYCHOL METHODS 4 (1): 58-69 MAR 1999
>
> [24] Williams VSL, Jones LV, Tukey JW
> Controlling error in multiple comparisons, with examples from
> state-to-state differences in
> educational achievement.
> J EDUC BEHAV STAT 24 (1): 42-69 SPR 1999
>
> [25] Wilkinson L
> Statistical methods in psychology journals - Guidelines and explanations.
> AM PSYCHOL 54 (8): 594-604 AUG 1999
>
> [26] Dusseldorp E, van Elderen T, Maes S, et al.
> A meta-analysis of psychoeducational programs for coronary heart disease
> patients.
> HEALTH PSYCHOL 18 (5): 506-519 SEP 1999
>
> [27] Yekutieli D, Benjamini Y
> Resampling-based false discovery rate controlling multiple test
> procedures for correlated test statistics.
> J STAT PLAN INFER 82 (1-2): 171-196 DEC 1 1999
>
> [28] Schaffer CM, Green PE
> Cluster-based market segmentation: some further comparisons of
> alternative approaches.
> J MARKET RES SOC 40 (2): 155-163 APR 1998
>
> [29] Lark RM, Nielsen BL, Mottram TT
> A time series model of daily milk yields and its possible use for
> detection of a disease (ketosis).
> ANIM SCI 69: 573-582 Part 3 DEC 1999
>
> [30] Heyen DW, Weller JI, Ron M, et al.
> A genome scan for QTL influencing milk production and health traits in
> dairy cattle.
> PHYSIOL GENOMICS 1 (3): 165-175 NOV 11 1999
>
> [31] Slawecki CJ, Somes C, Ehlers CL
> Effects of prolonged ethanol exposure on neurophysiological measures
> during an associative learning paradigm.
> DRUG ALCOHOL DEPEN 58 (1-2): 125-132 FEB 1 2000
>
> [32] Pigeot I
> Basic concepts of multiple tests - A survey.
> STAT PAP 41 (1): 3-36 JAN 2000
>
> [33] Bovenhuis H, Spelman RJ
> Selective genotyping to detect quantitative trait loci for multiple
> traits in outbred populations.
> J DAIRY SCI 83 (1): 173-180 JAN 2000
>
> [34] Andrade AR, Lambert EW, Bickman L
> Dose effect in child psychotherapy: Outcomes associated with negligible
> treatment.
> J AM ACAD CHILD PSY 39 (2): 161-168 FEB 2000
>
> [35] Witte JS, Elston RC, Cardon LR
> On the relative sample size required for multiple comparisons.
> STAT MED 19 (3): 369-372 FEB 15 2000
>
> [36] Zaykin DV, Young SS, Westfall PH
> Using the false discovery rate approach in the genetic dissection of
> complex traits: A response to
> Weller et al.
> GENETICS 154 (4): 1917-1918 APR 2000
>
> [37] Tierney WM, Harris LE, Gaskins DL, et al.
> Restricting Medicaid payments for transportation: Effects on inner-city
> patients' health care.
> AM J MED SCI 319 (5): 326-333 MAY 2000
>
> [38] Suhy J, Rooney WD, Goodkin DE, et al.
> H-1 MRSI comparison of white matter and lesions in primary progressive
> and relapsing-remitting MS.
> MULT SCLER 6 (3): 148-155 JUN 2000
>
> [39] Benjamini Y, Hochberg Y
> On the adaptive control of the false discovery fate in multiple testing
> with independent statistics.
> J EDUC BEHAV STAT 25 (1): 60-83 SPR 2000
>
> [40] Cribbie RA
> Evaluating the importance of individual parameters in structural
> equation modeling: the need for type I error control.
> PERS INDIV DIFFER 29 (3): 567-577 SEP 2000
>
> [41] Curran-Everett D
> Multiple comparisons: philosophies and illustrations.
> AM J PHYSIOL-REG I 279 (1): R1-R8 JUL 2000
>
> [42] Bickman L, Lambert EW, Andrade AR, et al.
> The Fort Bragg continuum of care for children and adolescents: Mental
> health outcomes over 5 years.
> J CONSULT CLIN PSYCH 68 (4): 710-716 AUG 2000
>
> [43] Ellis SP, Underwood MD, Arango V, et al.
> Mixed models and multiple comparisons in analysis of human neurochemical
> maps.
> PSYCHIAT RES-NEUROIM 99 (2): 111-119 AUG 28 2000
>
> [44] Kanas N, Salnitskiy V, Grund EM, et al.
> Social and cultural issues during Shuttle/Mir space missions.
> ACTA ASTRONAUT 47 (2-9): 647-655 Sp. Iss. SI JUL-NOV 2000
>
> [45] Kanas N, Salnitskiy V, Grund EM, et al.
> Interpersonal and cultural issues involving crews and ground personnel
> during Shuttle/Mir space
> Missions.
> AVIAT SPACE ENVIR MD 71 (9): A11-A16 Suppl. S SEP 2000
>
> [46] Barry MJ
> Inducible defences in Daphnia: responses to two closely related predator
> species.
> OECOLOGIA 124 (3): 396-401 AUG 2000
>
> [47] George EI
> The variable selection problem.
> J AM STAT ASSOC 95 (452): 1304-1308 DEC 2000
>
> [48] Brown LD
> An essay on statistical decision theory.
> J AM STAT ASSOC 95 (452): 1277-1281 DEC 2000
>
> [49] Recknagel RJ, Kowarschik R, Notni G
> High-resolution defect detection and noise reduction using wavelet
> methods for surface measurement.
> J OPT A-PURE APPL OP 2 (6): 538-545 NOV 2000
>
> [50] Luijten SH, Dierick A, Oostermeijer JGB, et al.
> Population size, genetic variation, and reproductive success in a
> rapidly declining, self-incompatible
> perennial (Arnica montana) in The Netherlands.
> CONSERV BIOL 14 (6): 1776-1787 DEC 2000
>
> [51] George EI, Foster DP
> Calibration and empirical Bayes variable selection.
> BIOMETRIKA 87 (4): 731-747 DEC 2000
>
> [52] Khatri P, Babyak M, Croughwell ND, et al.
> Temperature during coronary artery bypass surgery affects quality of life.
> ANN THORAC SURG 71 (1): 110-116 JAN 2001
>
> [53] Trojan DA, Collet JP, Pollak MN, et al.
> Serum insulin-like growth factor-I (IGF-I) does not correlate positively
> with isometric strength, fatigue,
> and quality of life in post-polio syndrome.
> J NEUROL SCI 182 (2): 107-115 JAN 1 2001
>
> [54] Stine RA, Heyse JF
> Non-parametric estimates of overlap.
> STAT MED 20 (2): 215-236 JAN 30 2001
>
> [55] Vedantham K, Brunet A, Boyer R, et al.
> Posttraumatic stress disorder, trauma exposure, and the current health
> of Canadian bus drivers.
> CAN J PSYCHIAT 46 (2): 149-155 MAR 2001
>
> [56] Ip EH
> Testing for local dependency in dichotomous and polytomous item response
> models.
> PSYCHOMETRIKA 66 (1): 109-132 MAR 2001
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>



--
Leah Quinlivan
Reply | Threaded
Open this post in threaded view
|

Re: Bonferroni-adjusted t-tests

Simon, Steve, PhD
In reply to this post by Shin-7
S. Okazaki writes:

> I have a question regarding t-tests. One of the reviewers suggested me
to
> perform "Bonferroni-adjusted t-tests". I understand that Bonferroni is
one
> of the post hoc methods in MULTIPLE group comparisons of ANOVA.
However,
> our study deals with only TWO groups.
>
> His/her main argument is that our results of two-group comparison
> generated very small means, but probably due to a relatively large
sample
> size (n1=870, n2=780), t-tests detected statistical significance at
0.05,
> which may not have any practical meaning. Therefore, he/she suggested
us
> to "adjuste the alpha level." He/She implies that if I indeed adjust
the
> alpha, some of the comparisons may not result in statistical
significance.
>
>
> Could anybody explain how we could perform Bonferroni-adjusted t-tests
IN
> A TWO-GROUP COMPARISON via SPSS? I have looked at the main menu but
could
> not find any option of this kind, except in ANOVA as a post hoc
option.

I presume that you have multiple outcome measures. Take the p-value that
SPSS produces and multiply by the number of outcome measures. If there
are 10 outcome measures, a p-value of 0.0014 becomes 0.014.

I discuss this in more detail and suggest some alternatives at

www.childrensmercy.org/stats/ask/bonferroni.asp

As a general rule, if a referee complains about how the data were
analyzed, you should rejoice and comply. It's a much happier outcome
than when they say "this data set is so bad that no re-analysis can
salvage it."

Steve Simon, [hidden email], Standard Disclaimer
CMH (Kansas City) is hiring a second statistician. See
www.childrensmercy.org/stats/JobOpening.asp for details.
Evidence Based Medicine gives my book 4/4.5 stars out of five!
Full text is at http://ebm.bmj.com/cgi/content/full/12/2/59

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Bonferroni-adjusted t-tests

Shin-7
In reply to this post by Shin-7
Hi Billy, Steve, and Stanley,

Thank you very much for your explanations regarding my question. In my
study, I have two groups, and have to compare several means based on the
same datasets. Therefore, I have to perform multiple t-tests. In my
original paper, I conducted a t-test for each set of means between the two
groups without any kind of alpha adjustment. Two groups are independent.

Based on your suggestions/comments, I now understand that:

1. In my case, there is NO bonferroni adjustment with 2 groups ("with ‘no’
levels").

2. I just take the number of comparison and divide that into .05 to obtain
the "adjusted alpha level".

Then, my final question is as follows: is there any specific (statistical)
term for this alpha adjustment?

Thank you for your help.

Shin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Bonferroni-adjusted t-tests

Shin-7
In reply to this post by Shin-7
Dear Marta:

Thank you very much for your thorough explanations for my queston. The
information you provides is extremely useful, and thus I sincerely
appreciated it.

Shin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Bonferroni-adjusted t-tests

Ornelas, Fermin-2
In reply to this post by Shin-7
It is not clear from your explanation about the number of comparisons you have. But from my experience using non parametric testing alpha is usually set at .2 so that some of the tests will remain significant. Here is a sample from an assignment.

                Problem 6.13 Multiple Comparisons Tests STP532 Hmework # 6:

        Values  "Significance
Comparisons"                    Results
D12=    8.5     8.5 > 7.10      Significant, thus M_1 and M_ are Different and the distributions are not the same
D13=    6.7     6.7 <  6.75     Not Significant, M_1 and M_3 are similar and the distributions are the same
D23=    1.8     1.8 < 6.85      Not Significant, M_2 and M_3 are cimilar and the distributions are the same
k=      3
alpha=  0.2
Comparisons=    3
alpha/k(k-1)=   0.033333333
Z_table=        1.84
sum_t^3=        206
sum_t=  20

Caculations of RHS term in the inequality adjusted for ties:

RHS Term_12=    6.75    Unequal samples
RHS Term_13=    6.75    Unequal samples
RHS Term_23=    6.85    Equal Samples

Fermin Ornelas, Ph.D.
Management Analyst III, AZ DES
1789 W. Jefferson Street
Phoenix, AZ 85032
Tel: (602) 542-5639
E-mail: [hidden email]

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of S. Okazaki
Sent: Wednesday, January 30, 2008 1:34 AM
To: [hidden email]
Subject: Re: Bonferroni-adjusted t-tests

Hi Billy, Steve, and Stanley,

Thank you very much for your explanations regarding my question. In my
study, I have two groups, and have to compare several means based on the
same datasets. Therefore, I have to perform multiple t-tests. In my
original paper, I conducted a t-test for each set of means between the two
groups without any kind of alpha adjustment. Two groups are independent.

Based on your suggestions/comments, I now understand that:

1. In my case, there is NO bonferroni adjustment with 2 groups ("with ‘no’
levels").

2. I just take the number of comparison and divide that into .05 to obtain
the "adjusted alpha level".

Then, my final question is as follows: is there any specific (statistical)
term for this alpha adjustment?

Thank you for your help.

Shin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR CONFIDENTIAL information and is intended only for the use of the specific individual(s) to whom it is addressed.  It may contain information that is privileged and confidential under state and federal law.  This information may be used or disclosed only in accordance with law, and you may be subject to penalties under law for improper use or further disclosure of the information in this e-mail and its attachments. If you have received this e-mail in error, please immediately notify the person named above by reply e-mail, and then delete the original e-mail.  Thank you.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Recall: Bonferroni-adjusted t-tests

Ornelas, Fermin-2
In reply to this post by Shin-7
Ornelas, Fermin would like to recall the message, "Bonferroni-adjusted t-tests".

NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR CONFIDENTIAL information and is intended only for the use of the specific individual(s) to whom it is addressed.  It may contain information that is privileged and confidential under state and federal law.  This information may be used or disclosed only in accordance with law, and you may be subject to penalties under law for improper use or further disclosure of the information in this e-mail and its attachments. If you have received this e-mail in error, please immediately notify the person named above by reply e-mail, and then delete the original e-mail.  Thank you.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD