Re: How should I account for multiple comparisons when looking at p values?
Posted by
Martha Hewett on
Aug 14, 2011; 8:10pm
URL: http://spssx-discussion.165.s1.nabble.com/How-should-I-account-for-multiple-comparisons-when-looking-at-p-values-tp4698517p4698982.html
Rich - Thanks very much for your input.
The respondent n's for the 7 groups
surveyed range from 380 to 569. Mailout for each group was 800, except
for the smallest group where the max possible mailout was 441. Excluding
that group, respondent ns range from 467 to 569 (We worked very hard to
get these high response rates.)
A key question is whether the treatments
caused people to take any actions that fall within a broad category of
actions. To examine this we asked about many actions (circa 40 without
actually counting them). These are being tested individually and
will probably also be put into groups of about 3 or 4 broad types of actions
and tested as composites.
I should explain that there is an objective
measure of the impact of the treatments, and that is also being analyzed,
but the examination of the self-reported actions attempts to determine
why there is or isn't a measurable impact.
What kind of statement are you making?
Who are you making it to, and what do they expect?
What is the N? If you have tens of thousands, a journal like NEJM
will suggest that you ignore all tests, and focus on "effect size",
because your power would be so large.
On the other hand, with 7 groups, observational data, and a much
smaller N... you could have a serious problem with power for some
analyses, especially if your group sizes are grossly unequal. I'll
skip
by those concerns.
If the study is exploratory, then it is probably fair to report the straight
p-values -- with suitable warning to the readers. That is the simplest
case. Otherwise, corrections for multiple tests are needed for the
important tests.
If you want to actually *conclude* something, about *hypotheses*,
then you should draw up your small number of hypotheses in advance,
and figure out what variables or composite scores will be able to test
them. As you describe it, there are dozens of possible hypotheses.
These should be arranged in a hierarchy: These few are *primary*,
the main reason we collected the data; and (if any will be), these will
be tested with correction; these next are also interesting in an
exploratory
mode, and are reported without correction.
When you are looking at a variable that might *bias* the other
tests or comparisons, then it is proper, if not mandatory, to report the
single test with its nominal p-value -- These are warnings, and you don't
want to under-rate a warning. If a variable plays both roles (being
a
possible bias-factor, and being of intrinsic interest), you need to report
both sorts of test-outcome if they are different. "Sex shows
enough
difference between groups that it could be an important biasing factor,
even though the p-value is not significant after it is corrected for the
multiple testing."
Some tests fall "under" the original important tests, so that
they may
be regarded as (say) explaining or detailing the reasons for the
significant (or non-significant) results in the primary tests. You
can point
to the original, significant test as an "overall" test on the
area, which
then justifies using the nominal test size in followup.
- Of course, that organization of tests should have been done before
you ever collected the data. Then, you probably would have done some
things a little different. After the fact, you can only try to achieve
the
same "fair" state of mind; do not try to draw on what you have
seen in
the results, because whatever critics exist will probably catch you at
it.
Hope this helps.
--
Rich Ulrich
> Date: Sun, 14 Aug 2011 12:52:01 -0400
> From: [hidden email]
> Subject: How should I account for multiple comparisons when looking
at p values?
> To: [hidden email]
>
> I am comparing 7 groups on multiple dimensions (demographics, attitudes,
> actions). Some comparisons are across all groups and some are among
2 or 3
> or 4 of the groups. They include ANOVAs, chi-squares and t tests.
I know
> with so many comparisons some will be significant by chance. How should
I
> adjust for this? Also, should such an adjustment be made within types
of
> questions (e.g. within demographics, within attitudes, and within
actions)
> or across all items compared?
>
> Thanks for any help you can provide.
>