Login  Register

Re: How should I account for multiple comparisons when looking at p values?

Posted by Martha Hewett on Aug 14, 2011; 8:10pm
URL: http://spssx-discussion.165.s1.nabble.com/How-should-I-account-for-multiple-comparisons-when-looking-at-p-values-tp4698517p4698982.html


Rich - Thanks very much for your input.

The respondent n's for the 7 groups surveyed range from 380 to 569.  Mailout for each group was 800, except for the smallest group where the max possible mailout was 441.  Excluding that group, respondent ns range from 467 to 569 (We worked very hard to get these high response rates.)

A key question is whether the treatments caused people to take any actions that fall within a broad category of actions.  To examine this we asked about many actions (circa 40 without actually counting them).  These are being tested individually and will probably also be put into groups of about 3 or 4 broad types of actions and tested as composites.

I should explain that there is an objective measure of the impact of the treatments, and that is also being analyzed, but the examination of the self-reported actions attempts to determine why there is or isn't a measurable impact.  





From: Rich Ulrich <[hidden email]>
To: <[hidden email]>, <[hidden email]>
Date: 08/14/2011 02:56 PM
Subject: RE: How should I account for multiple comparisons when looking at p              values?





What kind of statement are you making?
Who are you making it to, and what do they expect?

What is the N?  If you have tens of thousands, a journal like NEJM
will suggest that you ignore all tests, and focus on "effect size",
because your power would be so large.  

On the other hand, with 7 groups, observational data, and a much
smaller N...  you could have a serious problem with power for some
analyses, especially if your group sizes are grossly unequal.  I'll skip
by those concerns.

If the study is exploratory, then it is probably fair to report the straight
p-values -- with suitable warning to the readers.  That is the simplest
case.  Otherwise, corrections for multiple tests are needed for the
important tests.

If you want to actually *conclude*  something, about *hypotheses*,
then you should draw up your small number of hypotheses in advance,
and figure out what variables or composite scores will be able to test
them.  As you describe it, there are dozens of possible hypotheses.
These should be arranged in a hierarchy:  These few are *primary*,
the main reason we collected the data; and (if any will be), these will
be tested with correction;  these next are also interesting in an exploratory
mode, and are reported without correction.

When you are looking at a variable that might *bias* the other
tests or comparisons, then it is proper, if not mandatory, to report the
single test with its nominal p-value -- These are warnings, and you don't
want to under-rate a warning.  If a variable plays both roles (being a
possible bias-factor, and being of intrinsic interest), you need to report
both sorts of test-outcome if they are different.  "Sex shows enough
difference between groups that it could be an important biasing factor,
even though the p-value is not significant after it is corrected for the
multiple testing."

Some tests fall "under" the original important tests, so that they may
be regarded as (say) explaining or detailing the reasons for the
significant (or non-significant) results in the primary tests.  You can point
to the original, significant test as an "overall" test on the area, which
then justifies using the nominal test size in followup.

- Of course, that organization of tests should have been done before
you ever collected the data.  Then, you probably would have done some
things a little different.  After the fact, you can only try to achieve the
same "fair" state of mind; do not try to draw on what you have seen in
the results, because whatever critics exist will probably catch you at it.

Hope this helps.
--
Rich Ulrich


> Date: Sun, 14 Aug 2011 12:52:01 -0400
> From: [hidden email]
> Subject: How should I account for multiple comparisons when looking at p values?
> To: [hidden email]
>
> I am comparing 7 groups on multiple dimensions (demographics, attitudes,
> actions). Some comparisons are across all groups and some are among 2 or 3
> or 4 of the groups. They include ANOVAs, chi-squares and t tests. I know
> with so many comparisons some will be significant by chance. How should I
> adjust for this? Also, should such an adjustment be made within types of
> questions (e.g. within demographics, within attitudes, and within actions)
> or across all items compared?
>
> Thanks for any help you can provide.
>