Help with Mann-Whitney

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Help with Mann-Whitney

Baker, Harley
Colleagues,

I have data that were given to me from a pre/post study of the effects of a half-day intervention on middle school students' (N ~ 140) attitudes toward science. The data consist of answers to 12 questions presented in a 5-point Likert-type format. Both for IRB and logistical reasons, identifying information that could be used to match pre with post questionnaires was not collected. A typical question is “I think science is boring.” I have been asked to analyze the data. There are 121 pre-assessment questionnaires and 109 post-assessment questionnaires. There was a substantial amount (6%) of missing data. MVA shows that the data easily passed Little’s MCAR test. Based on that and other indices,  I have decided to use EM-augmented data.

Now to the issue at hand . . . 

I have analyzed the data using crosstabs. The chi-squared test here can be interpreted as assessing the distributional differences between the pre and post assessments. Arguably, if the distributions are the same, the “intervention” did little. If the distributions differ significantly, the intervention probably had an effect. I won’t bore you details, but the results of the cross tab analyses mostly make very good sense and paint a mostly consistent picture. Great! However, two of the items seem to have quite inconsistent outcomes. For example, there were ‘increases’ for the question “I think science is enjoyable” (p = .009) but no differences for “I think science is boring” (p = .335). Note: r = -0.56 for these two questions, so I am surprised by the discrepancy in the p-values. A similar discrepancy accrues for another item dealing with science self-efficacy, differing from the other science self-efficacy questions. These two major discrepancies made very little conceptual sense and are troubling given the whole of the data.

Just for fun, I used Mann-Whitney, even though it requires independent groups. (And, remember, I have no way to establish pre-post correspondence for these data.) When I use it, it leads to the same conclusions about the ‘well-behaving’ questions. However, it also leads to consistent findings for the two troubling questions identified from the chi-squared results. (It resolves them in ways that are very consistent with the rest of the findings.)

I would really like to use Mann-Whitney (MW) on these with testing (pre/post) as the IV and each individual questions as a DV. But, I know that MW requires independent samples, and we do not have independent samples. I have done a quick and dirty search of the appropriate literatures (psych, educational, math/stat) to find out the distortion that would occur when this independence assumption is violated and how I might compensate for it. If I were using a parametric analysis, I would know how to handle this. For example, with a t-test, the difference is simply the result of a change in the error term. The squared error term for the dependent t-test is simply the squared error term of the independent t-test reduced by (r)(sd1)(sd2). So, using the independent t approach just makes for a more conservative result than would the dependent t-test result. And, the loss in sensitivity can be ‘fixed’ with a larger sample size to increase power.

But, I am not sure this logic fits with the MW – that it would just provide a more conservative estimate of the probability of a ‘median’ shift. So, here is my question: to what extent am I justified/allowed to use the MW in this situation? Does it simply provide a more conservative result as does the t-test/ANOVA analogy? Or, does it result in an unpredictable/volatile set of results/findings that would completely invalidate its use here? As I said, I have not found any literature that would indicate the effects of violating the independence assumption.

I would appreciate and value any and all comments: all are welcome. Clearly, my grasp of nonparametric analysis is, well, not what it should be!!

Thank you.

Harley

Dr. Harley Baker
Professor of Psychology
Internal Evaluator, Project ACCESO
Madera Hall 2413
California State University Channel Islands
One University Drive
Camarillo, CA 93012
 
805.437.8997 (p)
805.437.8951 (f)
 
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Help with Mann-Whitney

Rich Ulrich

Factor analyze the 12 questions in order to pull out a scale for
"science self-efficacy"; score up the average; and do the t-test,
which is (as you note) conservative when the positive correlation
is not available to reduce the error term.  That seems to be the
hypothesis, so that is the test.

Isn't is bad practice to create a Likert-scaled scoring instrument,
and then focus on the items?

You can find plenty of comments about how treating Likert as
rank is a waste of useful presentation, while subscribing to assumptions
that are just about as onerous (if you want to take them seriously).

If you were testing with a contingency chisquared, 5x2, THAT is a
thorough waste of power unless you have the odd hypothesis that
the difference is NOT a shift of scale. 

--
Rich Ulrich


Date: Wed, 29 Jul 2015 02:09:14 +0000
From: [hidden email]
Subject: Help with Mann-Whitney
To: [hidden email]

Colleagues,

I have data that were given to me from a pre/post study of the effects of a half-day intervention on middle school students' (N ~ 140) attitudes toward science. The data consist of answers to 12 questions presented in a 5-point Likert-type format. Both for IRB and logistical reasons, identifying information that could be used to match pre with post questionnaires was not collected. A typical question is “I think science is boring.” I have been asked to analyze the data. There are 121 pre-assessment questionnaires and 109 post-assessment questionnaires. There was a substantial amount (6%) of missing data. MVA shows that the data easily passed Little’s MCAR test. Based on that and other indices,  I have decided to use EM-augmented data.

Now to the issue at hand . . . 

I have analyzed the data using crosstabs. The chi-squared test here can be interpreted as assessing the distributional differences between the pre and post assessments. Arguably, if the distributions are the same, the “intervention” did little. If the distributions differ significantly, the intervention probably had an effect. I won’t bore you details, but the results of the cross tab analyses mostly make very good sense and paint a mostly consistent picture. Great! However, two of the items seem to have quite inconsistent outcomes. For example, there were ‘increases’ for the question “I think science is enjoyable” (p = .009) but no differences for “I think science is boring” (p = .335). Note: r = -0.56 for these two questions, so I am surprised by the discrepancy in the p-values. A similar discrepancy accrues for another item dealing with science self-efficacy, differing from the other science self-efficacy questions. These two major discrepancies made very little conceptual sense and are troubling given the whole of the data.

Just for fun, I used Mann-Whitney, even though it requires independent groups. (And, remember, I have no way to establish pre-post correspondence for these data.) When I use it, it leads to the same conclusions about the ‘well-behaving’ questions. However, it also leads to consistent findings for the two troubling questions identified from the chi-squared results. (It resolves them in ways that are very consistent with the rest of the findings.)

I would really like to use Mann-Whitney (MW) on these with testing (pre/post) as the IV and each individual questions as a DV. But, I know that MW requires independent samples, and we do not have independent samples. I have done a quick and dirty search of the appropriate literatures (psych, educational, math/stat) to find out the distortion that would occur when this independence assumption is violated and how I might compensate for it. If I were using a parametric analysis, I would know how to handle this. For example, with a t-test, the difference is simply the result of a change in the error term. The squared error term for the dependent t-test is simply the squared error term of the independent t-test reduced by (r)(sd1)(sd2). So, using the independent t approach just makes for a more conservative result than would the dependent t-test result. And, the loss in sensitivity can be ‘fixed’ with a larger sample size to increase power.

But, I am not sure this logic fits with the MW – that it would just provide a more conservative estimate of the probability of a ‘median’ shift. So, here is my question: to what extent am I justified/allowed to use the MW in this situation? Does it simply provide a more conservative result as does the t-test/ANOVA analogy? Or, does it result in an unpredictable/volatile set of results/findings that would completely invalidate its use here? As I said, I have not found any literature that would indicate the effects of violating the independence assumption.

I would appreciate and value any and all comments: all are welcome. Clearly, my grasp of nonparametric analysis is, well, not what it should be!!

...
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD