interrater agreement, intrarater agreement

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

interrater agreement, intrarater agreement

Vassilis Hartzoulakis
Hi everyone
I have a dataset of 10000 subjects and their scores on a composition they
wrote. Each composition was scored by 2 different raters (randomly selected
from a pool of 70). The scores could range from 0 to 15.
So far I have set up a table with 10000 rows/cases and 5 columns (IDsubject,
IDraterA, rateA, IDraterB, rateB)
e.g.
00001, 1200, 12, 1300, 14 (the 1st rater gave a 12/15 and the 2nd a 14/15)
00002, 1200, 09, 1300, 12
00003, 1400, 15, 1200, 13
00004, 1400, 02, 1200, 08 etc.

Can someone suggest the best possible layout and analysis to investigate
inter and intra rater agreement?
Thank you
Vassilis
Reply | Threaded
Open this post in threaded view
|

Re: interrater agreement, intrarater agreement

Maguin, Eugene
Vassilis,

I haven't see any other replies. Yes, I think your data is set up correctly.
As shown you have it arranged in a multivariate ('wide') setup. From there
you can do a repeated measures anova or use reliability. However, I think
there are a number of different formulas to use depending on whether you
have the same raters rating everybody, or, as you have, two raters are
randomly selected to rate each person. I'd bet anything the computational
formulas are different and I'll bet almost anything that spss can't
accommodate both. There's a literature on rater agreement and on intraclass
correlation. If you haven't looked at that, you should. However, I can't
help you on that. One thing you might do is google 'intraclass correlation'
and there's a citation in the top 20 or 30 that references a book by, I
think, Fleiss (or Fliess) and Shrout. Another term to google is 'kappa'
(which is available from spss crosstabs).

I'm hoping that you have other responses that are more helpful than I am
able to be.

Gene Maguin
Reply | Threaded
Open this post in threaded view
|

Re: interrater agreement, intrarater agreement

Meyer, Gregory J
Vassilis,

Building on Gene's suggestions, I would recommend you compute an
intraclass correlation using a oneway random effects design. SPSS uses
three different kinds of models (and for two way models gives the option
of computing a consistency ICC or an exact agreement ICC), each of which
have slightly different definitions of error. The oneway model is the
one that would be appropriate for you to assess interrater reliability.
An update of the classic Fleiss and Shrout (1978) article is this paper:

McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some
intraclass correlation coefficients. Psychological Methods, 1, 30-46.

For the data you provided, the default syntax would be as given below,
though you can modify the confidence interval and null value. In the
output you would typically focus on the "Single Measures" reliability
rather than the Spearman-Brown estimate of "Average Measures"
reliability.

RELIABILITY
  /VARIABLES=rateA rateB
  /SCALE ('ALL VARIABLES')  ALL/MODEL=ALPHA
  /ICC=MODEL(ONEWAY) CIN=95 TESTVAL=0 .

I am not certain how you would examine intrarater reliability with these
data. For this analysis the scorers would have to rate the same essay at
least twice. It doesn't look like this was done. However, if it was, you
would continue to have participants in the rows and a column for
IDsubject. In addition, you would want a column for IDrater, rateT1, and
rateT2, with T designating the time of the rating. For this design, you
could run a two way random effects design because the 2nd ratings are
always differentiated from the first. However, using the same model you
could also split the file by IDrater and generate reliability values for
each scorer. In order to obtain findings that parallel the findings from
the interrater analyses you would want to use the Absolute Agreement
coefficient rather than the Consistency coefficient. McGraw and Wong
discuss all these models and types.

Good luck,

Greg


| -----Original Message-----
| From: SPSSX(r) Discussion [mailto:[hidden email]]
| On Behalf Of Gene Maguin
| Sent: Wednesday, July 12, 2006 1:28 PM
| To: [hidden email]
| Subject: Re: interrater agreement, intrarater agreement
|
| Vassilis,
|
| I haven't see any other replies. Yes, I think your data is
| set up correctly.
| As shown you have it arranged in a multivariate ('wide')
| setup. From there
| you can do a repeated measures anova or use reliability.
| However, I think
| there are a number of different formulas to use depending on
| whether you
| have the same raters rating everybody, or, as you have, two raters are
| randomly selected to rate each person. I'd bet anything the
| computational
| formulas are different and I'll bet almost anything that spss can't
| accommodate both. There's a literature on rater agreement and
| on intraclass
| correlation. If you haven't looked at that, you should.
| However, I can't
| help you on that. One thing you might do is google
| 'intraclass correlation'
| and there's a citation in the top 20 or 30 that references a
| book by, I
| think, Fleiss (or Fliess) and Shrout. Another term to google
| is 'kappa'
| (which is available from spss crosstabs).
|
| I'm hoping that you have other responses that are more
| helpful than I am
| able to be.
|
| Gene Maguin
|

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Vassilis Hartzoulakis
Sent: Wednesday, July 12, 2006 7:16 AM
To: [hidden email]
Subject: interrater agreement, intrarater agreement

Hi everyone
I have a dataset of 10000 subjects and their scores on a composition
they
wrote. Each composition was scored by 2 different raters (randomly
selected
from a pool of 70). The scores could range from 0 to 15.
So far I have set up a table with 10000 rows/cases and 5 columns
(IDsubject,
IDraterA, rateA, IDraterB, rateB)
e.g.
00001, 1200, 12, 1300, 14 (the 1st rater gave a 12/15 and the 2nd a
14/15)
00002, 1200, 09, 1300, 12
00003, 1400, 15, 1200, 13
00004, 1400, 02, 1200, 08 etc.

Can someone suggest the best possible layout and analysis to investigate
inter and intra rater agreement?
Thank you
Vassilis