Hi everyone
I have a dataset of 10000 subjects and their scores on a composition they wrote. Each composition was scored by 2 different raters (randomly selected from a pool of 70). The scores could range from 0 to 15. So far I have set up a table with 10000 rows/cases and 5 columns (IDsubject, IDraterA, rateA, IDraterB, rateB) e.g. 00001, 1200, 12, 1300, 14 (the 1st rater gave a 12/15 and the 2nd a 14/15) 00002, 1200, 09, 1300, 12 00003, 1400, 15, 1200, 13 00004, 1400, 02, 1200, 08 etc. Can someone suggest the best possible layout and analysis to investigate inter and intra rater agreement? Thank you Vassilis |
Vassilis,
I haven't see any other replies. Yes, I think your data is set up correctly. As shown you have it arranged in a multivariate ('wide') setup. From there you can do a repeated measures anova or use reliability. However, I think there are a number of different formulas to use depending on whether you have the same raters rating everybody, or, as you have, two raters are randomly selected to rate each person. I'd bet anything the computational formulas are different and I'll bet almost anything that spss can't accommodate both. There's a literature on rater agreement and on intraclass correlation. If you haven't looked at that, you should. However, I can't help you on that. One thing you might do is google 'intraclass correlation' and there's a citation in the top 20 or 30 that references a book by, I think, Fleiss (or Fliess) and Shrout. Another term to google is 'kappa' (which is available from spss crosstabs). I'm hoping that you have other responses that are more helpful than I am able to be. Gene Maguin |
Vassilis,
Building on Gene's suggestions, I would recommend you compute an intraclass correlation using a oneway random effects design. SPSS uses three different kinds of models (and for two way models gives the option of computing a consistency ICC or an exact agreement ICC), each of which have slightly different definitions of error. The oneway model is the one that would be appropriate for you to assess interrater reliability. An update of the classic Fleiss and Shrout (1978) article is this paper: McGraw, K. O., & Wong, S. P. (1996). Forming inferences about some intraclass correlation coefficients. Psychological Methods, 1, 30-46. For the data you provided, the default syntax would be as given below, though you can modify the confidence interval and null value. In the output you would typically focus on the "Single Measures" reliability rather than the Spearman-Brown estimate of "Average Measures" reliability. RELIABILITY /VARIABLES=rateA rateB /SCALE ('ALL VARIABLES') ALL/MODEL=ALPHA /ICC=MODEL(ONEWAY) CIN=95 TESTVAL=0 . I am not certain how you would examine intrarater reliability with these data. For this analysis the scorers would have to rate the same essay at least twice. It doesn't look like this was done. However, if it was, you would continue to have participants in the rows and a column for IDsubject. In addition, you would want a column for IDrater, rateT1, and rateT2, with T designating the time of the rating. For this design, you could run a two way random effects design because the 2nd ratings are always differentiated from the first. However, using the same model you could also split the file by IDrater and generate reliability values for each scorer. In order to obtain findings that parallel the findings from the interrater analyses you would want to use the Absolute Agreement coefficient rather than the Consistency coefficient. McGraw and Wong discuss all these models and types. Good luck, Greg | -----Original Message----- | From: SPSSX(r) Discussion [mailto:[hidden email]] | On Behalf Of Gene Maguin | Sent: Wednesday, July 12, 2006 1:28 PM | To: [hidden email] | Subject: Re: interrater agreement, intrarater agreement | | Vassilis, | | I haven't see any other replies. Yes, I think your data is | set up correctly. | As shown you have it arranged in a multivariate ('wide') | setup. From there | you can do a repeated measures anova or use reliability. | However, I think | there are a number of different formulas to use depending on | whether you | have the same raters rating everybody, or, as you have, two raters are | randomly selected to rate each person. I'd bet anything the | computational | formulas are different and I'll bet almost anything that spss can't | accommodate both. There's a literature on rater agreement and | on intraclass | correlation. If you haven't looked at that, you should. | However, I can't | help you on that. One thing you might do is google | 'intraclass correlation' | and there's a citation in the top 20 or 30 that references a | book by, I | think, Fleiss (or Fliess) and Shrout. Another term to google | is 'kappa' | (which is available from spss crosstabs). | | I'm hoping that you have other responses that are more | helpful than I am | able to be. | | Gene Maguin | -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Vassilis Hartzoulakis Sent: Wednesday, July 12, 2006 7:16 AM To: [hidden email] Subject: interrater agreement, intrarater agreement Hi everyone I have a dataset of 10000 subjects and their scores on a composition they wrote. Each composition was scored by 2 different raters (randomly selected from a pool of 70). The scores could range from 0 to 15. So far I have set up a table with 10000 rows/cases and 5 columns (IDsubject, IDraterA, rateA, IDraterB, rateB) e.g. 00001, 1200, 12, 1300, 14 (the 1st rater gave a 12/15 and the 2nd a 14/15) 00002, 1200, 09, 1300, 12 00003, 1400, 15, 1200, 13 00004, 1400, 02, 1200, 08 etc. Can someone suggest the best possible layout and analysis to investigate inter and intra rater agreement? Thank you Vassilis |
Free forum by Nabble | Edit this page |