SPSSX Discussion

Help with Calculating the Inter-Rater Reliability for four raters.

Classic

List

Threaded

1 message

gastonshack

Sep 07, 2012; 11:58pm

Help with Calculating the Inter-Rater Reliability for four raters.

4 posts

Hello,

I read the posting about calculating the Fleiss statistic as it pertained to the fellow who was looking to see the correlation of people rating a video consultation. However, I still am not completely sure how to calculate my results and I am hoping somebody might be able to help me out.

My goal is to show that a rating system for looking at sinuses has good inter rater reliability. Each rater was given a video to watch of four different subsites of the paranasal sinuses (Ie, the frontal, maxillary, ethmoidal and sphenoidal). Each one of these subsites were assessed for 9 different findings (clear mucus, eos mucus, crust, pus, polypoid, polyps, cystic polyp, mucoid and edema). For each finding, the rater would assign a 0 (absent) or 1 (present) to each subsite. So a patient can then receive a score in more than one category. Thus if a patient had mucoid, polypoid and crust findings in the ethmoid cavity, the rater would mark a 1 next to these findings on their score sheet. In summary:

Number of subjects = 20
Number of raters = 4
Number of subsites to look at 4
Number of characteristics: 9 (not necessarily distinct)

My questions are:

1. Is it possible to use the Fleiss Kappa if the rater is allowed to mark a subject in more than one category
2. If not, is there a test that I can use that takes this into account?
3. If this is not possible, do you have a suggestion for how I can best rearrange my data? (The goal is to show that different raters can reliably score sinuses using our method)

Thank you in advance for any helep you can offer,

Mike