SPSSX Discussion - Re: Interrater reliability

Re: Interrater reliability

Posted by Rich Ulrich on Aug 05, 2011; 2:39pm
URL: http://spssx-discussion.165.s1.nabble.com/Interrater-reliability-tp4663573p4669746.html

First, I must say that I am puzzled by the data description, and
possibly by the layout.

You say there are 125 subjects. You say there are 125 "different
pairs" of staff. Are there only 125 ratings, each with a new pair?
How many staff are involved?

Do the data identify, at all, which staff made which ratings?
- The way that I extrapolate the data format is to read
there is just one line per item, and two columns per subject;
that there is *no* identification of rater except as a/b; and
each line has 125x2 scores. It looks like there are multiple
subjects on one line, which is certainly not a reasonable form
for prospective statistical analyses.

You certainly cannot do any reasonable analog of a kappa if
the raters are not identified. One usual and reasonable way to
organize the data, at the start, would be as one line per rating,
with SubjectID and RaterID followed by a set of items.

If you want a pair of ratings on one line, then the line should
include the subject ID, and the RaterID with each rating.

I remember reading the observation that direct computation
of kappa (dichotomous or weighted - the only respectable versions)
is not really necessary for complicated designs, since it usually
would agree, to two decimals, to the intraclass correlation.
That would say that you would have a good approximation of
kappa if you identify Subject and Rater and do the two-way
ANOVA, and compute the ICC. Perhaps you can try that.

Unfortunately, computing the ICC with unequal Ns is not well-
documented. As further misfortune, any newbie-computation of the
ICC is plagued with errors because there is an extremely strong
tendency to swap the d.f. counts (groups, subjects) in the formulas.

--
Rich Ulrich

Date: Wed, 3 Aug 2011 14:41:37 -0400
From: [hidden email]
Subject: Interrater reliability
To: [hidden email]

Good afternoon,

I am looking to calculate Kappa to determine a measure of interrater reliability. I currently have 125 subjects being rated by different pairs of staff. Each pair assesses the same person, but there are 125 different pairs of staff. I want to calculate the overall Kappa for the entire group. I can do it for the individual pairs and average the scores, but was hoping there was a syntax/macro that I could use that would calculate the overall Kappa. The data are formatted as follows, but I can restructure the data if needed:

Rater1a	Rater1b	Rater2a	Rater2b
Item 1-Subject 1	Item 1-Subject 1	Item 1-Subject 2	Item 1 – Subject 2
Item 2 – Subject 1	Item 2 – subject 1	Item 2-Subject 2	Item 2 – Subject 2

Thanks

Brian