SPSSX Discussion - Re: inter-rater reliability with multiple raters

Re: inter-rater reliability with multiple raters

Posted by Rich Ulrich on Jun 18, 2014; 4:48pm
URL: http://spssx-discussion.165.s1.nabble.com/inter-rater-reliability-with-multiple-raters-tp5726465p5726524.html

ICCs with unequal Ns -

I got a reference in 1995 for computing a simple ICC with unequal Ns from the
Usenet stats group, sci.stat.consult. I put the formula (below) in my stats-FAQ,
which I maintained from 1997 to about 2006. No one ever provided newer references.

This is owing to Ernest Haggard, Intraclass Correlation and the Analysis of Variance
(1958) , as posted by Michael Bailey and reformatted and adapted by me. I hope I have
not screwed it up.

Let, R=intraclass correlation,
    BSMS=between Subject mean square,
    WMS=mean square within,
c=number of Subjects, and
    ki=number for the ith Subject. Then:

         R = (BSMS - WMS) / ( BSMS + (k' -1)*WMS )
         where k' = [ sum(ki) -   (sum(ki**2))/sum(ki) ] / (c-1)

The value of k' does need to work out to something in the range of an average ki
number of ratings for the subjects. I think that I remember using the "reciprocal
mean" of the counts, but I don't remember using this formula for k' for getting it.

--
Rich Ulrich

Date: Tue, 17 Jun 2014 22:49:46 -0400
From: [hidden email]
Subject: Re: inter-rater reliability with multiple raters
To: [hidden email]

Okay. Let me start by saying I'm a bit (okay maybe very) under the weather and paid work is catching up with me, so apologies for any typos/mistakes. Having said that, this topic is quite interesting as it relates to showing some connections between generalizability coefficients, various forms of ICCs, and coefficient alpha.

Before discussing those connections, however, the short answer to the question about whether there are valid estimators of an ICC in an unbalanced design would be a solid "it depends." I would argue that to obtain a valid ICC we need to appropriately decompose the variance to obtain between subject variance and variance attributable to all other sources. This could prove challenging.

At any rate, if the raters tend to agree in their ratings of each subject, then the between subject variance will tend to be much larger than other sources of variances, and the ICC should approach 1.0.

With that said, the ICC is defined as:

ICC = var(between Ss) / Total Variance

where,

Total Variance =

(1) between Ss variance

(2) between Rater Variance

(3) error variance

If one initially planned a crossed design (all subjects were intended to be rated by all raters) but due to random circumstances some raters were unable to rate some subjects and those missing data can be assumed to be missing at random (MAR), then I would suggest that one could theoretically estimate a valid estimate of the ICC using an ML estimator via the MIXED procedure in SPSS from which the estimated variance components would be inserted into the following ICC equation:

ICC = var(between Ss )/ [var(between Ss) + error variance]

I believe there are more sophisticated ways to deal with unbalanced designs that have been published in the past 5 years, but I am not fully versed in such methods. With that said, please see below for a small demonstration using SPSS syntax that might help make connections between generalizability coefficients from a one-facet design, ICC, and coefficient alpha using various procedures in SPSS:

[snip, SPSS code for alpha examples]