Inter-rater reliability statistic - which method?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Inter-rater reliability statistic - which method?

poloboyden
Hello

I wondered if you could help answer a quick (and hopefully easy) question.

I am currently trying to suss out which method of inter-rater reliability is the appropriate one for working out the reliability between two raters who numbered the amount of time's they saw a reference to the self in a set of transcripts.

Thus, it is continuous data. There are two raters - so I am thinking Cohen's Kappa should be used, or should it be Pearson's correlation?
Reply | Threaded
Open this post in threaded view
|

Re: Inter-rater reliability statistic - which method?

lori.andersen


Sent from my iPhone

On Jul 3, 2012, at 10:07 AM, "poloboyden [via SPSSX Discussion]"<[hidden email]> wrote:

Hello

I wondered if you could help answer a quick (and hopefully easy) question.

I am currently trying to suss out which method of inter-rater reliability is the appropriate one for working out the reliability between two raters who numbered the amount of time's they saw a reference to the self in a set of transcripts.

Thus, it is continuous data. There are two raters - so I am thinking Cohen's Kappa should be used, or should it be Pearson's correlation?


If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Inter-rater-reliability-statistic-which-method-tp5713982.html
To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML
Reply | Threaded
Open this post in threaded view
|

Re: Inter-rater reliability statistic - which method?

bdates
In reply to this post by poloboyden
Cohen's kappa is not appropriate for this, unless you want a measure of
how much actual agreement there is.  There are a number of articles
starting with Krippendorf (1970) and followed by Fleiss and Cohen (1973)
about the equivalence of the ICC to agreement statistics when the data
are ordinal or interval in nature.  I'd recommend using the ICC for your
work.  It's more accepted in the literature generally than simple
correlation. Others on the list may have alternative views.

Brian

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
poloboyden
Sent: Tuesday, July 03, 2012 10:07 AM
To: [hidden email]
Subject: Inter-rater reliability statistic - which method?

Hello

I wondered if you could help answer a quick (and hopefully easy)
question.

I am currently trying to suss out which method of inter-rater
reliability is
the appropriate one for working out the reliability between two raters
who
numbered the amount of time's they saw a reference to the self in a set
of
transcripts.

Thus, it is continuous data. There are two raters - so I am thinking
Cohen's
Kappa should be used, or should it be Pearson's correlation?

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Inter-rater-reliability-st
atistic-which-method-tp5713982.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Inter-rater reliability statistic - which method?

SR Millis-3
I'd suggest you consider using the root mean square differences and concordance correlation  coefficients.  See:

1. Psychol Methods. 2012 Jun;17(2):294-308. Epub 2011 May 16.

Examining the reliability of interval level data using root mean square
differences and concordance correlation coefficients.

Barchard KA.

University of Nevada, Las Vegas.

This article introduces new statistics for evaluating score consistency.
Psychologists usually use correlations to measure the degree of linear
relationship between 2 sets of scores, ignoring differences in means and standard
deviations. In medicine, biology, chemistry, and physics, a more stringent
criterion is often used: the extent to which scores are identically equal. For
each test taker (or other unit of measurement), the difference between the 2
scores is calculated. The root mean square difference (RMSD) represents the
average change from 1 set of scores to the other, and the concordance correlation
coefficient (CCC) rescales this coefficient to have a maximum value of 1. This
article shows the relationship of the RMSD and CCC to the intraclass correlation 
coefficients, product-moment correlation, and standard error of measurement.
Finally, this article adapts the RMSD and the CCC for linear, consistency, and
absolute definitions of agreement. (PsycINFO Database Record (c) 2012 APA, all
rights reserved).
 
~~~~~~~~~~~
Scott R Millis, PhD, ABPP, CStat, PStatĀ®
Board Certified in Clinical Neuropsychology, Clinical Psychology, & Rehabilitation Psychology 
Professor
Wayne State University School of Medicine
Email: [hidden email]
Email: [hidden email]
Tel: 313-993-8085


From: "Dates, Brian" <[hidden email]>
To: [hidden email]
Sent: Tuesday, July 3, 2012 10:16 AM
Subject: Re: Inter-rater reliability statistic - which method?

Cohen's kappa is not appropriate for this, unless you want a measure of
how much actual agreement there is.  There are a number of articles
starting with Krippendorf (1970) and followed by Fleiss and Cohen (1973)
about the equivalence of the ICC to agreement statistics when the data
are ordinal or interval in nature.  I'd recommend using the ICC for your
work.  It's more accepted in the literature generally than simple
correlation. Others on the list may have alternative views.

Brian

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
poloboyden
Sent: Tuesday, July 03, 2012 10:07 AM
To: [hidden email]
Subject: Inter-rater reliability statistic - which method?

Hello

I wondered if you could help answer a quick (and hopefully easy)
question.

I am currently trying to suss out which method of inter-rater
reliability is
the appropriate one for working out the reliability between two raters
who
numbered the amount of time's they saw a reference to the self in a set
of
transcripts.

Thus, it is continuous data. There are two raters - so I am thinking
Cohen's
Kappa should be used, or should it be Pearson's correlation?

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Inter-rater-reliability-st
atistic-which-method-tp5713982.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Inter-rater reliability statistic - which method?

Rich Ulrich
In reply to this post by poloboyden
Cohen's Kappa is mainly useful for 2x2 tables.  Or as a way
of bragging about near-perfect concordance.

To "work out the relationship", the best general approach
is to look at the Pearson correlation for the similarity and
the paired t-test for systematic difference.  For the 2x2 case,
use McNemar's test to check the difference.

The ICC assumes a common mean for the raters, so it is
less suited for examination.  It is sometimes preferred
for the summaries when publishing results of multiple tests.

--
Rich Ulrich

> Date: Tue, 3 Jul 2012 07:07:11 -0700

> From: [hidden email]
> Subject: Inter-rater reliability statistic - which method?
> To: [hidden email]
>
> Hello
>
> I wondered if you could help answer a quick (and hopefully easy) question.
>
> I am currently trying to suss out which method of inter-rater reliability is
> the appropriate one for working out the reliability between two raters who
> numbered the amount of time's they saw a reference to the self in a set of
> transcripts.
>
> Thus, it is continuous data. There are two raters - so I am thinking Cohen's
> Kappa should be used, or should it be Pearson's correlation?
>
...