Inter-rater agreement for multiple raters or something else?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Inter-rater agreement for multiple raters or something else?

MJury

Hi everyone!

I would appreciate any comments that would help me analyse my data. The project was about sending clinical data of 12 patients to 30 clinicians to ask for their diagnosis (A, B, C or D). We have the results back and would like to assess agreement between raters. I did Fleiss generalized kappa for all raters and subjects and found out that raters differ a lot in their opinions. Now I would like to assess which particular patients had the highest variability in diagnosis to identify clinical phenotypes that are most challenging for clinicians. I did Fleiss kappa for multiple raters and one subject but that does not seem to work here as it returns similar values for all patients.

Could you please give me some tips please as I think I'm missing something here :)

With warm regards and many thanks,

Mack
Reply | Threaded
Open this post in threaded view
|

Re: Inter-rater agreement for multiple raters or something else?

Art Kendall


There are SPSS macros for inter-coder, and inter-rater reliability.

Search for Krippendorf in the archives for this list.
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Inter-rater agreement for multiple raters or something else?

bdates
I have a macro for Fleiss that provides kappa's for each category as well as the overall kappa. I can send it to you offline.

Brian

Brian Dates, M.A.
Director of Evaluation and Research | Evaluation & Research | Southwest Counseling Solutions
Southwest Solutions
1700 Waterman, Detroit, MI 48209
313-841-8900 (x7442) office | 313-849-2702 fax
[hidden email] | www.swsol.org

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Art Kendall
Sent: Tuesday, November 04, 2014 10:53 AM
To: [hidden email]
Subject: Re: Inter-rater agreement for multiple raters or something else?

There are SPSS macros for inter-coder, and inter-rater reliability.

Search for Krippendorf in the archives for this list.



-----
Art Kendall
Social Research Consultants
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Inter-rater-agreement-for-multiple-raters-or-something-else-tp5727786p5727787.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Inter-rater agreement for multiple raters or something else?

Art Kendall
In reply to this post by Art Kendall
Please give some more detail about your data.
Were the clinicians able to respond to more than 1 of A, B, C, D?
What are are A, B, etc yes/no or are they ratings?

How is you data now arranged?
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Inter-rater agreement for multiple raters or something else?

MJury
Dear Art and David, thanks so much for your interest!

A, B, C and D are four possible diagnostic categories and raters were asked to choose only one of them (mutually exclusive and exhaustive categories).

Patients are arranged in rows and raters in columns. I calculated overall Fleiss kappa for all patients and all raters but I would be also interested in identifying patients that are the most debatable/controversial from the diagnostic point of view. I am not sure whether Fleiss kappa is the best solution here as I understand its main goal is to assess reliability of raters while I am more interested in recognizing clinical phenotypes that cause disagreement between raters. Maybe Fleiss kappa's pi (=the extent to which raters agree for the i-th subject) would be appropriate? However the Fleiss kappa Excel spreadsheet I downloaded from Jason E. King's website does not calculate that. Brian, thanks a lot for your kindness, does your macro calculate pi for Fleiss kappa?

I would appreciate any comments.

With best regards,

Mack
Reply | Threaded
Open this post in threaded view
|

Re: Inter-rater agreement for multiple raters or something else?

bdates
Mack,

The Excel sheet on Jason's website is mine, but there are problems with it. For some reason, it keeps getting corrupted, so I'd be careful about the results.  I'll send my macro offline. More to the point, Fleiss is one of the few authors that provided 'official' formulae for category kappa's as well as an overall solution. I'd be interested if he actually had formulae for individual cases/subjects. I've never seen that in the literature. As an idea, you could write syntax to set up a loop for each item that would count each of the four values assigned to your diagnoses for each case, then compute a variable that would count the number of diagnoses with more than one, or two, or ... occurrences (whatever value you set as a cutoff). That would give you an idea of the raw agreement and help distinguish  'difficult' patients from 'easy' patients.



Brian Dates, M.A.
Director of Evaluation and Research | Evaluation & Research | Southwest Counseling Solutions
Southwest Solutions
1700 Waterman, Detroit, MI 48209
313-841-8900 (x7442) office | 313-849-2702 fax
[hidden email] | www.swsol.org


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of MJury
Sent: Tuesday, November 04, 2014 6:03 PM
To: [hidden email]
Subject: Re: Inter-rater agreement for multiple raters or something else?

Dear Art and David, thanks so much for your interest!

A, B, C and D are four possible diagnostic categories and raters were asked to choose only one of them (mutually exclusive and exhaustive categories).

Patients are arranged in rows and raters in columns. I calculated overall Fleiss kappa for all patients and all raters but I would be also interested in identifying patients that are the most debatable/controversial from the diagnostic point of view. I am not sure whether Fleiss kappa is the best solution here as I understand its main goal is to assess reliability of raters while I am more interested in recognizing clinical phenotypes that cause disagreement between raters. Maybe Fleiss kappa's pi (=the extent to which raters agree for the i-th subject) would be appropriate? However the Fleiss kappa Excel spreadsheet I downloaded from Jason E. King's website does not calculate that. Brian, thanks a lot for your kindness, does your macro calculate pi for Fleiss kappa?

I would appreciate any comments.

With best regards,

Mack



--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Inter-rater-agreement-for-multiple-raters-or-something-else-tp5727786p5727790.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Inter-rater agreement for multiple raters or something else?

Art Kendall
In reply to this post by bdates
one way to look at your data would be to use  
recode A,B,C,D to 1 to 4 and use  MULT RESPONSE and treat the 30 judges as a set. (or transpose the file ad use profiles as the repeats).  Cross the set by itself.
something like
/tables= judges by judges
or
/tables = profiles by profiles.

You should be able to do something similar with CTABLES.
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

RE: Inter-rater agreement for multiple raters or something else?

MJury
In reply to this post by bdates

Dear Brian,

 

Thanks a lot for the macro, I was able to launch it and it works fine. However, overall Fleiss kappa is not perfect for my project as it depends heavily on how many ratings I have for each category in the total cohort of patients regardless of the agreement between raters for individual patients (so for example imagine that I have 100% agreement between raters and two possible situations: A) 6 patients were diagnosed with A and 6 patients were diagnosed with B;  B) 2 patients were diagnosed with A and 10 were diagnosed with B. In these two situations overall Fleiss kappa will be different although there is 100% agreement between raters for individual patients. Worth mentioning that there is no gold standard for the diagnosis in my cohort (the ultimate diagnosis is not known).

 

I think the only thing I can do with Fleiss kappa  is to calculate the proportion of inter-rater agreement for each patient.

 

Do you think Krippendorf alpha or Gwet AC would work better for this project?

 

With kind regards,

 

Mack

 

 

 

From: bdates [via SPSSX Discussion] [mailto:ml-node+[hidden email]]
Sent: 05 November 2014 14:04
To: Maciej Jurynczyk
Subject: Re: Inter-rater agreement for multiple raters or something else?

 

Mack,

The Excel sheet on Jason's website is mine, but there are problems with it. For some reason, it keeps getting corrupted, so I'd be careful about the results.  I'll send my macro offline. More to the point, Fleiss is one of the few authors that provided 'official' formulae for category kappa's as well as an overall solution. I'd be interested if he actually had formulae for individual cases/subjects. I've never seen that in the literature. As an idea, you could write syntax to set up a loop for each item that would count each of the four values assigned to your diagnoses for each case, then compute a variable that would count the number of diagnoses with more than one, or two, or ... occurrences (whatever value you set as a cutoff). That would give you an idea of the raw agreement and help distinguish  'difficult' patients from 'easy' patients.



Brian Dates, M.A.
Director of Evaluation and Research | Evaluation & Research | Southwest Counseling Solutions
Southwest Solutions
1700 Waterman, Detroit, MI 48209
313-841-8900 (x7442) office | 313-849-2702 fax
[hidden email] | www.swsol.org


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of MJury
Sent: Tuesday, November 04, 2014 6:03 PM
To: [hidden email]
Subject: Re: Inter-rater agreement for multiple raters or something else?

Dear Art and David, thanks so much for your interest!

A, B, C and D are four possible diagnostic categories and raters were asked to choose only one of them (mutually exclusive and exhaustive categories).

Patients are arranged in rows and raters in columns. I calculated overall Fleiss kappa for all patients and all raters but I would be also interested in identifying patients that are the most debatable/controversial from the diagnostic point of view. I am not sure whether Fleiss kappa is the best solution here as I understand its main goal is to assess reliability of raters while I am more interested in recognizing clinical phenotypes that cause disagreement between raters. Maybe Fleiss kappa's pi (=the extent to which raters agree for the i-th subject) would be appropriate? However the Fleiss kappa Excel spreadsheet I downloaded from Jason E. King's website does not calculate that. Brian, thanks a lot for your kindness, does your macro calculate pi for Fleiss kappa?

I would appreciate any comments.

With best regards,

Mack



--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Inter-rater-agreement-for-multiple-raters-or-something-else-tp5727786p5727790.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


To unsubscribe from Inter-rater agreement for multiple raters or something else?, click here.
NAML

Reply | Threaded
Open this post in threaded view
|

Re: Inter-rater agreement for multiple raters or something else?

MJury
In reply to this post by Art Kendall
Dear Art,

Thanks a lot for your reply. Could you please give me more details on this method? I am not familiar with these terms..

Kind regards,

Mack