SPSSX Discussion

Inter-rater agreement for multiple raters or something else?

Classic

List

Threaded

9 messages Options

MJury

Nov 04, 2014; 3:31pm

Inter-rater agreement for multiple raters or something else?

Hi everyone!

I would appreciate any comments that would help me analyse my data. The project was about sending clinical data of 12 patients to 30 clinicians to ask for their diagnosis (A, B, C or D). We have the results back and would like to assess agreement between raters. I did Fleiss generalized kappa for all raters and subjects and found out that raters differ a lot in their opinions. Now I would like to assess which particular patients had the highest variability in diagnosis to identify clinical phenotypes that are most challenging for clinicians. I did Fleiss kappa for multiple raters and one subject but that does not seem to work here as it returns similar values for all patients.

Could you please give me some tips please as I think I'm missing something here :)

With warm regards and many thanks,

Mack

Art Kendall

Nov 04, 2014; 3:52pm

Re: Inter-rater agreement for multiple raters or something else?

There are SPSS macros for inter-coder, and inter-rater reliability.

Search for Krippendorf in the archives for this list.

Art Kendall
Social Research Consultants

bdates

Nov 04, 2014; 4:14pm

Re: Inter-rater agreement for multiple raters or something else?

I have a macro for Fleiss that provides kappa's for each category as well as the overall kappa. I can send it to you offline.

Brian

Brian Dates, M.A.
Director of Evaluation and Research | Evaluation & Research | Southwest Counseling Solutions
Southwest Solutions
1700 Waterman, Detroit, MI 48209
313-841-8900 (x7442) office | 313-849-2702 fax
[hidden email] | www.swsol.org

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Art Kendall
Sent: Tuesday, November 04, 2014 10:53 AM
To: [hidden email]
Subject: Re: Inter-rater agreement for multiple raters or something else?

There are SPSS macros for inter-coder, and inter-rater reliability.

Search for Krippendorf in the archives for this list.

-----
Art Kendall
Social Research Consultants
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Inter-rater-agreement-for-multiple-raters-or-something-else-tp5727786p5727787.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall

Nov 04, 2014; 7:47pm

Re: Inter-rater agreement for multiple raters or something else?

In reply to this post by Art Kendall

Please give some more detail about your data.
Were the clinicians able to respond to more than 1 of A, B, C, D?
What are are A, B, etc yes/no or are they ratings?

How is you data now arranged?

Art Kendall
Social Research Consultants

MJury

Nov 04, 2014; 11:02pm

Re: Inter-rater agreement for multiple raters or something else?

Dear Art and David, thanks so much for your interest!

A, B, C and D are four possible diagnostic categories and raters were asked to choose only one of them (mutually exclusive and exhaustive categories).

Patients are arranged in rows and raters in columns. I calculated overall Fleiss kappa for all patients and all raters but I would be also interested in identifying patients that are the most debatable/controversial from the diagnostic point of view. I am not sure whether Fleiss kappa is the best solution here as I understand its main goal is to assess reliability of raters while I am more interested in recognizing clinical phenotypes that cause disagreement between raters. Maybe Fleiss kappa's pi (=the extent to which raters agree for the i-th subject) would be appropriate? However the Fleiss kappa Excel spreadsheet I downloaded from Jason E. King's website does not calculate that. Brian, thanks a lot for your kindness, does your macro calculate pi for Fleiss kappa?

I would appreciate any comments.

With best regards,

Mack

bdates

Nov 05, 2014; 1:56pm

Re: Inter-rater agreement for multiple raters or something else?

Mack,

The Excel sheet on Jason's website is mine, but there are problems with it. For some reason, it keeps getting corrupted, so I'd be careful about the results. I'll send my macro offline. More to the point, Fleiss is one of the few authors that provided 'official' formulae for category kappa's as well as an overall solution. I'd be interested if he actually had formulae for individual cases/subjects. I've never seen that in the literature. As an idea, you could write syntax to set up a loop for each item that would count each of the four values assigned to your diagnoses for each case, then compute a variable that would count the number of diagnoses with more than one, or two, or ... occurrences (whatever value you set as a cutoff). That would give you an idea of the raw agreement and help distinguish 'difficult' patients from 'easy' patients.

Brian Dates, M.A.
Director of Evaluation and Research | Evaluation & Research | Southwest Counseling Solutions
Southwest Solutions
1700 Waterman, Detroit, MI 48209
313-841-8900 (x7442) office | 313-849-2702 fax
[hidden email] | www.swsol.org

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of MJury
Sent: Tuesday, November 04, 2014 6:03 PM
To: [hidden email]
Subject: Re: Inter-rater agreement for multiple raters or something else?

Dear Art and David, thanks so much for your interest!

A, B, C and D are four possible diagnostic categories and raters were asked to choose only one of them (mutually exclusive and exhaustive categories).

Patients are arranged in rows and raters in columns. I calculated overall Fleiss kappa for all patients and all raters but I would be also interested in identifying patients that are the most debatable/controversial from the diagnostic point of view. I am not sure whether Fleiss kappa is the best solution here as I understand its main goal is to assess reliability of raters while I am more interested in recognizing clinical phenotypes that cause disagreement between raters. Maybe Fleiss kappa's pi (=the extent to which raters agree for the i-th subject) would be appropriate? However the Fleiss kappa Excel spreadsheet I downloaded from Jason E. King's website does not calculate that. Brian, thanks a lot for your kindness, does your macro calculate pi for Fleiss kappa?

I would appreciate any comments.

With best regards,

Mack

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Inter-rater-agreement-for-multiple-raters-or-something-else-tp5727786p5727790.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall

Nov 05, 2014; 4:46pm

Re: Inter-rater agreement for multiple raters or something else?

In reply to this post by bdates

one way to look at your data would be to use
recode A,B,C,D to 1 to 4 and use MULT RESPONSE and treat the 30 judges as a set. (or transpose the file ad use profiles as the repeats). Cross the set by itself.
something like
/tables= judges by judges
or
/tables = profiles by profiles.

You should be able to do something similar with CTABLES.

Art Kendall
Social Research Consultants

MJury

Nov 12, 2014; 5:39pm

RE: Inter-rater agreement for multiple raters or something else?

In reply to this post by bdates

Dear Brian,

Thanks a lot for the macro, I was able to launch it and it works fine. However, overall Fleiss kappa is not perfect for my project as it depends heavily on how many ratings I have for each category in the total cohort of patients regardless of the agreement between raters for individual patients (so for example imagine that I have 100% agreement between raters and two possible situations: A) 6 patients were diagnosed with A and 6 patients were diagnosed with B; B) 2 patients were diagnosed with A and 10 were diagnosed with B. In these two situations overall Fleiss kappa will be different although there is 100% agreement between raters for individual patients. Worth mentioning that there is no gold standard for the diagnosis in my cohort (the ultimate diagnosis is not known).

I think the only thing I can do with Fleiss kappa is to calculate the proportion of inter-rater agreement for each patient.

Do you think Krippendorf alpha or Gwet AC would work better for this project?

With kind regards,

Mack

From: bdates [via SPSSX Discussion] [mailto:ml-node+[hidden email]]
Sent: 05 November 2014 14:04
To: Maciej Jurynczyk
Subject: Re: Inter-rater agreement for multiple raters or something else?

If you reply to this email, your message will be added to the discussion below:

http://spssx-discussion.1045642.n5.nabble.com/Inter-rater-agreement-for-multiple-raters-or-something-else-tp5727786p5727794.html

To unsubscribe from Inter-rater agreement for multiple raters or something else?, click here.
NAML

MJury

Nov 12, 2014; 5:43pm

Re: Inter-rater agreement for multiple raters or something else?

In reply to this post by Art Kendall

Dear Art,

Thanks a lot for your reply. Could you please give me more details on this method? I am not familiar with these terms..

Kind regards,

Mack