SPSSX Discussion

Re: Fleiss Kappa

Posted by Rich Ulrich on Jan 24, 2015; 3:39am
URL: http://spssx-discussion.165.s1.nabble.com/Fleiss-Kappa-tp5728467p5728470.html

Here are my second reactions to the task --

The purpose here is NOT (as I see it) to develop a rating scale to
use for the discussions of experts; you are concerned with the 0/1
content of their discussions. The kappas would not address that.
- The intention (as I see it) is to use the best items in some further
form, as some sort of checklist for future job applicants.

On the other hand, you are handing out a fairly tedious task, it
seems, to a set of recruited raters. Whether there are 40 separate
interviews (recorded, or on paper) to be rated, I imagine that they
are being asked to do these, essentially, at one sitting. If that is the
case, then it *might* be of some interest to look at the pair-wise
kappas in order to detect whether one rater is unusually "random"
because of boredom and inattention. - If you are sure that they are always
well-motivated and capable of doing these ratings, then this step has
nothing to show you.

A full set of small kappas for one rater would imply that it might be best
to ignore this one rater. Keep in mind that there will be a number of small
kappas between raters (or "Incomputable") whenever the agreement tends
to be in one direction; you only get large kappas when there are agreements
on both Yes and No.

As before, the multi-rater Kappa has little to show you.

--
Rich Ulrich

From: [hidden email]
To: [hidden email]
Subject: RE: Fleiss Kappa
Date: Fri, 23 Jan 2015 22:00:43 -0500

Here are my reactions to the task --

It seems to me that you get everything that you want about an item
when you look at the mean of the 3 or 4 raters: "0.0" says that they
agreed on absence, and "1.0" says that they agreed on presence --
which do not mean the same thing. Once you have listed the items
in decreasing order, what is there to add? - count how many are 0
and how many are 1?

It seems unwarranted and not useful to compute Fleiss's Kappa between
pairs of raters.... "Between a pair" is how the original Kappa is used, and
how I prefer to use it. I don't gain much useful insight from knowing that
they merely "agree," without the direction.

--
Rich Ulrich

> Date: Fri, 23 Jan 2015 19:13:31 -0700

> From: [hidden email]
> Subject: Re: Fleiss Kappa
> To: [hidden email]
>
> Here is the text of the original...
> "I'd like feedback and suggestions on my intended use of Fleiss' Kappa to
> assess interrater agreement in a job analysis study that we are doing for a
> new job that has no existing incumbents.
>
> In this job analysis we are collecting interview data from about 40 subject
> matter experts. Our interviews are essentially detailed discussions of
> working conditions/environment, work tasks, and requisite KSAOs (knowledge,
> skills, abilities, and other characteristics) that are important for
> employees in this particular job. We have a list of about 50 personal
> characteristics (e.g., persistence, trustworthiness, creative thinking
> ability) that our literature review suggests are likely to be related to the
> particular job we are analyzing. We intend to have 3 or 4 raters read all
> interviews and rate each on the presence or absence of all 50 job-related
> factors. Since our ratings are categorical (yes/no), it appears that
> Fleiss' Kappa is the proper interrater agreement statistic, but all of the
> illustrations, that I have seen, of the use of this statistic are for a
> single assignment-to-category decision, not for multiple such assignments.
> This suggests that we will have to calculate a Fleiss Kappa for each of our
> 50 personal characteristics and then combine them (a simple mean?) to obtain
> an indication of overall interrater agreement.
>
> My questions are: (1) Is this approach of calculating 50 separate Fleiss
> Kappas and then averaging them the best approach? (2) Is there a way
> (existing SPSS tool or Excel spreadsheet) that allows all calculations to be
> done in one effort, or do we have to repeat the calculation 50 times? (3)
> Just to help settle my theoretical ruminations: If two raters do not see a
> given personal characteristic in a transcript, is this agreement as
> meaningful as if two raters do see a given personal characteristic.
> Intuitively, it seems that a positive affirmation of the presence of a
> personal characteristic is more meaningful to the aims of the study because
> absence of mention doesn't necessarily mean that the characteristic is not
> important.
>
> Thanks in advance for your thoughts."
> ------
> My suggestion was a search of this group because it has been previously
> discussed in some detail many times in the past.
> Brian Dates' posts in particular.
> It occurs to me also that there is an EXTENSION command for this (IIRC)
>
>

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD