David (and other Listserv readers),
I apologize for the uninformative subject line and quoting the whole digest! Responding to a digest email is clearly not the best way to post to the listserv. Thanks for the tip. I'll pursue it. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
Here is the text of the original...
"I'd like feedback and suggestions on my intended use of Fleiss' Kappa to assess interrater agreement in a job analysis study that we are doing for a new job that has no existing incumbents. In this job analysis we are collecting interview data from about 40 subject matter experts. Our interviews are essentially detailed discussions of working conditions/environment, work tasks, and requisite KSAOs (knowledge, skills, abilities, and other characteristics) that are important for employees in this particular job. We have a list of about 50 personal characteristics (e.g., persistence, trustworthiness, creative thinking ability) that our literature review suggests are likely to be related to the particular job we are analyzing. We intend to have 3 or 4 raters read all interviews and rate each on the presence or absence of all 50 job-related factors. Since our ratings are categorical (yes/no), it appears that Fleiss' Kappa is the proper interrater agreement statistic, but all of the illustrations, that I have seen, of the use of this statistic are for a single assignment-to-category decision, not for multiple such assignments. This suggests that we will have to calculate a Fleiss Kappa for each of our 50 personal characteristics and then combine them (a simple mean?) to obtain an indication of overall interrater agreement. My questions are: (1) Is this approach of calculating 50 separate Fleiss Kappas and then averaging them the best approach? (2) Is there a way (existing SPSS tool or Excel spreadsheet) that allows all calculations to be done in one effort, or do we have to repeat the calculation 50 times? (3) Just to help settle my theoretical ruminations: If two raters do not see a given personal characteristic in a transcript, is this agreement as meaningful as if two raters do see a given personal characteristic. Intuitively, it seems that a positive affirmation of the presence of a personal characteristic is more meaningful to the aims of the study because absence of mention doesn't necessarily mean that the characteristic is not important. Thanks in advance for your thoughts." ------ My suggestion was a search of this group because it has been previously discussed in some detail many times in the past. Brian Dates' posts in particular. It occurs to me also that there is an EXTENSION command for this (IIRC)
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Here are my reactions to the task --
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
It seems to me that you get everything that you want about an item when you look at the mean of the 3 or 4 raters: "0.0" says that they agreed on absence, and "1.0" says that they agreed on presence -- which do not mean the same thing. Once you have listed the items in decreasing order, what is there to add? - count how many are 0 and how many are 1? It seems unwarranted and not useful to compute Fleiss's Kappa between pairs of raters.... "Between a pair" is how the original Kappa is used, and how I prefer to use it. I don't gain much useful insight from knowing that they merely "agree," without the direction. -- Rich Ulrich > Date: Fri, 23 Jan 2015 19:13:31 -0700 > From: [hidden email] > Subject: Re: Fleiss Kappa > To: [hidden email] > > Here is the text of the original... > "I'd like feedback and suggestions on my intended use of Fleiss' Kappa to > assess interrater agreement in a job analysis study that we are doing for a > new job that has no existing incumbents. > > In this job analysis we are collecting interview data from about 40 subject > matter experts. Our interviews are essentially detailed discussions of > working conditions/environment, work tasks, and requisite KSAOs (knowledge, > skills, abilities, and other characteristics) that are important for > employees in this particular job. We have a list of about 50 personal > characteristics (e.g., persistence, trustworthiness, creative thinking > ability) that our literature review suggests are likely to be related to the > particular job we are analyzing. We intend to have 3 or 4 raters read all > interviews and rate each on the presence or absence of all 50 job-related > factors. Since our ratings are categorical (yes/no), it appears that > Fleiss' Kappa is the proper interrater agreement statistic, but all of the > illustrations, that I have seen, of the use of this statistic are for a > single assignment-to-category decision, not for multiple such assignments. > This suggests that we will have to calculate a Fleiss Kappa for each of our > 50 personal characteristics and then combine them (a simple mean?) to obtain > an indication of overall interrater agreement. > > My questions are: (1) Is this approach of calculating 50 separate Fleiss > Kappas and then averaging them the best approach? (2) Is there a way > (existing SPSS tool or Excel spreadsheet) that allows all calculations to be > done in one effort, or do we have to repeat the calculation 50 times? (3) > Just to help settle my theoretical ruminations: If two raters do not see a > given personal characteristic in a transcript, is this agreement as > meaningful as if two raters do see a given personal characteristic. > Intuitively, it seems that a positive affirmation of the presence of a > personal characteristic is more meaningful to the aims of the study because > absence of mention doesn't necessarily mean that the characteristic is not > important. > > Thanks in advance for your thoughts." > ------ > My suggestion was a search of this group because it has been previously > discussed in some detail many times in the past. > Brian Dates' posts in particular. > It occurs to me also that there is an EXTENSION command for this (IIRC) > > |
In reply to this post by David Marso
I'd like to offer assistance, but I'm unclear on the final data structure. That you will have 3 or 4 raters is clear. Who are the subjects/what are the items? How many will you have? Are they the "...interview data from about 40 subject matter experts."? By all means do not do 50 separate analyses, and definitely by all means do not average anything. That fits Light's kappa very loosely, but certainly not Fleiss. One option, if you want the overall score, is to have 50 lines for each of the 40 experts for a total of approximately 2000 lines. That, however, will not allow you to examine agreement per expert nor by characteristic. An option that fits Fleiss' model is to have 40 lines with 50 category options for agreement. The problem is that an infinitesimal agreement would be significant, and there'd be no practical application. Having more options than items is not recommended. I wonder if you might not reduce the number of categories first. From a practical perspective, are persons applying for this/these positions going to be rated on 50 separate characteristics? Wouldn't it be more practical to examine the 50 characteristics for commonality? If, say, 15 or so summary categories emerged, then the results would be more meaningful, and you could have raters assign the categories to each of the 40 items. By far not perfect, but better.
Brian ________________________________________ From: SPSSX(r) Discussion [[hidden email]] on behalf of David Marso [[hidden email]] Sent: Friday, January 23, 2015 9:13 PM To: [hidden email] Subject: Re: Fleiss Kappa Here is the text of the original... "I'd like feedback and suggestions on my intended use of Fleiss' Kappa to assess interrater agreement in a job analysis study that we are doing for a new job that has no existing incumbents. In this job analysis we are collecting interview data from about 40 subject matter experts. Our interviews are essentially detailed discussions of working conditions/environment, work tasks, and requisite KSAOs (knowledge, skills, abilities, and other characteristics) that are important for employees in this particular job. We have a list of about 50 personal characteristics (e.g., persistence, trustworthiness, creative thinking ability) that our literature review suggests are likely to be related to the particular job we are analyzing. We intend to have 3 or 4 raters read all interviews and rate each on the presence or absence of all 50 job-related factors. Since our ratings are categorical (yes/no), it appears that Fleiss' Kappa is the proper interrater agreement statistic, but all of the illustrations, that I have seen, of the use of this statistic are for a single assignment-to-category decision, not for multiple such assignments. This suggests that we will have to calculate a Fleiss Kappa for each of our 50 personal characteristics and then combine them (a simple mean?) to obtain an indication of overall interrater agreement. My questions are: (1) Is this approach of calculating 50 separate Fleiss Kappas and then averaging them the best approach? (2) Is there a way (existing SPSS tool or Excel spreadsheet) that allows all calculations to be done in one effort, or do we have to repeat the calculation 50 times? (3) Just to help settle my theoretical ruminations: If two raters do not see a given personal characteristic in a transcript, is this agreement as meaningful as if two raters do see a given personal characteristic. Intuitively, it seems that a positive affirmation of the presence of a personal characteristic is more meaningful to the aims of the study because absence of mention doesn't necessarily mean that the characteristic is not important. Thanks in advance for your thoughts." ------ My suggestion was a search of this group because it has been previously discussed in some detail many times in the past. Brian Dates' posts in particular. It occurs to me also that there is an EXTENSION command for this (IIRC) ----- Please reply to the list and not to my personal email. Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Fleiss-Kappa-tp5728467p5728468.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Rich Ulrich
Here are my second reactions to the task --
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
The purpose here is NOT (as I see it) to develop a rating scale to use for the discussions of experts; you are concerned with the 0/1 content of their discussions. The kappas would not address that. - The intention (as I see it) is to use the best items in some further form, as some sort of checklist for future job applicants. On the other hand, you are handing out a fairly tedious task, it seems, to a set of recruited raters. Whether there are 40 separate interviews (recorded, or on paper) to be rated, I imagine that they are being asked to do these, essentially, at one sitting. If that is the case, then it *might* be of some interest to look at the pair-wise kappas in order to detect whether one rater is unusually "random" because of boredom and inattention. - If you are sure that they are always well-motivated and capable of doing these ratings, then this step has nothing to show you. A full set of small kappas for one rater would imply that it might be best to ignore this one rater. Keep in mind that there will be a number of small kappas between raters (or "Incomputable") whenever the agreement tends to be in one direction; you only get large kappas when there are agreements on both Yes and No. As before, the multi-rater Kappa has little to show you. -- Rich Ulrich From: [hidden email] To: [hidden email] Subject: RE: Fleiss Kappa Date: Fri, 23 Jan 2015 22:00:43 -0500 Here are my reactions to the task -- It seems to me that you get everything that you want about an item when you look at the mean of the 3 or 4 raters: "0.0" says that they agreed on absence, and "1.0" says that they agreed on presence -- which do not mean the same thing. Once you have listed the items in decreasing order, what is there to add? - count how many are 0 and how many are 1? It seems unwarranted and not useful to compute Fleiss's Kappa between pairs of raters.... "Between a pair" is how the original Kappa is used, and how I prefer to use it. I don't gain much useful insight from knowing that they merely "agree," without the direction. -- Rich Ulrich > Date: Fri, 23 Jan 2015 19:13:31 -0700 > From: [hidden email] > Subject: Re: Fleiss Kappa > To: [hidden email] > > Here is the text of the original... > "I'd like feedback and suggestions on my intended use of Fleiss' Kappa to > assess interrater agreement in a job analysis study that we are doing for a > new job that has no existing incumbents. > > In this job analysis we are collecting interview data from about 40 subject > matter experts. Our interviews are essentially detailed discussions of > working conditions/environment, work tasks, and requisite KSAOs (knowledge, > skills, abilities, and other characteristics) that are important for > employees in this particular job. We have a list of about 50 personal > characteristics (e.g., persistence, trustworthiness, creative thinking > ability) that our literature review suggests are likely to be related to the > particular job we are analyzing. We intend to have 3 or 4 raters read all > interviews and rate each on the presence or absence of all 50 job-related > factors. Since our ratings are categorical (yes/no), it appears that > Fleiss' Kappa is the proper interrater agreement statistic, but all of the > illustrations, that I have seen, of the use of this statistic are for a > single assignment-to-category decision, not for multiple such assignments. > This suggests that we will have to calculate a Fleiss Kappa for each of our > 50 personal characteristics and then combine them (a simple mean?) to obtain > an indication of overall interrater agreement. > > My questions are: (1) Is this approach of calculating 50 separate Fleiss > Kappas and then averaging them the best approach? (2) Is there a way > (existing SPSS tool or Excel spreadsheet) that allows all calculations to be > done in one effort, or do we have to repeat the calculation 50 times? (3) > Just to help settle my theoretical ruminations: If two raters do not see a > given personal characteristic in a transcript, is this agreement as > meaningful as if two raters do see a given personal characteristic. > Intuitively, it seems that a positive affirmation of the presence of a > personal characteristic is more meaningful to the aims of the study because > absence of mention doesn't necessarily mean that the characteristic is not > important. > > Thanks in advance for your thoughts." > ------ > My suggestion was a search of this group because it has been previously > discussed in some detail many times in the past. > Brian Dates' posts in particular. > It occurs to me also that there is an EXTENSION command for this (IIRC) > > |
In reply to this post by Rich Ulrich
A very useful text for this is Streiner, D.L., Norman G.R. "Health Measurement Scales: A Practical Guide to their Development and Use" 2nd Ed. Oxford Medical Publications. 1995. You can obtain a cheap copy if you visit "Abebooks". You might find that the Google Group "MedStats" serves you well with regard to this discussion. Kind Regards, Martin Martin P. Holt [hidden email] Persistence and Determination Alone are Omnipotent ! If you can't explain it simply, you don't understand it well enough.....Einstein Linked In: https://www.linkedin.com/profile/edit?trk=nav_responsive_sub_nav_edit_profile
|
Free forum by Nabble | Edit this page |