SPSSX Discussion - Re: SPSS Python Extension for Fleiss Kappa

Re: SPSS Python Extension for Fleiss Kappa

Posted by Bruce Weaver on Oct 11, 2017; 6:46pm
URL: http://spssx-discussion.165.s1.nabble.com/SPSS-Python-Extension-for-Fleiss-Kappa-tp5734963p5734998.html

Brian, when you've finished sorting it all out, I hope you'll post a summary.
(It might be cleaner to start a new thread, with a link to this one for
anyone who wants the history.) It would be great if you could upload the
data & output to Nabble again too.

Meanwhile, I've added Gwet's 2014 book (http://www.agreestat.com/book4/) to
my Amazon.ca wish list.

Cheers,
Bruce

bdates wrote
> Thanks, Jon. I have all those works. I'm in the process of making changes
> in my syntax and have alerted Kilem Gwet to check his syntax for Fleiss'
> kappa to make sure the also has those updates.
>
> Take care.
>
> Brian
> ________________________________________
> From: Jon Peck [

> jkpeck@

> ]
> Sent: Tuesday, October 10, 2017 4:28 PM
> To: Dates, Brian;

> SPSSX-L@.uga

> Subject: Re: [SPSSX-L] SPSS Python Extension for Fleiss Kappa
>
> For the SPSS extension...
>
> From the 2003 text, the formula for the standard error for each category
> used in the extension is formula 18.53 on page 616. This is based on:
>
> Fleiss, J. L., J. C. M. Nee, and J. R. Landis. (1979). Large sample
> variance of kappa in the case of different sets of raters. Psychological
> Bulletin 86: 974–977.
>
> On Tue, Oct 10, 2017 at 1:52 PM Dates, Brian <

> BDATES@

> <mailto:

> BDATES@

> >> wrote:
> Thank-you, Bruce, and please thank Daniel Klein as well. First, let me
> clarify. The overall kappa's, standard errors, etc. produced from each
> solution are the same. The category kappa's produced from each solution
> are also the same. What differs is the standard errors. My syntax and the
> SAS syntax written by Gwet (INTER_RATER.MAC) produce the same standard
> errors which are different from those produced by the SPSS or Stata
> solutions. Fleiss, Nee, and Landis (1979) updated the formula for the
> standard error for the overall kappa. It seems that all the solutions have
> adopted that. There was also an update to the category standard error,
> which produces one standard error for all categories rather than a
> separate error for each category. The SPSS Python and Stata solutions have
> incorporated this. I need to update my syntax, but also contact the
> authors of other syntax about this issue.
>
> Again, thank-you vary much for your facilitation in this situation. It has
> truly made things go much quicker than otherwise.
>
> Brian
>
> ________________________________________
> From: SPSSX(r) Discussion [

> SPSSX-L@.UGA

> <mailto:

> SPSSX-L@.UGA

> >] on behalf of Bruce Weaver [

> bruce.weaver@

> <mailto:

> bruce.weaver@

> >]
> Sent: Tuesday, October 10, 2017 3:02 PM
> To:

> SPSSX-L@.UGA

> <mailto:

> SPSSX-L@.UGA

> >
> Subject: Re: SPSS Python Extension for Fleiss Kappa
>
> Hi Brian. I also sent Daniel Klein an e-mail alerting him to this thread.
> He does not use SPSS, and is probably not going to join this list himself.
> But he did ask me to post the following comments he sent me.
>
> Cheers,
> Bruce
> -------------------------------------
>
> First, note that Gwet (2008, 2014) uses an approach to statistical
> inference
> that is quite different from anything that has been proposed so far in the
> area of inter-rater agreement. He calls what is usually done the
> "model-based" approach. Here, a hypothetical distribution of the kappa
> coefficient under H0 is used to derive test statistics for testing against
> 0. Such approaches are not necessarily valid for confidence interval
> construction (also see Reichenheim 2004). So I suppose there might be a
> reason why StataCorp decided not to return a standard error and confidence
> interval in their -kap- coammand. What
> Gwet (2008, 2014) suggest instead is a "design-based" approach that is
> based
> on finite sample theory and that is implemented in -kappaetc-. This
> explains
> the difference between the standard error that Brian obtains for combined
> kappa (0.05770; which is what you confirm using the -kap- command) and the
> SE reported by Gwet's SAS macro (0.08132) and -kappaetc- (0.08223). To say
> more about the (small) difference between the latter two, I would need to
> know which macro was used to produce the results. Brian states that he
> used
> INTER_RATER.MAC but does neither state the software version nor where it
> is
> from.
>
> [BW: I believe the loop Daniel mentions in the next paragraph is
> referring
> to the Stata output I uploaded here:
> http://spssx-discussion.1045642.n5.nabble.com/file/t7186/Fleiss_kappa_problem.txt.]
>
> When you program the loop to get kappa for each category, note that
> -kappaetc- does not(!) report the same standard error every time. It
> differs
> for each category. It is the SE you obtain from -kap- that is constant
> across categories. As I have said, I do not know whether this SE is
> trustworthy for confidence interval construction but it seems the Python
> extension uses the very same approach. Note that it is documented in [R]
> kappa. However, the SEs reported by -kappaetc- do not match Brian's or
> those
> from the SAS macro. I have no explanation for that. All I can tell you is
> that -kappaetc- does nothing special here, i.e., it retains Gwet's
> "design-based" approach. It seems the SAS macro and Brian have implemented
> another formula to get the SE for
> each category, separately (and perhaps there is a reason for this). It
> seems
> a bit odd to me that the SEs are so much higher than the combined one,
> given
> that they are all based on the same number of subjects. Also, I find it
> somehow counterintuitive to clearly rejecting the null in the combined
> coefficient when only one of the category-wise coefficients is
> statistically
> significant different from zero.
>
> I notice that the help file for -kappaetc- might be misleading or
> incomplete
> concerning differences to Stata's -kap- and -kappa-. Here is the already
> revised paragraph that will be included in an updated version of
> -kappaetc-
> (hopefuly towards the end of the year).
>
> --- Start of excerpt from help for kappaetc ----
>
> Relation to official Stata's kap and kappa commands
>
> The percent agreement that is reported by kappaetc is the same as the
> observed agreement returned by Stata's kap command. The latter, however,
> does not provide a standard error or confidence intervals for the
> coefficient.
>
> For two unique raters and no missing ratings both, Stata's kap and the
> kappaetc command, estimate the same Cohen's kappa coefficient. The
> standard
> error and p-value will differ between the two results, though. Stata's kap
> command implements a model-based (theoretical) approximate formula for the
> standard error and reports a one-sided test using the standard normal
> distribution. kappaetc implements a design-based formula to obtain the
> standard error and reports a two-sided t test with n-1 degrees of freedom.
> kappaetc will, additionally, provide a confidence interval.
>
> For more than two (nonunique) raters and no missing ratings (i.e. a
> constant
> number of raters) both, Stata's kap (or kappa) command and kappaetc
> (possibly with the frequency option) estimate the same Fleiss' kappa
> coefficient. kappaetc will, however, not report a kappa value for each
> rating category. Standard errors and p-values will differ between the
> commands for the same reasons explained in the two unique raters case
> above.
>
> --- End of excerpt from help for kappaetc ----
>
> A last minor comment on category 4 not being used by either rater. In such
> a
> case it is useful to specify all possible rating categories with the
> -categories()- option of -kappaetc-. Failing to do so results in wrong
> estimates for Gwet's AC and the Brennan and Prediger coefficient, both of
> which depend on the number of rating categories. Note that weighted kappa
> might also be off.
>
> Best
> Daniel
>
>
> Gwet, K. L. (2014). Handbook of Inter-Rater Reliability. Gaithersburg,
> MD: Advanced Analytics, LLC.
>
> Gwet, K. L. (2008). Computing inter-rater reliability and its variance
> in the presence of high agreement. British Journal of Mathematical and
> Statistical Psycholgy, 61, 29-48.
>
> Reichenheim, M. E. (2004). Confidence intervals for the kappa
> statistic. The Stata Journal, 4, 421-428.
>
>
>
> bdates wrote
>> Thanks, Bruce. I've sent an email to him. He mentions Gwet in his
>> acknowledgments, so it'll be interesting what Gwet says about all of
>> this.
>>
>> Brian
>>
>> ________________________________________
>> From: SPSSX(r) Discussion [
>
>> SPSSX-L@.UGA
>
>> ] on behalf of Bruce Weaver [
>
>> bruce.weaver@
>
>> ]
>> Sent: Monday, October 09, 2017 4:45 PM
>> To:
>
>> SPSSX-L@.UGA
>
>> Subject: Re: SPSS Python Extension for Fleiss Kappa
>>
>> Hi Brian. The author of kappaetc can be reached via the e-mail address
>> at
>> the bottom of that text file I uploaded. You could always ask him
>> directly
>> what method(s) he used.
>>
>> Cheers,
>> Bruce
>>
>>
>>
>> bdates wrote
>>> Bruce,
>>>
>>> Thanks for this. I'm wondering what formulae they're using for category
>>> SE's. I'm going to check with other sources and get back to you. This is
>>> certainly a conundrum. We have multiple results from multiple sources. I
>>> didn't send this along, but I also used the application from
>>> RealStatistics in Excel and got the same results as my syntax and that
>>> of
>>> Gwet's SAS solution. I'm using the updated 1979 computation for the
>>> kappa's and standard errors. Maybe they're using the 1971 version.
>>>
>>> Brian
>>>
>>>
>>> ________________________________________
>>> From: SPSSX(r) Discussion [
>>
>>> SPSSX-L@.UGA
>>
>>> ] on behalf of Bruce Weaver [
>>
>>> bruce.weaver@
>>
>>> ]
>>> Sent: Monday, October 09, 2017 2:23 PM
>>> To:
>>
>>> SPSSX-L@.UGA
>>
>>> Subject: Re: SPSS Python Extension for Fleiss Kappa
>>>
>>> Thanks Brian. I don't know if this will helpful to you or not, but I've
>>> uploaded (in Nabble) a text file containing results from some analyses
>>> carried out using kappaetc, a user-written program for Stata.
>>> Unfortunately, kappaetc does not report a kappa for each category
>>> separately. But with a little programming, I was able to obtain those.
>>> But
>>> the way I did it yielded the same SE for every category, and it matched
>>> the
>>> value shown by the Python extension command for SPSS.
>>>
>>> Below my output, I appended the help file info for kappaetc (including a
>>> reference list), in case it is helpful to you.
>>>
>>> Cheers,
>>> Bruce
>>>
>>> Fleiss_kappa_problem.txt
>>> <http://spssx-discussion.1045642.n5.nabble.com/file/t7186/Fleiss_kappa_problem.txt>
>>>
>>>
>>> bdates wrote
>>>> I'm uploading a Word document with the results of my syntax, Kilem
>>>> Gwet's
>>>> SAS
>>>> macro, and the SPSS Python extension, in that order. I'm also uploading
>>>> an
>>>> SPSS data file with the raw data. Notice the Standard Error column in
>>>> the
>>>> Python extension solution. All values are the same.
>>>>
>>>> Brian Fleiss_kappa_Discrepancies.docx
>>>> <http://spssx-discussion.1045642.n5.nabble.com/file/t47633/Fleiss_kappa_Discrepancies.docx>
>>>> irr_test_data.sav
>>>> <http://spssx-discussion.1045642.n5.nabble.com/file/t47633/irr_test_data.sav>
>>>>
>>>>
>>>>
>>>> --
>>>> Sent from: http://spssx-discussion.1045642.n5.nabble.com/
>>>>
>>>> =====================
>>>> To manage your subscription to SPSSX-L, send a message to
>>>
>>>> LISTSERV@.UGA
>>>
>>>> (not to SPSSX-L), with no body text except the
>>>> command. To leave the list, send the command
>>>> SIGNOFF SPSSX-L
>>>> For a list of commands to manage subscriptions, send the command
>>>> INFO REFCARD
>>>
>>>
>>>
>>>
>>>
>>> -----
>>> --
>>> Bruce Weaver
>>
>>> bweaver@
>>
>>> http://sites.google.com/a/lakeheadu.ca/bweaver/
>>>
>>> "When all else fails, RTFM."
>>>
>>> NOTE: My Hotmail account is not monitored regularly.
>>> To send me an e-mail, please use the address shown above.
>>>
>>> --
>>> Sent from: http://spssx-discussion.1045642.n5.nabble.com/
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>
>>> LISTSERV@.UGA
>>
>>> (not to SPSSX-L), with no body text except the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>
>>> LISTSERV@.UGA
>>
>>> (not to SPSSX-L), with no body text except the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>
>>
>>
>>
>>
>> -----
>> --
>> Bruce Weaver
>
>> bweaver@
>
>> http://sites.google.com/a/lakeheadu.ca/bweaver/
>>
>> "When all else fails, RTFM."
>>
>> NOTE: My Hotmail account is not monitored regularly.
>> To send me an e-mail, please use the address shown above.
>>
>> --
>> Sent from: http://spssx-discussion.1045642.n5.nabble.com/
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>
>> LISTSERV@.UGA
>
>> (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>
>> LISTSERV@.UGA
>
>> (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>
>
>
>
>
> -----
> --
> Bruce Weaver

> bweaver@

> <mailto:

> bweaver@

> >
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> Sent from: http://spssx-discussion.1045642.n5.nabble.com/
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

> <mailto:

> LISTSERV@.UGA

> > (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

> <mailto:

> LISTSERV@.UGA

> > (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
> --
> Jon K Peck

> jkpeck@

> <mailto:

> jkpeck@

> >
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

> (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).