Fleiss’ kappa was designed for nominal data. If your data are ordinal, interval, ratio, then use the ICC or other related procedure
for continuous data. The ICC provides analyses which have been found analogous to Fleiss’ weighted kappa (Fleiss and Cohen, 1973). The syntax that Max refers to looks like the most promising alternative, as long as you know what model you have. If you need
Fleiss’ kappa syntax because you have nominal data, I can send that to you offline.
Brian
From: SPSSX(r) Discussion [mailto:
Sent: Friday, June 13, 2014 11:26 PM
To:
Subject: Re: inter-rater reliability with multiple raters
Check these out may help:
ftp://ftp.boulder.ibm.com/software/analytics/spss/support/Stats/Docs/Statistics/Macros/Iccsf.htm
Hi everyone! I need help with a research assignment. I'm new to IBM SPSS statistics, and actually statistics in general, so i'm pretty overwhelmed.
My coworkers and I created a new observation scale to improve the concise transfer of information between nurses and other psychiatric staff. This scale is designed to facilitate clinical care and outcomes related research.
Nurses and other staff members on our particular inpatient unit will use standard clinical observations to rate patient behaviors in eight
categories (
1.
abnormal motor activity,
2.
activities of daily living,
3.
bizarre/disorganized behavior,
4.
medication adherence,
5.
aggression,
6.
observation status,
7.
participation in assessment, and
8.
quality of social interactions).
Each category will be given a
score 0-4, and those ratings will be summed to create a
a total rating. At least two nurses
will rate each patient during each shift, morning
and evening (so one patient should theoretically have at least four ratings per day).
My assignment is to examine the reliability
and validity of this new scale, and determine its utility for transfer of information.
Right now I'm trying to figure out how to examine inter-rater reliability. IBM SPSS doesn't have a program to calculate
Fleiss kappa
(that I know of) and I'm not sure if that's what I should be calculating anyway...I'm confused because there are multiple raters, multiple patients, and multiple dates/times/shifts. The raters differ from day to day even on the same patient's
chart, so there is a real lack of consistency in the data. Also sometimes only one rating is done on a shift...sometimes the nurses skip a shift of rating altogether. Also there are different lengths of stay for each patient, so the amount of data collected
for each one differs dramatically.
I've attached a screenshot of part of our unidentified data. Can anyone please help me figure out how to determine inter-rater reliability? (Or if anyone has any insight into how to determine validity, that'd be great too!)
Thanks so much!
Free forum by Nabble | Edit this page |