SPSSX Discussion - Re: inter-rater reliability with multiple raters

Re: inter-rater reliability with multiple raters

Posted by Jon K Peck on Jun 16, 2014; 2:41pm
URL: http://spssx-discussion.165.s1.nabble.com/inter-rater-reliability-with-multiple-raters-tp5726465p5726483.html

"IBM SPSS doesn't have a program to calculate Fleiss kappa (that I know of) "

See the STATS FLEISS KAPPA custom dialog. It requires the Python Essentials and can be downloaded from the Utilities menu in Statistics 22 or from the Extension Commands collection of the SPSS Community website (www.ibm.com/developerworks/spssdevcentral)

Provides overall estimate of kappa, along with asymptotic standard error, Z statistic, significance or p value under the null hypothesis of chance agreement and confidence interval for kappa. (Standard errors are based on Fleiss et al., 1979 and Fleiss et al., 2003. Test statistic is based on Fleiss et al., 2003.) Also provides these statistics for individual categories, as well as conditional probabilities for categories, which according to Fleiss (1971, p. 381) are probabilities of a second object being assigned to a category given that the first object was assigned to that category.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621

From: Max Jasper <[hidden email]>
To: [hidden email],
Date: 06/16/2014 08:33 AM
Subject: Re: [SPSSX-L] inter-rater reliability with multiple raters
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Check these out may help:
ftp://ftp.boulder.ibm.com/software/analytics/spss/support/Stats/Docs/Statistics/Macros/Iccsf.htm

Hi everyone! I need help with a research assignment. I'm new to IBM SPSS statistics, and actually statistics in general, so i'm pretty overwhelmed.

My coworkers and I created a new observation scale to improve the concise transfer of information between nurses and other psychiatric staff. This scale is designed to facilitate clinical care and outcomes related research.
Nurses and other staff members on our particular inpatient unit will use standard clinical observations to rate patient behaviors in eight categories (
1. abnormal motor activity,
2. activities of daily living,
3. bizarre/disorganized behavior,
4. medication adherence,
5. aggression,
6. observation status,
7. participation in assessment, and
8. quality of social interactions).
Each category will be given a score 0-4, and those ratings will be summed to create a a total rating. At least two nurses will rate each patient during each shift, morning and evening (so one patient should theoretically have at least four ratings per day).

My assignment is to examine the reliability and validity of this new scale, and determine its utility for transfer of information.

Right now I'm trying to figure out how to examine inter-rater reliability. IBM SPSS doesn't have a program to calculate Fleiss kappa (that I know of) and I'm not sure if that's what I should be calculating anyway...I'm confused because there are multiple raters, multiple patients, and multiple dates/times/shifts. The raters differ from day to day even on the same patient's chart, so there is a real lack of consistency in the data. Also sometimes only one rating is done on a shift...sometimes the nurses skip a shift of rating altogether. Also there are different lengths of stay for each patient, so the amount of data collected for each one differs dramatically.

I've attached a screenshot of part of our unidentified data. Can anyone please help me figure out how to determine inter-rater reliability? (Or if anyone has any insight into how to determine validity, that'd be great too!)

Thanks so much!