Re: inter-rater reliability with multiple raters
Posted by
Jon K Peck on
Jun 16, 2014; 2:41pm
URL: http://spssx-discussion.165.s1.nabble.com/inter-rater-reliability-with-multiple-raters-tp5726465p5726483.html
"IBM SPSS doesn't have a program
to calculate Fleiss
kappa (that I know of) "
See the STATS FLEISS KAPPA custom dialog.
It requires the Python Essentials and can be downloaded from the
Utilities menu in Statistics 22 or from the Extension Commands collection
of the SPSS Community website (www.ibm.com/developerworks/spssdevcentral)
Provides overall estimate of kappa, along
with asymptotic standard error, Z statistic, significance or p value under
the null hypothesis of chance agreement and confidence interval for kappa.
(Standard errors are based on Fleiss et al., 1979 and Fleiss et al., 2003.
Test statistic is based on Fleiss et al., 2003.) Also provides these statistics
for individual categories, as well as conditional probabilities for categories,
which according to Fleiss (1971, p. 381) are probabilities of a second
object being assigned to a category given that the first object was assigned
to that category.
Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621
From:
Max Jasper <[hidden email]>
To:
[hidden email],
Date:
06/16/2014 08:33 AM
Subject:
Re: [SPSSX-L]
inter-rater reliability with multiple raters
Sent by:
"SPSSX(r)
Discussion" <[hidden email]>
Check these out may
help:
ftp://ftp.boulder.ibm.com/software/analytics/spss/support/Stats/Docs/Statistics/Macros/Iccsf.htm
Hi everyone! I need help with a
research assignment. I'm new to IBM SPSS statistics, and actually statistics
in general, so i'm pretty overwhelmed.
My coworkers and I created a new observation scale to improve the concise
transfer of information between nurses and other psychiatric staff. This
scale is designed to facilitate clinical care and outcomes related research.
Nurses and other staff members
on our particular inpatient unit will use standard clinical observations
to rate patient behaviors in eight categories (
1. abnormal
motor activity,
2. activities
of daily living,
3. bizarre/disorganized
behavior,
4. medication
adherence,
5. aggression,
6. observation
status,
7. participation
in assessment, and
8. quality
of social interactions).
Each category will be given a score
0-4, and those ratings will be summed to create a a total rating. At
least two nurses will rate each
patient during each shift, morning
and evening
(so one patient should theoretically
have at least four ratings per day).
My assignment is to examine the reliability
and validity
of this new scale, and determine
its utility for transfer of information.
Right now I'm trying to figure out how to examine inter-rater
reliability. IBM SPSS doesn't
have a program to calculate Fleiss
kappa (that I know of) and
I'm not sure if that's what I should be calculating anyway...I'm confused
because there are multiple raters, multiple patients, and multiple dates/times/shifts.
The raters differ from day to day even on the same patient's chart, so
there is a real lack of consistency in the data. Also sometimes only one
rating is done on a shift...sometimes the nurses skip a shift of rating
altogether. Also there are different lengths of stay for each patient,
so the amount of data collected for each one differs dramatically.
I've attached a screenshot of part of our unidentified data. Can anyone
please help me figure out how to determine inter-rater reliability? (Or
if anyone has any insight into how to determine validity, that'd be great
too!)
Thanks so much!