|
Hello,
I read the posting about calculating the Fleiss statistic as it pertained to the fellow who was looking to see the correlation of people rating a video consultation. However, I still am not completely sure how to calculate my results and I am hoping somebody might be able to help me out. My goal is to show that a rating system for looking at sinuses has good inter rater reliability. Each rater was given a video to watch of four different subsites of the paranasal sinuses (Ie, the frontal, maxillary, ethmoidal and sphenoidal). Each one of these subsites were assessed for 9 different findings (clear mucus, eos mucus, crust, pus, polypoid, polyps, cystic polyp, mucoid and edema). For each finding, the rater would assign a 0 (absent) or 1 (present) to each subsite. So a patient can then receive a score in more than one category. Thus if a patient had mucoid, polypoid and crust findings in the ethmoid cavity, the rater would mark a 1 next to these findings on their score sheet. In summary: Number of subjects = 20 Number of raters = 4 Number of subsites to look at 4 Number of characteristics: 9 (not necessarily distinct) My questions are: 1. Is it possible to use the Fleiss Kappa if the rater is allowed to mark a subject in more than one category 2. If not, is there a test that I can use that takes this into account? 3. If this is not possible, do you have a suggestion for how I can best rearrange my data? (The goal is to show that different raters can reliably score sinuses using our method) Thank you in advance for any helep you can offer, Mike |
|
Mike,
I didn't see any replies to this thread, so if for some reason they got caught in my organization's quarantine folder, I apologize. Here's my recommendation. First, I would analyze each of the subsites separately. Understanding each of these independently will tell you more about which subsites, for example, may be giving your raters difficulty. The real difficulty is that your responses are not exclusive. So for each subsite, I'd recommend offering each finding (e.g., clear mucus, eos mucus, etc.) and using a binomial response (present/not present). If you choose to do this, use integers, not zeros as you indicate, as no nominal interrater agreement statistic will handle zeros. You can do a simple recode for this. The approach posited will require a lot of data separation because it will eventuate in four 180 subject analyses (9 categories x 20 subjects) each. You might also analyze each of these separately as well, which would mean nine separate 20 subject analyses for each of the four subsites. That would probably be the most revealing - how do raters agree per subsite per category of finding. HTH. Brian -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of gastonshack Sent: Friday, September 07, 2012 7:59 PM To: [hidden email] Subject: Help with Calculating the Inter-Rater Reliability for four raters. - Bayesian Filter detected spam Hello, I read the posting about calculating the Fleiss statistic as it pertained to the fellow who was looking to see the correlation of people rating a video consultation. However, I still am not completely sure how to calculate my results and I am hoping somebody might be able to help me out. My goal is to show that a rating system for looking at sinuses has good inter rater reliability. Each rater was given a video to watch of four different subsites of the paranasal sinuses (Ie, the frontal, maxillary, ethmoidal and sphenoidal). Each one of these subsites were assessed for 9 different findings (clear mucus, eos mucus, crust, pus, polypoid, polyps, cystic polyp, mucoid and edema). For each finding, the rater would assign a 0 (absent) or 1 (present) to each subsite. So a patient can then receive a score in more than one category. Thus if a patient had mucoid, polypoid and crust findings in the ethmoid cavity, the rater would mark a 1 next to these findings on their score sheet. In summary: Number of subjects = 20 Number of raters = 4 Number of subsites to look at 4 Number of characteristics: 9 (not necessarily distinct) My questions are: 1. Is it possible to use the Fleiss Kappa if the rater is allowed to mark a subject in more than one category 2. If not, is there a test that I can use that takes this into account? 3. If this is not possible, do you have a suggestion for how I can best rearrange my data? (The goal is to show that different raters can reliably score sinuses using our method) Thank you in advance for any helep you can offer, Mike -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Help-with-Calculating-the- Inter-Rater-Reliability-for-four-raters-tp5714985.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hi Brian,
Thank you for your reply! I was thinking the same things. Initially, I was hoping to create a validated staging system but it doesn't look like I'm heading in that direction. I will take your advice and look at each category of each subsite. At least then, we can see what entities in which locations seem to cause the most problems. The only thing I did not quite understand is the part about changing the zeros so that they are a binomial response. I am new to SPSS so please bare with me. Does this mean that I should make present = 1 and not present = 2? Also if I had a 3rd response (Not visible), can I make that a 3? Thanks, Mike Chater > Subject: RE: Help with Calculating the Inter-Rater Reliability for four raters. - Bayesian Filter detected spam > Date: Wed, 12 Sep 2012 12:52:28 -0400 > From: [hidden email] > To: [hidden email]; [hidden email] > > Mike, > > I didn't see any replies to this thread, so if for some reason they got > caught in my organization's quarantine folder, I apologize. Here's my > recommendation. First, I would analyze each of the subsites separately. > Understanding each of these independently will tell you more about which > subsites, for example, may be giving your raters difficulty. The real > difficulty is that your responses are not exclusive. So for each > subsite, I'd recommend offering each finding (e.g., clear mucus, eos > mucus, etc.) and using a binomial response (present/not present). If > you choose to do this, use integers, not zeros as you indicate, as no > nominal interrater agreement statistic will handle zeros. You can do a > simple recode for this. The approach posited will require a lot of data > separation because it will eventuate in four 180 subject analyses (9 > categories x 20 subjects) each. You might also analyze each of these > separately as well, which would mean nine separate 20 subject analyses > for each of the four subsites. That would probably be the most > revealing - how do raters agree per subsite per category of finding. > HTH. > > Brian > > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of > gastonshack > Sent: Friday, September 07, 2012 7:59 PM > To: [hidden email] > Subject: Help with Calculating the Inter-Rater Reliability for four > raters. - Bayesian Filter detected spam > > Hello, > > I read the posting about calculating the Fleiss statistic as it > pertained to > the fellow who was looking to see the correlation of people rating a > video > consultation. However, I still am not completely sure how to calculate > my > results and I am hoping somebody might be able to help me out. > > My goal is to show that a rating system for looking at sinuses has good > inter rater reliability. Each rater was given a video to watch of four > different subsites of the paranasal sinuses (Ie, the frontal, maxillary, > ethmoidal and sphenoidal). Each one of these subsites were assessed for > 9 > different findings (clear mucus, eos mucus, crust, pus, polypoid, > polyps, > cystic polyp, mucoid and edema). For each finding, the rater would > assign a > 0 (absent) or 1 (present) to each subsite. So a patient can then receive > a > score in more than one category. Thus if a patient had mucoid, polypoid > and > crust findings in the ethmoid cavity, the rater would mark a 1 next to > these > findings on their score sheet. In summary: > > Number of subjects = 20 > Number of raters = 4 > Number of subsites to look at 4 > Number of characteristics: 9 (not necessarily distinct) > > My questions are: > > 1. Is it possible to use the Fleiss Kappa if the rater is allowed to > mark a > subject in more than one category > 2. If not, is there a test that I can use that takes this into account? > 3. If this is not possible, do you have a suggestion for how I can best > rearrange my data? (The goal is to show that different raters can > reliably > score sinuses using our method) > > Thank you in advance for any helep you can offer, > > Mike > > > > -- > View this message in context: > http://spssx-discussion.1045642.n5.nabble.com/Help-with-Calculating-the- > Inter-Rater-Reliability-for-four-raters-tp5714985.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD |
| Free forum by Nabble | Edit this page |
