Testing agreement between two group of raters

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Testing agreement between two group of raters

E. Bernardo
Dear everyone,
 
The 30-item competence inventory that uses four-point response format (4=always performed to 1=never performed) was admisnistered to two groups of raters (n1=40 and n2=45).  I want to test if there is an agreement in the responses of the two groups of raters on the items.  What test is appropriate?
 
Thank you in advance!
 
Eins

Reply | Threaded
Open this post in threaded view
|

Re: Testing agreement between two group of raters

Samir Paul-2
Kendall's Coefficient of Concordance is the test. SPSS doesn't calculate this anyway, but you can use R to do this in SPSS. SAS has an option to compute this. The algorithm for this is not very complex, you can implement this with a programming knowledge in any language.
 
HTH.


From: Eins Bernardo <[hidden email]>
To: [hidden email]
Sent: Thu, August 12, 2010 3:19:54 AM
Subject: Testing agreement between two group of raters

Dear everyone,
 
The 30-item competence inventory that uses four-point response format (4=always performed to 1=never performed) was admisnistered to two groups of raters (n1=40 and n2=45).  I want to test if there is an agreement in the responses of the two groups of raters on the items.  What test is appropriate?
 
Thank you in advance!
 
Eins


Reply | Threaded
Open this post in threaded view
|

Re: Testing agreement between two group of raters

Bruce Weaver
Administrator
Hi Samir.  It's not clear to me how one would compare two groups of raters with Kendall's W.  Could you provide a little more detail?  

I ask, because as I understand it, Kendall's W gives a measure of inter-rater agreement within a single group of judges.  If W=0, the ratings are essentially random; and if W=1, there is unanimous agreement.  One could compute W for each of the groups separately.  The results would concern agreement WITHIN each of the groups.  But the OP is asking about agreement BETWEEN the groups.  

Thanks for clarifying.

Bruce

Samir Paul-2 wrote
Kendall's Coefficient of Concordance is the test. SPSS doesn't calculate this
anyway, but you can use R to do this in SPSS. SAS has an option to compute this.
The algorithm for this is not very complex, you can implement this with a
programming knowledge in any language.

HTH.




________________________________
From: Eins Bernardo <einsbernardo@yahoo.com.ph>
To: SPSSX-L@LISTSERV.UGA.EDU
Sent: Thu, August 12, 2010 3:19:54 AM
Subject: Testing agreement between two group of raters


Dear everyone,

The 30-item competence inventory that uses four-point response format (4=always
performed to 1=never performed) was admisnistered to two groups of raters (n1=40
and n2=45).  I want to test if there is an agreement in the responses of the two
groups of raters on the items.  What test is appropriate?

Thank you in advance!

Eins

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Testing agreement between two group of raters

Bruce Weaver
Administrator
Having looked at the Wikipedia page (http://en.wikipedia.org/wiki/Kendall%27s_W), I have a further questions about how one would use Kendall's W here.  Even if there is a way to compare two groups, it seems to me that Kendall's W is appropriate when judges are ranking some set of objects from 1 to n.  But in the case Eins described, judges are not rank ordering the 30 items from the competence inventory.  Rather, they are assigning each one a value from 1-4.  In other words, for each item, there is a 2 x 4 (groups x response options) table.  

I'm now at the point of thinking out loud, so bear that in mind if I suggest something ridiculous.  ;-)

For each item, then, one could calculate a Pearson chi-square; or given the ordinal nature of the response options, the test of "linear-by-linear" association.  But there would be a big multiple testing problem if one did that:  Some of those 30 tests would be bound to be significant by chance alone.  So...I think I would look into using GENLIN to run some kind of ordinal logistic regression--but it would have to be with GEE to account for the 30 repeated items within ID (Analyze - Generalized Linear Models - GEE).

Bruce


Bruce Weaver wrote
Hi Samir.  It's not clear to me how one would compare two groups of raters with Kendall's W.  Could you provide a little more detail?  

I ask, because as I understand it, Kendall's W gives a measure of inter-rater agreement within a single group of judges.  If W=0, the ratings are essentially random; and if W=1, there is unanimous agreement.  One could compute W for each of the groups separately.  The results would concern agreement WITHIN each of the groups.  But the OP is asking about agreement BETWEEN the groups.  

Thanks for clarifying.

Bruce

Samir Paul-2 wrote
Kendall's Coefficient of Concordance is the test. SPSS doesn't calculate this
anyway, but you can use R to do this in SPSS. SAS has an option to compute this.
The algorithm for this is not very complex, you can implement this with a
programming knowledge in any language.

HTH.




________________________________
From: Eins Bernardo <einsbernardo@yahoo.com.ph>
To: SPSSX-L@LISTSERV.UGA.EDU
Sent: Thu, August 12, 2010 3:19:54 AM
Subject: Testing agreement between two group of raters


Dear everyone,

The 30-item competence inventory that uses four-point response format (4=always
performed to 1=never performed) was admisnistered to two groups of raters (n1=40
and n2=45).  I want to test if there is an agreement in the responses of the two
groups of raters on the items.  What test is appropriate?

Thank you in advance!

Eins

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Testing agreement between two group of raters

Hurtz, Gregory M
Eins,
A graduate student and I dealt with this issue in the following article:

Pasisz, D. J., & Hurtz, G. M. (2009). Testing for between-group differences in within-group interrater agreement. Organizational Research Methods, 12, 590-613.

We were focused on the r(wg) statistic, which has been shown to be an extension of kappa for use with multi-point scales like yours. We addressed testing at the level of each item, and the scale as a whole, and also addressed the issue of alpha inflation from multiple tests. SPSS syntax is provided in the appendix of the article.

I should note that our methodology, being focused on the r(wg) statistic which has become fairly common in the industrial/organizational psychology field, ultimately boils down to testing for differences in rater variances between the two groups. Given prior discussion on this list regarding the "6-point scale" issue, there may be some differing views on using variances for conceptualizing agreement on a 4-point ordinal scale.

-Greg.


--
Greg Hurtz, Ph.D.
Associate Professor
Industrial & Organizational Psychology
California State University, Sacramento
http://www.csus.edu/indiv/h/hurtzg
________________________________________
From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Bruce Weaver [[hidden email]]
Sent: Thursday, August 12, 2010 8:10 AM
To: [hidden email]
Subject: Re: Testing agreement between two group of raters

Having looked at the Wikipedia page
(http://en.wikipedia.org/wiki/Kendall%27s_W), I have a further questions
about how one would use Kendall's W here.  Even if there is a way to compare
two groups, it seems to me that Kendall's W is appropriate when judges are
ranking some set of objects from 1 to n.  But in the case Eins described,
judges are not rank ordering the 30 items from the competence inventory.
Rather, they are assigning each one a value from 1-4.  In other words, for
each item, there is a 2 x 4 (groups x response options) table.

I'm now at the point of thinking out loud, so bear that in mind if I suggest
something ridiculous.  ;-)

For each item, then, one could calculate a Pearson chi-square; or given the
ordinal nature of the response options, the test of "linear-by-linear"
association.  But there would be a big multiple testing problem if one did
that:  Some of those 30 tests would be bound to be significant by chance
alone.  So...I think I would look into using GENLIN to run some kind of
ordinal logistic regression--but it would have to be with GEE to account for
the 30 repeated items within ID (Analyze - Generalized Linear Models - GEE).

Bruce



Bruce Weaver wrote:

>
> Hi Samir.  It's not clear to me how one would compare two groups of raters
> with Kendall's W.  Could you provide a little more detail?
>
> I ask, because as I understand it, Kendall's W gives a measure of
> inter-rater agreement within a single group of judges.  If W=0, the
> ratings are essentially random; and if W=1, there is unanimous agreement.
> One could compute W for each of the groups separately.  The results would
> concern agreement WITHIN each of the groups.  But the OP is asking about
> agreement BETWEEN the groups.
>
> Thanks for clarifying.
>
> Bruce
>
>
> Samir Paul-2 wrote:
>>
>> Kendall's Coefficient of Concordance is the test. SPSS doesn't calculate
>> this
>> anyway, but you can use R to do this in SPSS. SAS has an option to
>> compute this.
>> The algorithm for this is not very complex, you can implement this with a
>> programming knowledge in any language.
>>
>> HTH.
>>
>>
>>
>>
>> ________________________________
>> From: Eins Bernardo <[hidden email]>
>> To: [hidden email]
>> Sent: Thu, August 12, 2010 3:19:54 AM
>> Subject: Testing agreement between two group of raters
>>
>>
>> Dear everyone,
>>
>> The 30-item competence inventory that uses four-point response format
>> (4=always
>> performed to 1=never performed) was admisnistered to two groups of raters
>> (n1=40
>> and n2=45).� � I want to test if there is an agreement in the responses of
>> the two
>> groups of raters� on the items.�  What test is appropriate?
>>
>> Thank you in advance!
>>
>> Eins
>>
>>
>>
>>
>
>


-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Testing-agreement-between-two-group-of-raters-tp2473077p2473323.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Testing agreement between two group of raters

E. Bernardo
Dear Greg,
 
Thank you for your article.  How is your index of agreement differed with the one presented in the following article:
 
Vanvelle, S. (2009) Agreement between two groups of raters.  Available online at http://orbi.ulg.ac.be/handle/2268/9366
 
Regards,
Eins

--- On Thu, 8/12/10, Hurtz, Gregory M <[hidden email]> wrote:

From: Hurtz, Gregory M <[hidden email]>
Subject: Re: Testing agreement between two group of raters
To: [hidden email]
Date: Thursday, 12 August, 2010, 4:09 PM

Eins,
A graduate student and I dealt with this issue in the following article:

Pasisz, D. J., & Hurtz, G. M. (2009). Testing for between-group differences in within-group interrater agreement. Organizational Research Methods, 12, 590-613.

We were focused on the r(wg) statistic, which has been shown to be an extension of kappa for use with multi-point scales like yours. We addressed testing at the level of each item, and the scale as a whole, and also addressed the issue of alpha inflation from multiple tests. SPSS syntax is provided in the appendix of the article.

I should note that our methodology, being focused on the r(wg) statistic which has become fairly common in the industrial/organizational psychology field, ultimately boils down to testing for differences in rater variances between the two groups. Given prior discussion on this list regarding the "6-point scale" issue, there may be some differing views on using variances for conceptualizing agreement on a 4-point ordinal scale.

-Greg.


--
Greg Hurtz, Ph.D.
Associate Professor
Industrial & Organizational Psychology
California State University, Sacramento
http://www.csus.edu/indiv/h/hurtzg
________________________________________
From: SPSSX(r) Discussion [SPSSX-L@...] On Behalf Of Bruce Weaver [bruce.weaver@...]
Sent: Thursday, August 12, 2010 8:10 AM
To: SPSSX-L@...
Subject: Re: Testing agreement between two group of raters

Having looked at the Wikipedia page
(http://en.wikipedia.org/wiki/Kendall%27s_W), I have a further questions
about how one would use Kendall's W here.  Even if there is a way to compare
two groups, it seems to me that Kendall's W is appropriate when judges are
ranking some set of objects from 1 to n.  But in the case Eins described,
judges are not rank ordering the 30 items from the competence inventory.
Rather, they are assigning each one a value from 1-4.  In other words, for
each item, there is a 2 x 4 (groups x response options) table.

I'm now at the point of thinking out loud, so bear that in mind if I suggest
something ridiculous.  ;-)

For each item, then, one could calculate a Pearson chi-square; or given the
ordinal nature of the response options, the test of "linear-by-linear"
association.  But there would be a big multiple testing problem if one did
that:  Some of those 30 tests would be bound to be significant by chance
alone.  So...I think I would look into using GENLIN to run some kind of
ordinal logistic regression--but it would have to be with GEE to account for
the 30 repeated items within ID (Analyze - Generalized Linear Models - GEE).

Bruce



Bruce Weaver wrote:

>
> Hi Samir.  It's not clear to me how one would compare two groups of raters
> with Kendall's W.  Could you provide a little more detail?
>
> I ask, because as I understand it, Kendall's W gives a measure of
> inter-rater agreement within a single group of judges.  If W=0, the
> ratings are essentially random; and if W=1, there is unanimous agreement.
> One could compute W for each of the groups separately.  The results would
> concern agreement WITHIN each of the groups.  But the OP is asking about
> agreement BETWEEN the groups.
>
> Thanks for clarifying.
>
> Bruce
>
>
> Samir Paul-2 wrote:
>>
>> Kendall's Coefficient of Concordance is the test. SPSS doesn't calculate
>> this
>> anyway, but you can use R to do this in SPSS. SAS has an option to
>> compute this.
>> The algorithm for this is not very complex, you can implement this with a
>> programming knowledge in any language.
>>
>> HTH.
>>
>>
>>
>>
>> ________________________________
>> From: Eins Bernardo <einsbernardo@...>
>> To: SPSSX-L@...
>> Sent: Thu, August 12, 2010 3:19:54 AM
>> Subject: Testing agreement between two group of raters
>>
>>
>> Dear everyone,
>>
>> The 30-item competence inventory that uses four-point response format
>> (4=always
>> performed to 1=never performed) was admisnistered to two groups of raters
>> (n1=40
>> and n2=45).� � I want to test if there is an agreement in the responses of
>> the two
>> groups of raters� on the items.�  What test is appropriate?
>>
>> Thank you in advance!
>>
>> Eins
>>
>>
>>
>>
>
>


-----
--
Bruce Weaver
bweaver@...
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Testing-agreement-between-two-group-of-raters-tp2473077p2473323.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Testing agreement between two group of raters

Hurtz, Gregory M
Eins,
It appears that the two indices are answering different questions:
 
1) Vanbelle: Do Groups 1 and 2 agree with one another? For example, do expert and novice groups agree (or disagree) in their ratings of some stimulus?
 
2) Pasisz & Hurtz: Does the level of agreement within Group 1 differ from the level of agreement within Group 2? For example, is there more disagreement among novices than among experts in their ratings of some stimulus?
 
Perhaps our test doesn't answer the question you are asking. Thanks for sending the link to Vanbelle.
 
-Greg.
 
 
 
--
Greg Hurtz, Ph.D.
Associate Professor
Industrial & Organizational Psychology
California State University, Sacramento
http://www.csus.edu/indiv/h/hurtzg
 

From: Eins Bernardo [[hidden email]]
Sent: Thursday, August 12, 2010 10:17 PM
To: [hidden email]; Hurtz, Gregory M
Subject: Re: Testing agreement between two group of raters

Dear Greg,
 
Thank you for your article.  How is your index of agreement differed with the one presented in the following article:
 
Vanvelle, S. (2009) Agreement between two groups of raters.  Available online at http://orbi.ulg.ac.be/handle/2268/9366
 
Regards,
Eins

--- On Thu, 8/12/10, Hurtz, Gregory M <[hidden email]> wrote:

From: Hurtz, Gregory M <[hidden email]>
Subject: Re: Testing agreement between two group of raters
To: [hidden email]
Date: Thursday, 12 August, 2010, 4:09 PM

Eins,
A graduate student and I dealt with this issue in the following article:

Pasisz, D. J., & Hurtz, G. M. (2009). Testing for between-group differences in within-group interrater agreement. Organizational Research Methods, 12, 590-613.

We were focused on the r(wg) statistic, which has been shown to be an extension of kappa for use with multi-point scales like yours. We addressed testing at the level of each item, and the scale as a whole, and also addressed the issue of alpha inflation from multiple tests. SPSS syntax is provided in the appendix of the article.

I should note that our methodology, being focused on the r(wg) statistic which has become fairly common in the industrial/organizational psychology field, ultimately boils down to testing for differences in rater variances between the two groups. Given prior discussion on this list regarding the "6-point scale" issue, there may be some differing views on using variances for conceptualizing agreement on a 4-point ordinal scale.

-Greg.


--
Greg Hurtz, Ph.D.
Associate Professor
Industrial & Organizational Psychology
California State University, Sacramento
http://www.csus.edu/indiv/h/hurtzg
________________________________________
From: SPSSX(r) Discussion [SPSSX-L@...] On Behalf Of Bruce Weaver [bruce.weaver@...]
Sent: Thursday, August 12, 2010 8:10 AM
To: SPSSX-L@...
Subject: Re: Testing agreement between two group of raters

Having looked at the Wikipedia page
(http://en.wikipedia.org/wiki/Kendall%27s_W), I have a further questions
about how one would use Kendall's W here.  Even if there is a way to compare
two groups, it seems to me that Kendall's W is appropriate when judges are
ranking some set of objects from 1 to n.  But in the case Eins described,
judges are not rank ordering the 30 items from the competence inventory.
Rather, they are assigning each one a value from 1-4.  In other words, for
each item, there is a 2 x 4 (groups x response options) table.

I'm now at the point of thinking out loud, so bear that in mind if I suggest
something ridiculous.  ;-)

For each item, then, one could calculate a Pearson chi-square; or given the
ordinal nature of the response options, the test of "linear-by-linear"
association.  But there would be a big multiple testing problem if one did
that:  Some of those 30 tests would be bound to be significant by chance
alone.  So...I think I would look into using GENLIN to run some kind of
ordinal logistic regression--but it would have to be with GEE to account for
the 30 repeated items within ID (Analyze - Generalized Linear Models - GEE).

Bruce



Bruce Weaver wrote:
>
> Hi Samir.  It's not clear to me how one would compare two groups of raters
> with Kendall's W.  Could you provide a little more detail?
>
> I ask, because as I understand it, Kendall's W gives a measure of
> inter-rater agreement within a single group of judges.  If W=0, the
> ratings are essentially random; and if W=1, there is unanimous agreement.
> One could compute W for each of the groups separately.  The results would
> concern agreement WITHIN each of the groups.  But the OP is asking about
> agreement BETWEEN the groups.
>
> Thanks for clarifying.
>
> Bruce
>
>
> Samir Paul-2 wrote:
>>
>> Kendall's Coefficient of Concordance is the test. SPSS doesn't calculate
>> this
>> anyway, but you can use R to do this in SPSS. SAS has an option to
>> compute this.
>> The algorithm for this is not very complex, you can implement this with a
>> programming knowledge in any language.
>>
>> HTH.
>>
>>
>>
>>
>> ________________________________
>> From: Eins Bernardo <einsbernardo@...>
>> To: SPSSX-L@...
>> Sent: Thu, August 12, 2010 3:19:54 AM
>> Subject: Testing agreement between two group of raters
>>
>>
>> Dear everyone,
>>
>> The 30-item competence inventory that uses four-point response format
>> (4=always
>> performed to 1=never performed) was admisnistered to two groups of raters
>> (n1=40
>> and n2=45).� � I want to test if there is an agreement in the responses of
>> the two
>> groups of raters� on the items.�  What test is appropriate?
>>
>> Thank you in advance!
>>
>> Eins
>>
>>
>>
>>
>
>


-----
--
Bruce Weaver
bweaver@...
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Testing-agreement-between-two-group-of-raters-tp2473077p2473323.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Testing agreement between two group of raters

E. Bernardo
Eins,
It appears that the two indices are answering different questions:
 1) Vanbelle: Do Groups 1 and 2 agree with one another? For example, do expert and novice groups agree (or disagree) in their ratings of some stimulus?
 
Can we not just simply use median test or t-test to answer such question?
 
2) Pasisz & Hurtz: Does the level of agreement within Group 1 differ from the level of agreement within Group 2? For example, is there more disagreement among novices than among experts in their ratings of some stimulus?
 
I think your test is the one we are looking for.  Thank you for that acticle.
 
Eins
 


--- On Fri, 8/13/10, Hurtz, Gregory M <[hidden email]> wrote:

From: Hurtz, Gregory M <[hidden email]>
Subject: Re: Testing agreement between two group of raters
To: [hidden email]
Date: Friday, 13 August, 2010, 7:04 AM

Eins,
It appears that the two indices are answering different questions:
 
1) Vanbelle: Do Groups 1 and 2 agree with one another? For example, do expert and novice groups agree (or disagree) in their ratings of some stimulus?
 
2) Pasisz & Hurtz: Does the level of agreement within Group 1 differ from the level of agreement within Group 2? For example, is there more disagreement among novices than among experts in their ratings of some stimulus?
 
Perhaps our test doesn't answer the question you are asking. Thanks for sending the link to Vanbelle.
 
-Greg.
 
 
 
--
Greg Hurtz, Ph.D.
Associate Professor
Industrial & Organizational Psychology
California State University, Sacramento
http://www.csus.edu/indiv/h/hurtzg
 

From: Eins Bernardo [[hidden email]]
Sent: Thursday, August 12, 2010 10:17 PM
To: [hidden email]; Hurtz, Gregory M
Subject: Re: Testing agreement between two group of raters

Dear Greg,
 
Thank you for your article.  How is your index of agreement differed with the one presented in the following article:
 
Vanvelle, S. (2009) Agreement between two groups of raters.  Available online at http://orbi.ulg.ac.be/handle/2268/9366
 
Regards,
Eins

--- On Thu, 8/12/10, Hurtz, Gregory M <[hidden email]> wrote:

From: Hurtz, Gregory M <[hidden email]>
Subject: Re: Testing agreement between two group of raters
To: [hidden email]
Date: Thursday, 12 August, 2010, 4:09 PM

Eins,
A graduate student and I dealt with this issue in the following article:

Pasisz, D. J., & Hurtz, G. M. (2009). Testing for between-group differences in within-group interrater agreement. Organizational Research Methods, 12, 590-613.

We were focused on the r(wg) statistic, which has been shown to be an extension of kappa for use with multi-point scales like yours. We addressed testing at the level of each item, and the scale as a whole, and also addressed the issue of alpha inflation from multiple tests. SPSS syntax is provided in the appendix of the article.

I should note that our methodology, being focused on the r(wg) statistic which has become fairly common in the industrial/organizational psychology field, ultimately boils down to testing for differences in rater variances between the two groups. Given prior discussion on this list regarding the "6-point scale" issue, there may be some differing views on using variances for conceptualizing agreement on a 4-point ordinal scale.

-Greg.


--
Greg Hurtz, Ph.D.
Associate Professor
Industrial & Organizational Psychology
California State University, Sacramento
http://www.csus.edu/indiv/h/hurtzg
________________________________________
From: SPSSX(r) Discussion [SPSSX-L@...] On Behalf Of Bruce Weaver [bruce.weaver@...]
Sent: Thursday, August 12, 2010 8:10 AM
To: SPSSX-L@...
Subject: Re: Testing agreement between two group of raters

Having looked at the Wikipedia page
(http://en.wikipedia.org/wiki/Kendall%27s_W), I have a further questions
about how one would use Kendall's W here.  Even if there is a way to compare
two groups, it seems to me that Kendall's W is appropriate when judges are
ranking some set of objects from 1 to n.  But in the case Eins described,
judges are not rank ordering the 30 items from the competence inventory.
Rather, they are assigning each one a value from 1-4.  In other words, for
each item, there is a 2 x 4 (groups x response options) table.

I'm now at the point of thinking out loud, so bear that in mind if I suggest
something ridiculous.  ;-)

For each item, then, one could calculate a Pearson chi-square; or given the
ordinal nature of the response options, the test of "linear-by-linear"
association.  But there would be a big multiple testing problem if one did
that:  Some of those 30 tests would be bound to be significant by chance
alone.  So...I think I would look into using GENLIN to run some kind of
ordinal logistic regression--but it would have to be with GEE to account for
the 30 repeated items within ID (Analyze - Generalized Linear Models - GEE).

Bruce



Bruce Weaver wrote:

>
> Hi Samir.  It's not clear to me how one would compare two groups of raters
> with Kendall's W.  Could you provide a little more detail?
>
> I ask, because as I understand it, Kendall's W gives a measure of
> inter-rater agreement within a single group of judges.  If W=0, the
> ratings are essentially random; and if W=1, there is unanimous agreement.
> One could compute W for each of the groups separately.  The results would
> concern agreement WITHIN each of the groups.  But the OP is asking about
> agreement BETWEEN the groups.
>
> Thanks for clarifying.
>
> Bruce
>
>
> Samir Paul-2 wrote:
>>
>> Kendall's Coefficient of Concordance is the test. SPSS doesn't calculate
>> this
>> anyway, but you can use R to do this in SPSS. SAS has an option to
>> compute this.
>> The algorithm for this is not very complex, you can implement this with a
>> programming knowledge in any language.
>>
>> HTH.
>>
>>
>>
>>
>> ________________________________
>> From: Eins Bernardo <einsbernardo@...>
>> To: SPSSX-L@...
>> Sent: Thu, August 12, 2010 3:19:54 AM
>> Subject: Testing agreement between two group of raters
>>
>>
>> Dear everyone,
>>
>> The 30-item competence inventory that uses four-point response format
>> (4=always
>> performed to 1=never performed) was admisnistered to two groups of raters
>> (n1=40
>> and n2=45).� � I want to test if there is an agreement in the responses of
>> the two
>> groups of raters� on the items.�  What test is appropriate?
>>
>> Thank you in advance!
>>
>> Eins
>>
>>
>>
>>
>
>


-----
--
Bruce Weaver
bweaver@...
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Testing-agreement-between-two-group-of-raters-tp2473077p2473323.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Kendall's W [was Re: Testing agreement between two group of raters]

Alex Reutter
In reply to this post by Hurtz, Gregory M

Just a minor correction to Samir's note.  SPSS does, in fact, offer Kendall's coefficient of concordance (W) for k related samples in both the NPAR TESTS and (if you have Statistics 18) NPTESTS procedures.

Alex


> Samir Paul-2 wrote:
>>
>> Kendall's Coefficient of Concordance is the test. SPSS doesn't calculate
>> this
>> anyway, but you can use R to do this in SPSS. SAS has an option to
>> compute this.
>> The algorithm for this is not very complex, you can implement this with a
>> programming knowledge in any language.
>>
>> HTH.
>>
Reply | Threaded
Open this post in threaded view
|

Re: Testing agreement between two group of raters

E. Bernardo
In reply to this post by Samir Paul-2
Hi all,
 
My original post was this:
The 30-item competence inventory that uses four-point response format (4=always performed to 1=never performed) was admisnistered to two groups of raters (n1=40 and n2=45).  I want to test if there is an agreement in the responses of the two groups of raters on the items.  What test is appropriate?
 
Samir replied:
Kendall's Coefficient of Concordance is the test. SPSS doesn't calculate this anyway, but you can use R to do this in SPSS. SAS has an option to compute this. The algorithm for this is not very complex, you can implement this with a programming knowledge in any language.
 
Can we really use Kendall's Coefficient of Concordance ? I appreciate any inputs.
 
Eins

--- On Thu, 8/12/10, Samir Paul <[hidden email]> wrote:

From: Samir Paul <[hidden email]>
Subject: Re: Testing agreement between two group of raters
To: [hidden email]
Date: Thursday, 12 August, 2010, 12:13 PM

Kendall's Coefficient of Concordance is the test. SPSS doesn't calculate this anyway, but you can use R to do this in SPSS. SAS has an option to compute this. The algorithm for this is not very complex, you can implement this with a programming knowledge in any language.
 
HTH.


From: Eins Bernardo <[hidden email]>
To: [hidden email]
Sent: Thu, August 12, 2010 3:19:54 AM
Subject: Testing agreement between two group of raters

Dear everyone,
 
The 30-item competence inventory that uses four-point response format (4=always performed to 1=never performed) was admisnistered to two groups of raters (n1=40 and n2=45).  I want to test if there is an agreement in the responses of the two groups of raters on the items.  What test is appropriate?
 
Thank you in advance!
 
Eins