SPSSX Discussion - Re: test-retest reliability and Spearman-Brown split half coefficient

Re: test-retest reliability and Spearman-Brown split half coefficient

Posted by Swank, Paul R on Jan 25, 2007; 4:23pm
URL: http://spssx-discussion.165.s1.nabble.com/test-retest-reliability-and-Spearman-Brown-split-half-coefficient-tp1073459p1073460.html

I would not use the Spearman for the Likert items since the number of
ties is likely to be large. For both you could use the contingency
coefficient or Cramer's V from crosstabs. These are like Phi
coefficients but for greater than 2 by 2 tables. For the Likert items,
you could use a Mantel-Haenszel test for significance if needed since
this will take into account the ordering. For the yes, no, don't know,
the chi square test would be the appropriate test statistic.

Paul R. Swank, Ph.D. Professor
Director of Reseach
Children's Learning Institute
University of Texas Health Science Center-Houston

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Paul Mcgeoghan
Sent: Thursday, January 25, 2007 9:51 AM
To: [hidden email]
Subject: Re: test-retest reliability and Spearman-Brown split half
coefficient

Stephen,

Thanks, I was also thinking along the lines of Spearman rank-order as a
possibility.
She is only interested in comparing the single Likert items.

However, she also has the problem with Yes, No and Don't Know questions
which to me would be considered nominal rather than likert scale so what
does she do for these type of questions?

Paul

==================
Paul McGeoghan,
Application support specialist (Statistics and Databases), University
Infrastructure Group (UIG), Information Services, Cardiff University.
Tel. 02920 (875035).

>>> "Statisticsdoc" <[hidden email]> 25/01/2007 15:45:25 >>>
Paul,

I imagine that you have suggested that more data be collected (a sample
of nine is not enough to accurately estimate test-retest reliability).

Having said that, for the Likert items, you want to compute the
correlation between two time points, not the split-half reliability
(which applies to ratings that are summed into scale scores at the same
point in time). Is your customer interested in knowing about the
test-retest reliability of single items, or the test-retest reliability
of a score that has been computed from these items. For single Likert
items, a Spearman rank-order correlation would be appropriate, given the
ordinal nature of the data. For computing the test-retest reliability
of the scale total, you could use Pearson's correlation if you are
willing to assume that the sums behave like interval level data (this is
a topic that is frequently discussed on this list, as you know),
otherwise stick with Spearman's coefficient.

HTH,

Stephen Brand

For personalized and professional consultation in statistics and
research design, visit www.statisticsdoc.com

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
Paul Mcgeoghan
Sent: Thursday, January 25, 2007 10:03 AM
To: [hidden email]
Subject: test-retest reliability and Spearman-Brown split half
coefficient

Hi,

I have been reading a lot of websites relating to reliability and
agreement this week and have a customer who has carried out the
following:

She has 9 people who answered nominal questions:
Yes, No, Don't Know
on 2 seperate occasions.
She is interested in knowing whether the respondents reliably answer the
questions the same over the 2 time periods.

She also has a number of likert questions also where the same 9
respondents answered the questions at time 1 and time 2 and wants to see
if they have reliably answered each question the same over the
2 time periods.

From what I have read, this seems to be referred to as test-retest
reliability and a correlation between each pair of questions seems to be
the way to approach it.

So for the likert scale questions, I could just do a Spearman's
correlation coefficient?
For the nominal questions, what can I use?

I have also seen reference to Spearman-Brown split half coefficient
(Analyse Scale Reliability and Split-Half).
Can this be used to compare each pair of questions at the 2 time
intervals, and does the data have to be likert-scale in this instance?

Hope I have explained the problem clearly.

Thanks,
Paul

==================
Paul McGeoghan,
Application support specialist (Statistics and Databases), University
Infrastructure Group (UIG), Information Services, Cardiff University.
Tel. 02920 (875035).