Ordinal similarity

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Ordinal similarity

Bob Schacht-3
I've been looking at correlations between Likert scale items, such as an
agreement or satisfaction scale. I know that even though we often code them
that way, items on a numerical scale are often not truly numeric, but
should at least be ordinal.

In comparing results between ordinary PCA and CATPCA, in order to
understand my data better, I started doing crosstabs between variables
reported as highly correlated by PCA or CATPCA, to see what that actually
meant. I immediately started gravitating to counting the number of times
the answers were the same (the diagonal in the contingency table), when
they were similar (adjacent items in the order, just off the diagonal, such
as "strongly agree" and "agree"), and when they were dissimilar (items not
adjacent on the scale and off the diagonal in the table by more than one
cell). I soon was calculating % same, % similar, % different, and felt a
greater sense of understanding this data, but wondering how these three
numbers might be compiled into a single number. Then I began to wonder if I
was reinventing a wheel already developed by someone else. However, the
logic of this comparison seems unlike Kendall's Tau or Spearman's rank
correlation. Can someone rescue me from the gorse bushes and point me to
where the approach I was beginning to take is more fully and properly
developed?

A slight variance occurred with the midpoint of my 5-point scale, which was
meant to be a midpoint, but was glossed as "don't know/not applicable".
Questions with an unusually large number of these answers seem to have been
treated  very differently by conventional PCA and CATPCA, resulting in
different factor loadings or "dimensions". It occurred to me that these
DK/NA responses did not necessarily fit on the otherwise ordinal scale, and
I began thinking of them as missing data.

Suggestions appreciated.

Bob Schacht

Robert M. Schacht, Ph.D. <[hidden email]>
Pacific Basin Rehabilitation Research & Training Center
1268 Young Street, Suite #204
Research Center, University of Hawaii
Honolulu, HI 96814

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Ordinal similarity

Kooij, A.J. van der
>I soon was calculating % same, % similar, % different, ...

This looks like the Rand index, used in cluster analysis as a measure of similarity between 2 cluster membership variables.

 

To get an idea about multivariate relations between categories, you can look at the joint plot of category points in CATPCA (/PLOT joint(varlist) ). The closer categories lie to each other, the more they have been choosen together.

 

>It occurred to me that these DK/NA responses did not necessarily fit on the otherwise ordinal scale ...

CATPCA has the missing option Impute Extra category. The quantification for the extra category is free, i.e, not restricted according to the optimal scaling level for the variable. So, if you specify the dk/na categories as missing and choose Impute Extra category, the quantified dk/na categories will not be ordinally restricted (if you have categories 1 to 5 and specify the middle category as missing, CATPCA will impute 6 for the middle category. The quantified values of categories 1, 2, 4 and 5 will be ordinally related to the original category values, but the quantified value of category 6 does not need to be higher than the quantified value of category 5).

 

Regards,

Anita van der Kooij

Data Theory Group

Leiden University


________________________________

From: SPSSX(r) Discussion on behalf of Bob Schacht
Sent: Wed 28-Jan-09 03:23
To: [hidden email]
Subject: Ordinal similarity



I've been looking at correlations between Likert scale items, such as an
agreement or satisfaction scale. I know that even though we often code them
that way, items on a numerical scale are often not truly numeric, but
should at least be ordinal.

In comparing results between ordinary PCA and CATPCA, in order to
understand my data better, I started doing crosstabs between variables
reported as highly correlated by PCA or CATPCA, to see what that actually
meant. I immediately started gravitating to counting the number of times
the answers were the same (the diagonal in the contingency table), when
they were similar (adjacent items in the order, just off the diagonal, such
as "strongly agree" and "agree"), and when they were dissimilar (items not
adjacent on the scale and off the diagonal in the table by more than one
cell). I soon was calculating % same, % similar, % different, and felt a
greater sense of understanding this data, but wondering how these three
numbers might be compiled into a single number. Then I began to wonder if I
was reinventing a wheel already developed by someone else. However, the
logic of this comparison seems unlike Kendall's Tau or Spearman's rank
correlation. Can someone rescue me from the gorse bushes and point me to
where the approach I was beginning to take is more fully and properly
developed?

A slight variance occurred with the midpoint of my 5-point scale, which was
meant to be a midpoint, but was glossed as "don't know/not applicable".
Questions with an unusually large number of these answers seem to have been
treated  very differently by conventional PCA and CATPCA, resulting in
different factor loadings or "dimensions". It occurred to me that these
DK/NA responses did not necessarily fit on the otherwise ordinal scale, and
I began thinking of them as missing data.

Suggestions appreciated.

Bob Schacht

Robert M. Schacht, Ph.D. <[hidden email]>
Pacific Basin Rehabilitation Research & Training Center
1268 Young Street, Suite #204
Research Center, University of Hawaii
Honolulu, HI 96814

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
**********************************************************************



====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Ordinal similarity

Bob Schacht-3
In reply to this post by Bob Schacht-3
At 12:56 AM 1/28/2009, Kooij, A.J. van der wrote:
> >I soon was calculating % same, % similar, % different, ...
>
>This looks like the Rand index, used in cluster analysis as a measure of
>similarity between 2 cluster membership variables.

Thank you for this suggestion. However, the Rand index handles the issue of
"similarity" (i.e., an adjacent rating) differently.

For my as yet unspecified index of similarity, under the random hypothesis,
I would expect the cell totals to match the marginal probabilities, i.e.
row total * column total /grand total, the same calculation as for a chi
square.
For a correlation index of +1, I would expect "% same" = 100%
For a correlation index of -1, I would expect "% different" = 100%. This
should happen, for example, if two questions are identical except that one
is the negative of the other.
For a correlation index of 0, I would expect the cell totals to match the
marginal probabilities.

>To get an idea about multivariate relations between categories, you can
>look at the joint plot of category points in CATPCA (/PLOT joint(varlist)
>). The closer categories lie to each other, the more they have been
>choosen together.

How does this joint plot handle "similarity"? I'm trying to break out of
simple dichotomization.

> >It occurred to me that these DK/NA responses did not necessarily fit on
> the otherwise ordinal scale ...
>
>CATPCA has the missing option Impute Extra category. The quantification
>for the extra category is free, i.e, not restricted according to the
>optimal scaling level for the variable. So, if you specify the dk/na
>categories as missing and choose Impute Extra category, the quantified
>dk/na categories will not be ordinally restricted (if you have categories
>1 to 5 and specify the middle category as missing, CATPCA will impute 6
>for the middle category. The quantified values of categories 1, 2, 4 and 5
>will be ordinally related to the original category values, but the
>quantified value of category 6 does not need to be higher than the
>quantified value of category 5).

This is helpful. Thanks!

Bob Schacht


Robert M. Schacht, Ph.D. <[hidden email]>
Pacific Basin Rehabilitation Research & Training Center
1268 Young Street, Suite #204
Research Center, University of Hawaii
Honolulu, HI 96814

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD