|
I've been looking at correlations between Likert scale items, such as an
agreement or satisfaction scale. I know that even though we often code them that way, items on a numerical scale are often not truly numeric, but should at least be ordinal. In comparing results between ordinary PCA and CATPCA, in order to understand my data better, I started doing crosstabs between variables reported as highly correlated by PCA or CATPCA, to see what that actually meant. I immediately started gravitating to counting the number of times the answers were the same (the diagonal in the contingency table), when they were similar (adjacent items in the order, just off the diagonal, such as "strongly agree" and "agree"), and when they were dissimilar (items not adjacent on the scale and off the diagonal in the table by more than one cell). I soon was calculating % same, % similar, % different, and felt a greater sense of understanding this data, but wondering how these three numbers might be compiled into a single number. Then I began to wonder if I was reinventing a wheel already developed by someone else. However, the logic of this comparison seems unlike Kendall's Tau or Spearman's rank correlation. Can someone rescue me from the gorse bushes and point me to where the approach I was beginning to take is more fully and properly developed? A slight variance occurred with the midpoint of my 5-point scale, which was meant to be a midpoint, but was glossed as "don't know/not applicable". Questions with an unusually large number of these answers seem to have been treated very differently by conventional PCA and CATPCA, resulting in different factor loadings or "dimensions". It occurred to me that these DK/NA responses did not necessarily fit on the otherwise ordinal scale, and I began thinking of them as missing data. Suggestions appreciated. Bob Schacht Robert M. Schacht, Ph.D. <[hidden email]> Pacific Basin Rehabilitation Research & Training Center 1268 Young Street, Suite #204 Research Center, University of Hawaii Honolulu, HI 96814 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
>I soon was calculating % same, % similar, % different, ...
This looks like the Rand index, used in cluster analysis as a measure of similarity between 2 cluster membership variables. To get an idea about multivariate relations between categories, you can look at the joint plot of category points in CATPCA (/PLOT joint(varlist) ). The closer categories lie to each other, the more they have been choosen together. >It occurred to me that these DK/NA responses did not necessarily fit on the otherwise ordinal scale ... CATPCA has the missing option Impute Extra category. The quantification for the extra category is free, i.e, not restricted according to the optimal scaling level for the variable. So, if you specify the dk/na categories as missing and choose Impute Extra category, the quantified dk/na categories will not be ordinally restricted (if you have categories 1 to 5 and specify the middle category as missing, CATPCA will impute 6 for the middle category. The quantified values of categories 1, 2, 4 and 5 will be ordinally related to the original category values, but the quantified value of category 6 does not need to be higher than the quantified value of category 5). Regards, Anita van der Kooij Data Theory Group Leiden University ________________________________ From: SPSSX(r) Discussion on behalf of Bob Schacht Sent: Wed 28-Jan-09 03:23 To: [hidden email] Subject: Ordinal similarity I've been looking at correlations between Likert scale items, such as an agreement or satisfaction scale. I know that even though we often code them that way, items on a numerical scale are often not truly numeric, but should at least be ordinal. In comparing results between ordinary PCA and CATPCA, in order to understand my data better, I started doing crosstabs between variables reported as highly correlated by PCA or CATPCA, to see what that actually meant. I immediately started gravitating to counting the number of times the answers were the same (the diagonal in the contingency table), when they were similar (adjacent items in the order, just off the diagonal, such as "strongly agree" and "agree"), and when they were dissimilar (items not adjacent on the scale and off the diagonal in the table by more than one cell). I soon was calculating % same, % similar, % different, and felt a greater sense of understanding this data, but wondering how these three numbers might be compiled into a single number. Then I began to wonder if I was reinventing a wheel already developed by someone else. However, the logic of this comparison seems unlike Kendall's Tau or Spearman's rank correlation. Can someone rescue me from the gorse bushes and point me to where the approach I was beginning to take is more fully and properly developed? A slight variance occurred with the midpoint of my 5-point scale, which was meant to be a midpoint, but was glossed as "don't know/not applicable". Questions with an unusually large number of these answers seem to have been treated very differently by conventional PCA and CATPCA, resulting in different factor loadings or "dimensions". It occurred to me that these DK/NA responses did not necessarily fit on the otherwise ordinal scale, and I began thinking of them as missing data. Suggestions appreciated. Bob Schacht Robert M. Schacht, Ph.D. <[hidden email]> Pacific Basin Rehabilitation Research & Training Center 1268 Young Street, Suite #204 Research Center, University of Hawaii Honolulu, HI 96814 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. ********************************************************************** ====================To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Bob Schacht-3
At 12:56 AM 1/28/2009, Kooij, A.J. van der wrote:
> >I soon was calculating % same, % similar, % different, ... > >This looks like the Rand index, used in cluster analysis as a measure of >similarity between 2 cluster membership variables. Thank you for this suggestion. However, the Rand index handles the issue of "similarity" (i.e., an adjacent rating) differently. For my as yet unspecified index of similarity, under the random hypothesis, I would expect the cell totals to match the marginal probabilities, i.e. row total * column total /grand total, the same calculation as for a chi square. For a correlation index of +1, I would expect "% same" = 100% For a correlation index of -1, I would expect "% different" = 100%. This should happen, for example, if two questions are identical except that one is the negative of the other. For a correlation index of 0, I would expect the cell totals to match the marginal probabilities. >To get an idea about multivariate relations between categories, you can >look at the joint plot of category points in CATPCA (/PLOT joint(varlist) >). The closer categories lie to each other, the more they have been >choosen together. How does this joint plot handle "similarity"? I'm trying to break out of simple dichotomization. > >It occurred to me that these DK/NA responses did not necessarily fit on > the otherwise ordinal scale ... > >CATPCA has the missing option Impute Extra category. The quantification >for the extra category is free, i.e, not restricted according to the >optimal scaling level for the variable. So, if you specify the dk/na >categories as missing and choose Impute Extra category, the quantified >dk/na categories will not be ordinally restricted (if you have categories >1 to 5 and specify the middle category as missing, CATPCA will impute 6 >for the middle category. The quantified values of categories 1, 2, 4 and 5 >will be ordinally related to the original category values, but the >quantified value of category 6 does not need to be higher than the >quantified value of category 5). This is helpful. Thanks! Bob Schacht Robert M. Schacht, Ph.D. <[hidden email]> Pacific Basin Rehabilitation Research & Training Center 1268 Young Street, Suite #204 Research Center, University of Hawaii Honolulu, HI 96814 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
