|
Hi all,
I was wondering if it is appropriate to include rating scale data (1 to 7) of attitudes with other types of data such as practice size, physician age, and other practice descriptors in a k-means clustering procedure. I am not sure if you can mix data types. Thanks. Rodrigo. The information transmitted is intended only for the addressee(s) and may contain confidential or privileged material, or both. Any review, receipt, dissemination or other use of this information by non-addressees is prohibited. If you received this in error or are a non-addressee, please contact the sender and delete the transmitted information. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
There is no clear cut answer. A lot depends on what you want to ask of
the data. If you have a mix of categorical and scale data TWOSTEP is designed for all scale variables, all categorical variables, or a mix of scale and categorical data. TwoStep also provides some help in deciding on the number of clusters to retain. I would not use a single method of clustering, but would base my retained clusters on consensus among very different clustering methods and proximities measures. Also, k-means is very sensitive to the order of cases. You would want to sort the cases into a few random orders to see how good the consensus is among k-means runs. K-means is for scale data (not very discrepant from interval level) so if you are worried about level of measurement using attitude scale scores would be ok on that basis. Substantively, without knowing the details of your situation, it seems unusual to have attitudes and practice characteristics is the same clustering. Without knowing more about your application, my knee-jerk reaction would be to see if there were clusters of practices, and then see if those clusters differed on attitudes. An additional exploration would be to cluster practices and to cluster attitude scale scores. Art Kendall Social Research Consultants Guerrero, Rodrigo wrote: > Hi all, > > I was wondering if it is appropriate to include rating scale data (1 to 7) of attitudes with other types of data such as practice size, physician age, and other practice descriptors in a k-means clustering procedure. I am not sure if you can mix data types. > > Thanks. > > Rodrigo. > > The information transmitted is intended only for the addressee(s) and may contain confidential or privileged material, or both. Any review, receipt, dissemination or other use of this information by non-addressees is prohibited. If you received this in error or are a non-addressee, please contact the sender and delete the transmitted information. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
In reply to this post by Guerrero, Rodrigo
Guerrero, Rodrigo wrote:
> I was wondering if it is appropriate to include rating scale data (1 > to 7) of attitudes with other types of data such as practice size, > physician age, and other practice descriptors in a k-means clustering > procedure. I am not sure if you can mix data types. This is a commonly asked question in many statistical procedures. There is no consensus in the research community about this, so you have to be prepared for a peer-reviewer to complain, no matter which approach you take. PASW Statistics/SPSS will let you include an ordinal scale variable in k-means clustering, and there are several reasons why you would want to do this. First, k-means is a descriptive procedure rather than an inferential procedure. You can't screw up something like the Type I error rate, because there is no null hypothesis that you can mistakenly reject. Second, k-means does not have a distributional requirement for the input variables. You can't blithely ignore things like extreme outliers, but the skewed pattern for much ordinal data caused by data piling up at one of the extremes is no more an issue than skewed data from a ratio scale measurement. Third, the quality of the clusters produced is likely to be better when you include more information. You could examine this by clustering with and without your ordinal variable. Keep in mind that most measures of the value of the information produced by a cluster analysis are rather subjective in nature. There are "purists" who will point out that ordinal data can never satisfy certain assumptions that would, for example, make the mean a meaningful measure. If a mean is meaningless, then k-means clustering is also meaningless. I am an "impurist" (pragmatist would be a more flattering term). I find that the mean for ordinal variables usually behaves reasonably well and provides almost as good a summary as measures like the median, which are well defined even for ordinal data. The question that distinguishes "purists" from "pragmatists" is whether you believe in grade point averages. I like them, but they do make the rather questionable assumption that a student with an A and an F is comparable to a student with a B and a D, and to a student with two Cs. My new website has discussion of a similar question in the context of ANOVA at * http://www.pmean.com/08/LikertSum.html I hope this helps. -- Steve Simon, Standard Disclaimer "The first three steps in a descriptive data analysis, with examples in PASW/SPSS" Thursday, January 21, 2010, 11am-noon, CST. Details at www.pmean.com/webinars ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
You might also consider TWOSTEP CLUSTER, which treats categorical and scale variables differently, although it makes a different set of assumptions about the variables. Regards, Jon Peck SPSS, an IBM Company [hidden email] 312-651-3435
Guerrero, Rodrigo wrote: > I was wondering if it is appropriate to include rating scale data (1 > to 7) of attitudes with other types of data such as practice size, > physician age, and other practice descriptors in a k-means clustering > procedure. I am not sure if you can mix data types. This is a commonly asked question in many statistical procedures. There is no consensus in the research community about this, so you have to be prepared for a peer-reviewer to complain, no matter which approach you take. PASW Statistics/SPSS will let you include an ordinal scale variable in k-means clustering, and there are several reasons why you would want to do this. First, k-means is a descriptive procedure rather than an inferential procedure. You can't screw up something like the Type I error rate, because there is no null hypothesis that you can mistakenly reject. Second, k-means does not have a distributional requirement for the input variables. You can't blithely ignore things like extreme outliers, but the skewed pattern for much ordinal data caused by data piling up at one of the extremes is no more an issue than skewed data from a ratio scale measurement. Third, the quality of the clusters produced is likely to be better when you include more information. You could examine this by clustering with and without your ordinal variable. Keep in mind that most measures of the value of the information produced by a cluster analysis are rather subjective in nature. There are "purists" who will point out that ordinal data can never satisfy certain assumptions that would, for example, make the mean a meaningful measure. If a mean is meaningless, then k-means clustering is also meaningless. I am an "impurist" (pragmatist would be a more flattering term). I find that the mean for ordinal variables usually behaves reasonably well and provides almost as good a summary as measures like the median, which are well defined even for ordinal data. The question that distinguishes "purists" from "pragmatists" is whether you believe in grade point averages. I like them, but they do make the rather questionable assumption that a student with an A and an F is comparable to a student with a B and a D, and to a student with two Cs. My new website has discussion of a similar question in the context of ANOVA at * http://www.pmean.com/08/LikertSum.html I hope this helps. -- Steve Simon, Standard Disclaimer "The first three steps in a descriptive data analysis, with examples in PASW/SPSS" Thursday, January 21, 2010, 11am-noon, CST. Details at www.pmean.com/webinars ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Steve Simon, P.Mean Consulting
There are "purists" who will point out that ordinal data can never I do not care if people call me "purist" or "practical". I believe that it is more practical to do right instead of wrong; and to calculate mean (and sd) on ordinal data is nothing else but wrong. All the best Wilhelm (Wille) Landerholm Queue/STATB BOX 92 162 12 Vallingby Sweden +46-735-460000 http://www.qsweden.com http://www.statb.com QUEUE/STATB - your partner in data analysis, data modeling and data mining. 2010/1/6 Steve Simon, P.Mean Consulting <[hidden email]>
|
|
List, I would like to thank everyone for their input on my k-means
clustering data questions. RG Rodrigo A. Guerrero | Director Of Marketing Research and
Analysis | The Scooter Store | 830.627.4317 From: SPSSX(r) Discussion
[mailto:[hidden email]] On Behalf Of Wilhelm Landerholm |
Queue
2010/1/6 Steve Simon, P.Mean
Consulting <[hidden email]> Guerrero, Rodrigo wrote: I was wondering if it is
appropriate to include rating scale data (1 This is a commonly asked question
in many statistical procedures. There
The information transmitted is intended only for the addressee(s) and may contain confidential or privileged material, or both. Any review, receipt, dissemination or other use of this information by non-addressees is prohibited. If you received this in error or are a non-addressee, please contact the sender and delete the transmitted information. |
| Free forum by Nabble | Edit this page |
