|
Solutions are not unique. There are many different distance/similarity coefficients, and many agglomeration (putting together) methods. Assignments of cases to clusters will be different according to the coefficient and method chosen. K-Means (Quick Cluster) clearly can show different clusters with different orders of the cases. It is usually advisable to use several random orderings of the cases. TWOSTEP can also give different results depending on case order in the file. In the mid-70s I started using a process that I called "core clustering". I used a few similarity measures and several agglomeration methods. Cases that fell into interpretable clusters together were treated as core clusters. Other cases were coded as ungrouped. I then iteratively used the classification phase of DFA to assign the ungrouped cases to the cores. After each iteration, cases that had a low value for the groups that had the highest probability of membership, or that had a small probability of belonging to that cluster based on its distance from the centroid were treated as ungrouped in the next round. When most of the cases remained in the assigned core cluster and the profiles were meaningful, that cluster assignment was used in further analysis. These days, I first get cores from a few runs of K-Means, and then cores from a few runs of TWOSTEP. I then find cores that agree across other methods and coefficients. Clusters are not necessarily "pure". In some circumstances a case can 'belong" to more than 1 cluster. For example, in the late 60s or early 70s Lorr found a group of cases that were sort of like paranoids and sort of like schizophrenics. This led to the DSM differentiating "paranoid schizophrenics" from other kinds of schizophrenics. Sometimes some cases just won't fit in. Also, sometimes cases just stay as singletons. For example, in work I did at Census on clusters of counties in western states, Los Angeles County stayed a singleton, and Yellowstone National Park County stayed a singleton. Validation of the final clustering solution is done on the basis of interpretability and on the basis on relation of the new nominal level variable (assignments) to variables that were not included in the clustering. For example, cluster representing different kinds of classroom environment clusters showed different outcome profiles. As another example, mapping of counties that grouped together on poverty and housing variables showed definite patterns when placed on a choropleth map. Art Kendall Social Research Consultants On 5/12/2011 6:56 AM, SoS Statistical Services wrote: ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
In reply to this post by SoS Statistical Services
Hi Evie,
I notice that you seems to have included some demographic variables in your cluster analysis. The danger of including a demographic variable in the clustering analysis is that the cluster might not be unique. An extreme example is to have gender place in a cluster analysis which turns out that 2 clusters have male and female respondents in them which might not be useful for analysis. In an example that I had carried out recently for my client, a 2-step cluster was done on a set of choices the respondents chose in the survey. Dummy Example: What do you do on Sunday? (Multiple Answers Question) Swimming (1-Yes, 0-No) Play Tennis (1-Yes, 0-No) Jogging (1-Yes, 0-No) Listen to Music at Home (1-Yes, 0-No) Read Books at Home (1-Yes, 0-No) So end up I have clusters like respondents doing a cocktail of activites based on 2 step example: 1st Cluster: swimming, play tennis, jogging. 2nd Cluster: Listen to music at home, Read Books at Home. To give a meaningful title to the cluster, I will rename it say cluster 1: Sporty respondent cluster 2: Respondent who like to stay at home Of course after which to know more about these clusters, what I did was to perform a decison tree to profile these clusters based on the demographics. I do understand that having a lack of knowledge on your project objectives, it will be very hasty of me to advice you this way but I hope that my advice could help you. Warm Regards Dorraj Oet Date: Thu, 12 May 2011 11:56:52 +0100 From: [hidden email] Subject: Re: cluster analysis methods To: [hidden email]
|
|
If you are going to use categorical
demographic variables in your cluster anlaysis I believe you have to use the
2-step option, not hierarchical. And you have to use the log-likelihood
distance measure within 2-step. This is my understanding. Thanks matt Matthew Pirritano, Ph.D. Research Analyst IV Medical Services Initiative (MSI) Orange County Health Care Agency (714) 568-5648 From: SPSSX(r)
Discussion [mailto:[hidden email]] On
Behalf Of SoS Statistical Services
|
Free forum by Nabble | Edit this page |