|
Dear all,
Does anyone know about how to determine optimal number of cluster in Statistics 17.0? I have been utilising utilising k-means and hierarchical methods for cluster analysis by using Statistics 17.0. As you may be know, that in k-means method we can use both continuous and categorical variables, and we can only use continuous variables in hierarchical method. In addition, in utilising k-means method, we have to input predetermine values of k prior the clustering process. I got results from some scenarios run in SPSS, however, I have no idea about a basis to determine the optimal number of clusters. Thank you for your attention, and I highly appreciate your helps. Regards, Bernardus Wahyuputro |
|
Dear Bernardus,
The only clustering method in SPSS which can handle continuous and categorical variables simultaneously is two-step clustering. Although I personally prefer Latent Class Cluster analysis, I believe two-step clustering provides Goodness-of-fit indices (AIC and BIC) for intercomparison between models. However, I would not recommend to rely on just a statistical criterion. (How) can you describe your clusters? How many clusters do you propose and how large are they? Does the overall picture make any sense? Is the result actionable for you or your client? I find the answers to such questions more important than the AIC. K-means clustering is based upon the concept of 'distances' between respondents (usually squared Euclidean distances). Such 'distances' make sense only for metric variables. For ordered Likert scales, you must be willing to make the assumption of equal distances (between answer categories) if you want to use them for k-means clustering. In hierarchical clustering, either all variables must be binary, counts or metric but no mixture of these is allowed. It does, however, provide you with a 'dendrogram' which suggests a (statistical) optimum for the number of clusters. All three clustering methods in SPSS depend upon the (arbitrary) order of cases so it is often recommended you compare a number of solutions from different random orderings of cases. HTH, Ruben van den Berg Consultant Models & Methods TNS NIPO Email: [hidden email] Mobiel: +31 6 24641435 Telefoon: +31 20 522 5738 Internet: www.tns-nipo.com Date: Wed, 2 Jun 2010 15:06:39 +0930 From: [hidden email] Subject: Cluster analysis To: [hidden email] Dear all, Does anyone know about how to determine optimal number of cluster in Statistics 17.0? I have been utilising utilising k-means and hierarchical methods for cluster analysis by using Statistics 17.0. As you may be know, that in k-means method we can use both continuous and categorical variables, and we can only use continuous variables in hierarchical method. In addition, in utilising k-means method, we have to input predetermine values of k prior the clustering process. I got results from some scenarios run in SPSS, however, I have no idea about a basis to determine the optimal number of clusters. Thank you for your attention, and I highly appreciate your helps. Regards, Bernardus Wahyuputro Express yourself instantly with MSN Messenger! MSN Messenger |
|
In reply to this post by Bernardus Wahyuputro
Mbaye,
Maybe this, from Ruben van den Berg, will help ? (I've forwarded his posting).
It's a good idea to scan the archives.....and with enough search words, Google.
bw,
Martin
PS Analysing clusters can be done using Discriminant Analysis. When you're looking at only two clusters, I believe the results of Discriminant Analysis are mirrored by the results of a logistic regression.
----- Forwarded Message ---- From: Ruben van den Berg <[hidden email]> To: [hidden email] Sent: Wednesday, 2 June, 2010 9:23:03 Subject: Re: Cluster analysis Dear Bernardus, The only clustering method in SPSS which can handle continuous and categorical variables simultaneously is two-step clustering. Although I personally prefer Latent Class Cluster analysis, I believe two-step clustering provides Goodness-of-fit indices (AIC and BIC) for intercomparison between models. However, I would not recommend to rely on just a statistical criterion. (How) can you describe your clusters? How many clusters do you propose and how large are they? Does the overall picture make any sense? Is the result actionable for you or your client? I find the answers to such questions more important than the AIC. K-means clustering is based upon the concept of 'distances' between respondents (usually squared Euclidean distances). Such 'distances' make sense only for metric variables. For ordered Likert scales, you must be willing to make the assumption of equal distances (between answer categories) if you want to use them for k-means clustering. In hierarchical clustering, either all variables must be binary, counts or metric but no mixture of these is allowed. It does, however, provide you with a 'dendrogram' which suggests a (statistical) optimum for the number of clusters. All three clustering methods in SPSS depend upon the (arbitrary) order of cases so it is often recommended you compare a number of solutions from different random orderings of cases. HTH, Ruben van den Berg Consultant Models & Methods TNS NIPO Email: [hidden email] Mobiel: +31 6 24641435 Telefoon: +31 20 522 5738 Internet: www.tns-nipo.com Date: Wed, 2 Jun 2010 15:06:39 +0930 From: [hidden email] Subject: Cluster analysis To: [hidden email] Dear all, Does anyone know about how to determine optimal number of cluster in Statistics 17.0? I have been utilising utilising k-means and hierarchical methods for cluster analysis by using Statistics 17.0. As you may be know, that in k-means method we can use both continuous and categorical variables, and we can only use continuous variables in hierarchical method. In addition, in utilising k-means method, we have to input predetermine values of k prior the clustering process. I got results from some scenarios run in SPSS, however, I have no idea about a basis to determine the optimal number of clusters. Thank you for your attention, and I highly appreciate your helps. Regards, Bernardus Wahyuputro Express yourself instantly with MSN Messenger! MSN Messenger |
|
Hi Martin.
Thanks. I had already read the Ruben's post. But I was looking for references (theoretical papers) in order to have a theoretical support for my analysis. I performed hierarchical cluster analysis and have found four clusters in my data. The four clusters make sense for the purpose of my research. I also made Disciminant analysis to validate the clusters found in hierarchical cluster analysis. The four clusters hold well. Mbaye Date: Mon, 7 Jun 2010 14:16:48 +0000 From: [hidden email] Subject: Fw: Cluster analysis To: [hidden email] Mbaye,
Maybe this, from Ruben van den Berg, will help ? (I've forwarded his posting).
It's a good idea to scan the archives.....and with enough search words, Google.
bw,
Martin
PS Analysing clusters can be done using Discriminant Analysis. When you're looking at only two clusters, I believe the results of Discriminant Analysis are mirrored by the results of a logistic regression.
----- Forwarded Message ---- From: Ruben van den Berg <[hidden email]> To: [hidden email] Sent: Wednesday, 2 June, 2010 9:23:03 Subject: Re: Cluster analysis Dear Bernardus, The only clustering method in SPSS which can handle continuous and categorical variables simultaneously is two-step clustering. Although I personally prefer Latent Class Cluster analysis, I believe two-step clustering provides Goodness-of-fit indices (AIC and BIC) for intercomparison between models. However, I would not recommend to rely on just a statistical criterion. (How) can you describe your clusters? How many clusters do you propose and how large are they? Does the overall picture make any sense? Is the result actionable for you or your client? I find the answers to such questions more important than the AIC. K-means clustering is based upon the concept of 'distances' between respondents (usually squared Euclidean distances). Such 'distances' make sense only for metric variables. For ordered Likert scales, you must be willing to make the assumption of equal distances (between answer categories) if you want to use them for k-means clustering. In hierarchical clustering, either all variables must be binary, counts or metric but no mixture of these is allowed. It does, however, provide you with a 'dendrogram' which suggests a (statistical) optimum for the number of clusters. All three clustering methods in SPSS depend upon the (arbitrary) order of cases so it is often recommended you compare a number of solutions from different random orderings of cases. HTH, Ruben van den Berg Consultant Models & Methods TNS NIPO Email: [hidden email] Mobiel: +31 6 24641435 Telefoon: +31 20 522 5738 Internet: www.tns-nipo.com Date: Wed, 2 Jun 2010 15:06:39 +0930 From: [hidden email] Subject: Cluster analysis To: [hidden email] Dear all, Does anyone know about how to determine optimal number of cluster in Statistics 17.0? I have been utilising utilising k-means and hierarchical methods for cluster analysis by using Statistics 17.0. As you may be know, that in k-means method we can use both continuous and categorical variables, and we can only use continuous variables in hierarchical method. In addition, in utilising k-means method, we have to input predetermine values of k prior the clustering process. I got results from some scenarios run in SPSS, however, I have no idea about a basis to determine the optimal number of clusters. Thank you for your attention, and I highly appreciate your helps. Regards, Bernardus Wahyuputro Express yourself instantly with MSN Messenger! MSN Messenger Envie de plus d'originalité dans vos conversations ? Téléchargez gratuitement les Emoch'ticones ! |
|
In reply to this post by Bernardus Wahyuputro
Martin Bland to the rescue, again ! There are about 10 articles cited herein, most available by double-click.
http://www-users.york.ac.uk/~mb55/clust/clustud.htm more on the design side.
HTH,
Martin Holt
From: M HOLT <[hidden email]> To: [hidden email]; [hidden email] Sent: Monday, 7 June, 2010 15:16:48 Subject: Fw: Cluster analysis Mbaye,
Maybe this, from Ruben van den Berg, will help ? (I've forwarded his posting).
It's a good idea to scan the archives.....and with enough search words, Google.
bw,
Martin
PS Analysing clusters can be done using Discriminant Analysis. When you're looking at only two clusters, I believe the results of Discriminant Analysis are mirrored by the results of a logistic regression.
----- Forwarded Message ---- From: Ruben van den Berg <[hidden email]> To: [hidden email] Sent: Wednesday, 2 June, 2010 9:23:03 Subject: Re: Cluster analysis Dear Bernardus, The only clustering method in SPSS which can handle continuous and categorical variables simultaneously is two-step clustering. Although I personally prefer Latent Class Cluster analysis, I believe two-step clustering provides Goodness-of-fit indices (AIC and BIC) for intercomparison between models. However, I would not recommend to rely on just a statistical criterion. (How) can you describe your clusters? How many clusters do you propose and how large are they? Does the overall picture make any sense? Is the result actionable for you or your client? I find the answers to such questions more important than the AIC. K-means clustering is based upon the concept of 'distances' between respondents (usually squared Euclidean distances). Such 'distances' make sense only for metric variables. For ordered Likert scales, you must be willing to make the assumption of equal distances (between answer categories) if you want to use them for k-means clustering. In hierarchical clustering, either all variables must be binary, counts or metric but no mixture of these is allowed. It does, however, provide you with a 'dendrogram' which suggests a (statistical) optimum for the number of clusters. All three clustering methods in SPSS depend upon the (arbitrary) order of cases so it is often recommended you compare a number of solutions from different random orderings of cases. HTH, Ruben van den Berg Consultant Models & Methods TNS NIPO Email: [hidden email] Mobiel: +31 6 24641435 Telefoon: +31 20 522 5738 Internet: www.tns-nipo.com Date: Wed, 2 Jun 2010 15:06:39 +0930 From: [hidden email] Subject: Cluster analysis To: [hidden email] Dear all, Does anyone know about how to determine optimal number of cluster in Statistics 17.0? I have been utilising utilising k-means and hierarchical methods for cluster analysis by using Statistics 17.0. As you may be know, that in k-means method we can use both continuous and categorical variables, and we can only use continuous variables in hierarchical method. In addition, in utilising k-means method, we have to input predetermine values of k prior the clustering process. I got results from some scenarios run in SPSS, however, I have no idea about a basis to determine the optimal number of clusters. Thank you for your attention, and I highly appreciate your helps. Regards, Bernardus Wahyuputro Express yourself instantly with MSN Messenger! MSN Messenger |
|
In reply to this post by Mbaye Fall Diallo
Dear Mbaye,
I think the 'bible' on cluster analysis is 'cluster analysis' by Brian Everitt (at least, I was told in Uni). However, I think it won't be helpful since it basically just explains how the techniques work (in readable language). I think you may be better off with more managerial literature (e.g. Philip Kotler). Best regards, Ruben van den Berg Consultant Models & Methods TNS NIPO Email: [hidden email] Mobiel: +31 6 24641435 Telefoon: +31 20 522 5738 Internet: www.tns-nipo.com Date: Mon, 7 Jun 2010 14:36:58 +0000 From: [hidden email] Subject: Re: Cluster analysis To: [hidden email] Hi Martin. Thanks. I had already read the Ruben's post. But I was looking for references (theoretical papers) in order to have a theoretical support for my analysis. I performed hierarchical cluster analysis and have found four clusters in my data. The four clusters make sense for the purpose of my research. I also made Disciminant analysis to validate the clusters found in hierarchical cluster analysis. The four clusters hold well. Mbaye Date: Mon, 7 Jun 2010 14:16:48 +0000 From: [hidden email] Subject: Fw: Cluster analysis To: [hidden email] Mbaye,
Maybe this, from Ruben van den Berg, will help ? (I've forwarded his posting).
It's a good idea to scan the archives.....and with enough search words, Google.
bw,
Martin
PS Analysing clusters can be done using Discriminant Analysis. When you're looking at only two clusters, I believe the results of Discriminant Analysis are mirrored by the results of a logistic regression.
----- Forwarded Message ---- From: Ruben van den Berg <[hidden email]> To: [hidden email] Sent: Wednesday, 2 June, 2010 9:23:03 Subject: Re: Cluster analysis Dear Bernardus, The only clustering method in SPSS which can handle continuous and categorical variables simultaneously is two-step clustering. Although I personally prefer Latent Class Cluster analysis, I believe two-step clustering provides Goodness-of-fit indices (AIC and BIC) for intercomparison between models. However, I would not recommend to rely on just a statistical criterion. (How) can you describe your clusters? How many clusters do you propose and how large are they? Does the overall picture make any sense? Is the result actionable for you or your client? I find the answers to such questions more important than the AIC. K-means clustering is based upon the concept of 'distances' between respondents (usually squared Euclidean distances). Such 'distances' make sense only for metric variables. For ordered Likert scales, you must be willing to make the assumption of equal distances (between answer categories) if you want to use them for k-means clustering. In hierarchical clustering, either all variables must be binary, counts or metric but no mixture of these is allowed. It does, however, provide you with a 'dendrogram' which suggests a (statistical) optimum for the number of clusters. All three clustering methods in SPSS depend upon the (arbitrary) order of cases so it is often recommended you compare a number of solutions from different random orderings of cases. HTH, Ruben van den Berg Consultant Models & Methods TNS NIPO Email: [hidden email] Mobiel: +31 6 24641435 Telefoon: +31 20 522 5738 Internet: www.tns-nipo.com Date: Wed, 2 Jun 2010 15:06:39 +0930 From: [hidden email] Subject: Cluster analysis To: [hidden email] Dear all, Does anyone know about how to determine optimal number of cluster in Statistics 17.0? I have been utilising utilising k-means and hierarchical methods for cluster analysis by using Statistics 17.0. As you may be know, that in k-means method we can use both continuous and categorical variables, and we can only use continuous variables in hierarchical method. In addition, in utilising k-means method, we have to input predetermine values of k prior the clustering process. I got results from some scenarios run in SPSS, however, I have no idea about a basis to determine the optimal number of clusters. Thank you for your attention, and I highly appreciate your helps. Regards, Bernardus Wahyuputro Express yourself instantly with MSN Messenger! MSN Messenger Envie de plus d'originalité dans vos conversations ? Téléchargez gratuitement les Emoch'ticones ! Express yourself instantly with MSN Messenger! MSN Messenger |
|
In reply to this post by Martin Holt
Just curious, does anyone know if SPSS has plans to develop capacity for correction to fit and test statistics in order to accomodatte multilevel cox regression models?
Thanks, John |
| Free forum by Nabble | Edit this page |
