Hi Aaron,
In answer to your first question, Yes the matrix does represent the mean within cluster values of each cluster variates. The idea of running on a small sample is because of time and memory constraints. Anything over 1000 cases will take a long time to run using Hierarchical CA, before the two-step cluster analysis the techniques that your trying to use was to allow 'potential' better solutions to the clustering problems. Remember as Hector said yesterday "K-means uses only Euclidean distances, whereas Hierarchical Clustering uses a full array of distance measures. A solution that seems adequate with some fancy distance function may lead to nonsense, or at least to some surprising results, when applied to K-means with Euclidean distances." So if your dataset has say 100,000 cases run the analysis on 1000 to form the cluster centres and then use these as the starting point for the quicker method of k-means. For your second point , there is no need to run the analysis twice. Remember "These steps allow you to do your clustering analysis, although any results that you generate are for your interpretation, use them at your own risk!" HTH Mike -----Original Message----- From: Aaron Eakman [mailto:[hidden email]] Sent: 09 August 2006 08:37 To: [hidden email]; Michael Pearmain Cc: Aaron Eakman Subject: Re: Cluster Analysis - Seeds needed for K-Means Mike, Would my cluster .sav file have as row labels: 1, 2, 3, given that I had identified a three cluster solution in the hierarchical approach; and would the column labels be: cluster_, var1, var2, ... varX. (with "varX" representing my cluster variates)? If so, might the values in this matrix that I would submit to a K-means approach be the mean (average) within cluster values of the the varX cluster variates derived from my hierchical approach? As an FYI, my cluster variates are all of the same ratio scale. Finally, (1) why would I run the hierarchical approach on a small sample of my total sample rather than on the total sample?; and (2) why would I need to run the K-means twice rather than just once? Thanks much for you reply, Aaron On Tue, 8 Aug 2006 09:21:17 +0100, Michael Pearmain <[hidden email]> wrote: >Morning Aaron, > >Try the following steps > >* Steps: >1. Run a Hierarchical Cluster analysis on a small sample >2. Choose a solution >3. Aggregate the variables used in the Cluster Analysis according >to the cluster variable > >**Change the name of variables in the aggregate file to be the same as >originally > >4. Name the first variable 'cluster_' in the aggregated file >5. The aggregated file will be used as centre in the K-Means >procedure >6. Use the aggregated file as centres when running a K-means on >whole data set >* Clustering new cases using a previous cluster analysis >o Save the final centre points. >o Use them a centres for the new file >o Choose as method: classify only >HTH > >Mike > > >-----Original Message----- >From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf >Of Aaron Eakman >Sent: 07 August 2006 18:50 >To: [hidden email] >Subject: Cluster Analysis - Seeds needed for K-Means > >I am using SPSS 12 for my clustering procedures. I started with >heirarchical clustering using Wards method with squared euclidean >distance. I have identified a three cluster solution as the best >option from a possible range of 2-4 that I established a priori. > >Here is my problem, I want to next run a K-means clustering procedure. >More specifically, I want to use the centroids of the three clusters >from my heirarchical procedure as "seed" or starting values for the >K-means clustering procedure. Unfortunately, SPSS does not generate >this output from the heirarchical procedure. And I do not know 1) how >to generate cluster centroids from the cluster assignment information >provided by SPSS heirarchical procedure, and 2) even if I did, I do not >know how to generate an SPSS.sav file with that information for use by >the K-means approach. A further problem, I am a point and clicker and >not savvy with command syntax; I AM WILLING TO LEARN IF IT CAN GET ME >OUT OF MY MESS!! > >Any persons that are SPSS - Cluster Analysis savvy, or know others >that might lend a hand would be met with gratitude for any assistance. > >Take care, > >Aaron Eakman > >_______________________________________________________________________ >_ This e-mail has been scanned for all viruses by Star. The service is >powered by MessageLabs. For more information on a proactive anti-virus >service working around the clock, around the globe, visit: >http://www.star.net.uk >_______________________________________________________________________ >_ > >______________________________________________________________________ >This email has been scanned by the MessageLabs Email Security System. >For more information please visit http://www.messagelabs.com/email >______________________________________________________________________ ________________________________________________________________________ This e-mail has been scanned for all viruses by Star. The service is powered by MessageLabs. For more information on a proactive anti-virus service working around the clock, around the globe, visit: http://www.star.net.uk ________________________________________________________________________ ______________________________________________________________________ This email has been scanned by the MessageLabs Email Security System. For more information please visit http://www.messagelabs.com/email ______________________________________________________________________ |
Free forum by Nabble | Edit this page |