I'm trying twostep cluster in version 13. The documentation states that it will select "between 1 and the maximum" number of clusters using the Information Criterion. I assume it is looking for a maximum AIC/BIC so find this output (apologies that column headings don't line up with values) inconsistent with it reporting two clusters for this dataset. This seems so obvious that I'm probably overlooking the obvious.
Auto-Clustering Number of Clusters Akaike's Information Criterion (AIC) AIC Change(a) Ratio of AIC Changes(b) Ratio of Distance Measures(c) 1 42679.177 2 29736.437 -12942.740 1.000 3.076 3 25584.939 -4151.498 .321 1.444 4 22736.637 -2848.302 .220 1.751 5 21145.674 -1590.963 .123 1.008 6 19567.475 -1578.199 .122 1.515 7 18554.435 -1013.040 .078 1.257 8 17765.615 -788.819 .061 1.034 9 17005.491 -760.125 .059 1.151 10 16356.395 -649.095 .050 1.150 11 15802.853 -553.542 .043 1.305 12 15398.294 -404.559 .031 1.167 13 15063.487 -334.807 .026 1.095 14 14765.135 -298.352 .023 1.084 15 14496.451 -268.684 .021 1.225 a The changes are from the previous number of clusters in the table. b The ratios of changes are relative to the change for the two cluster solution. c The ratios of distance measures are based on the current number of clusters against the previous number of clusters. If there is in fact only one cluster, this may explain why the next output lists all the variables in their name order, though the command description states they "will be sorted by the importance rank of each variable." The documentation for this command is very sparse, so I'd appreciate feedback from other users. The menu generates syntax for "twostep cluster" followed by "aim" but I am unable to trace any documentation for AIM. Allan *********************************************************************************** This email and any attachments are intended for the named recipient only. Its unauthorised use, distribution, disclosure, storage or copying is not permitted. If you have received it in error, please destroy all copies and notify the sender. In messages of a non-business nature, the views and opinions expressed are the author's own and do not necessarily reflect those of the organisation from which it is sent. All emails may be subject to monitoring. *********************************************************************************** |
In general, when you use a criterion such as AIC or BIC, you select the
model that MINIMIZES the criterion. In your reported data, AIC gets smaller and smaller as the number of clusters increases, up to the maximum of 15 clusters. TwoStep Cluster also calculates 2 other column that you report. These are the "ratio of AIC changes" and the "ratio of distance measures." The ratio of AIC changes sets the AIC change from K=1 to K=2 as 1, and then scales the other AIC changes relative to this one. The ratio of distance measures is the ratio of the distance measure in a given step to the distance measure in the previous step. TwoStep looks at these measures to choose K. It is argued that a jump in either measure between two consecutive Ks suggests a tentative number of clusters. While this is given as a rationale for the choice of K, some research shows that this model selection approach doesn't always work - in particular, when there are a mix of continuous and categorical basis variables, and when there is no underlying cluster structure. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Allan Reese (Cefas) Sent: Thursday, July 27, 2006 10:14 AM To: [hidden email] Subject: Two-step cluster I'm trying twostep cluster in version 13. The documentation states that it will select "between 1 and the maximum" number of clusters using the Information Criterion. I assume it is looking for a maximum AIC/BIC so find this output (apologies that column headings don't line up with values) inconsistent with it reporting two clusters for this dataset. This seems so obvious that I'm probably overlooking the obvious. Auto-Clustering Number of Clusters Akaike's Information Criterion (AIC) AIC Change(a) Ratio of AIC Changes(b) Ratio of Distance Measures(c) 1 42679.177 2 29736.437 -12942.740 1.000 3.076 3 25584.939 -4151.498 .321 1.444 4 22736.637 -2848.302 .220 1.751 5 21145.674 -1590.963 .123 1.008 6 19567.475 -1578.199 .122 1.515 7 18554.435 -1013.040 .078 1.257 8 17765.615 -788.819 .061 1.034 9 17005.491 -760.125 .059 1.151 10 16356.395 -649.095 .050 1.150 11 15802.853 -553.542 .043 1.305 12 15398.294 -404.559 .031 1.167 13 15063.487 -334.807 .026 1.095 14 14765.135 -298.352 .023 1.084 15 14496.451 -268.684 .021 1.225 a The changes are from the previous number of clusters in the table. b The ratios of changes are relative to the change for the two cluster solution. c The ratios of distance measures are based on the current number of clusters against the previous number of clusters. If there is in fact only one cluster, this may explain why the next output lists all the variables in their name order, though the command description states they "will be sorted by the importance rank of each variable." The documentation for this command is very sparse, so I'd appreciate feedback from other users. The menu generates syntax for "twostep cluster" followed by "aim" but I am unable to trace any documentation for AIM. Allan **************************************************************************** ******* This email and any attachments are intended for the named recipient only. Its unauthorised use, distribution, disclosure, storage or copying is not permitted. If you have received it in error, please destroy all copies and notify the sender. In messages of a non-business nature, the views and opinions expressed are the author's own and do not necessarily reflect those of the organisation from which it is sent. All emails may be subject to monitoring. **************************************************************************** ******* |
Free forum by Nabble | Edit this page |