Two-step cluster

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Two-step cluster

Allan Reese (Cefas)
I'm trying twostep cluster in version 13.  The documentation states that it will select "between 1 and the maximum" number of clusters using the Information Criterion.  I assume it is looking for a maximum AIC/BIC so find this output (apologies that column headings don't line up with values) inconsistent with it reporting two clusters for this dataset.  This seems so obvious that I'm probably overlooking the obvious.

Auto-Clustering
Number of Clusters      Akaike's Information Criterion (AIC)    AIC Change(a)   Ratio of AIC Changes(b) Ratio of Distance Measures(c)
1       42679.177                              
2       29736.437       -12942.740      1.000   3.076
3       25584.939       -4151.498       .321    1.444
4       22736.637       -2848.302       .220    1.751
5       21145.674       -1590.963       .123    1.008
6       19567.475       -1578.199       .122    1.515
7       18554.435       -1013.040       .078    1.257
8       17765.615       -788.819        .061    1.034
9       17005.491       -760.125        .059    1.151
10      16356.395       -649.095        .050    1.150
11      15802.853       -553.542        .043    1.305
12      15398.294       -404.559        .031    1.167
13      15063.487       -334.807        .026    1.095
14      14765.135       -298.352        .023    1.084
15      14496.451       -268.684        .021    1.225
a       The changes are from the previous number of clusters in the table.
b       The ratios of changes are relative to the change for the two cluster solution.
c       The ratios of distance measures are based on the current number of clusters against the previous number of clusters.

If there is in fact only one cluster, this may explain why the next output lists all the variables in their name order, though the command description states they "will be sorted by the importance rank of each variable."

The documentation for this command is very sparse, so I'd appreciate feedback from other users.  The menu generates syntax for "twostep cluster" followed by "aim" but I am unable to trace any documentation for AIM.

Allan



***********************************************************************************
This email and any attachments are intended for the named recipient only.  Its unauthorised use, distribution, disclosure, storage or copying is not permitted.  If you have received it in error, please destroy all copies and notify the sender.  In messages of a non-business nature, the views and opinions expressed are the author's own and do not necessarily reflect those of the organisation from which it is sent.  All emails may be subject to monitoring.
***********************************************************************************
Reply | Threaded
Open this post in threaded view
|

Re: Two-step cluster

Anthony Babinec
In general, when you use a criterion such as AIC or BIC, you select the
model that MINIMIZES the criterion. In your reported data, AIC gets smaller
and smaller as the number of clusters increases, up to the maximum of 15
clusters.

TwoStep Cluster also calculates 2 other column that you report. These are
the "ratio of AIC changes" and the "ratio of distance measures." The ratio
of AIC changes sets the AIC change from K=1 to K=2 as 1, and then scales the
other AIC changes relative to this one. The ratio of distance measures is
the ratio of the distance measure in a given step to the distance measure in
the previous step. TwoStep looks at these measures to choose K. It is argued
that a jump in either measure between two consecutive Ks suggests a
tentative number of clusters. While this is given as a rationale for the
choice of K, some research shows that this model selection approach doesn't
always work - in particular, when there are a mix of continuous and
categorical basis variables, and when there is no underlying cluster
structure.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Allan Reese (Cefas)
Sent: Thursday, July 27, 2006 10:14 AM
To: [hidden email]
Subject: Two-step cluster

I'm trying twostep cluster in version 13.  The documentation states that it
will select "between 1 and the maximum" number of clusters using the
Information Criterion.  I assume it is looking for a maximum AIC/BIC so find
this output (apologies that column headings don't line up with values)
inconsistent with it reporting two clusters for this dataset.  This seems so
obvious that I'm probably overlooking the obvious.

Auto-Clustering
Number of Clusters      Akaike's Information Criterion (AIC)    AIC
Change(a)   Ratio of AIC Changes(b) Ratio of Distance Measures(c)
1       42679.177
2       29736.437       -12942.740      1.000   3.076
3       25584.939       -4151.498       .321    1.444
4       22736.637       -2848.302       .220    1.751
5       21145.674       -1590.963       .123    1.008
6       19567.475       -1578.199       .122    1.515
7       18554.435       -1013.040       .078    1.257
8       17765.615       -788.819        .061    1.034
9       17005.491       -760.125        .059    1.151
10      16356.395       -649.095        .050    1.150
11      15802.853       -553.542        .043    1.305
12      15398.294       -404.559        .031    1.167
13      15063.487       -334.807        .026    1.095
14      14765.135       -298.352        .023    1.084
15      14496.451       -268.684        .021    1.225
a       The changes are from the previous number of clusters in the table.
b       The ratios of changes are relative to the change for the two cluster
solution.
c       The ratios of distance measures are based on the current number of
clusters against the previous number of clusters.

If there is in fact only one cluster, this may explain why the next output
lists all the variables in their name order, though the command description
states they "will be sorted by the importance rank of each variable."

The documentation for this command is very sparse, so I'd appreciate
feedback from other users.  The menu generates syntax for "twostep cluster"
followed by "aim" but I am unable to trace any documentation for AIM.

Allan



****************************************************************************
*******
This email and any attachments are intended for the named recipient only.
Its unauthorised use, distribution, disclosure, storage or copying is not
permitted.  If you have received it in error, please destroy all copies and
notify the sender.  In messages of a non-business nature, the views and
opinions expressed are the author's own and do not necessarily reflect those
of the organisation from which it is sent.  All emails may be subject to
monitoring.
****************************************************************************
*******