selecting the best cluster solution

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

selecting the best cluster solution

Juanito Talili
Hi everyone,
 
The data with 400 cases and 9 yes/no variables were subjected to cluster analysis.  Using hierarchical cluster analysis in spss, a range of solutions (2 to 6) were save.  What to look at to select the better solution.
 
Thank you. 


      Yahoo! Toolbar is now powered with Search Assist.Download it now!
http://ph.toolbar.yahoo.com/

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: selecting the best cluster solution

Keith McCormick
Hello Juanito,

It is difficult to answer this question out of context, but there are some
things to think about:

1) There are three analyses that you need to do after every tentative
solution. a) look at the size of each cluster; b) look at how the 9 yes/no
variables relate to the solution c) look at how other variables relate to
the clusters (usually done with crosstabs).

2) If any of the solutions have clusters with tiny sample sizes (maybe even
N=1) then you probably have either too many clusters or you should consider
another distance method.

3) If you can come up with descriptive "nick names" for each of the clusters
you probably have the right number. Whereas if two look very much the same
you might have too many, and if any are both large and seem to look too much
like the general population, you might have too few clusters. Each cluster
should have its own "personality".

4) If other variables (other than the 9) relate strongly to the clusters you
probably have a solution that has merit.

I hope that helps.

Keith
www.keithmccormick.com
On Wed, Oct 8, 2008 at 4:24 AM, Juanito Talili <[hidden email]> wrote:

> Hi everyone,
>
> The data with 400 cases and 9 yes/no variables were subjected to cluster
> analysis.  Using hierarchical cluster analysis in spss, a range of solutions
> (2 to 6) were save.  What to look at to select the better solution.
>
> Thank you.
>
>
>      Yahoo! Toolbar is now powered with Search Assist.Download it now!
> http://ph.toolbar.yahoo.com/
>
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: selecting the best cluster solution

Keith McCormick
Hi Juanito,

Regarding the context, I simply meant I didn't know what the 9 questions
were about, nor did I know for sure if you had any variables beyond the 9.

Solutions found when using exploratory techniques like cluster analysis are
best judged in the light of helping you achieve a purpose. For instance, the
purpose of the clusters could be to create separate marketing campaigns for
each cluster. One might conclude that 6 was too high (too expensive), even
if it seemed the fit the data a little better than other solutions. That
conclusion, while reasonable, might draw as much on costs as the cluster
analysis itself.

Although hypothesis testing involves somewhat subjective decisions (like
setting alpha), it doesn't compare to cluster analysis in this regard.
Cluster analysis involves a great deal more subjectivity. There is no
optimal cluster analysis, per se. Experts could likely agree that a
particular solution was poor, but it would be much more difficult to get a
room full of analysts/data miners to agree that a particular clustering
solution was optimal - even when the specific purpose of the segmentation
were known.

Nonetheless, since you know the purpose of the analysis, your knowledge of
the data should allow you to pick a good solution that will accomplish your
purpose. General principles, like the ones I posted, really only make the
search more efficient.

Keith
www.keithmccormick.com

In any case, I hope my suggestions, while general, were helpful.

On Fri, Oct 17, 2008 at 10:32 PM, Juanito Talili <[hidden email]> wrote:

> Juanito wrote:
> > The data with 400 cases and 9 yes/no variables were subjected to cluster
> > analysis.  Using hierarchical cluster analysis in spss, a range of
> solutions
> > (2 to 6) were save.  What to look at to select the better solution.
>
> Keith McCormick wrote:
> >It is difficult to answer this question out of context, but there are some
> >things to think about:
> Why out of context?
>
>
>
>
> --- On *Tue, 10/14/08, Keith McCormick <[hidden email]>* wrote:
>
> From: Keith McCormick <[hidden email]>
> Subject: Re: selecting the best cluster solution
> To: [hidden email]
> Date: Tuesday, 14 October, 2008, 12:59 PM
>
> Hello Juanito,
>
> It is difficult to answer this question out of context, but there are some
> things to think about:
>
> 1) There are three analyses that you need to do after every tentative
> solution. a) look at the size of each cluster; b) look at how the 9 yes/no
> variables relate to the solution c) look at how other variables relate to
> the clusters (usually done with crosstabs).
>
> 2) If any of the solutions have clusters with tiny sample sizes (maybe even
> N=1) then you probably have either too many clusters or you should consider
> another distance method.
>
> 3) If you can come up with descriptive "nick names" for each of the
> clusters
> you probably have the right number. Whereas if two look very much the same
> you might have too many, and if any are both large and seem to look too much
> like the general population, you might have too few clusters. Each cluster
> should have its own "personality".
>
> 4) If other variables (other than the 9) relate strongly to the clusters you
> probably have a solution that has merit.
>
> I hope that helps.
>
> Keithwww.keithmccormick.com
> On Wed, Oct 8, 2008 at 4:24 AM, Juanito Talili <[hidden email]> wrote:
>
> > Hi everyone,
> >
> > The data with 400 cases and 9 yes/no variables were subjected to cluster
> > analysis.  Using hierarchical cluster analysis in spss, a range of
> solutions
> > (2 to 6) were save.  What to look at to select the better solution.
> >
> > Thank you.
> >
> >
> >      Yahoo! Toolbar is now powered with Search Assist.Download it now!
> > http://ph.toolbar.yahoo.com/
> >
> > To manage your subscription to SPSSX-L, send a message to
> > [hidden email]
>  (not to SPSSX-L), with no body text except the
> > command. To leave the list, send the command
> > SIGNOFF SPSSX-L
> > For a list of commands to manage subscriptions, send the command
> > INFO REFCARD
> >
>
> =====================
> To manage your subscription to SPSSX-L, send a message [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam? Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD