SPSSX Discussion - Re: Weighted Cluster Analysis in SPSS

Re: Weighted Cluster Analysis in SPSS

Posted by Hector Maletta on
URL: http://spssx-discussion.165.s1.nabble.com/Weighted-Cluster-Analysis-in-SPSS-tp1073624p1073626.html

Two comments to Paul's:
1. Indeed, latent class analysis do group cases in clusters by taking correlation of variables into account, all in one pass. But Catharine is asking how to do it in two steps: giving some differential weight to variables, then apply ordinary cluster analysis.

2. Using regression coefficients as weights makes sense in a way. In fact, not using any weights is equivalent to using unit weights, which is a form of weighting after all. So one cannot avoid weighting the clustering variables anyway. The idea of using a criterion variable to give more weight to variables more closely related to it (or having more weight in a predictive equation) makes sense to me at first glance. The finer points, in which I have yet not thought through, is whether regression assumptions (such as homogeneity of variance etc) have any influence on the results, especially for cases well away from the mean of predictor variables. My intuitive fear is that coefficients for predictors with lower significance will have wider confidence intervals, and small errors in independent variables may entail large errors in the allocation of cases to clusters, which would be magnified if the cases are not close to the mean.

3. I should refer listers to my previous message to this thread.
Hector

----- Mensaje original -----
De: Paul Dickson <[hidden email]>
Fecha: Martes, Febrero 6, 2007 6:06 am
Asunto: Re: Weighted Cluster Analysis in SPSS

> Hi Catharine,
>
> My first concern with weighting in the way you propose would be
> that importance weights from regression have little or nothing to
> do with variables and their importance that 'drive' a cluster
> analysis. If you use a regression model (OLS, the standard
> regression in spss), one of your assumptions that the group is
> homogeneous (similar), and that the importance of drivers is
> uniform for the entire group. To weight your cluster analysis
> (and drive your clusters) using variables weighted by their
> regression weights on this basis is therefore totally counter-
> intuitive to me.
>
> The second issue is that the algorithm for cluster analysis (quick
> cluster) is distance based, and is based on euclidean distance. I
> think (I use a different package) an important variable in spss is
> one that either minimises the euclidean distance within groups
> (generates groups that are similar) or maximises the distance
> between groups (makes the clusters different) and you can get a
> proxy (not a great one) using anova for the relative importance of
> the variables that drive the segments.
>
> I think that what you might need is latent class regression from a
> package called 'latent gold'. This program segments groups
> (recovers heterogeneity) at the same time as computing importance
> weights for each of the different segments.
>
> HTH Paul
>
>
>
> > Catharine Liddicoat <[hidden email]> wrote:
> >
> > We are performing a cluster analysis where we want to weight the
> > clustering
> > variables.
> >
> > 1. Can this be done directly in SPSS? If so, how?
> >
> > 2. Has anyone had experience weighting a cluster analysis by the
> > standarized regression coefficients from a multiple regression model
> > used
> > to identify the clustering variables? What are the advantages or
> > disadvantages of weighting the clustering variables this way?
> >
> > Thank you for any information provided.
> >
> > C. Liddicoat
> > California Community Colleges
> > [hidden email]
>