SPSSX Discussion

Multinomial Regression Summary Parameter Estimates

Classic

List

Threaded

25 messages Options

SoS Statistical Services

Re: cluster analysis methods

Thank you Paul, Arthur, Brian for your replies. Any other advice
from someone who has used cluster analysis would be appreciated.

Evie

--- On Fri, 6/5/11, SoS Statistical Services <[hidden email]> wrote:

From: SoS Statistical Services <[hidden email]>
Subject: cluster analysis methods
To: [hidden email]
Date: Friday, 6 May, 2011, 13:45

I'm doing a cluster analysis for sole parents on variables such as
employment, tenure, socioeconomic status, education, age. The
variables are a mixture of nominal and ordinal although age is
continuous. I've been using hierarchical clustering (is this
correct?) but I'm not sure which method to use - SPSS has several e.g.
between-groups, nearest neighbour , Ward's etc. I would appreciate any
advice, thanks.

Evie

Art Kendall

Re: cluster analysis methods

Cluster analysis is an exploratory method. It's goal is to create a new nominal level variable that will be useful.

Solutions are not unique. There are many different distance/similarity coefficients, and many agglomeration (putting together) methods. Assignments of cases to clusters will be different according to the coefficient and method chosen. K-Means (Quick Cluster) clearly can show different clusters with different orders of the cases. It is usually advisable to use several random orderings of the cases. TWOSTEP can also give different results depending on case order in the file.

In the mid-70s I started using a process that I called "core clustering". I used a few similarity measures and several agglomeration methods. Cases that
fell into interpretable clusters together were treated as core clusters. Other cases were coded as ungrouped. I then iteratively used the classification phase of DFA to assign the ungrouped cases to the cores. After each iteration, cases that had a low value for the groups that had the highest probability of membership, or that had a small probability of belonging to that cluster based on its distance from the centroid were treated as ungrouped in the next round. When most of the cases remained in the assigned core cluster and the profiles were meaningful, that cluster assignment was used in further analysis.
These days, I first get cores from a few runs of K-Means, and then cores from a few runs of TWOSTEP. I then find cores that agree across other methods and coefficients.

Clusters are not necessarily "pure". In some circumstances a case can 'belong" to more than 1 cluster. For example, in the late 60s or early 70s Lorr found a group of cases that were sort of like paranoids and sort of like schizophrenics. This led to the DSM differentiating "paranoid schizophrenics" from other kinds of schizophrenics.

Sometimes some cases just won't fit in. Also, sometimes cases just stay as singletons. For example, in work I did at Census on clusters of counties in western states, Los Angeles County stayed a singleton, and Yellowstone National Park County stayed a singleton.

Validation of the final clustering solution is done on the basis of interpretability and on the basis on relation of the new nominal level variable (assignments) to variables that were not included in the clustering. For example, cluster representing different kinds of classroom environment clusters showed different outcome profiles. As another example, mapping of counties that grouped together on poverty and housing variables showed definite patterns when placed on a choropleth map.

Art Kendall
Social Research Consultants

On 5/12/2011 6:56 AM, SoS Statistical Services wrote:

Thank you Paul, Arthur, Brian for your replies. Any other advice
from someone who has used cluster analysis would be appreciated.

Evie

--- On Fri, 6/5/11, SoS Statistical Services [hidden email] wrote:

From: SoS Statistical Services [hidden email]
Subject: cluster analysis methods
To: [hidden email]
Date: Friday, 6 May, 2011, 13:45

I'm doing a cluster analysis for sole parents on variables such as
employment, tenure, socioeconomic status, education, age. The
variables are a mixture of nominal and ordinal although age is
continuous. I've been using hierarchical clustering (is this
correct?) but I'm not sure which method to use - SPSS has several e.g.
between-groups, nearest neighbour , Ward's etc. I would appreciate any
advice, thanks.

Evie

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Art Kendall
Social Research Consultants

Jarrod Teo-2

Re: cluster analysis methods

In reply to this post by SoS Statistical Services

Hi Evie,

I notice that you seems to have included some demographic variables in your cluster analysis. The danger of including a demographic variable in the clustering analysis is that the cluster might not be unique. An extreme example is to have gender place in a cluster analysis which turns out that 2 clusters have male and female respondents in them which might not be useful for analysis.

In an example that I had carried out recently for my client, a 2-step cluster was done on a set of choices the respondents chose in the survey.

Dummy Example: What do you do on Sunday? (Multiple Answers Question)
Swimming (1-Yes, 0-No)
Play Tennis (1-Yes, 0-No)
Jogging (1-Yes, 0-No)
Listen to Music at Home (1-Yes, 0-No)
Read Books at Home (1-Yes, 0-No)

So end up I have clusters like respondents doing a cocktail of activites based on 2 step example:

1st Cluster: swimming, play tennis, jogging.
2nd Cluster: Listen to music at home, Read Books at Home.

To give a meaningful title to the cluster, I will rename it say
cluster 1: Sporty respondent
cluster 2: Respondent who like to stay at home

Of course after which to know more about these clusters, what I did was to perform a decison tree to profile these clusters based on the demographics.

I do understand that having a lack of knowledge on your project objectives, it will be very hasty of me to advice you this way but I hope that my advice could help you.

Warm Regards
Dorraj Oet

Date: Thu, 12 May 2011 11:56:52 +0100
From: [hidden email]
Subject: Re: cluster analysis methods
To: [hidden email]

Thank you Paul, Arthur, Brian for your replies. Any other advice
from someone who has used cluster analysis would be appreciated.

Evie

--- On Fri, 6/5/11, SoS Statistical Services <[hidden email]> wrote:

From: SoS Statistical Services <[hidden email]>
Subject: cluster analysis methods
To: [hidden email]
Date: Friday, 6 May, 2011, 13:45

I'm doing a cluster analysis for sole parents on variables such as
employment, tenure, socioeconomic status, education, age. The
variables are a mixture of nominal and ordinal although age is
continuous. I've been using hierarchical clustering (is this
correct?) but I'm not sure which method to use - SPSS has several e.g.
between-groups, nearest neighbour , Ward's etc. I would appreciate any
advice, thanks.

Evie

SoS Statistical Services

Re: cluster analysis methods

Dorraj

Thank you very much for your reply. Your example was useful. If you did a crosstabulation of the 2 clusters by the 5 activity variables are there some people in e.g. cluster 1 that listen to music and read books i.e. are the clusters 'pure'? Or are some folk left without being in a cluster?

Which method did you use? I have used k-means but also dichotomised my data and used Wards. I am finding that my clusters are not 'pure' as described above. My aim is to compare lowbirth weight between clusters so I was hoping to have a cluster with e.g. mothers of <19 years, unemployed, GSCE level, no father figure, living with grandparents etc who may be at risk of low birth weight.

I can understand that you would then use demographics to make comparisons between your 2 groups but the demographics are the variables that I was hoping to include in the cluster analysis. Does this make sense?

I appreciate your advice.

Evie

--- On Fri, 13/5/11, DorraJ Oet <[hidden email]> wrote:

From: DorraJ Oet <[hidden email]>
Subject: RE: cluster analysis methods
To: [hidden email], "SPSS Syntax help" <[hidden email]>
Date: Friday, 13 May, 2011, 9:15

Hi Evie,

I notice that you seems to have included some demographic variables in your cluster analysis. The danger of including a demographic variable in the clustering analysis is that the cluster might not be unique. An extreme example is to have gender place in a cluster analysis which turns out that 2 clusters have male and female respondents in them which might not be useful for analysis.

In an example that I had carried out recently for my client, a 2-step cluster was done on a set of choices the respondents chose in the survey.

Dummy Example: What do you do on Sunday? (Multiple Answers Question)
Swimming (1-Yes, 0-No)
Play Tennis (1-Yes, 0-No)
Jogging (1-Yes, 0-No)
Listen to Music at Home (1-Yes, 0-No)
Read Books at Home (1-Yes, 0-No)

So end up I have clusters like respondents doing a cocktail of activites based on 2 step example:

1st Cluster: swimming, play tennis, jogging.
2nd Cluster: Listen to music at home, Read Books at Home.

To give a meaningful title to the cluster, I will rename it say
cluster 1: Sporty respondent
cluster 2: Respondent who like to stay at home

Of course after which to know more about these clusters, what I did was to perform a decison tree to profile these clusters based on the demographics.

I do understand that having a lack of knowledge on your project objectives, it will be very hasty of me to advice you this way but I hope that my advice could help you.

Warm Regards
Dorraj Oet

Date: Thu, 12 May 2011 11:56:52 +0100
From: [hidden email]
Subject: Re: cluster analysis methods
To: [hidden email]

Thank you Paul, Arthur, Brian for your replies. Any other advice
from someone who has used cluster analysis would be appreciated.

Evie

--- On Fri, 6/5/11, SoS Statistical Services <[hidden email]> wrote:

From: SoS Statistical Services <[hidden email]>
Subject: cluster analysis methods
To: [hidden email]
Date: Friday, 6 May, 2011, 13:45

I'm doing a cluster analysis for sole parents on variables such as
employment, tenure, socioeconomic status, education, age. The
variables are a mixture of nominal and ordinal although age is
continuous. I've been using hierarchical clustering (is this
correct?) but I'm not sure which method to use - SPSS has several e.g.
between-groups, nearest neighbour , Ward's etc. I would appreciate any
advice, thanks.

Evie

mpirritano

Re: cluster analysis methods

If you are going to use categorical demographic variables in your cluster anlaysis I believe you have to use the 2-step option, not hierarchical. And you have to use the log-likelihood distance measure within 2-step. This is my understanding.

Thanks

matt

Matthew Pirritano, Ph.D.

Research Analyst IV

Medical Services Initiative (MSI)

Orange County Health Care Agency

(714) 568-5648

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of SoS Statistical Services
Sent: Monday, June 06, 2011 6:10 AM
To: [hidden email]
Subject: Re: cluster analysis methods

Dorraj

I appreciate your advice.

Evie

--- On Fri, 13/5/11, DorraJ Oet <[hidden email]> wrote:

From: DorraJ Oet <[hidden email]>
Subject: RE: cluster analysis methods
To: [hidden email], "SPSS Syntax help" <[hidden email]>
Date: Friday, 13 May, 2011, 9:15

Hi Evie,

I notice that you seems to have included some demographic variables in your cluster analysis. The danger of including a demographic variable in the clustering analysis is that the cluster might not be unique. An extreme example is to have gender place in a cluster analysis which turns out that 2 clusters have male and female respondents in them which might not be useful for analysis.

In an example that I had carried out recently for my client, a 2-step cluster was done on a set of choices the respondents chose in the survey.

Dummy Example: What do you do on Sunday? (Multiple Answers Question)
Swimming (1-Yes, 0-No)
Play Tennis (1-Yes, 0-No)
Jogging (1-Yes, 0-No)
Listen to Music at Home (1-Yes, 0-No)
Read Books at Home (1-Yes, 0-No)

So end up I have clusters like respondents doing a cocktail of activites based on 2 step example:

1st Cluster: swimming, play tennis, jogging.
2nd Cluster: Listen to music at home, Read Books at Home.

To give a meaningful title to the cluster, I will rename it say
cluster 1: Sporty respondent
cluster 2: Respondent who like to stay at home

Of course after which to know more about these clusters, what I did was to perform a decison tree to profile these clusters based on the demographics.

I do understand that having a lack of knowledge on your project objectives, it will be very hasty of me to advice you this way but I hope that my advice could help you.

Warm Regards
Dorraj Oet

Date: Thu, 12 May 2011 11:56:52 +0100
From: [hidden email]
Subject: Re: cluster analysis methods
To: [hidden email]

Thank you Paul, Arthur, Brian for your replies. Any other advice
from someone who has used cluster analysis would be appreciated.

Evie

--- On Fri, 6/5/11, SoS Statistical Services <[hidden email]> wrote:

From: SoS Statistical Services <[hidden email]>
Subject: cluster analysis methods
To: [hidden email]
Date: Friday, 6 May, 2011, 13:45

I'm doing a cluster analysis for sole parents on variables such as
employment, tenure, socioeconomic status, education, age. The
variables are a mixture of nominal and ordinal although age is
continuous. I've been using hierarchical clustering (is this
correct?) but I'm not sure which method to use - SPSS has several e.g.
between-groups, nearest neighbour , Ward's etc. I would appreciate any
advice, thanks.

Evie