Cluster Analysis Option in SPSS

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Cluster Analysis Option in SPSS

ljrhurley
Does anyone know if SPSS can perform an ISOdata (iterative self organizing data analysis technique), or some other variation of a data driven partition cluster analysis?

I see that it can do k-means clustering, but that isn't quite what I want, since it is a subset of a user defined partition, not a data driven one.

landon
Reply | Threaded
Open this post in threaded view
|

Running ANCOVA with categorical covariates in SPSS

Firas Asad
Hi everyone,

I am planning on running Ancova to investigate whether there are any significant mean differences among groups on the dependent variable (DV). I have one continuous DV, one categorical IV, and eight other variables that I need to control for their potential influence on the DV by considering them as covariates (CVs).

Basically, I am going to use IBM SPSS v20: Analyze .... General Linear Model.... Univariate.

The issue is that 4 of these covariates have been already measured using nominal scale: 1 dichotomous and 3 polytomous.

Having stated that, my question is how to run Ancova when there are categorical covariates. Is it, for instance, legitimate, to transfer these categorical variables into series of dummy variables?    

All answers and suggestions are highly welcomed.  

Many thanks in advance,
 
Firas
PG student
 
Reply | Threaded
Open this post in threaded view
|

Re: Cluster Analysis Option in SPSS

Art Kendall
In reply to this post by ljrhurley
Please explain why you think TWOSTEP, CLUSTER, and QUICK CLUSTER are not data-driven techniques. In data mining speak they are unsupervised.

DISCRIMINANT has a user defined (a priori) partition.  There is a way to have QUICK CLUSTER use specified profiles, but that is not the common way to run it. In data mining speak these would be supervised learning.

Are you looking to to create a single nominal level variable that designate groups of cases that have similar profiles or are you trying to find a tree?
Do you have a reason to pre-specify the number of clusters to find.
IF you describe your study in more detail list members will be able to make suggestions.

ISODATA, there is a blast from the past, I do not recall hearing about that since about 1978. I bought it on old-fashioned punch cards, but they were out of order and I never followed up on it.
Art Kendall
Social Research Consultants

On 6/11/2012 1:42 AM, ljrhurley wrote:
Does anyone know if SPSS can perform an ISOdata (iterative self organizing
data analysis technique), or some other variation of a data driven partition
cluster analysis?

I see that it can do k-means clustering, but that isn't quite what I want,
since it is a subset of a user defined partition, not a data driven one.

landon

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Cluster-Analysis-Option-in-SPSS-tp5713624.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Cluster Analysis Option in SPSS

Art Kendall
In reply to this post by ljrhurley
PS.
TWOSTEP, CLUSTER and QUICK CLUSTER are pattern _detection_ techniques
,
DISCRIMINANT  and TREES  are pattern _recognition_ techniques. QUICK CLUSTER has an option to be used as a pattern recognition techniques.
Art Kendall
Social Research Consultants

On 6/11/2012 1:42 AM, ljrhurley wrote:
Does anyone know if SPSS can perform an ISOdata (iterative self organizing
data analysis technique), or some other variation of a data driven partition
cluster analysis?

I see that it can do k-means clustering, but that isn't quite what I want,
since it is a subset of a user defined partition, not a data driven one.

landon

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Cluster-Analysis-Option-in-SPSS-tp5713624.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Running ANCOVA with categorical covariates in SPSS

Bruce Weaver
Administrator
In reply to this post by Firas Asad
In SPSS lingo (and dialogs for UNIANOVA etc), "covariate" = scaled predictor variable and "fixed factor" = categorical predictor variable.  In other contexts, "covariate" is often used more broadly to include both continuous and categorical variables that one wishes to control for.  

Creating k-1 indicator variables for a categorical predictor with k categories and including those indicators as "covariates" (with UNIANOVA) is equivalent to simply including the original variable as a "fixed factor".  So there's not much point in computing the indicators (if you're not doing it as an exercise to demonstrate the equivalence of the two approaches).  

I suggest you stop thinking in terms of ANCOVA, and start thinking in terms of "general linear model", with some (at least approximately) interval scaled predictors and some categorical predictors.  

What is your sample size, by the way?  Given the number of variables in your model (and the number of df they eat up), you'll need a fairly large sample size if you are to avoid over-fitting.  See Mike Babyak's nice article for some discussion of this.

   http://os1.amc.nl/mediawiki/images/Babyak_-_overfitting.pdf

HTH.


Firas Asad wrote
Hi everyone,


I am planning on running Ancova to
investigate whether there are any significant mean differences among groups on
the dependent variable (DV). I have one continuous DV, one categorical IV, and eight
other variables that I need to control for their potential influence on the DV
by considering them as covariates (CVs).


Basically, I am going to use IBM SPSS v20: Analyze .... General Linear Model....
Univariate.

The issue is that 4 of these
covariates have been already measured using nominal scale: 1 dichotomous and 3 polytomous.

Having stated that, my question is how
to run Ancova when there are categorical covariates. Is it, for instance,
legitimate, to transfer these categorical variables into series of dummy
variables?    

All answers and suggestions are
highly welcomed.  

Many thanks in advance,
 
Firas
PG student
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Cluster Analysis Option in SPSS

ljrhurley
In reply to this post by Art Kendall
Grubmeier and Rudolph 2002 defined the classifications that I learned. Something like figure 6 or so.

K means has the a priori assumption that the number of clusters is known as well.

 The ISODATA algorithm is similar to the k-means algorithm with the distinct difference that the ISODATA algorithm allows for different number of clusters while the k-means assumes that the number of clusters is known a priori. [1]

If you want I can cite isodata studies recently published as well.

1. http://www.yale.edu/ceo/Projects/swap/landcover/Unsupervised_classification.htm






-------- Original Message --------
From: Art Kendall <[hidden email]>
Sent: Mon Jun 11 07:47:05 EDT 2012
To: ljrhurley <[hidden email]>
Cc: [hidden email]
Subject: Re: [SPSSX-L] Cluster Analysis Option in SPSS

Please explain why you think TWOSTEP, CLUSTER, and QUICK CLUSTER are not data-driven techniques. In data mining speak they are unsupervised.

DISCRIMINANT has a user defined (a priori) partition.  There is a way to have QUICK CLUSTER use specified profiles, but that is not the common way to run it. In data mining speak these would be supervised learning.

Are you looking to to create a single nominal level variable that designate groups of cases that have similar profiles or are you trying to find a tree?
Do you have a reason to pre-specify the number of clusters to find.
IF you describe your study in more detail list members will be able to make suggestions.

ISODATA, there is a blast from the past, I do not recall hearing about that since about 1978. I bought it on old-fashioned punch cards, but they were out of order and I never followed up on it.
Art Kendall Social Research Consultants
On 6/11/2012 1:42 AM, ljrhurley wrote:

Does anyone know if SPSS can perform an ISOdata (iterative self organizing data analysis technique), or some other variation of a data driven partition cluster analysis? I see that it can do k-means clustering, but that isn't quite what I want, since it is a subset of a user defined partition, not a data driven one. landon -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Cluster-Analysis-Option-in-SPSS-tp5713624.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


--
Violence is the last refuge of incompetence.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Cluster Analysis Option in SPSS

Art Kendall
Yes one needs to run K-mean, k-medians, etc with a guess as to the number of clusters, but not necessarily what the profiles of the clusters are.
TWOSTEP and CLUSTER allow one to save memberships at different numbers of clusters. In TWOSTEP depending on whether the variables are continuous categorical or a mix of those, AIC or BIC can be used to choose a number of cluster to retain based on fit to the data.  In CLUSTER depending which agglomeration method is used there is often some measure of fit to eyeball.

IIRC TWOSTEP has been around since before 2000.  I have seldom used k-means since TWOSTEP has been around.

Also if the data is double centered (columns then rows) one can use a scree test and a parallel analysis to estimate the number of Q factors (clusters) to retain.

The final decision of the number of clusters to retain uses measures of fit as well as how meaningful the set of profiles is.

In actual work I tend to use the fit measures to ballpark the number of cluster to consider.  Then I develop core clusters based on agreement of several of the heuristic approaches to identify "core" clusters.  Finally, I refine cluster assignments using the classifications phase of a discriminant function analysis.
Art Kendall
Social Research Consultants

On 6/11/2012 11:19 AM, Landon Hurley wrote:
Grubmeier and Rudolph 2002 defined the classifications that I learned. Something like figure 6 or so.

K means has the a priori assumption that the number of clusters is known as well.

 The ISODATA algorithm is similar to the k-means algorithm with the distinct difference that the ISODATA algorithm allows for different number of clusters while the k-means assumes that the number of clusters is known a priori. [1]

If you want I can cite isodata studies recently published as well.

1. http://www.yale.edu/ceo/Projects/swap/landcover/Unsupervised_classification.htm






-------- Original Message --------
From: Art Kendall [hidden email]
Sent: Mon Jun 11 07:47:05 EDT 2012
To: ljrhurley [hidden email]
Cc: [hidden email]
Subject: Re: [SPSSX-L] Cluster Analysis Option in SPSS

Please explain why you think TWOSTEP, CLUSTER, and QUICK CLUSTER are not data-driven techniques. In data mining speak they are unsupervised.

DISCRIMINANT has a user defined (a priori) partition.  There is a way to have QUICK CLUSTER use specified profiles, but that is not the common way to run it. In data mining speak these would be supervised learning.

Are you looking to to create a single nominal level variable that designate groups of cases that have similar profiles or are you trying to find a tree?
Do you have a reason to pre-specify the number of clusters to find.
IF you describe your study in more detail list members will be able to make suggestions.

ISODATA, there is a blast from the past, I do not recall hearing about that since about 1978. I bought it on old-fashioned punch cards, but they were out of order and I never followed up on it.
Art Kendall Social Research Consultants
On 6/11/2012 1:42 AM, ljrhurley wrote:

Does anyone know if SPSS can perform an ISOdata (iterative self organizing data analysis technique), or some other variation of a data driven partition cluster analysis? I see that it can do k-means clustering, but that isn't quite what I want, since it is a subset of a user defined partition, not a data driven one. landon -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Cluster-Analysis-Option-in-SPSS-tp5713624.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscri!
 ptions,
send the command INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Cluster Analysis Option in SPSS

Art Kendall
In reply to this post by ljrhurley
Yes one needs to run K-mean, k-medians, etc with a guess as to the number of clusters, but not necessarily what the profiles of the clusters are. K-means approaches in practice involve trying a few numbers of clusters.

TWOSTEP and CLUSTER allow one to save memberships at different numbers of clusters. In TWOSTEP depending on whether the variables are continuous categorical or a mix of those, AIC or BIC can be used to choose a number of cluster to retain based on fit to the data.  In CLUSTER depending which agglomeration method is used there is often some measure of fit to eyeball.

IIRC TWOSTEP has been around since before 2000.  I have seldom used k-means since TWOSTEP has been around.

Also if the data is double centered (columns then rows) one can use a scree test and a parallel analysis to estimate the number of Q factors (clusters) to retain.

The final decision of the number of clusters to retain uses measures of fit as well as how meaningful the set of profiles is.

In actual work I tend to use the fit measures to ballpark the number of cluster to consider.  Then I develop core clusters based on agreement of several of the heuristic approaches to identify "core" clusters.  Finally, I refine cluster assignments using the classifications phase of a discriminant function analysis.
Art Kendall
Social Research Consultants

On 6/11/2012 11:19 AM, Landon Hurley wrote:
Grubmeier and Rudolph 2002 defined the classifications that I learned. Something like figure 6 or so.

K means has the a priori assumption that the number of clusters is known as well.

 The ISODATA algorithm is similar to the k-means algorithm with the distinct difference that the ISODATA algorithm allows for different number of clusters while the k-means assumes that the number of clusters is known a priori. [1]

If you want I can cite isodata studies recently published as well.

1. http://www.yale.edu/ceo/Projects/swap/landcover/Unsupervised_classification.htm






-------- Original Message --------
From: Art Kendall [hidden email]
Sent: Mon Jun 11 07:47:05 EDT 2012
To: ljrhurley [hidden email]
Cc: [hidden email]
Subject: Re: [SPSSX-L] Cluster Analysis Option in SPSS

Please explain why you think TWOSTEP, CLUSTER, and QUICK CLUSTER are not data-driven techniques. In data mining speak they are unsupervised.

DISCRIMINANT has a user defined (a priori) partition.  There is a way to have QUICK CLUSTER use specified profiles, but that is not the common way to run it. In data mining speak these would be supervised learning.

Are you looking to to create a single nominal level variable that designate groups of cases that have similar profiles or are you trying to find a tree?
Do you have a reason to pre-specify the number of clusters to find.
IF you describe your study in more detail list members will be able to make suggestions.

ISODATA, there is a blast from the past, I do not recall hearing about that since about 1978. I bought it on old-fashioned punch cards, but they were out of order and I never followed up on it.
Art Kendall Social Research Consultants
On 6/11/2012 1:42 AM, ljrhurley wrote:

Does anyone know if SPSS can perform an ISOdata (iterative self organizing data analysis technique), or some other variation of a data driven partition cluster analysis? I see that it can do k-means clustering, but that isn't quite what I want, since it is a subset of a user defined partition, not a data driven one. landon -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Cluster-Analysis-Option-in-SPSS-tp5713624.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscri!
 ptions,
send the command INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Running ANCOVA with categorical covariates in SPSS

webster
In reply to this post by Bruce Weaver
I am wanting to do something similar to Firas.

"In SPSS lingo (and dialogs for UNIANOVA etc), "covariate" = scaled predictor variable and "fixed factor" = categorical predictor variable.  In other contexts, "covariate" is often used more broadly to include both continuous and categorical variables that one wishes to control for."

"I suggest you stop thinking in terms of ANCOVA, and start thinking in terms of "general linear model", with some (at least approximately) interval scaled predictors and some categorical predictors."

Thus do you enter categorical (actually in my case they are dichotomous) variables that you want to control for (in the SPSS GLM univariate) as "fixed factors" then rather than "covariates"?

Thanks for any advice


Firas Asad wrote
Hi everyone,


I am planning on running Ancova to
investigate whether there are any significant mean differences among groups on
the dependent variable (DV). I have one continuous DV, one categorical IV, and eight
other variables that I need to control for their potential influence on the DV
by considering them as covariates (CVs).


Basically, I am going to use IBM SPSS v20: Analyze .... General Linear Model....
Univariate.

The issue is that 4 of these
covariates have been already measured using nominal scale: 1 dichotomous and 3 polytomous.

Having stated that, my question is how
to run Ancova when there are categorical covariates. Is it, for instance,
legitimate, to transfer these categorical variables into series of dummy
variables?    

All answers and suggestions are
highly welcomed.  

Many thanks in advance,
 
Firas
PG student

Reply | Threaded
Open this post in threaded view
|

Re: Running ANCOVA with categorical covariates in SPSS

Bruce Weaver
Administrator
Dichotomous predictors can be treated as fixed factors OR as covariates--it makes no difference, except for some formatting of the output.  If your main interest is in the table of coefficients, you might find it more convenient to treat dichotomous predictors as covariates.  This will give you one row per dichotomous variable in the table of coefficients; treating them as fixed factors gives you two rows, one of which (the reference category) has nothing in it anyway.  

On the other hand, if your main interest is in estimated marginal means, you might prefer treating the dichotomous predictors as fixed factors.

HTH.


webster wrote
I am wanting to do something similar to Firas.

"In SPSS lingo (and dialogs for UNIANOVA etc), "covariate" = scaled predictor variable and "fixed factor" = categorical predictor variable.  In other contexts, "covariate" is often used more broadly to include both continuous and categorical variables that one wishes to control for."

"I suggest you stop thinking in terms of ANCOVA, and start thinking in terms of "general linear model", with some (at least approximately) interval scaled predictors and some categorical predictors."

Thus do you enter categorical (actually in my case they are dichotomous) variables that you want to control for (in the SPSS GLM univariate) as "fixed factors" then rather than "covariates"?

Thanks for any advice


Firas Asad wrote
Hi everyone,


I am planning on running Ancova to
investigate whether there are any significant mean differences among groups on
the dependent variable (DV). I have one continuous DV, one categorical IV, and eight
other variables that I need to control for their potential influence on the DV
by considering them as covariates (CVs).


Basically, I am going to use IBM SPSS v20: Analyze .... General Linear Model....
Univariate.

The issue is that 4 of these
covariates have been already measured using nominal scale: 1 dichotomous and 3 polytomous.

Having stated that, my question is how
to run Ancova when there are categorical covariates. Is it, for instance,
legitimate, to transfer these categorical variables into series of dummy
variables?    

All answers and suggestions are
highly welcomed.  

Many thanks in advance,
 
Firas
PG student

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).