SPSSX Discussion

Cluster Analysis Option in SPSS

Classic

List

Threaded

10 messages Options

ljrhurley

Jun 11, 2012; 5:42am

Cluster Analysis Option in SPSS

2 posts

Does anyone know if SPSS can perform an ISOdata (iterative self organizing data analysis technique), or some other variation of a data driven partition cluster analysis?

I see that it can do k-means clustering, but that isn't quite what I want, since it is a subset of a user defined partition, not a data driven one.

landon

Firas Asad

Jun 11, 2012; 9:33am

Running ANCOVA with categorical covariates in SPSS

3 posts

Hi everyone,

I am planning on running Ancova to investigate whether there are any significant mean differences among groups on the dependent variable (DV). I have one continuous DV, one categorical IV, and eight other variables that I need to control for their potential influence on the DV by considering them as covariates (CVs).

Basically, I am going to use IBM SPSS v20: Analyze .... General Linear Model.... Univariate.

The issue is that 4 of these covariates have been already measured using nominal scale: 1 dichotomous and 3 polytomous.

Having stated that, my question is how to run Ancova when there are categorical covariates. Is it, for instance, legitimate, to transfer these categorical variables into series of dummy variables?

All answers and suggestions are highly welcomed.

Many thanks in advance,

Firas

PG student

Art Kendall

Jun 11, 2012; 11:47am

Re: Cluster Analysis Option in SPSS

2500 posts

In reply to this post by ljrhurley

Please explain why you think TWOSTEP, CLUSTER, and QUICK CLUSTER are not data-driven techniques. In data mining speak they are unsupervised.

DISCRIMINANT has a user defined (a priori) partition. There is a way to have QUICK CLUSTER use specified profiles, but that is not the common way to run it. In data mining speak these would be supervised learning.

Are you looking to to create a single nominal level variable that designate groups of cases that have similar profiles or are you trying to find a tree?
Do you have a reason to pre-specify the number of clusters to find.
IF you describe your study in more detail list members will be able to make suggestions.

ISODATA, there is a blast from the past, I do not recall hearing about that since about 1978. I bought it on old-fashioned punch cards, but they were out of order and I never followed up on it.

Art Kendall
Social Research Consultants

On 6/11/2012 1:42 AM, ljrhurley wrote:

Does anyone know if SPSS can perform an ISOdata (iterative self organizing
data analysis technique), or some other variation of a data driven partition
cluster analysis?

I see that it can do k-means clustering, but that isn't quite what I want,
since it is a subset of a user defined partition, not a data driven one.

landon

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Cluster-Analysis-Option-in-SPSS-tp5713624.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

... [show rest of quote]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Art Kendall
Social Research Consultants

Art Kendall

Jun 11, 2012; 11:52am

Re: Cluster Analysis Option in SPSS

2500 posts

In reply to this post by ljrhurley

PS.
TWOSTEP, CLUSTER and QUICK CLUSTER are pattern _detection_ techniques,
DISCRIMINANT and TREES are pattern _recognition_ techniques. QUICK CLUSTER has an option to be used as a pattern recognition techniques.

Art Kendall
Social Research Consultants

On 6/11/2012 1:42 AM, ljrhurley wrote:

Does anyone know if SPSS can perform an ISOdata (iterative self organizing
data analysis technique), or some other variation of a data driven partition
cluster analysis?

I see that it can do k-means clustering, but that isn't quite what I want,
since it is a subset of a user defined partition, not a data driven one.

landon

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Cluster-Analysis-Option-in-SPSS-tp5713624.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

... [show rest of quote]

Art Kendall
Social Research Consultants

Bruce Weaver

Jun 11, 2012; 12:38pm

Re: Running ANCOVA with categorical covariates in SPSS

Administrator

3512 posts

In reply to this post by Firas Asad

In SPSS lingo (and dialogs for UNIANOVA etc), "covariate" = scaled predictor variable and "fixed factor" = categorical predictor variable. In other contexts, "covariate" is often used more broadly to include both continuous and categorical variables that one wishes to control for.

Creating k-1 indicator variables for a categorical predictor with k categories and including those indicators as "covariates" (with UNIANOVA) is equivalent to simply including the original variable as a "fixed factor". So there's not much point in computing the indicators (if you're not doing it as an exercise to demonstrate the equivalence of the two approaches).

I suggest you stop thinking in terms of ANCOVA, and start thinking in terms of "general linear model", with some (at least approximately) interval scaled predictors and some categorical predictors.

What is your sample size, by the way? Given the number of variables in your model (and the number of df they eat up), you'll need a fairly large sample size if you are to avoid over-fitting. See Mike Babyak's nice article for some discussion of this.

http://os1.amc.nl/mediawiki/images/Babyak_-_overfitting.pdf

HTH.

Firas Asad wrote

Hi everyone,

I am planning on running Ancova to
investigate whether there are any significant mean differences among groups on
the dependent variable (DV). I have one continuous DV, one categorical IV, and eight
other variables that I need to control for their potential influence on the DV
by considering them as covariates (CVs).

Basically, I am going to use IBM SPSS v20: Analyze .... General Linear Model....
Univariate.

The issue is that 4 of these
covariates have been already measured using nominal scale: 1 dichotomous and 3 polytomous.

Having stated that, my question is how
to run Ancova when there are categorical covariates. Is it, for instance,
legitimate, to transfer these categorical variables into series of dummy
variables?

All answers and suggestions are
highly welcomed.

Many thanks in advance,

Firas
PG student
... [show rest of quote]

... [show rest of quote]

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

ljrhurley

Jun 11, 2012; 3:19pm

Re: Cluster Analysis Option in SPSS

2 posts

In reply to this post by Art Kendall

Grubmeier and Rudolph 2002 defined the classifications that I learned. Something like figure 6 or so.

K means has the a priori assumption that the number of clusters is known as well.

The ISODATA algorithm is similar to the k-means algorithm with the distinct difference that the ISODATA algorithm allows for different number of clusters while the k-means assumes that the number of clusters is known a priori. [1]

If you want I can cite isodata studies recently published as well.

1. http://www.yale.edu/ceo/Projects/swap/landcover/Unsupervised_classification.htm

-------- Original Message --------
From: Art Kendall <[hidden email]>
Sent: Mon Jun 11 07:47:05 EDT 2012
To: ljrhurley <[hidden email]>
Cc: [hidden email]
Subject: Re: [SPSSX-L] Cluster Analysis Option in SPSS

Please explain why you think TWOSTEP, CLUSTER, and QUICK CLUSTER are not data-driven techniques. In data mining speak they are unsupervised.

DISCRIMINANT has a user defined (a priori) partition. There is a way to have QUICK CLUSTER use specified profiles, but that is not the common way to run it. In data mining speak these would be supervised learning.

Are you looking to to create a single nominal level variable that designate groups of cases that have similar profiles or are you trying to find a tree?
Do you have a reason to pre-specify the number of clusters to find.
IF you describe your study in more detail list members will be able to make suggestions.

ISODATA, there is a blast from the past, I do not recall hearing about that since about 1978. I bought it on old-fashioned punch cards, but they were out of order and I never followed up on it.
Art Kendall Social Research Consultants
On 6/11/2012 1:42 AM, ljrhurley wrote:

Does anyone know if SPSS can perform an ISOdata (iterative self organizing data analysis technique), or some other variation of a data driven partition cluster analysis? I see that it can do k-means clustering, but that isn't quite what I want, since it is a subset of a user defined partition, not a data driven one. landon -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Cluster-Analysis-Option-in-SPSS-tp5713624.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

--
Violence is the last refuge of incompetence.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall

Jun 11, 2012; 5:04pm

Re: Cluster Analysis Option in SPSS

2500 posts

Yes one needs to run K-mean, k-medians, etc with a guess as to the number of clusters, but not necessarily what the profiles of the clusters are.
TWOSTEP and CLUSTER allow one to save memberships at different numbers of clusters. In TWOSTEP depending on whether the variables are continuous categorical or a mix of those, AIC or BIC can be used to choose a number of cluster to retain based on fit to the data. In CLUSTER depending which agglomeration method is used there is often some measure of fit to eyeball.

IIRC TWOSTEP has been around since before 2000. I have seldom used k-means since TWOSTEP has been around.

Also if the data is double centered (columns then rows) one can use a scree test and a parallel analysis to estimate the number of Q factors (clusters) to retain.

The final decision of the number of clusters to retain uses measures of fit as well as how meaningful the set of profiles is.

In actual work I tend to use the fit measures to ballpark the number of cluster to consider. Then I develop core clusters based on agreement of several of the heuristic approaches to identify "core" clusters. Finally, I refine cluster assignments using the classifications phase of a discriminant function analysis.

Art Kendall
Social Research Consultants

On 6/11/2012 11:19 AM, Landon Hurley wrote:

Grubmeier and Rudolph 2002 defined the classifications that I learned. Something like figure 6 or so.

K means has the a priori assumption that the number of clusters is known as well.

 The ISODATA algorithm is similar to the k-means algorithm with the distinct difference that the ISODATA algorithm allows for different number of clusters while the k-means assumes that the number of clusters is known a priori. [1]

If you want I can cite isodata studies recently published as well.

1. http://www.yale.edu/ceo/Projects/swap/landcover/Unsupervised_classification.htm






-------- Original Message --------
From: Art Kendall [hidden email]
Sent: Mon Jun 11 07:47:05 EDT 2012
To: ljrhurley [hidden email]
Cc: [hidden email]
Subject: Re: [SPSSX-L] Cluster Analysis Option in SPSS

Please explain why you think TWOSTEP, CLUSTER, and QUICK CLUSTER are not data-driven techniques. In data mining speak they are unsupervised.

DISCRIMINANT has a user defined (a priori) partition.  There is a way to have QUICK CLUSTER use specified profiles, but that is not the common way to run it. In data mining speak these would be supervised learning.

Are you looking to to create a single nominal level variable that designate groups of cases that have similar profiles or are you trying to find a tree?
Do you have a reason to pre-specify the number of clusters to find.
IF you describe your study in more detail list members will be able to make suggestions.

ISODATA, there is a blast from the past, I do not recall hearing about that since about 1978. I bought it on old-fashioned punch cards, but they were out of order and I never followed up on it.
Art Kendall Social Research Consultants
On 6/11/2012 1:42 AM, ljrhurley wrote:

Does anyone know if SPSS can perform an ISOdata (iterative self organizing data analysis technique), or some other variation of a data driven partition cluster analysis? I see that it can do k-means clustering, but that isn't quite what I want, since it is a subset of a user defined partition, not a data driven one. landon -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Cluster-Analysis-Option-in-SPSS-tp5713624.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscri!
 ptions,
send the command INFO REFCARD

... [show rest of quote]

Art Kendall
Social Research Consultants

Art Kendall

Jun 11, 2012; 5:06pm

Re: Cluster Analysis Option in SPSS

2500 posts

In reply to this post by ljrhurley

Yes one needs to run K-mean, k-medians, etc with a guess as to the number of clusters, but not necessarily what the profiles of the clusters are. K-means approaches in practice involve trying a few numbers of clusters.

TWOSTEP and CLUSTER allow one to save memberships at different numbers of clusters. In TWOSTEP depending on whether the variables are continuous categorical or a mix of those, AIC or BIC can be used to choose a number of cluster to retain based on fit to the data. In CLUSTER depending which agglomeration method is used there is often some measure of fit to eyeball.

IIRC TWOSTEP has been around since before 2000. I have seldom used k-means since TWOSTEP has been around.

Also if the data is double centered (columns then rows) one can use a scree test and a parallel analysis to estimate the number of Q factors (clusters) to retain.

The final decision of the number of clusters to retain uses measures of fit as well as how meaningful the set of profiles is.

In actual work I tend to use the fit measures to ballpark the number of cluster to consider. Then I develop core clusters based on agreement of several of the heuristic approaches to identify "core" clusters. Finally, I refine cluster assignments using the classifications phase of a discriminant function analysis.

Art Kendall
Social Research Consultants

On 6/11/2012 11:19 AM, Landon Hurley wrote:

Grubmeier and Rudolph 2002 defined the classifications that I learned. Something like figure 6 or so.

K means has the a priori assumption that the number of clusters is known as well.

 The ISODATA algorithm is similar to the k-means algorithm with the distinct difference that the ISODATA algorithm allows for different number of clusters while the k-means assumes that the number of clusters is known a priori. [1]

If you want I can cite isodata studies recently published as well.

1. http://www.yale.edu/ceo/Projects/swap/landcover/Unsupervised_classification.htm






-------- Original Message --------
From: Art Kendall [hidden email]
Sent: Mon Jun 11 07:47:05 EDT 2012
To: ljrhurley [hidden email]
Cc: [hidden email]
Subject: Re: [SPSSX-L] Cluster Analysis Option in SPSS

Please explain why you think TWOSTEP, CLUSTER, and QUICK CLUSTER are not data-driven techniques. In data mining speak they are unsupervised.

DISCRIMINANT has a user defined (a priori) partition.  There is a way to have QUICK CLUSTER use specified profiles, but that is not the common way to run it. In data mining speak these would be supervised learning.

Are you looking to to create a single nominal level variable that designate groups of cases that have similar profiles or are you trying to find a tree?
Do you have a reason to pre-specify the number of clusters to find.
IF you describe your study in more detail list members will be able to make suggestions.

ISODATA, there is a blast from the past, I do not recall hearing about that since about 1978. I bought it on old-fashioned punch cards, but they were out of order and I never followed up on it.
Art Kendall Social Research Consultants
On 6/11/2012 1:42 AM, ljrhurley wrote:

Does anyone know if SPSS can perform an ISOdata (iterative self organizing data analysis technique), or some other variation of a data driven partition cluster analysis? I see that it can do k-means clustering, but that isn't quite what I want, since it is a subset of a user defined partition, not a data driven one. landon -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Cluster-Analysis-Option-in-SPSS-tp5713624.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscri!
 ptions,
send the command INFO REFCARD

... [show rest of quote]

Art Kendall
Social Research Consultants

webster

Jul 13, 2012; 9:17pm

Re: Running ANCOVA with categorical covariates in SPSS

1 post

In reply to this post by Bruce Weaver

I am wanting to do something similar to Firas.

"In SPSS lingo (and dialogs for UNIANOVA etc), "covariate" = scaled predictor variable and "fixed factor" = categorical predictor variable. In other contexts, "covariate" is often used more broadly to include both continuous and categorical variables that one wishes to control for."

"I suggest you stop thinking in terms of ANCOVA, and start thinking in terms of "general linear model", with some (at least approximately) interval scaled predictors and some categorical predictors."

Thus do you enter categorical (actually in my case they are dichotomous) variables that you want to control for (in the SPSS GLM univariate) as "fixed factors" then rather than "covariates"?

Thanks for any advice

Firas Asad wrote

Hi everyone,

I am planning on running Ancova to
investigate whether there are any significant mean differences among groups on
the dependent variable (DV). I have one continuous DV, one categorical IV, and eight
other variables that I need to control for their potential influence on the DV
by considering them as covariates (CVs).

Basically, I am going to use IBM SPSS v20: Analyze .... General Linear Model....
Univariate.

The issue is that 4 of these
covariates have been already measured using nominal scale: 1 dichotomous and 3 polytomous.

Having stated that, my question is how
to run Ancova when there are categorical covariates. Is it, for instance,
legitimate, to transfer these categorical variables into series of dummy
variables?

All answers and suggestions are
highly welcomed.

Many thanks in advance,

Firas
PG student
... [show rest of quote]

... [show rest of quote]

Bruce Weaver

Jul 13, 2012; 9:59pm

Re: Running ANCOVA with categorical covariates in SPSS

Administrator

3512 posts

Dichotomous predictors can be treated as fixed factors OR as covariates--it makes no difference, except for some formatting of the output. If your main interest is in the table of coefficients, you might find it more convenient to treat dichotomous predictors as covariates. This will give you one row per dichotomous variable in the table of coefficients; treating them as fixed factors gives you two rows, one of which (the reference category) has nothing in it anyway.

On the other hand, if your main interest is in estimated marginal means, you might prefer treating the dichotomous predictors as fixed factors.

HTH.

webster wrote

I am wanting to do something similar to Firas.

"In SPSS lingo (and dialogs for UNIANOVA etc), "covariate" = scaled predictor variable and "fixed factor" = categorical predictor variable. In other contexts, "covariate" is often used more broadly to include both continuous and categorical variables that one wishes to control for."

"I suggest you stop thinking in terms of ANCOVA, and start thinking in terms of "general linear model", with some (at least approximately) interval scaled predictors and some categorical predictors."

Thus do you enter categorical (actually in my case they are dichotomous) variables that you want to control for (in the SPSS GLM univariate) as "fixed factors" then rather than "covariates"?

Thanks for any advice

Firas Asad wrote

Hi everyone,

I am planning on running Ancova to
investigate whether there are any significant mean differences among groups on
the dependent variable (DV). I have one continuous DV, one categorical IV, and eight
other variables that I need to control for their potential influence on the DV
by considering them as covariates (CVs).

Basically, I am going to use IBM SPSS v20: Analyze .... General Linear Model....
Univariate.

The issue is that 4 of these
covariates have been already measured using nominal scale: 1 dichotomous and 3 polytomous.

Having stated that, my question is how
to run Ancova when there are categorical covariates. Is it, for instance,
legitimate, to transfer these categorical variables into series of dummy
variables?

All answers and suggestions are
highly welcomed.

Many thanks in advance,

Firas
PG student
... [show rest of quote]

... [show rest of quote]

... [show rest of quote]

... [show rest of quote]