K means cluster analysis with Likert type items

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

K means cluster analysis with Likert type items

G.S.
I would be grateful if you could help me.
  I have a sample of 300 respondents to whose I addressed a question of 20
items of 5-point (response alternatives) Likert type scale and I want to
perform a type of cluster analysis procedure. What of the following do you
suggest me to attempt as most reliable?
1. K-means cluster analysis (traditional), by considering the items as
being numerical.
2. Hierarchical cluster analysis by using counts through chi-square
measures between sets of frequencies (in that case should I prepare counts
by myself or are they achieved automatically through SPSS procedure? In
case I need to prepare counts by myself, how can I do and in what way are
they appeared in data file?).
3. K-means cluster analysis of tranformed items having been saved through
CATPCA.

Thanks
George S.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: K means cluster analysis with Likert type items

Art Kendall
First forget about k-means.

see what you get with the ordinary PAF with varimax rotation.  Then
create scales by summing the items that load cleanly on a single factor.
Then use CATPCA to see what you get for items that you can sum into scales.

On each of the sets of scores run TWOSTEP treating the scores as
continuous, to see what clusters you get.

Then although it would be somewhat farfetched to pretend the variables
are reasonably independent, run TWOSTEP treating the items as categorical.

compare and contrast the solutions.

Art Kendall
Social Research Consultants

G.S. wrote:

> I would be grateful if you could help me.
>   I have a sample of 300 respondents to whose I addressed a question of 20
> items of 5-point (response alternatives) Likert type scale and I want to
> perform a type of cluster analysis procedure. What of the following do you
> suggest me to attempt as most reliable?
> 1. K-means cluster analysis (traditional), by considering the items as
> being numerical.
> 2. Hierarchical cluster analysis by using counts through chi-square
> measures between sets of frequencies (in that case should I prepare counts
> by myself or are they achieved automatically through SPSS procedure? In
> case I need to prepare counts by myself, how can I do and in what way are
> they appeared in data file?).
> 3. K-means cluster analysis of tranformed items having been saved through
> CATPCA.
>
> Thanks
> George S.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: K means cluster analysis with Likert type items

Nancy Darling-2
In reply to this post by G.S.
What do the distributions of your variables look like?  If they look
roughly normal, you can treat them like linear numerical items pretty
safely.  If they don't, you need to use other techniques.



G.S. wrote:

> I would be grateful if you could help me.
>   I have a sample of 300 respondents to whose I addressed a question of 20
> items of 5-point (response alternatives) Likert type scale and I want to
> perform a type of cluster analysis procedure. What of the following do you
> suggest me to attempt as most reliable?
> 1. K-means cluster analysis (traditional), by considering the items as
> being numerical.
> 2. Hierarchical cluster analysis by using counts through chi-square
> measures between sets of frequencies (in that case should I prepare counts
> by myself or are they achieved automatically through SPSS procedure? In
> case I need to prepare counts by myself, how can I do and in what way are
> they appeared in data file?).
> 3. K-means cluster analysis of tranformed items having been saved through
> CATPCA.
>
> Thanks
> George S.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: K means cluster analysis with Likert type items

Johnny Amora
In reply to this post by Art Kendall
Hi George,

What is your goal, why do you want to conduct cluster analysis?

Regards,
Johnny

G.S. wrote:

> I would be grateful if you could help me.
>   I have a sample of 300 respondents to whose I addressed a question of 20
> items of 5-point (response alternatives) Likert type scale and I want to
> perform a type of cluster analysis procedure. What of the following do you
> suggest me to attempt as most reliable?
> 1. K-means cluster analysis (traditional), by considering the items as
> being numerical.
> 2. Hierarchical cluster analysis by using counts through chi-square
> measures between sets of frequencies (in that case should I prepare counts
> by myself or are they achieved automatically through SPSS procedure? In
> case I need to prepare counts by myself, how can I do and in what way are
> they appeared in data file?).
> 3. K-means cluster analysis of tranformed items having been saved through
> CATPCA.
>
> Thanks
> George S.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



      What can we do to improve Metro Manila traffic? Find the answers on Yahoo Answers! http://ph.answers.yahoo.com/

====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: K means cluster analysis with Likert type items

Swank, Paul R
In reply to this post by G.S.
I agree with Art that K-means is probably not the best way to go here.
However, if you really have two or more distinct subpopulations within
your sample and the subpopulations they represent have different factor
structures, then the factor analysis might not be accurate. You may want
to check into latent class analysis. There is software that will do this
and at the same time handle the ordered categorical nature of the data
(Mplus, for example).

Paul R. Swank, Ph.D
Professor and Director of Research
Children's Learning Institute
University of Texas Health Science Center
Houston, TX 77038


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
G.S.
Sent: Thursday, December 11, 2008 8:47 AM
To: [hidden email]
Subject: K means cluster analysis with Likert type items

I would be grateful if you could help me.
  I have a sample of 300 respondents to whose I addressed a question of
20
items of 5-point (response alternatives) Likert type scale and I want to
perform a type of cluster analysis procedure. What of the following do
you
suggest me to attempt as most reliable?
1. K-means cluster analysis (traditional), by considering the items as
being numerical.
2. Hierarchical cluster analysis by using counts through chi-square
measures between sets of frequencies (in that case should I prepare
counts
by myself or are they achieved automatically through SPSS procedure? In
case I need to prepare counts by myself, how can I do and in what way
are
they appeared in data file?).
3. K-means cluster analysis of tranformed items having been saved
through
CATPCA.

Thanks
George S.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: K means cluster analysis with Likert type items

G.S.
In reply to this post by G.S.
Dear Dr. Kendall,

Thank you very much for your help in giving me detailed instructions.
However, I would like to raise a few more questions. Why do you suggest me
to start with Factor analysis since it is not considered a proper technique
for handling ordinal variables as Likert type items are? still, why should
I forget about K means cluster analysis in favor of two step cluster
analysis? Is any benefit in performance of one technique over the other?
Provided that I intend to make use of quantified variables transformed via
CATPCA, is it a drawback to proceed to K means cluster analysis? Finally,
what are the pros and cons in using factor scores instead of transformed
variables? In conclusion, do you reject all the options I am referring in
my initial letter and what for?

Sincerely yours

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: K means cluster analysis with Likert type items

Art Kendall
Overall it never hurts to try the different approaches and
comparing/contrasting the results.
of course it makes a difference of how many constructs you selected your
set of items to measure, or if you inherited them, or what.

see interspersed responses all the way trough including the original post.

George Siardos wrote:
> Dear Dr. Kendall,
>
> Thank you very much for your help in giving me detailed instructions.
> However, I would like to raise a few more questions. Why do you suggest me
> to start with Factor analysis since it is not considered a proper technique
> for handling ordinal variables as Likert type items are?
There was a lot of work in the late 60's and early 70's showing that
Likert items sd-s-n-a-sa  are frequently *not severely discrepant* from
*interval *level of measurement.
I was suggesting that you start with the customary/conventional approach
and then the newer approach (CATPCA).  CATPCA will show you how items
group together at different assumptions about the level of measurement.


> still, why should
> I forget about K means cluster analysis in favor of two step cluster
> analysis? Is any benefit in performance of one technique over the other?
>
"Forget" maybe was a little too strong.   START with TWOSTEP.
TWOSTEP has  options of treating  the data as nominal or as continuous
of different runs.  It also gives you  "goodness" measures for a range
of numbers of clusters.
It is much less subject to the way cases are sorted so you don't need to
do runs for different numbers of clusters in different sort orders to
find a number of clusters to retain.
Once you have determined an approximate number of clusters to retain you
can do a much smaller number of k-means runs to see how they compare in
case assignments.  Since you have only a small number of cases, you
could then run some of the hierarchical methods and inspect how those
cluster assignments compare to what you get from TWOSTEP and K-means.


> Provided that I intend to make use of quantified variables transformed via
> CATPCA, is it a drawback to proceed to K means cluster analysis?



> Finally,
> what are the pros and cons in using factor scores instead of transformed
> variables?
Using ordinary scores (summing those items that load cleanly on a factor
above some cut such as abs(.4) ) provides  a scoring key that can be
used in other studies.    Using factor scores would be considered by
some as using the fallacy of precision when you are starting with test
items.

> In conclusion, do you reject all the options I am referring in
> my initial letter and what for?
>
> Sincerely yours
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
>
> First forget about k-means.
>
> see what you get with the ordinary PAF with varimax rotation.  Then
> create scales by summing the items that load cleanly on a single factor.
> Then use CATPCA to see what you get for items that you can sum into
> scales.
>
> On each of the sets of scores run TWOSTEP treating the scores as
> continuous, to see what clusters you get.
>
> Then although it would be somewhat farfetched to pretend the variables
> are reasonably independent, run TWOSTEP treating the items as
> categorical.
>
> compare and contrast the solutions.
>
> Art Kendall
> Social Research Consultants
>
> G.S. wrote:
> I would be grateful if you could help me.
>   I have a sample of 300 respondents to whose I addressed a question
> of 20
> items of 5-point (response alternatives) Likert type scale and I want to
> perform a type of cluster analysis procedure. What of the following do
> you
> suggest me to attempt as most reliable?
> 1. K-means cluster analysis (traditional), by considering the items as
> being numerical.

See comments above.
> 2. Hierarchical cluster analysis by using counts through chi-square
> measures between sets of frequencies (in that case should I prepare
> counts
> by myself or are they achieved automatically through SPSS procedure? In
> case I need to prepare counts by myself, how can I do and in what way are
> they appeared in data file?).
assignments to clusters are provided by all of the clustering procedures
in SPSS.
Crosstabulation would tell you which cases were put together by the
different methods.  The chi-square stats would not be applicable.

You could find the how far each case is from the centroid of its groups
by using the /save in DISCRIMANANT.
> 3. K-means cluster analysis of tranformed items having been saved through
> CATPCA.
Variables created via transformations using a key derived from CATPCA
(or PAF) runs are one of the inputs you could use in any of the clusterings.

> Thanks
> George S.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: K means cluster analysis with Likert type items

Nancy Darling-2
I have been puzzled by the responses too.   A cluster analysis provides
a fundamentally different approach to data analysis than a factor
analysis (case-centered rather than variable-centered, if you use that
terminology).  Like the writer, I couldn't figure out from the
recommendation for factor analysis if the issue was that you wanted to
factor first, then cluster, because there were too many variables to get
a good solution, whether you thought factor analysis provided a better
approach to data summary, or some other reason.

I agree with an earlier poster that latent class analysis is most
conceptually similar to cluster analysis.  LCA also groups individuals
or cases by similarity of responses rather than creates latent variables
based on similarity of responses across variables.

The original poster also seems very concerned with the fact that they
have Likert items rather than strictly ratio items.  This doesn't bother
me it particular, if they are normally distributed or can be transformed
to be and act like linear variables.  Factored or clustered, that is
important.  For LCA, on the other hand, dichotomous variables are
easiest, although you can do categorical or linear.  To my knowledge, an
LCA module has not yet been introduced in SPSS.  You can run the
standalone program (download from the Penn State Methodology website -
it's a bear to make work) or use the new LCA module in SAS.


Art Kendall wrote:

> Overall it never hurts to try the different approaches and
> comparing/contrasting the results.
> of course it makes a difference of how many constructs you selected your
> set of items to measure, or if you inherited them, or what.
>
> see interspersed responses all the way trough including the original
> post.
>
> George Siardos wrote:
>> Dear Dr. Kendall,
>>
>> Thank you very much for your help in giving me detailed instructions.
>> However, I would like to raise a few more questions. Why do you
>> suggest me
>> to start with Factor analysis since it is not considered a proper
>> technique
>> for handling ordinal variables as Likert type items are?
> There was a lot of work in the late 60's and early 70's showing that
> Likert items sd-s-n-a-sa  are frequently *not severely discrepant* from
> *interval *level of measurement.
> I was suggesting that you start with the customary/conventional approach
> and then the newer approach (CATPCA).  CATPCA will show you how items
> group together at different assumptions about the level of measurement.
>
>
>> still, why should
>> I forget about K means cluster analysis in favor of two step cluster
>> analysis? Is any benefit in performance of one technique over the other?
>>
> "Forget" maybe was a little too strong.   START with TWOSTEP.
> TWOSTEP has  options of treating  the data as nominal or as continuous
> of different runs.  It also gives you  "goodness" measures for a range
> of numbers of clusters.
> It is much less subject to the way cases are sorted so you don't need to
> do runs for different numbers of clusters in different sort orders to
> find a number of clusters to retain.
> Once you have determined an approximate number of clusters to retain you
> can do a much smaller number of k-means runs to see how they compare in
> case assignments.  Since you have only a small number of cases, you
> could then run some of the hierarchical methods and inspect how those
> cluster assignments compare to what you get from TWOSTEP and K-means.
>
>
>> Provided that I intend to make use of quantified variables
>> transformed via
>> CATPCA, is it a drawback to proceed to K means cluster analysis?
>
>
>
>> Finally,
>> what are the pros and cons in using factor scores instead of transformed
>> variables?
> Using ordinary scores (summing those items that load cleanly on a factor
> above some cut such as abs(.4) ) provides  a scoring key that can be
> used in other studies.    Using factor scores would be considered by
> some as using the fallacy of precision when you are starting with test
> items.
>
>> In conclusion, do you reject all the options I am referring in
>> my initial letter and what for?
>>
>> Sincerely yours
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>>
>>
>> First forget about k-means.
>>
>> see what you get with the ordinary PAF with varimax rotation.  Then
>> create scales by summing the items that load cleanly on a single factor.
>> Then use CATPCA to see what you get for items that you can sum into
>> scales.
>>
>> On each of the sets of scores run TWOSTEP treating the scores as
>> continuous, to see what clusters you get.
>>
>> Then although it would be somewhat farfetched to pretend the variables
>> are reasonably independent, run TWOSTEP treating the items as
>> categorical.
>>
>> compare and contrast the solutions.
>>
>> Art Kendall
>> Social Research Consultants
>>
>> G.S. wrote:
>> I would be grateful if you could help me.
>>   I have a sample of 300 respondents to whose I addressed a question
>> of 20
>> items of 5-point (response alternatives) Likert type scale and I want to
>> perform a type of cluster analysis procedure. What of the following do
>> you
>> suggest me to attempt as most reliable?
>> 1. K-means cluster analysis (traditional), by considering the items as
>> being numerical.
>
> See comments above.
>> 2. Hierarchical cluster analysis by using counts through chi-square
>> measures between sets of frequencies (in that case should I prepare
>> counts
>> by myself or are they achieved automatically through SPSS procedure? In
>> case I need to prepare counts by myself, how can I do and in what way
>> are
>> they appeared in data file?).
> assignments to clusters are provided by all of the clustering procedures
> in SPSS.
> Crosstabulation would tell you which cases were put together by the
> different methods.  The chi-square stats would not be applicable.
>
> You could find the how far each case is from the centroid of its groups
> by using the /save in DISCRIMANANT.
>> 3. K-means cluster analysis of tranformed items having been saved
>> through
>> CATPCA.
> Variables created via transformations using a key derived from CATPCA
> (or PAF) runs are one of the inputs you could use in any of the
> clusterings.
>
>> Thanks
>> George S.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: K means cluster analysis with Likert type items

Zetu, Dan
I will go ahead and agree with Nancy here. I've done countless cluster
analyses using Likert data treated as numerical variables with very good
practical results. I may not be 100% correct in this statement, but my
understanding is that correlation between variables is of less
importance in clustering of cases, since clustering is done based on
relationship between cases rather than variables (there is also the case
when one wants to cluster variables rather than cases, but this is more
equivalent to factor analysis and it is beyond the scope of this
discussion).

On the other hand, I would recommend starting with hierarchical
clustering first, establish the optimal number of clusters and calculate
seeds, and then input this information to a k-means algorithm. I am not
sure whether 2-step clustering in SPSS does excatly that as I tend to
favor SAS over SPSS in doing cluster analysis and I am not too familiar
with SPSS clustering capabilities.

-------------------------------
Dan Zetu
Analytical Consultant
R. L. Polk & Co.
248-728-7278
[hidden email]

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Nancy Darling
Sent: Tuesday, December 16, 2008 10:07 AM
To: [hidden email]
Subject: Re: K means cluster analysis with Likert type items

I have been puzzled by the responses too.   A cluster analysis provides
a fundamentally different approach to data analysis than a factor
analysis (case-centered rather than variable-centered, if you use that
terminology).  Like the writer, I couldn't figure out from the
recommendation for factor analysis if the issue was that you wanted to
factor first, then cluster, because there were too many variables to get
a good solution, whether you thought factor analysis provided a better
approach to data summary, or some other reason.

I agree with an earlier poster that latent class analysis is most
conceptually similar to cluster analysis.  LCA also groups individuals
or cases by similarity of responses rather than creates latent variables
based on similarity of responses across variables.

The original poster also seems very concerned with the fact that they
have Likert items rather than strictly ratio items.  This doesn't bother
me it particular, if they are normally distributed or can be transformed
to be and act like linear variables.  Factored or clustered, that is
important.  For LCA, on the other hand, dichotomous variables are
easiest, although you can do categorical or linear.  To my knowledge, an
LCA module has not yet been introduced in SPSS.  You can run the
standalone program (download from the Penn State Methodology website -
it's a bear to make work) or use the new LCA module in SAS.


Art Kendall wrote:
> Overall it never hurts to try the different approaches and
> comparing/contrasting the results.
> of course it makes a difference of how many constructs you selected
your

> set of items to measure, or if you inherited them, or what.
>
> see interspersed responses all the way trough including the original
> post.
>
> George Siardos wrote:
>> Dear Dr. Kendall,
>>
>> Thank you very much for your help in giving me detailed instructions.
>> However, I would like to raise a few more questions. Why do you
>> suggest me
>> to start with Factor analysis since it is not considered a proper
>> technique
>> for handling ordinal variables as Likert type items are?
> There was a lot of work in the late 60's and early 70's showing that
> Likert items sd-s-n-a-sa  are frequently *not severely discrepant*
from
> *interval *level of measurement.
> I was suggesting that you start with the customary/conventional
approach
> and then the newer approach (CATPCA).  CATPCA will show you how items
> group together at different assumptions about the level of
measurement.
>
>
>> still, why should
>> I forget about K means cluster analysis in favor of two step cluster
>> analysis? Is any benefit in performance of one technique over the
other?
>>
> "Forget" maybe was a little too strong.   START with TWOSTEP.
> TWOSTEP has  options of treating  the data as nominal or as continuous
> of different runs.  It also gives you  "goodness" measures for a range
> of numbers of clusters.
> It is much less subject to the way cases are sorted so you don't need
to
> do runs for different numbers of clusters in different sort orders to
> find a number of clusters to retain.
> Once you have determined an approximate number of clusters to retain
you
> can do a much smaller number of k-means runs to see how they compare
in

> case assignments.  Since you have only a small number of cases, you
> could then run some of the hierarchical methods and inspect how those
> cluster assignments compare to what you get from TWOSTEP and K-means.
>
>
>> Provided that I intend to make use of quantified variables
>> transformed via
>> CATPCA, is it a drawback to proceed to K means cluster analysis?
>
>
>
>> Finally,
>> what are the pros and cons in using factor scores instead of
transformed
>> variables?
> Using ordinary scores (summing those items that load cleanly on a
factor

> above some cut such as abs(.4) ) provides  a scoring key that can be
> used in other studies.    Using factor scores would be considered by
> some as using the fallacy of precision when you are starting with test
> items.
>
>> In conclusion, do you reject all the options I am referring in
>> my initial letter and what for?
>>
>> Sincerely yours
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except
the

>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>>
>>
>> First forget about k-means.
>>
>> see what you get with the ordinary PAF with varimax rotation.  Then
>> create scales by summing the items that load cleanly on a single
factor.
>> Then use CATPCA to see what you get for items that you can sum into
>> scales.
>>
>> On each of the sets of scores run TWOSTEP treating the scores as
>> continuous, to see what clusters you get.
>>
>> Then although it would be somewhat farfetched to pretend the
variables

>> are reasonably independent, run TWOSTEP treating the items as
>> categorical.
>>
>> compare and contrast the solutions.
>>
>> Art Kendall
>> Social Research Consultants
>>
>> G.S. wrote:
>> I would be grateful if you could help me.
>>   I have a sample of 300 respondents to whose I addressed a question
>> of 20
>> items of 5-point (response alternatives) Likert type scale and I want
to
>> perform a type of cluster analysis procedure. What of the following
do
>> you
>> suggest me to attempt as most reliable?
>> 1. K-means cluster analysis (traditional), by considering the items
as
>> being numerical.
>
> See comments above.
>> 2. Hierarchical cluster analysis by using counts through chi-square
>> measures between sets of frequencies (in that case should I prepare
>> counts
>> by myself or are they achieved automatically through SPSS procedure?
In
>> case I need to prepare counts by myself, how can I do and in what way
>> are
>> they appeared in data file?).
> assignments to clusters are provided by all of the clustering
procedures
> in SPSS.
> Crosstabulation would tell you which cases were put together by the
> different methods.  The chi-square stats would not be applicable.
>
> You could find the how far each case is from the centroid of its
groups

> by using the /save in DISCRIMANANT.
>> 3. K-means cluster analysis of tranformed items having been saved
>> through
>> CATPCA.
> Variables created via transformations using a key derived from CATPCA
> (or PAF) runs are one of the inputs you could use in any of the
> clusterings.
>
>> Thanks
>> George S.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except
the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: K means cluster analysis with Likert type items

Art Kendall
In reply to this post by Nancy Darling-2
Without further information on the nature of the variables and the
nature of the set of cases, I made the assumption, that the purpose was
to find sets of cases that made sense in terms of the constructs
underlying the selection of Likert items and the set of cases.

Most clustering algorithms assume that the variables in the profile are
not too highly correlated.  Some implementations of LCA have been
extended to deal with highly similar items.  Factoring to find a set of
variables that is meaningful and then finding clusters based on those
variables has been *one *of the most prominent approaches since I
started using clustering in the early 70's.


A set of profiles with 20 variables would be a bear to try to interpret
especially if the OP decided to retain many clusters.  It would be even
more difficult if the variables were similar in meaning. The whole
Likert approach was developed because we have more faith that the sum of
items is a more reliable and valid measure of a construct than any
single item is. I was also considering that in most instances Likert
items are sufficiently close to interval level to treat their sums as
continuous for practical purposes.When you have Likert items, the FA is
used figure out how to combine them into scales (or whether they cohere
in defining a construct).

Although there is not a procedure that is exactly LCA in SPSS there are
many coefficients available in PROXIMITIES  for dichotomous data.

Art Kendall
Social Research Consultants


Nancy Darling wrote:

> I have been puzzled by the responses too.   A cluster analysis
> provides a fundamentally different approach to data analysis than a
> factor analysis (case-centered rather than variable-centered, if you
> use that terminology).  Like the writer, I couldn't figure out from
> the recommendation for factor analysis if the issue was that you
> wanted to factor first, then cluster, because there were too many
> variables to get a good solution, whether you thought factor analysis
> provided a better approach to data summary, or some other reason.
>
> I agree with an earlier poster that latent class analysis is most
> conceptually similar to cluster analysis.  LCA also groups individuals
> or cases by similarity of responses rather than creates latent
> variables based on similarity of responses across variables.
>
> The original poster also seems very concerned with the fact that they
> have Likert items rather than strictly ratio items.  This doesn't
> bother me it particular, if they are normally distributed or can be
> transformed to be and act like linear variables.  Factored or
> clustered, that is important.  For LCA, on the other hand, dichotomous
> variables are easiest, although you can do categorical or linear.  To
> my knowledge, an LCA module has not yet been introduced in SPSS.  You
> can run the standalone program (download from the Penn State
> Methodology website - it's a bear to make work) or use the new LCA
> module in SAS.

>
>
> Art Kendall wrote:
>> Overall it never hurts to try the different approaches and
>> comparing/contrasting the results.
>> of course it makes a difference of how many constructs you selected your
>> set of items to measure, or if you inherited them, or what.
>>
>> see interspersed responses all the way trough including the original
>> post.
>>
>> George Siardos wrote:
>>> Dear Dr. Kendall,
>>>
>>> Thank you very much for your help in giving me detailed instructions.
>>> However, I would like to raise a few more questions. Why do you
>>> suggest me
>>> to start with Factor analysis since it is not considered a proper
>>> technique
>>> for handling ordinal variables as Likert type items are?
>> There was a lot of work in the late 60's and early 70's showing that
>> Likert items sd-s-n-a-sa  are frequently *not severely discrepant* from
>> *interval *level of measurement.
>> I was suggesting that you start with the customary/conventional approach
>> and then the newer approach (CATPCA).  CATPCA will show you how items
>> group together at different assumptions about the level of measurement.
>>
>>
>>> still, why should
>>> I forget about K means cluster analysis in favor of two step cluster
>>> analysis? Is any benefit in performance of one technique over the
>>> other?
>>>
>> "Forget" maybe was a little too strong.   START with TWOSTEP.
>> TWOSTEP has  options of treating  the data as nominal or as continuous
>> of different runs.  It also gives you  "goodness" measures for a range
>> of numbers of clusters.
>> It is much less subject to the way cases are sorted so you don't need to
>> do runs for different numbers of clusters in different sort orders to
>> find a number of clusters to retain.
>> Once you have determined an approximate number of clusters to retain you
>> can do a much smaller number of k-means runs to see how they compare in
>> case assignments.  Since you have only a small number of cases, you
>> could then run some of the hierarchical methods and inspect how those
>> cluster assignments compare to what you get from TWOSTEP and K-means.
>>
>>
>>> Provided that I intend to make use of quantified variables
>>> transformed via
>>> CATPCA, is it a drawback to proceed to K means cluster analysis?
>>
>>
>>
>>> Finally,
>>> what are the pros and cons in using factor scores instead of
>>> transformed
>>> variables?
>> Using ordinary scores (summing those items that load cleanly on a factor
>> above some cut such as abs(.4) ) provides  a scoring key that can be
>> used in other studies.    Using factor scores would be considered by
>> some as using the fallacy of precision when you are starting with test
>> items.
>>
>>> In conclusion, do you reject all the options I am referring in
>>> my initial letter and what for?
>>>
>>> Sincerely yours
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>> [hidden email] (not to SPSSX-L), with no body text except
>>> the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>>
>>>
>>>
>>> First forget about k-means.
>>>
>>> see what you get with the ordinary PAF with varimax rotation.  Then
>>> create scales by summing the items that load cleanly on a single
>>> factor.
>>> Then use CATPCA to see what you get for items that you can sum into
>>> scales.
>>>
>>> On each of the sets of scores run TWOSTEP treating the scores as
>>> continuous, to see what clusters you get.
>>>
>>> Then although it would be somewhat farfetched to pretend the variables
>>> are reasonably independent, run TWOSTEP treating the items as
>>> categorical.
>>>
>>> compare and contrast the solutions.
>>>
>>> Art Kendall
>>> Social Research Consultants
>>>
>>> G.S. wrote:
>>> I would be grateful if you could help me.
>>>   I have a sample of 300 respondents to whose I addressed a question
>>> of 20
>>> items of 5-point (response alternatives) Likert type scale and I
>>> want to
>>> perform a type of cluster analysis procedure. What of the following do
>>> you
>>> suggest me to attempt as most reliable?
>>> 1. K-means cluster analysis (traditional), by considering the items as
>>> being numerical.
>>
>> See comments above.
>>> 2. Hierarchical cluster analysis by using counts through chi-square
>>> measures between sets of frequencies (in that case should I prepare
>>> counts
>>> by myself or are they achieved automatically through SPSS procedure? In
>>> case I need to prepare counts by myself, how can I do and in what
>>> way are
>>> they appeared in data file?).
>> assignments to clusters are provided by all of the clustering procedures
>> in SPSS.
>> Crosstabulation would tell you which cases were put together by the
>> different methods.  The chi-square stats would not be applicable.
>>
>> You could find the how far each case is from the centroid of its groups
>> by using the /save in DISCRIMANANT.
>>> 3. K-means cluster analysis of tranformed items having been saved
>>> through
>>> CATPCA.
>> Variables created via transformations using a key derived from CATPCA
>> (or PAF) runs are one of the inputs you could use in any of the
>> clusterings.
>>
>>> Thanks
>>> George S.
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: K means cluster analysis with Likert type items

Art Kendall
In reply to this post by Zetu, Dan
TWOSTEP  does a hierarchical clustering as the first step and goes back
and refines it.
In SPSS it is the quickest way to see how each number of clusters in a
range works out. e.g. if one used 2 to 15 clusters.
The AIC and/or BIC gives a good ballpark on the number of cluster to
inspect to determine how many to retain.

TWOSTEP t is not as sensitive to the order of cases within the file as
k-means. It is often useful to look at the the membership assignments
from TWOSTEP with the memberships assigned by a few runs of QUICK
CLUSTER (k-means)  with the cases randomly sorted on different random
number variables.

Art Kendall
Social Research Consultants

Zetu, Dan wrote:

> I will go ahead and agree with Nancy here. I've done countless cluster
> analyses using Likert data treated as numerical variables with very good
> practical results. I may not be 100% correct in this statement, but my
> understanding is that correlation between variables is of less
> importance in clustering of cases, since clustering is done based on
> relationship between cases rather than variables (there is also the case
> when one wants to cluster variables rather than cases, but this is more
> equivalent to factor analysis and it is beyond the scope of this
> discussion).
>
> On the other hand, I would recommend starting with hierarchical
> clustering first, establish the optimal number of clusters and calculate
> seeds, and then input this information to a k-means algorithm. I am not
> sure whether 2-step clustering in SPSS does excatly that as I tend to
> favor SAS over SPSS in doing cluster analysis and I am not too familiar
> with SPSS clustering capabilities.
>
> -------------------------------
> Dan Zetu
> Analytical Consultant
> R. L. Polk & Co.
> 248-728-7278
> [hidden email]
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> Nancy Darling
> Sent: Tuesday, December 16, 2008 10:07 AM
> To: [hidden email]
> Subject: Re: K means cluster analysis with Likert type items
>
> I have been puzzled by the responses too.   A cluster analysis provides
> a fundamentally different approach to data analysis than a factor
> analysis (case-centered rather than variable-centered, if you use that
> terminology).  Like the writer, I couldn't figure out from the
> recommendation for factor analysis if the issue was that you wanted to
> factor first, then cluster, because there were too many variables to get
> a good solution, whether you thought factor analysis provided a better
> approach to data summary, or some other reason.
>
> I agree with an earlier poster that latent class analysis is most
> conceptually similar to cluster analysis.  LCA also groups individuals
> or cases by similarity of responses rather than creates latent variables
> based on similarity of responses across variables.
>
> The original poster also seems very concerned with the fact that they
> have Likert items rather than strictly ratio items.  This doesn't bother
> me it particular, if they are normally distributed or can be transformed
> to be and act like linear variables.  Factored or clustered, that is
> important.  For LCA, on the other hand, dichotomous variables are
> easiest, although you can do categorical or linear.  To my knowledge, an
> LCA module has not yet been introduced in SPSS.  You can run the
> standalone program (download from the Penn State Methodology website -
> it's a bear to make work) or use the new LCA module in SAS.
>
>
> Art Kendall wrote:
>
>> Overall it never hurts to try the different approaches and
>> comparing/contrasting the results.
>> of course it makes a difference of how many constructs you selected
>>
> your
>
>> set of items to measure, or if you inherited them, or what.
>>
>> see interspersed responses all the way trough including the original
>> post.
>>
>> George Siardos wrote:
>>
>>> Dear Dr. Kendall,
>>>
>>> Thank you very much for your help in giving me detailed instructions.
>>> However, I would like to raise a few more questions. Why do you
>>> suggest me
>>> to start with Factor analysis since it is not considered a proper
>>> technique
>>> for handling ordinal variables as Likert type items are?
>>>
>> There was a lot of work in the late 60's and early 70's showing that
>> Likert items sd-s-n-a-sa  are frequently *not severely discrepant*
>>
> from
>
>> *interval *level of measurement.
>> I was suggesting that you start with the customary/conventional
>>
> approach
>
>> and then the newer approach (CATPCA).  CATPCA will show you how items
>> group together at different assumptions about the level of
>>
> measurement.
>
>>
>>> still, why should
>>> I forget about K means cluster analysis in favor of two step cluster
>>> analysis? Is any benefit in performance of one technique over the
>>>
> other?
>
>> "Forget" maybe was a little too strong.   START with TWOSTEP.
>> TWOSTEP has  options of treating  the data as nominal or as continuous
>> of different runs.  It also gives you  "goodness" measures for a range
>> of numbers of clusters.
>> It is much less subject to the way cases are sorted so you don't need
>>
> to
>
>> do runs for different numbers of clusters in different sort orders to
>> find a number of clusters to retain.
>> Once you have determined an approximate number of clusters to retain
>>
> you
>
>> can do a much smaller number of k-means runs to see how they compare
>>
> in
>
>> case assignments.  Since you have only a small number of cases, you
>> could then run some of the hierarchical methods and inspect how those
>> cluster assignments compare to what you get from TWOSTEP and K-means.
>>
>>
>>
>>> Provided that I intend to make use of quantified variables
>>> transformed via
>>> CATPCA, is it a drawback to proceed to K means cluster analysis?
>>>
>>
>>
>>> Finally,
>>> what are the pros and cons in using factor scores instead of
>>>
> transformed
>
>>> variables?
>>>
>> Using ordinary scores (summing those items that load cleanly on a
>>
> factor
>
>> above some cut such as abs(.4) ) provides  a scoring key that can be
>> used in other studies.    Using factor scores would be considered by
>> some as using the fallacy of precision when you are starting with test
>> items.
>>
>>
>>> In conclusion, do you reject all the options I am referring in
>>> my initial letter and what for?
>>>
>>> Sincerely yours
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>> [hidden email] (not to SPSSX-L), with no body text except
>>>
> the
>
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>>
>>>
>>>
>>> First forget about k-means.
>>>
>>> see what you get with the ordinary PAF with varimax rotation.  Then
>>> create scales by summing the items that load cleanly on a single
>>>
> factor.
>
>>> Then use CATPCA to see what you get for items that you can sum into
>>> scales.
>>>
>>> On each of the sets of scores run TWOSTEP treating the scores as
>>> continuous, to see what clusters you get.
>>>
>>> Then although it would be somewhat farfetched to pretend the
>>>
> variables
>
>>> are reasonably independent, run TWOSTEP treating the items as
>>> categorical.
>>>
>>> compare and contrast the solutions.
>>>
>>> Art Kendall
>>> Social Research Consultants
>>>
>>> G.S. wrote:
>>> I would be grateful if you could help me.
>>>   I have a sample of 300 respondents to whose I addressed a question
>>> of 20
>>> items of 5-point (response alternatives) Likert type scale and I want
>>>
> to
>
>>> perform a type of cluster analysis procedure. What of the following
>>>
> do
>
>>> you
>>> suggest me to attempt as most reliable?
>>> 1. K-means cluster analysis (traditional), by considering the items
>>>
> as
>
>>> being numerical.
>>>
>> See comments above.
>>
>>> 2. Hierarchical cluster analysis by using counts through chi-square
>>> measures between sets of frequencies (in that case should I prepare
>>> counts
>>> by myself or are they achieved automatically through SPSS procedure?
>>>
> In
>
>>> case I need to prepare counts by myself, how can I do and in what way
>>> are
>>> they appeared in data file?).
>>>
>> assignments to clusters are provided by all of the clustering
>>
> procedures
>
>> in SPSS.
>> Crosstabulation would tell you which cases were put together by the
>> different methods.  The chi-square stats would not be applicable.
>>
>> You could find the how far each case is from the centroid of its
>>
> groups
>
>> by using the /save in DISCRIMANANT.
>>
>>> 3. K-means cluster analysis of tranformed items having been saved
>>> through
>>> CATPCA.
>>>
>> Variables created via transformations using a key derived from CATPCA
>> (or PAF) runs are one of the inputs you could use in any of the
>> clusterings.
>>
>>
>>> Thanks
>>> George S.
>>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except
>>
> the
>
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: K means cluster analysis with Likert type items

John Fiedler
Two years ago, at the Sawtooth Software Conference, I proposed a measure of
"cluster integrity."
It begins on page 25 of the proceedings.
http://www.sawtoothsoftware.com/download/techpap/2006Proceedings.pdf
I would be interested in any reactions or comments.
JOHN


----- Original Message -----
From: "Art Kendall" <[hidden email]>
To: <[hidden email]>
Sent: Tuesday, December 16, 2008 10:01 AM
Subject: Re: K means cluster analysis with Likert type items


> TWOSTEP  does a hierarchical clustering as the first step and goes back
> and refines it.
> In SPSS it is the quickest way to see how each number of clusters in a
> range works out. e.g. if one used 2 to 15 clusters.
> The AIC and/or BIC gives a good ballpark on the number of cluster to
> inspect to determine how many to retain.
>
> TWOSTEP t is not as sensitive to the order of cases within the file as
> k-means. It is often useful to look at the the membership assignments
> from TWOSTEP with the memberships assigned by a few runs of QUICK
> CLUSTER (k-means)  with the cases randomly sorted on different random
> number variables.
>
> Art Kendall
> Social Research Consultants
>
> Zetu, Dan wrote:
>> I will go ahead and agree with Nancy here. I've done countless cluster
>> analyses using Likert data treated as numerical variables with very good
>> practical results. I may not be 100% correct in this statement, but my
>> understanding is that correlation between variables is of less
>> importance in clustering of cases, since clustering is done based on
>> relationship between cases rather than variables (there is also the case
>> when one wants to cluster variables rather than cases, but this is more
>> equivalent to factor analysis and it is beyond the scope of this
>> discussion).
>>
>> On the other hand, I would recommend starting with hierarchical
>> clustering first, establish the optimal number of clusters and calculate
>> seeds, and then input this information to a k-means algorithm. I am not
>> sure whether 2-step clustering in SPSS does excatly that as I tend to
>> favor SAS over SPSS in doing cluster analysis and I am not too familiar
>> with SPSS clustering capabilities.
>>
>> -------------------------------
>> Dan Zetu
>> Analytical Consultant
>> R. L. Polk & Co.
>> 248-728-7278
>> [hidden email]
>>
>> -----Original Message-----
>> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
>> Nancy Darling
>> Sent: Tuesday, December 16, 2008 10:07 AM
>> To: [hidden email]
>> Subject: Re: K means cluster analysis with Likert type items
>>
>> I have been puzzled by the responses too.   A cluster analysis provides
>> a fundamentally different approach to data analysis than a factor
>> analysis (case-centered rather than variable-centered, if you use that
>> terminology).  Like the writer, I couldn't figure out from the
>> recommendation for factor analysis if the issue was that you wanted to
>> factor first, then cluster, because there were too many variables to get
>> a good solution, whether you thought factor analysis provided a better
>> approach to data summary, or some other reason.
>>
>> I agree with an earlier poster that latent class analysis is most
>> conceptually similar to cluster analysis.  LCA also groups individuals
>> or cases by similarity of responses rather than creates latent variables
>> based on similarity of responses across variables.
>>
>> The original poster also seems very concerned with the fact that they
>> have Likert items rather than strictly ratio items.  This doesn't bother
>> me it particular, if they are normally distributed or can be transformed
>> to be and act like linear variables.  Factored or clustered, that is
>> important.  For LCA, on the other hand, dichotomous variables are
>> easiest, although you can do categorical or linear.  To my knowledge, an
>> LCA module has not yet been introduced in SPSS.  You can run the
>> standalone program (download from the Penn State Methodology website -
>> it's a bear to make work) or use the new LCA module in SAS.
>>
>>
>> Art Kendall wrote:
>>
>>> Overall it never hurts to try the different approaches and
>>> comparing/contrasting the results.
>>> of course it makes a difference of how many constructs you selected
>>>
>> your
>>
>>> set of items to measure, or if you inherited them, or what.
>>>
>>> see interspersed responses all the way trough including the original
>>> post.
>>>
>>> George Siardos wrote:
>>>
>>>> Dear Dr. Kendall,
>>>>
>>>> Thank you very much for your help in giving me detailed instructions.
>>>> However, I would like to raise a few more questions. Why do you
>>>> suggest me
>>>> to start with Factor analysis since it is not considered a proper
>>>> technique
>>>> for handling ordinal variables as Likert type items are?
>>>>
>>> There was a lot of work in the late 60's and early 70's showing that
>>> Likert items sd-s-n-a-sa  are frequently *not severely discrepant*
>>>
>> from
>>
>>> *interval *level of measurement.
>>> I was suggesting that you start with the customary/conventional
>>>
>> approach
>>
>>> and then the newer approach (CATPCA).  CATPCA will show you how items
>>> group together at different assumptions about the level of
>>>
>> measurement.
>>
>>>
>>>> still, why should
>>>> I forget about K means cluster analysis in favor of two step cluster
>>>> analysis? Is any benefit in performance of one technique over the
>>>>
>> other?
>>
>>> "Forget" maybe was a little too strong.   START with TWOSTEP.
>>> TWOSTEP has  options of treating  the data as nominal or as continuous
>>> of different runs.  It also gives you  "goodness" measures for a range
>>> of numbers of clusters.
>>> It is much less subject to the way cases are sorted so you don't need
>>>
>> to
>>
>>> do runs for different numbers of clusters in different sort orders to
>>> find a number of clusters to retain.
>>> Once you have determined an approximate number of clusters to retain
>>>
>> you
>>
>>> can do a much smaller number of k-means runs to see how they compare
>>>
>> in
>>
>>> case assignments.  Since you have only a small number of cases, you
>>> could then run some of the hierarchical methods and inspect how those
>>> cluster assignments compare to what you get from TWOSTEP and K-means.
>>>
>>>
>>>
>>>> Provided that I intend to make use of quantified variables
>>>> transformed via
>>>> CATPCA, is it a drawback to proceed to K means cluster analysis?
>>>>
>>>
>>>
>>>> Finally,
>>>> what are the pros and cons in using factor scores instead of
>>>>
>> transformed
>>
>>>> variables?
>>>>
>>> Using ordinary scores (summing those items that load cleanly on a
>>>
>> factor
>>
>>> above some cut such as abs(.4) ) provides  a scoring key that can be
>>> used in other studies.    Using factor scores would be considered by
>>> some as using the fallacy of precision when you are starting with test
>>> items.
>>>
>>>
>>>> In conclusion, do you reject all the options I am referring in
>>>> my initial letter and what for?
>>>>
>>>> Sincerely yours
>>>>
>>>> =====================
>>>> To manage your subscription to SPSSX-L, send a message to
>>>> [hidden email] (not to SPSSX-L), with no body text except
>>>>
>> the
>>
>>>> command. To leave the list, send the command
>>>> SIGNOFF SPSSX-L
>>>> For a list of commands to manage subscriptions, send the command
>>>> INFO REFCARD
>>>>
>>>>
>>>>
>>>> First forget about k-means.
>>>>
>>>> see what you get with the ordinary PAF with varimax rotation.  Then
>>>> create scales by summing the items that load cleanly on a single
>>>>
>> factor.
>>
>>>> Then use CATPCA to see what you get for items that you can sum into
>>>> scales.
>>>>
>>>> On each of the sets of scores run TWOSTEP treating the scores as
>>>> continuous, to see what clusters you get.
>>>>
>>>> Then although it would be somewhat farfetched to pretend the
>>>>
>> variables
>>
>>>> are reasonably independent, run TWOSTEP treating the items as
>>>> categorical.
>>>>
>>>> compare and contrast the solutions.
>>>>
>>>> Art Kendall
>>>> Social Research Consultants
>>>>
>>>> G.S. wrote:
>>>> I would be grateful if you could help me.
>>>>   I have a sample of 300 respondents to whose I addressed a question
>>>> of 20
>>>> items of 5-point (response alternatives) Likert type scale and I want
>>>>
>> to
>>
>>>> perform a type of cluster analysis procedure. What of the following
>>>>
>> do
>>
>>>> you
>>>> suggest me to attempt as most reliable?
>>>> 1. K-means cluster analysis (traditional), by considering the items
>>>>
>> as
>>
>>>> being numerical.
>>>>
>>> See comments above.
>>>
>>>> 2. Hierarchical cluster analysis by using counts through chi-square
>>>> measures between sets of frequencies (in that case should I prepare
>>>> counts
>>>> by myself or are they achieved automatically through SPSS procedure?
>>>>
>> In
>>
>>>> case I need to prepare counts by myself, how can I do and in what way
>>>> are
>>>> they appeared in data file?).
>>>>
>>> assignments to clusters are provided by all of the clustering
>>>
>> procedures
>>
>>> in SPSS.
>>> Crosstabulation would tell you which cases were put together by the
>>> different methods.  The chi-square stats would not be applicable.
>>>
>>> You could find the how far each case is from the centroid of its
>>>
>> groups
>>
>>> by using the /save in DISCRIMANANT.
>>>
>>>> 3. K-means cluster analysis of tranformed items having been saved
>>>> through
>>>> CATPCA.
>>>>
>>> Variables created via transformations using a key derived from CATPCA
>>> (or PAF) runs are one of the inputs you could use in any of the
>>> clusterings.
>>>
>>>
>>>> Thanks
>>>> George S.
>>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>> [hidden email] (not to SPSSX-L), with no body text except
>>>
>> the
>>
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>>
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>>
>>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD