k-means cluster analysis

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

k-means cluster analysis

Pirritano, Matthew
Can someone please tell me why I get different results every time I do a k-means cluster analysis? Is there some logic to this, it seems to be related to how my data is sorted.  Any help would be much appreciated.

Thanks,

Matthew Pirritano, Ph.D.
Assistant Professor of Psychology
Smith Hall 116C
Chapman University
Department of Psychology
One University Drive
Orange, CA 92866
Telephone (714)744-7940
FAX (714)997-6780

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: k-means cluster analysis

Spousta Jan
Matthew,

It really depends on the initial sorting (or better, on the "seeds", usually selected as the first few cases of the data file) - it is the property of the algorithm, nothing specific for SPSS. (look at http://en.wikipedia.org/wiki/K-means if you wish to see how it works).

See also this comment from Paul Dickson: http://listserv.uga.edu/cgi-bin/wa?A2=ind0612&L=spssx-l&P=R2534

HTH

Jan



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Pirritano, Matthew
Sent: Monday, November 19, 2007 8:05 AM
To: [hidden email]
Subject: k-means cluster analysis

Can someone please tell me why I get different results every time I do a k-means cluster analysis? Is there some logic to this, it seems to be related to how my data is sorted.  Any help would be much appreciated.

Thanks,

Matthew Pirritano, Ph.D.
Assistant Professor of Psychology
Smith Hall 116C
Chapman University
Department of Psychology
One University Drive
Orange, CA 92866
Telephone (714)744-7940
FAX (714)997-6780

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



_____

Tato zpráva a všechny připojené soubory jsou důvěrné a určené výlučně adresátovi(-ům). Jestliže nejste oprávněným adresátem, je zakázáno jakékoliv zveřejňování, zprostředkování nebo jiné použití těchto informací. Jestliže jste tento mail dostali neoprávněně, prosím, uvědomte odesilatele a smažte zprávu i přiložené soubory. Odesilatel nezodpovídá za jakékoliv chyby nebo opomenutí způsobené tímto přenosem.

This message and any attached files are confidential and intended solely for the addressee(s). Any publication, transmission or other use of the information by a person or entity other than the intended addressee is prohibited. If you receive this in error please contact the sender and delete the message as well as all attached documents. The sender does not accept liability for any errors or omissions as a result of the transmission.

-.- --

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: k-means cluster analysis

Pirritano, Matthew
Jan,

Thanks for the help. Of course this makes the situation just that more confusing. How do you go through multiple versions of the clusters and visually assess they similarity? Especially because in this case I have 13 variables.

How about this. Whenever I run Ward's method I get three very clear clusters. Is there some way to get SPSS to start the k-means with the cases ordered as they are in the Ward's clusters? Would it be sufficient to start k-means out by having the cases sorted on the saved cluster membership from Ward's.

Know that's a lot, but I'm obsessive (and compulsive) and am supposed to have this done today.

Thanks,
Matt

Matthew Pirritano, Ph.D.
Assistant Professor of Psychology
Smith Hall 116C
Chapman University
Department of Psychology
One University Drive
Orange, CA 92866
Telephone (714)744-7940
FAX (714)997-6780



-----Original Message-----
From: Spousta Jan [mailto:[hidden email]]
Sent: Mon 11/19/2007 2:31 AM
To: Pirritano, Matthew; [hidden email]
Subject: RE:      k-means cluster analysis

Matthew,

It really depends on the initial sorting (or better, on the "seeds", usually selected as the first few cases of the data file) - it is the property of the algorithm, nothing specific for SPSS. (look at http://en.wikipedia.org/wiki/K-means if you wish to see how it works).

See also this comment from Paul Dickson: http://listserv.uga.edu/cgi-bin/wa?A2=ind0612&L=spssx-l&P=R2534

HTH

Jan



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Pirritano, Matthew
Sent: Monday, November 19, 2007 8:05 AM
To: [hidden email]
Subject: k-means cluster analysis

Can someone please tell me why I get different results every time I do a k-means cluster analysis? Is there some logic to this, it seems to be related to how my data is sorted.  Any help would be much appreciated.

Thanks,

Matthew Pirritano, Ph.D.
Assistant Professor of Psychology
Smith Hall 116C
Chapman University
Department of Psychology
One University Drive
Orange, CA 92866
Telephone (714)744-7940
FAX (714)997-6780

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



_____

Tato zpráva a vsechny pripojené soubory jsou duverné a urcené výlucne adresátovi(-um). Jestlize nejste oprávneným adresátem, je zakázáno jakékoliv zverejnování, zprostredkování nebo jiné pouzití techto informací. Jestlize jste tento mail dostali neoprávnene, prosím, uvedomte odesilatele a smazte zprávu i prilozené soubory. Odesilatel nezodpovídá za jakékoliv chyby nebo opomenutí zpusobené tímto prenosem.

This message and any attached files are confidential and intended solely for the addressee(s). Any publication, transmission or other use of the information by a person or entity other than the intended addressee is prohibited. If you receive this in error please contact the sender and delete the message as well as all attached documents. The sender does not accept liability for any errors or omissions as a result of the transmission.

-.- --

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: k-means cluster analysis

Anthony Babinec
Read the Quick Cluster documentation in Help. You can specify initial
cluster centers using the /INITIAL subcommand—see the syntax below. These
can be the final cluster center from the Ward method.

QUICK CLUSTER {varlist}
{ALL }
[/MISSING=[{LISTWISE**}] [INCLUDE]]
{PAIRWISE }
{DEFAULT }
[/FILE='savfile'|'dataset']
[/INITIAL=(value list)]
[/CRITERIA=[CLUSTER({2**})][NOINITIAL][MXITER({10**})] [CONVERGE({0**})]]
{n } {n } {n }
[/METHOD=[{KMEANS[(NOUPDATE)]**}]
{KMEANS(UPDATE)} }
{CLASSIFY }
[/PRINT=[INITIAL**] [CLUSTER] [ID(varname)] [DISTANCE] [ANOVA] [NONE]]
[/OUTFILE='savfile'|'dataset']
[/SAVE=[CLUSTER[(varname)]] [DISTANCE[(varname)]]]

Anthony Babinec
[hidden email]

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Pirritano, Matthew
Sent: Monday, November 19, 2007 5:01 AM
To: [hidden email]
Subject: Re: k-means cluster analysis

Jan,

Thanks for the help. Of course this makes the situation just that more
confusing. How do you go through multiple versions of the clusters and
visually assess they similarity? Especially because in this case I have 13
variables.

How about this. Whenever I run Ward's method I get three very clear
clusters. Is there some way to get SPSS to start the k-means with the cases
ordered as they are in the Ward's clusters? Would it be sufficient to start
k-means out by having the cases sorted on the saved cluster membership from
Ward's.

Know that's a lot, but I'm obsessive (and compulsive) and am supposed to
have this done today.

Thanks,
Matt

Matthew Pirritano, Ph.D.
Assistant Professor of Psychology
Smith Hall 116C
Chapman University
Department of Psychology
One University Drive
Orange, CA 92866
Telephone (714)744-7940
FAX (714)997-6780



-----Original Message-----
From: Spousta Jan [mailto:[hidden email]]
Sent: Mon 11/19/2007 2:31 AM
To: Pirritano, Matthew; [hidden email]
Subject: RE:      k-means cluster analysis

Matthew,

It really depends on the initial sorting (or better, on the "seeds", usually
selected as the first few cases of the data file) - it is the property of
the algorithm, nothing specific for SPSS. (look at
http://en.wikipedia.org/wiki/K-means if you wish to see how it works).

See also this comment from Paul Dickson:
http://listserv.uga.edu/cgi-bin/wa?A2=ind0612&L=spssx-l&P=R2534

HTH

Jan



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Pirritano, Matthew
Sent: Monday, November 19, 2007 8:05 AM
To: [hidden email]
Subject: k-means cluster analysis

Can someone please tell me why I get different results every time I do a
k-means cluster analysis? Is there some logic to this, it seems to be
related to how my data is sorted.  Any help would be much appreciated.

Thanks,

Matthew Pirritano, Ph.D.
Assistant Professor of Psychology
Smith Hall 116C
Chapman University
Department of Psychology
One University Drive
Orange, CA 92866
Telephone (714)744-7940
FAX (714)997-6780

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD



_____

Tato zpráva a vsechny pripojené soubory jsou duverné a urcené výlucne
adresátovi(-um). Jestlize nejste oprávneným adresátem, je zakázáno jakékoliv
zverejnování, zprostredkování nebo jiné pouzití techto informací. Jestlize
jste tento mail dostali neoprávnene, prosím, uvedomte odesilatele a smazte
zprávu i prilozené soubory. Odesilatel nezodpovídá za jakékoliv chyby nebo
opomenutí zpusobené tímto prenosem.

This message and any attached files are confidential and intended solely for
the addressee(s). Any publication, transmission or other use of the
information by a person or entity other than the intended addressee is
prohibited. If you receive this in error please contact the sender and
delete the message as well as all attached documents. The sender does not
accept liability for any errors or omissions as a result of the
transmission.

-.- --

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: k-means cluster analysis

Matthew Pirritano
In reply to this post by Pirritano, Matthew
So now I'm using Ward's to create my starting cluster centers and k-means to calculate my final cluster centers. Can anyone point me to any references that explain why this make sense? If it does. Why not just report the Wards centers?

Thanks,
Matt
 
Matthew Pirritano, Ph.D.
Assistant Professor of Psychology
Smith Hall 116C
Chapman University
Department of Psychology
One University Drive
Orange, CA 92866
Telephone (714)744-7940
FAX (714)997-6780

----- Original Message ----
From: Anthony Babinec <[hidden email]>
To: [hidden email]
Sent: Monday, November 19, 2007 4:15:37 AM
Subject: Re: k-means cluster analysis

Read the Quick Cluster documentation in Help. You can specify initial
cluster centers using the /INITIAL subcommand—see the syntax below.
 These
can be the final cluster center from the Ward method.

QUICK CLUSTER {varlist}
{ALL }
[/MISSING=[{LISTWISE**}] [INCLUDE]]
{PAIRWISE }
{DEFAULT }
[/FILE='savfile'|'dataset']
[/INITIAL=(value list)]
[/CRITERIA=[CLUSTER({2**})][NOINITIAL][MXITER({10**})]
 [CONVERGE({0**})]]
{n } {n } {n }
[/METHOD=[{KMEANS[(NOUPDATE)]**}]
{KMEANS(UPDATE)} }
{CLASSIFY }
[/PRINT=[INITIAL**] [CLUSTER] [ID(varname)] [DISTANCE] [ANOVA] [NONE]]
[/OUTFILE='savfile'|'dataset']
[/SAVE=[CLUSTER[(varname)]] [DISTANCE[(varname)]]]

Anthony Babinec
[hidden email]

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf
 Of
Pirritano, Matthew
Sent: Monday, November 19, 2007 5:01 AM
To: [hidden email]
Subject: Re: k-means cluster analysis

Jan,

Thanks for the help. Of course this makes the situation just that more
confusing. How do you go through multiple versions of the clusters and
visually assess they similarity? Especially because in this case I have
 13
variables.

How about this. Whenever I run Ward's method I get three very clear
clusters. Is there some way to get SPSS to start the k-means with the
 cases
ordered as they are in the Ward's clusters? Would it be sufficient to
 start
k-means out by having the cases sorted on the saved cluster membership
 from
Ward's.

Know that's a lot, but I'm obsessive (and compulsive) and am supposed
 to
have this done today.

Thanks,
Matt

Matthew Pirritano, Ph.D.
Assistant Professor of Psychology
Smith Hall 116C
Chapman University
Department of Psychology
One University Drive
Orange, CA 92866
Telephone (714)744-7940
FAX (714)997-6780



-----Original Message-----
From: Spousta Jan [mailto:[hidden email]]
Sent: Mon 11/19/2007 2:31 AM
To: Pirritano, Matthew; [hidden email]
Subject: RE:      k-means cluster analysis

Matthew,

It really depends on the initial sorting (or better, on the "seeds",
 usually
selected as the first few cases of the data file) - it is the property
 of
the algorithm, nothing specific for SPSS. (look at
http://en.wikipedia.org/wiki/K-means if you wish to see how it works).

See also this comment from Paul Dickson:
http://listserv.uga.edu/cgi-bin/wa?A2=ind0612&L=spssx-l&P=R2534

HTH

Jan



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf
 Of
Pirritano, Matthew
Sent: Monday, November 19, 2007 8:05 AM
To: [hidden email]
Subject: k-means cluster analysis

Can someone please tell me why I get different results every time I do
 a
k-means cluster analysis? Is there some logic to this, it seems to be
related to how my data is sorted.  Any help would be much appreciated.

Thanks,

Matthew Pirritano, Ph.D.
Assistant Professor of Psychology
Smith Hall 116C
Chapman University
Department of Psychology
One University Drive
Orange, CA 92866
Telephone (714)744-7940
FAX (714)997-6780

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except
 the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list
 of
commands to manage subscriptions, send the command INFO REFCARD



_____

Tato zpráva a vsechny pripojené soubory jsou duverné a urcené
 výlucne
adresátovi(-um). Jestlize nejste oprávneným adresátem, je
 zakázáno jakékoliv
zverejnování, zprostredkování nebo jiné pouzití techto
 informací. Jestlize
jste tento mail dostali neoprávnene, prosím, uvedomte odesilatele a
 smazte
zprávu i prilozené soubory. Odesilatel nezodpovídá za jakékoliv
 chyby nebo
opomenutí zpusobené tímto prenosem.

This message and any attached files are confidential and intended
 solely for
the addressee(s). Any publication, transmission or other use of the
information by a person or entity other than the intended addressee is
prohibited. If you receive this in error please contact the sender and
delete the message as well as all attached documents. The sender does
 not
accept liability for any errors or omissions as a result of the
transmission.

-.- --

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except
 the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except
 the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD





====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: k-means cluster analysis

David Hitchin
In reply to this post by Pirritano, Matthew
Quoting "Pirritano, Matthew" <[hidden email]>:

> Can someone please tell me why I get different results every time I
> do a k-means cluster analysis? Is there some logic to this, it seems
> to be related to how my data is sorted.  Any help would be much
> appreciated.
>

Others have explained why this is the case, but it is a useful reminder
that many clustering algorithms do not produce unique results from the
same data set, and that even the same algorithm starting from different
random numbers will get different answers. Small changes to the data,
e.g. adding or removing a few cases may make a lot of difference to the
final results.

Clustering methods often provide useful ideas about how the data values
MIGHT be grouped, but it is unwise to assume that they will always give
unique clear objective answers to a question about how the data points
are related.

David Hitchin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: k-means cluster analysis

Steve Peck
In reply to this post by Matthew Pirritano
Hi Matt

the following is a great book for anyone trying to come to terms of with
thinking in terms of 'patterns of values across variables for each unit
eg person' (as opposed to 'patterns of values across units eg people for
each variable'):

Bergman, L. R., Magnusson, D., & El-Khouri, B. M. (2003). /Studying
individual development in an interindividual context: A person-oriented
approach./ Mahwah, NJ: Lawrence Erlbaum Associates. [Paths through life
– volume 4, D. Magusson (Ed.)]

In short, Ward's method is great for determining the range and nature of
viable profile solutions within a sample (e.g., the 4-cluster solution
v. the 9-cluster solution) but, because its hierarchical, individuals
often get sorted into a profile that later (so to speak) turns out not
to match their individual profile as well as some other profile.
Using Ward's, then, we can select the best set of profiles; using
k-means on the start values derived from our selected Ward's solution,
we can then be sure that each individual is placed into the profile
group that best matches their individual profile.

Steve

Matthew Pirritano wrote:

> So now I'm using Ward's to create my starting cluster centers and k-means to calculate my final cluster centers. Can anyone point me to any references that explain why this make sense? If it does. Why not just report the Wards centers?
>
> Thanks,
> Matt
>
> Matthew Pirritano, Ph.D.
> Assistant Professor of Psychology
> Smith Hall 116C
> Chapman University
> Department of Psychology
> One University Drive
> Orange, CA 92866
> Telephone (714)744-7940
> FAX (714)997-6780
>
> ----- Original Message ----
> From: Anthony Babinec <[hidden email]>
> To: [hidden email]
> Sent: Monday, November 19, 2007 4:15:37 AM
> Subject: Re: k-means cluster analysis
>
> Read the Quick Cluster documentation in Help. You can specify initial
> cluster centers using the /INITIAL subcommand—see the syntax below.
>  These
> can be the final cluster center from the Ward method.
>
> QUICK CLUSTER {varlist}
> {ALL }
> [/MISSING=[{LISTWISE**}] [INCLUDE]]
> {PAIRWISE }
> {DEFAULT }
> [/FILE='savfile'|'dataset']
> [/INITIAL=(value list)]
> [/CRITERIA=[CLUSTER({2**})][NOINITIAL][MXITER({10**})]
>  [CONVERGE({0**})]]
> {n } {n } {n }
> [/METHOD=[{KMEANS[(NOUPDATE)]**}]
> {KMEANS(UPDATE)} }
> {CLASSIFY }
> [/PRINT=[INITIAL**] [CLUSTER] [ID(varname)] [DISTANCE] [ANOVA] [NONE]]
> [/OUTFILE='savfile'|'dataset']
> [/SAVE=[CLUSTER[(varname)]] [DISTANCE[(varname)]]]
>
> Anthony Babinec
> [hidden email]
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf
>  Of
> Pirritano, Matthew
> Sent: Monday, November 19, 2007 5:01 AM
> To: [hidden email]
> Subject: Re: k-means cluster analysis
>
> Jan,
>
> Thanks for the help. Of course this makes the situation just that more
> confusing. How do you go through multiple versions of the clusters and
> visually assess they similarity? Especially because in this case I have
>  13
> variables.
>
> How about this. Whenever I run Ward's method I get three very clear
> clusters. Is there some way to get SPSS to start the k-means with the
>  cases
> ordered as they are in the Ward's clusters? Would it be sufficient to
>  start
> k-means out by having the cases sorted on the saved cluster membership
>  from
> Ward's.
>
> Know that's a lot, but I'm obsessive (and compulsive) and am supposed
>  to
> have this done today.
>
> Thanks,
> Matt
>
> Matthew Pirritano, Ph.D.
> Assistant Professor of Psychology
> Smith Hall 116C
> Chapman University
> Department of Psychology
> One University Drive
> Orange, CA 92866
> Telephone (714)744-7940
> FAX (714)997-6780
>
>
>
> -----Original Message-----
> From: Spousta Jan [mailto:[hidden email]]
> Sent: Mon 11/19/2007 2:31 AM
> To: Pirritano, Matthew; [hidden email]
> Subject: RE:      k-means cluster analysis
>
> Matthew,
>
> It really depends on the initial sorting (or better, on the "seeds",
>  usually
> selected as the first few cases of the data file) - it is the property
>  of
> the algorithm, nothing specific for SPSS. (look at
> http://en.wikipedia.org/wiki/K-means if you wish to see how it works).
>
> See also this comment from Paul Dickson:
> http://listserv.uga.edu/cgi-bin/wa?A2=ind0612&L=spssx-l&P=R2534
>
> HTH
>
> Jan
>
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf
>  Of
> Pirritano, Matthew
> Sent: Monday, November 19, 2007 8:05 AM
> To: [hidden email]
> Subject: k-means cluster analysis
>
> Can someone please tell me why I get different results every time I do
>  a
> k-means cluster analysis? Is there some logic to this, it seems to be
> related to how my data is sorted.  Any help would be much appreciated.
>
> Thanks,
>
> Matthew Pirritano, Ph.D.
> Assistant Professor of Psychology
> Smith Hall 116C
> Chapman University
> Department of Psychology
> One University Drive
> Orange, CA 92866
> Telephone (714)744-7940
> FAX (714)997-6780
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except
>  the
> command. To leave the list, send the command SIGNOFF SPSSX-L For a list
>  of
> commands to manage subscriptions, send the command INFO REFCARD
>
>
>
> _____
>
> Tato zpráva a vsechny pripojené soubory jsou duverné a urcené
>  výlucne
> adresátovi(-um). Jestlize nejste oprávneným adresátem, je
>  zakázáno jakékoliv
> zverejnování, zprostredkování nebo jiné pouzití techto
>  informací. Jestlize
> jste tento mail dostali neoprávnene, prosím, uvedomte odesilatele a
>  smazte
> zprávu i prilozené soubory. Odesilatel nezodpovídá za jakékoliv
>  chyby nebo
> opomenutí zpusobené tímto prenosem.
>
> This message and any attached files are confidential and intended
>  solely for
> the addressee(s). Any publication, transmission or other use of the
> information by a person or entity other than the intended addressee is
> prohibited. If you receive this in error please contact the sender and
> delete the message as well as all attached documents. The sender does
>  not
> accept liability for any errors or omissions as a result of the
> transmission.
>
> -.- --
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except
>  the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except
>  the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
>
>
>
> ===================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
>
>

--
Stephen C. Peck
Research Investigator
Achievement Research Lab
Research Center for Group Dynamics
Institute for Social Research
University of Michigan
426 Thompson Street, # 5136
Ann Arbor, MI  48106-1248
(734) 647-3683; fax (734) 936-7370
http://www.rcgd.isr.umich.edu/garp/
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: k-means cluster analysis

Art Kendall
In reply to this post by Matthew Pirritano
I don't have any references at hand.

> Why not just report the Wards centers?

Because any cluster solution is just a good guess.  One approach is to
consider the consensus of several methods to be more reasonable.

Since the early 70, I have used a few proximity measures and a few
agglomeration methods and saved the membership variables from each
solution. Using crosstabs, I then found sets of cases that were
clustered together by several methods. I called these "core clusters".
Of course, that meant some cases were left ungrouped. I then iteratively
used discriminant function analysis with its classification phase's
probabilities of membership. At the end of each iteration, I assigned
cases that were far from the centroid and those that were not clearly
assigned to one cluster as ungrouped in the next dfa.

These days I would try using the assignments to clusters as nominal
level variables as input to TWOSTEP  to find the core clusters.

 In the 50's Lorr's method used the assumption that there were some
"pure types" but that some cases might not fit into the typology and
some cases would be a mix of "types".

Of course, a lot depends on the nature of your data.

Art Kendall
Social

Matthew Pirritano wrote:

> So now I'm using Ward's to create my starting cluster centers and k-means to calculate my final cluster centers. Can anyone point me to any references that explain why this make sense? If it does. Why not just report the Wards centers?
>
> Thanks,
> Matt
>
> Matthew Pirritano, Ph.D.
> Assistant Professor of Psychology
> Smith Hall 116C
> Chapman University
> Department of Psychology
> One University Drive
> Orange, CA 92866
> Telephone (714)744-7940
> FAX (714)997-6780
>
> ----- Original Message ----
> From: Anthony Babinec <[hidden email]>
> To: [hidden email]
> Sent: Monday, November 19, 2007 4:15:37 AM
> Subject: Re: k-means cluster analysis
>
> Read the Quick Cluster documentation in Help. You can specify initial
> cluster centers using the /INITIAL subcommand—see the syntax below.
>  These
> can be the final cluster center from the Ward method.
>
> QUICK CLUSTER {varlist}
> {ALL }
> [/MISSING=[{LISTWISE**}] [INCLUDE]]
> {PAIRWISE }
> {DEFAULT }
> [/FILE='savfile'|'dataset']
> [/INITIAL=(value list)]
> [/CRITERIA=[CLUSTER({2**})][NOINITIAL][MXITER({10**})]
>  [CONVERGE({0**})]]
> {n } {n } {n }
> [/METHOD=[{KMEANS[(NOUPDATE)]**}]
> {KMEANS(UPDATE)} }
> {CLASSIFY }
> [/PRINT=[INITIAL**] [CLUSTER] [ID(varname)] [DISTANCE] [ANOVA] [NONE]]
> [/OUTFILE='savfile'|'dataset']
> [/SAVE=[CLUSTER[(varname)]] [DISTANCE[(varname)]]]
>
> Anthony Babinec
> [hidden email]
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf
>  Of
> Pirritano, Matthew
> Sent: Monday, November 19, 2007 5:01 AM
> To: [hidden email]
> Subject: Re: k-means cluster analysis
>
> Jan,
>
> Thanks for the help. Of course this makes the situation just that more
> confusing. How do you go through multiple versions of the clusters and
> visually assess they similarity? Especially because in this case I have
>  13
> variables.
>
> How about this. Whenever I run Ward's method I get three very clear
> clusters. Is there some way to get SPSS to start the k-means with the
>  cases
> ordered as they are in the Ward's clusters? Would it be sufficient to
>  start
> k-means out by having the cases sorted on the saved cluster membership
>  from
> Ward's.
>
> Know that's a lot, but I'm obsessive (and compulsive) and am supposed
>  to
> have this done today.
>
> Thanks,
> Matt
>
> Matthew Pirritano, Ph.D.
> Assistant Professor of Psychology
> Smith Hall 116C
> Chapman University
> Department of Psychology
> One University Drive
> Orange, CA 92866
> Telephone (714)744-7940
> FAX (714)997-6780
>
>
>
> -----Original Message-----
> From: Spousta Jan [mailto:[hidden email]]
> Sent: Mon 11/19/2007 2:31 AM
> To: Pirritano, Matthew; [hidden email]
> Subject: RE:      k-means cluster analysis
>
> Matthew,
>
> It really depends on the initial sorting (or better, on the "seeds",
>  usually
> selected as the first few cases of the data file) - it is the property
>  of
> the algorithm, nothing specific for SPSS. (look at
> http://en.wikipedia.org/wiki/K-means if you wish to see how it works).
>
> See also this comment from Paul Dickson:
> http://listserv.uga.edu/cgi-bin/wa?A2=ind0612&L=spssx-l&P=R2534
>
> HTH
>
> Jan
>
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf
>  Of
> Pirritano, Matthew
> Sent: Monday, November 19, 2007 8:05 AM
> To: [hidden email]
> Subject: k-means cluster analysis
>
> Can someone please tell me why I get different results every time I do
>  a
> k-means cluster analysis? Is there some logic to this, it seems to be
> related to how my data is sorted.  Any help would be much appreciated.
>
> Thanks,
>
> Matthew Pirritano, Ph.D.
> Assistant Professor of Psychology
> Smith Hall 116C
> Chapman University
> Department of Psychology
> One University Drive
> Orange, CA 92866
> Telephone (714)744-7940
> FAX (714)997-6780
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except
>  the
> command. To leave the list, send the command SIGNOFF SPSSX-L For a list
>  of
> commands to manage subscriptions, send the command INFO REFCARD
>
>
>
> _____
>
> Tato zpráva a vsechny pripojené soubory jsou duverné a urcené
>  výlucne
> adresátovi(-um). Jestlize nejste oprávneným adresátem, je
>  zakázáno jakékoliv
> zverejnování, zprostredkování nebo jiné pouzití techto
>  informací. Jestlize
> jste tento mail dostali neoprávnene, prosím, uvedomte odesilatele a
>  smazte
> zprávu i prilozené soubory. Odesilatel nezodpovídá za jakékoliv
>  chyby nebo
> opomenutí zpusobené tímto prenosem.
>
> This message and any attached files are confidential and intended
>  solely for
> the addressee(s). Any publication, transmission or other use of the
> information by a person or entity other than the intended addressee is
> prohibited. If you receive this in error please contact the sender and
> delete the message as well as all attached documents. The sender does
>  not
> accept liability for any errors or omissions as a result of the
> transmission.
>
> -.- --
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except
>  the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except
>  the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
>
>
>
> ===================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Rescaling/Transforming a Single Ordinal Variable

John Fiedler
In reply to this post by Steve Peck
I have more than a million observations on a single ordinal variable. Given
the categories, it not even close to being a interval variable. I know that
CATPCA can transform several such variables. Can it be used for a single
variable. I have yet to figure out how.

Thanks,

John A Fiedler
Oreon Inc.
195 Wilderness Way
Boise ID 83816-3383

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Rescaling/Transforming a Single Ordinal Variable

Hector Maletta
         John,
         CATPCA assigns interval-scale values to ordinal and nominal
variables as a by-product of a factor analysis of several inter-correlated
variables. The values assigned are designed to optimize the process of
estimation of underlying factors. Thus I do not think you can do it for a
single variable.

         Hector

         -----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
John Fiedler
Sent: 19 November 2007 16:49
To: [hidden email]
Subject: Rescaling/Transforming a Single Ordinal Variable

         I have more than a million observations on a single ordinal
variable. Given
         the categories, it not even close to being a interval variable. I
know that
         CATPCA can transform several such variables. Can it be used for a
single
         variable. I have yet to figure out how.

         Thanks,

         John A Fiedler
         Oreon Inc.
         195 Wilderness Way
         Boise ID 83816-3383

         =====================
         To manage your subscription to SPSSX-L, send a message to
         [hidden email] (not to SPSSX-L), with no body text
except the
         command. To leave the list, send the command
         SIGNOFF SPSSX-L
         For a list of commands to manage subscriptions, send the command
         INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: k-means cluster analysis

Wim Beyers
In reply to this post by Matthew Pirritano
Dear Matt,

    Good refs for why using Ward's final centers as starting values for a
kmeans-clustering are:

Gore, P. A. Jr. (2000). Cluster analysis. In H. E. A. Tinsley & S. D. Brown
(Eds.), Handbook of applied multivariate statistics and mathematical
modeling (pp. 297-321). San Diego, CA: Academic Press.



Hair, J. R., Anderson, R. E., Tatham, R. L., & Black, W. C. (1998).
Multivariate data analysis. Upper Saddle River, NJ: Prentic Hall.



Tan, P. N., Steinbach, M., & Kumar, V. (2006). Introduction to data mining.
Boston, MA: Addison-Wesley. (Chapter 8 in that book: Cluster Analysis: Basic
Concepts and Algorithms, freely available at
http://www-users.cs.umn.edu/~kumar/dmbook/index.php).



    In sum, Wards clustering (a hierarchical clustering algorithm) has as
positive point that it is a robust method, but as a negative point that it
is not flexible (once a case is in a cluster, is stays in that cluster up to
the last fusion). Kmeans clustering has as negative point that it needs good
starting values (you have noted yourself, different ordering gives different
starting values and different solutions), bu as a positive point that it is
very flexible, because it is an iterative method, that allows cases to be
placed in other clusters at different iterations (after clusters centers
changed). So, the idea behind combining Wards (to get starting values from)
and kmeans clustering (using these centers to optimize the Wards solution)
is simply combining the best of both worlds.



Beste regards,

Wim Beyers
---
Dr. Wim Beyers
Dept. of Developmental, Personality and Social Psychology
Ghent University
Henri Dunantlaan 2
9000 Gent
Belgium
http://users.ugent.be/~wbeyers/



----- Original Message -----
From: "Matthew Pirritano" <[hidden email]>
To: <[hidden email]>
Sent: Monday, November 19, 2007 3:13 PM
Subject: Re: k-means cluster analysis


So now I'm using Ward's to create my starting cluster centers and k-means to
calculate my final cluster centers. Can anyone point me to any references
that explain why this make sense? If it does. Why not just report the Wards
centers?

Thanks,
Matt

Matthew Pirritano, Ph.D.
Assistant Professor of Psychology
Smith Hall 116C
Chapman University
Department of Psychology
One University Drive
Orange, CA 92866
Telephone (714)744-7940
FAX (714)997-6780

----- Original Message ----
From: Anthony Babinec <[hidden email]>
To: [hidden email]
Sent: Monday, November 19, 2007 4:15:37 AM
Subject: Re: k-means cluster analysis

Read the Quick Cluster documentation in Help. You can specify initial
cluster centers using the /INITIAL subcommand—see the syntax below.
 These
can be the final cluster center from the Ward method.

QUICK CLUSTER {varlist}
{ALL }
[/MISSING=[{LISTWISE**}] [INCLUDE]]
{PAIRWISE }
{DEFAULT }
[/FILE='savfile'|'dataset']
[/INITIAL=(value list)]
[/CRITERIA=[CLUSTER({2**})][NOINITIAL][MXITER({10**})]
 [CONVERGE({0**})]]
{n } {n } {n }
[/METHOD=[{KMEANS[(NOUPDATE)]**}]
{KMEANS(UPDATE)} }
{CLASSIFY }
[/PRINT=[INITIAL**] [CLUSTER] [ID(varname)] [DISTANCE] [ANOVA] [NONE]]
[/OUTFILE='savfile'|'dataset']
[/SAVE=[CLUSTER[(varname)]] [DISTANCE[(varname)]]]

Anthony Babinec
[hidden email]

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf
 Of
Pirritano, Matthew
Sent: Monday, November 19, 2007 5:01 AM
To: [hidden email]
Subject: Re: k-means cluster analysis

Jan,

Thanks for the help. Of course this makes the situation just that more
confusing. How do you go through multiple versions of the clusters and
visually assess they similarity? Especially because in this case I have
 13
variables.

How about this. Whenever I run Ward's method I get three very clear
clusters. Is there some way to get SPSS to start the k-means with the
 cases
ordered as they are in the Ward's clusters? Would it be sufficient to
 start
k-means out by having the cases sorted on the saved cluster membership
 from
Ward's.

Know that's a lot, but I'm obsessive (and compulsive) and am supposed
 to
have this done today.

Thanks,
Matt

Matthew Pirritano, Ph.D.
Assistant Professor of Psychology
Smith Hall 116C
Chapman University
Department of Psychology
One University Drive
Orange, CA 92866
Telephone (714)744-7940
FAX (714)997-6780



-----Original Message-----
From: Spousta Jan [mailto:[hidden email]]
Sent: Mon 11/19/2007 2:31 AM
To: Pirritano, Matthew; [hidden email]
Subject: RE:      k-means cluster analysis

Matthew,

It really depends on the initial sorting (or better, on the "seeds",
 usually
selected as the first few cases of the data file) - it is the property
 of
the algorithm, nothing specific for SPSS. (look at
http://en.wikipedia.org/wiki/K-means if you wish to see how it works).

See also this comment from Paul Dickson:
http://listserv.uga.edu/cgi-bin/wa?A2=ind0612&L=spssx-l&P=R2534

HTH

Jan



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf
 Of
Pirritano, Matthew
Sent: Monday, November 19, 2007 8:05 AM
To: [hidden email]
Subject: k-means cluster analysis

Can someone please tell me why I get different results every time I do
 a
k-means cluster analysis? Is there some logic to this, it seems to be
related to how my data is sorted.  Any help would be much appreciated.

Thanks,

Matthew Pirritano, Ph.D.
Assistant Professor of Psychology
Smith Hall 116C
Chapman University
Department of Psychology
One University Drive
Orange, CA 92866
Telephone (714)744-7940
FAX (714)997-6780

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except
 the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list
 of
commands to manage subscriptions, send the command INFO REFCARD



_____

Tato zpráva a vsechny pripojené soubory jsou duverné a urcené
 výlucne
adresátovi(-um). Jestlize nejste oprávneným adresátem, je
 zakázáno jakékoliv
zverejnování, zprostredkování nebo jiné pouzití techto
 informací. Jestlize
jste tento mail dostali neoprávnene, prosím, uvedomte odesilatele a
 smazte
zprávu i prilozené soubory. Odesilatel nezodpovídá za jakékoliv
 chyby nebo
opomenutí zpusobené tímto prenosem.

This message and any attached files are confidential and intended
 solely for
the addressee(s). Any publication, transmission or other use of the
information by a person or entity other than the intended addressee is
prohibited. If you receive this in error please contact the sender and
delete the message as well as all attached documents. The sender does
 not
accept liability for any errors or omissions as a result of the
transmission.

-.- --

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except
 the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except
 the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD





=======
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Query: outputing tables in SPSS with more decimal places

Jamie Burnett-3
In reply to this post by Art Kendall
Hi,

I am trying to increase the number of decimal places that are shown in
my tables in the SPSS output, does anyone know how I can do this so that
it will avoid me having to do it manually for all cells for all my
tables?

Many Thanks

Jamie


============================
This e-mail and all attachments it may contain is confidential and intended solely for the use of the individual to whom it is addressed. Any views or opinions presented are solely those of the author and do not necessarily represent those of Ipsos MORI and its associated companies. If you are not the intended recipient, be advised that you have received this e-mail in error and that any use, dissemination, printing, forwarding or copying of this e-mail is strictly prohibited. Please contact the sender if you have received this e-mail in error.

Market & Opinion Research International Ltd , Registered in England and Wales No. 948470 , 79-81 Borough Road , London SE1 1FY, United Kingdom, Email: [hidden email]
============================

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Query: outputing tables in SPSS with more decimal places

ViAnn Beadle
There are ways of increasing some decimal places without specifying cell
formats but you need to be specific about which tables and which statistics.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Jamie Burnett
Sent: Tuesday, November 20, 2007 7:38 AM
To: [hidden email]
Subject: Query: outputing tables in SPSS with more decimal places

Hi,

I am trying to increase the number of decimal places that are shown in
my tables in the SPSS output, does anyone know how I can do this so that
it will avoid me having to do it manually for all cells for all my
tables?

Many Thanks

Jamie


============================
This e-mail and all attachments it may contain is confidential and intended
solely for the use of the individual to whom it is addressed. Any views or
opinions presented are solely those of the author and do not necessarily
represent those of Ipsos MORI and its associated companies. If you are not
the intended recipient, be advised that you have received this e-mail in
error and that any use, dissemination, printing, forwarding or copying of
this e-mail is strictly prohibited. Please contact the sender if you have
received this e-mail in error.

Market & Opinion Research International Ltd , Registered in England and
Wales No. 948470 , 79-81 Borough Road , London SE1 1FY, United Kingdom,
Email: [hidden email]
============================

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD