|
Can someone please tell me why I get different results every time I do a k-means cluster analysis? Is there some logic to this, it seems to be related to how my data is sorted. Any help would be much appreciated.
Thanks, Matthew Pirritano, Ph.D. Assistant Professor of Psychology Smith Hall 116C Chapman University Department of Psychology One University Drive Orange, CA 92866 Telephone (714)744-7940 FAX (714)997-6780 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Matthew,
It really depends on the initial sorting (or better, on the "seeds", usually selected as the first few cases of the data file) - it is the property of the algorithm, nothing specific for SPSS. (look at http://en.wikipedia.org/wiki/K-means if you wish to see how it works). See also this comment from Paul Dickson: http://listserv.uga.edu/cgi-bin/wa?A2=ind0612&L=spssx-l&P=R2534 HTH Jan -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Pirritano, Matthew Sent: Monday, November 19, 2007 8:05 AM To: [hidden email] Subject: k-means cluster analysis Can someone please tell me why I get different results every time I do a k-means cluster analysis? Is there some logic to this, it seems to be related to how my data is sorted. Any help would be much appreciated. Thanks, Matthew Pirritano, Ph.D. Assistant Professor of Psychology Smith Hall 116C Chapman University Department of Psychology One University Drive Orange, CA 92866 Telephone (714)744-7940 FAX (714)997-6780 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD _____ Tato zpráva a všechny připojené soubory jsou důvěrné a určené výlučně adresátovi(-ům). Jestliže nejste oprávněným adresátem, je zakázáno jakékoliv zveřejňování, zprostředkování nebo jiné použití těchto informací. Jestliže jste tento mail dostali neoprávněně, prosím, uvědomte odesilatele a smažte zprávu i přiložené soubory. Odesilatel nezodpovídá za jakékoliv chyby nebo opomenutí způsobené tímto přenosem. This message and any attached files are confidential and intended solely for the addressee(s). Any publication, transmission or other use of the information by a person or entity other than the intended addressee is prohibited. If you receive this in error please contact the sender and delete the message as well as all attached documents. The sender does not accept liability for any errors or omissions as a result of the transmission. -.- -- ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Jan,
Thanks for the help. Of course this makes the situation just that more confusing. How do you go through multiple versions of the clusters and visually assess they similarity? Especially because in this case I have 13 variables. How about this. Whenever I run Ward's method I get three very clear clusters. Is there some way to get SPSS to start the k-means with the cases ordered as they are in the Ward's clusters? Would it be sufficient to start k-means out by having the cases sorted on the saved cluster membership from Ward's. Know that's a lot, but I'm obsessive (and compulsive) and am supposed to have this done today. Thanks, Matt Matthew Pirritano, Ph.D. Assistant Professor of Psychology Smith Hall 116C Chapman University Department of Psychology One University Drive Orange, CA 92866 Telephone (714)744-7940 FAX (714)997-6780 -----Original Message----- From: Spousta Jan [mailto:[hidden email]] Sent: Mon 11/19/2007 2:31 AM To: Pirritano, Matthew; [hidden email] Subject: RE: k-means cluster analysis Matthew, It really depends on the initial sorting (or better, on the "seeds", usually selected as the first few cases of the data file) - it is the property of the algorithm, nothing specific for SPSS. (look at http://en.wikipedia.org/wiki/K-means if you wish to see how it works). See also this comment from Paul Dickson: http://listserv.uga.edu/cgi-bin/wa?A2=ind0612&L=spssx-l&P=R2534 HTH Jan -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Pirritano, Matthew Sent: Monday, November 19, 2007 8:05 AM To: [hidden email] Subject: k-means cluster analysis Can someone please tell me why I get different results every time I do a k-means cluster analysis? Is there some logic to this, it seems to be related to how my data is sorted. Any help would be much appreciated. Thanks, Matthew Pirritano, Ph.D. Assistant Professor of Psychology Smith Hall 116C Chapman University Department of Psychology One University Drive Orange, CA 92866 Telephone (714)744-7940 FAX (714)997-6780 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD _____ Tato zpráva a vsechny pripojené soubory jsou duverné a urcené výlucne adresátovi(-um). Jestlize nejste oprávneným adresátem, je zakázáno jakékoliv zverejnování, zprostredkování nebo jiné pouzití techto informací. Jestlize jste tento mail dostali neoprávnene, prosím, uvedomte odesilatele a smazte zprávu i prilozené soubory. Odesilatel nezodpovídá za jakékoliv chyby nebo opomenutí zpusobené tímto prenosem. This message and any attached files are confidential and intended solely for the addressee(s). Any publication, transmission or other use of the information by a person or entity other than the intended addressee is prohibited. If you receive this in error please contact the sender and delete the message as well as all attached documents. The sender does not accept liability for any errors or omissions as a result of the transmission. -.- -- ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Read the Quick Cluster documentation in Help. You can specify initial
cluster centers using the /INITIAL subcommandsee the syntax below. These can be the final cluster center from the Ward method. QUICK CLUSTER {varlist} {ALL } [/MISSING=[{LISTWISE**}] [INCLUDE]] {PAIRWISE } {DEFAULT } [/FILE='savfile'|'dataset'] [/INITIAL=(value list)] [/CRITERIA=[CLUSTER({2**})][NOINITIAL][MXITER({10**})] [CONVERGE({0**})]] {n } {n } {n } [/METHOD=[{KMEANS[(NOUPDATE)]**}] {KMEANS(UPDATE)} } {CLASSIFY } [/PRINT=[INITIAL**] [CLUSTER] [ID(varname)] [DISTANCE] [ANOVA] [NONE]] [/OUTFILE='savfile'|'dataset'] [/SAVE=[CLUSTER[(varname)]] [DISTANCE[(varname)]]] Anthony Babinec [hidden email] -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Pirritano, Matthew Sent: Monday, November 19, 2007 5:01 AM To: [hidden email] Subject: Re: k-means cluster analysis Jan, Thanks for the help. Of course this makes the situation just that more confusing. How do you go through multiple versions of the clusters and visually assess they similarity? Especially because in this case I have 13 variables. How about this. Whenever I run Ward's method I get three very clear clusters. Is there some way to get SPSS to start the k-means with the cases ordered as they are in the Ward's clusters? Would it be sufficient to start k-means out by having the cases sorted on the saved cluster membership from Ward's. Know that's a lot, but I'm obsessive (and compulsive) and am supposed to have this done today. Thanks, Matt Matthew Pirritano, Ph.D. Assistant Professor of Psychology Smith Hall 116C Chapman University Department of Psychology One University Drive Orange, CA 92866 Telephone (714)744-7940 FAX (714)997-6780 -----Original Message----- From: Spousta Jan [mailto:[hidden email]] Sent: Mon 11/19/2007 2:31 AM To: Pirritano, Matthew; [hidden email] Subject: RE: k-means cluster analysis Matthew, It really depends on the initial sorting (or better, on the "seeds", usually selected as the first few cases of the data file) - it is the property of the algorithm, nothing specific for SPSS. (look at http://en.wikipedia.org/wiki/K-means if you wish to see how it works). See also this comment from Paul Dickson: http://listserv.uga.edu/cgi-bin/wa?A2=ind0612&L=spssx-l&P=R2534 HTH Jan -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Pirritano, Matthew Sent: Monday, November 19, 2007 8:05 AM To: [hidden email] Subject: k-means cluster analysis Can someone please tell me why I get different results every time I do a k-means cluster analysis? Is there some logic to this, it seems to be related to how my data is sorted. Any help would be much appreciated. Thanks, Matthew Pirritano, Ph.D. Assistant Professor of Psychology Smith Hall 116C Chapman University Department of Psychology One University Drive Orange, CA 92866 Telephone (714)744-7940 FAX (714)997-6780 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD _____ Tato zpráva a vsechny pripojené soubory jsou duverné a urcené výlucne adresátovi(-um). Jestlize nejste oprávneným adresátem, je zakázáno jakékoliv zverejnování, zprostredkování nebo jiné pouzití techto informací. Jestlize jste tento mail dostali neoprávnene, prosím, uvedomte odesilatele a smazte zprávu i prilozené soubory. Odesilatel nezodpovídá za jakékoliv chyby nebo opomenutí zpusobené tímto prenosem. This message and any attached files are confidential and intended solely for the addressee(s). Any publication, transmission or other use of the information by a person or entity other than the intended addressee is prohibited. If you receive this in error please contact the sender and delete the message as well as all attached documents. The sender does not accept liability for any errors or omissions as a result of the transmission. -.- -- ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Pirritano, Matthew
So now I'm using Ward's to create my starting cluster centers and k-means to calculate my final cluster centers. Can anyone point me to any references that explain why this make sense? If it does. Why not just report the Wards centers?
Thanks, Matt Matthew Pirritano, Ph.D. Assistant Professor of Psychology Smith Hall 116C Chapman University Department of Psychology One University Drive Orange, CA 92866 Telephone (714)744-7940 FAX (714)997-6780 ----- Original Message ---- From: Anthony Babinec <[hidden email]> To: [hidden email] Sent: Monday, November 19, 2007 4:15:37 AM Subject: Re: k-means cluster analysis Read the Quick Cluster documentation in Help. You can specify initial cluster centers using the /INITIAL subcommand—see the syntax below. These can be the final cluster center from the Ward method. QUICK CLUSTER {varlist} {ALL } [/MISSING=[{LISTWISE**}] [INCLUDE]] {PAIRWISE } {DEFAULT } [/FILE='savfile'|'dataset'] [/INITIAL=(value list)] [/CRITERIA=[CLUSTER({2**})][NOINITIAL][MXITER({10**})] [CONVERGE({0**})]] {n } {n } {n } [/METHOD=[{KMEANS[(NOUPDATE)]**}] {KMEANS(UPDATE)} } {CLASSIFY } [/PRINT=[INITIAL**] [CLUSTER] [ID(varname)] [DISTANCE] [ANOVA] [NONE]] [/OUTFILE='savfile'|'dataset'] [/SAVE=[CLUSTER[(varname)]] [DISTANCE[(varname)]]] Anthony Babinec [hidden email] -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Pirritano, Matthew Sent: Monday, November 19, 2007 5:01 AM To: [hidden email] Subject: Re: k-means cluster analysis Jan, Thanks for the help. Of course this makes the situation just that more confusing. How do you go through multiple versions of the clusters and visually assess they similarity? Especially because in this case I have 13 variables. How about this. Whenever I run Ward's method I get three very clear clusters. Is there some way to get SPSS to start the k-means with the cases ordered as they are in the Ward's clusters? Would it be sufficient to start k-means out by having the cases sorted on the saved cluster membership from Ward's. Know that's a lot, but I'm obsessive (and compulsive) and am supposed to have this done today. Thanks, Matt Matthew Pirritano, Ph.D. Assistant Professor of Psychology Smith Hall 116C Chapman University Department of Psychology One University Drive Orange, CA 92866 Telephone (714)744-7940 FAX (714)997-6780 -----Original Message----- From: Spousta Jan [mailto:[hidden email]] Sent: Mon 11/19/2007 2:31 AM To: Pirritano, Matthew; [hidden email] Subject: RE: k-means cluster analysis Matthew, It really depends on the initial sorting (or better, on the "seeds", usually selected as the first few cases of the data file) - it is the property of the algorithm, nothing specific for SPSS. (look at http://en.wikipedia.org/wiki/K-means if you wish to see how it works). See also this comment from Paul Dickson: http://listserv.uga.edu/cgi-bin/wa?A2=ind0612&L=spssx-l&P=R2534 HTH Jan -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Pirritano, Matthew Sent: Monday, November 19, 2007 8:05 AM To: [hidden email] Subject: k-means cluster analysis Can someone please tell me why I get different results every time I do a k-means cluster analysis? Is there some logic to this, it seems to be related to how my data is sorted. Any help would be much appreciated. Thanks, Matthew Pirritano, Ph.D. Assistant Professor of Psychology Smith Hall 116C Chapman University Department of Psychology One University Drive Orange, CA 92866 Telephone (714)744-7940 FAX (714)997-6780 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD _____ Tato zpráva a vsechny pripojené soubory jsou duverné a urcené výlucne adresátovi(-um). Jestlize nejste oprávneným adresátem, je zakázáno jakékoliv zverejnování, zprostredkování nebo jiné pouzití techto informací. Jestlize jste tento mail dostali neoprávnene, prosím, uvedomte odesilatele a smazte zprávu i prilozené soubory. Odesilatel nezodpovídá za jakékoliv chyby nebo opomenutí zpusobené tímto prenosem. This message and any attached files are confidential and intended solely for the addressee(s). Any publication, transmission or other use of the information by a person or entity other than the intended addressee is prohibited. If you receive this in error please contact the sender and delete the message as well as all attached documents. The sender does not accept liability for any errors or omissions as a result of the transmission. -.- -- ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ====================To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Pirritano, Matthew
Quoting "Pirritano, Matthew" <[hidden email]>:
> Can someone please tell me why I get different results every time I > do a k-means cluster analysis? Is there some logic to this, it seems > to be related to how my data is sorted. Any help would be much > appreciated. > Others have explained why this is the case, but it is a useful reminder that many clustering algorithms do not produce unique results from the same data set, and that even the same algorithm starting from different random numbers will get different answers. Small changes to the data, e.g. adding or removing a few cases may make a lot of difference to the final results. Clustering methods often provide useful ideas about how the data values MIGHT be grouped, but it is unwise to assume that they will always give unique clear objective answers to a question about how the data points are related. David Hitchin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Matthew Pirritano
Hi Matt
the following is a great book for anyone trying to come to terms of with thinking in terms of 'patterns of values across variables for each unit eg person' (as opposed to 'patterns of values across units eg people for each variable'): Bergman, L. R., Magnusson, D., & El-Khouri, B. M. (2003). /Studying individual development in an interindividual context: A person-oriented approach./ Mahwah, NJ: Lawrence Erlbaum Associates. [Paths through life – volume 4, D. Magusson (Ed.)] In short, Ward's method is great for determining the range and nature of viable profile solutions within a sample (e.g., the 4-cluster solution v. the 9-cluster solution) but, because its hierarchical, individuals often get sorted into a profile that later (so to speak) turns out not to match their individual profile as well as some other profile. Using Ward's, then, we can select the best set of profiles; using k-means on the start values derived from our selected Ward's solution, we can then be sure that each individual is placed into the profile group that best matches their individual profile. Steve Matthew Pirritano wrote: > So now I'm using Ward's to create my starting cluster centers and k-means to calculate my final cluster centers. Can anyone point me to any references that explain why this make sense? If it does. Why not just report the Wards centers? > > Thanks, > Matt > > Matthew Pirritano, Ph.D. > Assistant Professor of Psychology > Smith Hall 116C > Chapman University > Department of Psychology > One University Drive > Orange, CA 92866 > Telephone (714)744-7940 > FAX (714)997-6780 > > ----- Original Message ---- > From: Anthony Babinec <[hidden email]> > To: [hidden email] > Sent: Monday, November 19, 2007 4:15:37 AM > Subject: Re: k-means cluster analysis > > Read the Quick Cluster documentation in Help. You can specify initial > cluster centers using the /INITIAL subcommand—see the syntax below. > These > can be the final cluster center from the Ward method. > > QUICK CLUSTER {varlist} > {ALL } > [/MISSING=[{LISTWISE**}] [INCLUDE]] > {PAIRWISE } > {DEFAULT } > [/FILE='savfile'|'dataset'] > [/INITIAL=(value list)] > [/CRITERIA=[CLUSTER({2**})][NOINITIAL][MXITER({10**})] > [CONVERGE({0**})]] > {n } {n } {n } > [/METHOD=[{KMEANS[(NOUPDATE)]**}] > {KMEANS(UPDATE)} } > {CLASSIFY } > [/PRINT=[INITIAL**] [CLUSTER] [ID(varname)] [DISTANCE] [ANOVA] [NONE]] > [/OUTFILE='savfile'|'dataset'] > [/SAVE=[CLUSTER[(varname)]] [DISTANCE[(varname)]]] > > Anthony Babinec > [hidden email] > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf > Of > Pirritano, Matthew > Sent: Monday, November 19, 2007 5:01 AM > To: [hidden email] > Subject: Re: k-means cluster analysis > > Jan, > > Thanks for the help. Of course this makes the situation just that more > confusing. How do you go through multiple versions of the clusters and > visually assess they similarity? Especially because in this case I have > 13 > variables. > > How about this. Whenever I run Ward's method I get three very clear > clusters. Is there some way to get SPSS to start the k-means with the > cases > ordered as they are in the Ward's clusters? Would it be sufficient to > start > k-means out by having the cases sorted on the saved cluster membership > from > Ward's. > > Know that's a lot, but I'm obsessive (and compulsive) and am supposed > to > have this done today. > > Thanks, > Matt > > Matthew Pirritano, Ph.D. > Assistant Professor of Psychology > Smith Hall 116C > Chapman University > Department of Psychology > One University Drive > Orange, CA 92866 > Telephone (714)744-7940 > FAX (714)997-6780 > > > > -----Original Message----- > From: Spousta Jan [mailto:[hidden email]] > Sent: Mon 11/19/2007 2:31 AM > To: Pirritano, Matthew; [hidden email] > Subject: RE: k-means cluster analysis > > Matthew, > > It really depends on the initial sorting (or better, on the "seeds", > usually > selected as the first few cases of the data file) - it is the property > of > the algorithm, nothing specific for SPSS. (look at > http://en.wikipedia.org/wiki/K-means if you wish to see how it works). > > See also this comment from Paul Dickson: > http://listserv.uga.edu/cgi-bin/wa?A2=ind0612&L=spssx-l&P=R2534 > > HTH > > Jan > > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf > Of > Pirritano, Matthew > Sent: Monday, November 19, 2007 8:05 AM > To: [hidden email] > Subject: k-means cluster analysis > > Can someone please tell me why I get different results every time I do > a > k-means cluster analysis? Is there some logic to this, it seems to be > related to how my data is sorted. Any help would be much appreciated. > > Thanks, > > Matthew Pirritano, Ph.D. > Assistant Professor of Psychology > Smith Hall 116C > Chapman University > Department of Psychology > One University Drive > Orange, CA 92866 > Telephone (714)744-7940 > FAX (714)997-6780 > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except > the > command. To leave the list, send the command SIGNOFF SPSSX-L For a list > of > commands to manage subscriptions, send the command INFO REFCARD > > > > _____ > > Tato zpráva a vsechny pripojené soubory jsou duverné a urcené > výlucne > adresátovi(-um). Jestlize nejste oprávneným adresátem, je > zakázáno jakékoliv > zverejnování, zprostredkování nebo jiné pouzití techto > informací. Jestlize > jste tento mail dostali neoprávnene, prosím, uvedomte odesilatele a > smazte > zprávu i prilozené soubory. Odesilatel nezodpovídá za jakékoliv > chyby nebo > opomenutí zpusobené tímto prenosem. > > This message and any attached files are confidential and intended > solely for > the addressee(s). Any publication, transmission or other use of the > information by a person or entity other than the intended addressee is > prohibited. If you receive this in error please contact the sender and > delete the message as well as all attached documents. The sender does > not > accept liability for any errors or omissions as a result of the > transmission. > > -.- -- > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except > the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except > the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > > > > > =================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > > > -- Stephen C. Peck Research Investigator Achievement Research Lab Research Center for Group Dynamics Institute for Social Research University of Michigan 426 Thompson Street, # 5136 Ann Arbor, MI 48106-1248 (734) 647-3683; fax (734) 936-7370 http://www.rcgd.isr.umich.edu/garp/ [hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Matthew Pirritano
I don't have any references at hand.
> Why not just report the Wards centers? Because any cluster solution is just a good guess. One approach is to consider the consensus of several methods to be more reasonable. Since the early 70, I have used a few proximity measures and a few agglomeration methods and saved the membership variables from each solution. Using crosstabs, I then found sets of cases that were clustered together by several methods. I called these "core clusters". Of course, that meant some cases were left ungrouped. I then iteratively used discriminant function analysis with its classification phase's probabilities of membership. At the end of each iteration, I assigned cases that were far from the centroid and those that were not clearly assigned to one cluster as ungrouped in the next dfa. These days I would try using the assignments to clusters as nominal level variables as input to TWOSTEP to find the core clusters. In the 50's Lorr's method used the assumption that there were some "pure types" but that some cases might not fit into the typology and some cases would be a mix of "types". Of course, a lot depends on the nature of your data. Art Kendall Social Matthew Pirritano wrote: > So now I'm using Ward's to create my starting cluster centers and k-means to calculate my final cluster centers. Can anyone point me to any references that explain why this make sense? If it does. Why not just report the Wards centers? > > Thanks, > Matt > > Matthew Pirritano, Ph.D. > Assistant Professor of Psychology > Smith Hall 116C > Chapman University > Department of Psychology > One University Drive > Orange, CA 92866 > Telephone (714)744-7940 > FAX (714)997-6780 > > ----- Original Message ---- > From: Anthony Babinec <[hidden email]> > To: [hidden email] > Sent: Monday, November 19, 2007 4:15:37 AM > Subject: Re: k-means cluster analysis > > Read the Quick Cluster documentation in Help. You can specify initial > cluster centers using the /INITIAL subcommand—see the syntax below. > These > can be the final cluster center from the Ward method. > > QUICK CLUSTER {varlist} > {ALL } > [/MISSING=[{LISTWISE**}] [INCLUDE]] > {PAIRWISE } > {DEFAULT } > [/FILE='savfile'|'dataset'] > [/INITIAL=(value list)] > [/CRITERIA=[CLUSTER({2**})][NOINITIAL][MXITER({10**})] > [CONVERGE({0**})]] > {n } {n } {n } > [/METHOD=[{KMEANS[(NOUPDATE)]**}] > {KMEANS(UPDATE)} } > {CLASSIFY } > [/PRINT=[INITIAL**] [CLUSTER] [ID(varname)] [DISTANCE] [ANOVA] [NONE]] > [/OUTFILE='savfile'|'dataset'] > [/SAVE=[CLUSTER[(varname)]] [DISTANCE[(varname)]]] > > Anthony Babinec > [hidden email] > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf > Of > Pirritano, Matthew > Sent: Monday, November 19, 2007 5:01 AM > To: [hidden email] > Subject: Re: k-means cluster analysis > > Jan, > > Thanks for the help. Of course this makes the situation just that more > confusing. How do you go through multiple versions of the clusters and > visually assess they similarity? Especially because in this case I have > 13 > variables. > > How about this. Whenever I run Ward's method I get three very clear > clusters. Is there some way to get SPSS to start the k-means with the > cases > ordered as they are in the Ward's clusters? Would it be sufficient to > start > k-means out by having the cases sorted on the saved cluster membership > from > Ward's. > > Know that's a lot, but I'm obsessive (and compulsive) and am supposed > to > have this done today. > > Thanks, > Matt > > Matthew Pirritano, Ph.D. > Assistant Professor of Psychology > Smith Hall 116C > Chapman University > Department of Psychology > One University Drive > Orange, CA 92866 > Telephone (714)744-7940 > FAX (714)997-6780 > > > > -----Original Message----- > From: Spousta Jan [mailto:[hidden email]] > Sent: Mon 11/19/2007 2:31 AM > To: Pirritano, Matthew; [hidden email] > Subject: RE: k-means cluster analysis > > Matthew, > > It really depends on the initial sorting (or better, on the "seeds", > usually > selected as the first few cases of the data file) - it is the property > of > the algorithm, nothing specific for SPSS. (look at > http://en.wikipedia.org/wiki/K-means if you wish to see how it works). > > See also this comment from Paul Dickson: > http://listserv.uga.edu/cgi-bin/wa?A2=ind0612&L=spssx-l&P=R2534 > > HTH > > Jan > > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf > Of > Pirritano, Matthew > Sent: Monday, November 19, 2007 8:05 AM > To: [hidden email] > Subject: k-means cluster analysis > > Can someone please tell me why I get different results every time I do > a > k-means cluster analysis? Is there some logic to this, it seems to be > related to how my data is sorted. Any help would be much appreciated. > > Thanks, > > Matthew Pirritano, Ph.D. > Assistant Professor of Psychology > Smith Hall 116C > Chapman University > Department of Psychology > One University Drive > Orange, CA 92866 > Telephone (714)744-7940 > FAX (714)997-6780 > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except > the > command. To leave the list, send the command SIGNOFF SPSSX-L For a list > of > commands to manage subscriptions, send the command INFO REFCARD > > > > _____ > > Tato zpráva a vsechny pripojené soubory jsou duverné a urcené > výlucne > adresátovi(-um). Jestlize nejste oprávneným adresátem, je > zakázáno jakékoliv > zverejnování, zprostredkování nebo jiné pouzití techto > informací. Jestlize > jste tento mail dostali neoprávnene, prosím, uvedomte odesilatele a > smazte > zprávu i prilozené soubory. Odesilatel nezodpovídá za jakékoliv > chyby nebo > opomenutí zpusobené tímto prenosem. > > This message and any attached files are confidential and intended > solely for > the addressee(s). Any publication, transmission or other use of the > information by a person or entity other than the intended addressee is > prohibited. If you receive this in error please contact the sender and > delete the message as well as all attached documents. The sender does > not > accept liability for any errors or omissions as a result of the > transmission. > > -.- -- > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except > the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except > the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > > > > > =================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
In reply to this post by Steve Peck
I have more than a million observations on a single ordinal variable. Given
the categories, it not even close to being a interval variable. I know that CATPCA can transform several such variables. Can it be used for a single variable. I have yet to figure out how. Thanks, John A Fiedler Oreon Inc. 195 Wilderness Way Boise ID 83816-3383 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
John,
CATPCA assigns interval-scale values to ordinal and nominal variables as a by-product of a factor analysis of several inter-correlated variables. The values assigned are designed to optimize the process of estimation of underlying factors. Thus I do not think you can do it for a single variable. Hector -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of John Fiedler Sent: 19 November 2007 16:49 To: [hidden email] Subject: Rescaling/Transforming a Single Ordinal Variable I have more than a million observations on a single ordinal variable. Given the categories, it not even close to being a interval variable. I know that CATPCA can transform several such variables. Can it be used for a single variable. I have yet to figure out how. Thanks, John A Fiedler Oreon Inc. 195 Wilderness Way Boise ID 83816-3383 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Matthew Pirritano
Dear Matt,
Good refs for why using Ward's final centers as starting values for a kmeans-clustering are: Gore, P. A. Jr. (2000). Cluster analysis. In H. E. A. Tinsley & S. D. Brown (Eds.), Handbook of applied multivariate statistics and mathematical modeling (pp. 297-321). San Diego, CA: Academic Press. Hair, J. R., Anderson, R. E., Tatham, R. L., & Black, W. C. (1998). Multivariate data analysis. Upper Saddle River, NJ: Prentic Hall. Tan, P. N., Steinbach, M., & Kumar, V. (2006). Introduction to data mining. Boston, MA: Addison-Wesley. (Chapter 8 in that book: Cluster Analysis: Basic Concepts and Algorithms, freely available at http://www-users.cs.umn.edu/~kumar/dmbook/index.php). In sum, Wards clustering (a hierarchical clustering algorithm) has as positive point that it is a robust method, but as a negative point that it is not flexible (once a case is in a cluster, is stays in that cluster up to the last fusion). Kmeans clustering has as negative point that it needs good starting values (you have noted yourself, different ordering gives different starting values and different solutions), bu as a positive point that it is very flexible, because it is an iterative method, that allows cases to be placed in other clusters at different iterations (after clusters centers changed). So, the idea behind combining Wards (to get starting values from) and kmeans clustering (using these centers to optimize the Wards solution) is simply combining the best of both worlds. Beste regards, Wim Beyers --- Dr. Wim Beyers Dept. of Developmental, Personality and Social Psychology Ghent University Henri Dunantlaan 2 9000 Gent Belgium http://users.ugent.be/~wbeyers/ ----- Original Message ----- From: "Matthew Pirritano" <[hidden email]> To: <[hidden email]> Sent: Monday, November 19, 2007 3:13 PM Subject: Re: k-means cluster analysis So now I'm using Ward's to create my starting cluster centers and k-means to calculate my final cluster centers. Can anyone point me to any references that explain why this make sense? If it does. Why not just report the Wards centers? Thanks, Matt Matthew Pirritano, Ph.D. Assistant Professor of Psychology Smith Hall 116C Chapman University Department of Psychology One University Drive Orange, CA 92866 Telephone (714)744-7940 FAX (714)997-6780 ----- Original Message ---- From: Anthony Babinec <[hidden email]> To: [hidden email] Sent: Monday, November 19, 2007 4:15:37 AM Subject: Re: k-means cluster analysis Read the Quick Cluster documentation in Help. You can specify initial cluster centers using the /INITIAL subcommand—see the syntax below. These can be the final cluster center from the Ward method. QUICK CLUSTER {varlist} {ALL } [/MISSING=[{LISTWISE**}] [INCLUDE]] {PAIRWISE } {DEFAULT } [/FILE='savfile'|'dataset'] [/INITIAL=(value list)] [/CRITERIA=[CLUSTER({2**})][NOINITIAL][MXITER({10**})] [CONVERGE({0**})]] {n } {n } {n } [/METHOD=[{KMEANS[(NOUPDATE)]**}] {KMEANS(UPDATE)} } {CLASSIFY } [/PRINT=[INITIAL**] [CLUSTER] [ID(varname)] [DISTANCE] [ANOVA] [NONE]] [/OUTFILE='savfile'|'dataset'] [/SAVE=[CLUSTER[(varname)]] [DISTANCE[(varname)]]] Anthony Babinec [hidden email] -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Pirritano, Matthew Sent: Monday, November 19, 2007 5:01 AM To: [hidden email] Subject: Re: k-means cluster analysis Jan, Thanks for the help. Of course this makes the situation just that more confusing. How do you go through multiple versions of the clusters and visually assess they similarity? Especially because in this case I have 13 variables. How about this. Whenever I run Ward's method I get three very clear clusters. Is there some way to get SPSS to start the k-means with the cases ordered as they are in the Ward's clusters? Would it be sufficient to start k-means out by having the cases sorted on the saved cluster membership from Ward's. Know that's a lot, but I'm obsessive (and compulsive) and am supposed to have this done today. Thanks, Matt Matthew Pirritano, Ph.D. Assistant Professor of Psychology Smith Hall 116C Chapman University Department of Psychology One University Drive Orange, CA 92866 Telephone (714)744-7940 FAX (714)997-6780 -----Original Message----- From: Spousta Jan [mailto:[hidden email]] Sent: Mon 11/19/2007 2:31 AM To: Pirritano, Matthew; [hidden email] Subject: RE: k-means cluster analysis Matthew, It really depends on the initial sorting (or better, on the "seeds", usually selected as the first few cases of the data file) - it is the property of the algorithm, nothing specific for SPSS. (look at http://en.wikipedia.org/wiki/K-means if you wish to see how it works). See also this comment from Paul Dickson: http://listserv.uga.edu/cgi-bin/wa?A2=ind0612&L=spssx-l&P=R2534 HTH Jan -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Pirritano, Matthew Sent: Monday, November 19, 2007 8:05 AM To: [hidden email] Subject: k-means cluster analysis Can someone please tell me why I get different results every time I do a k-means cluster analysis? Is there some logic to this, it seems to be related to how my data is sorted. Any help would be much appreciated. Thanks, Matthew Pirritano, Ph.D. Assistant Professor of Psychology Smith Hall 116C Chapman University Department of Psychology One University Drive Orange, CA 92866 Telephone (714)744-7940 FAX (714)997-6780 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD _____ Tato zpráva a vsechny pripojené soubory jsou duverné a urcené výlucne adresátovi(-um). Jestlize nejste oprávneným adresátem, je zakázáno jakékoliv zverejnování, zprostredkování nebo jiné pouzití techto informací. Jestlize jste tento mail dostali neoprávnene, prosím, uvedomte odesilatele a smazte zprávu i prilozené soubory. Odesilatel nezodpovídá za jakékoliv chyby nebo opomenutí zpusobené tímto prenosem. This message and any attached files are confidential and intended solely for the addressee(s). Any publication, transmission or other use of the information by a person or entity other than the intended addressee is prohibited. If you receive this in error please contact the sender and delete the message as well as all attached documents. The sender does not accept liability for any errors or omissions as a result of the transmission. -.- -- ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ======= To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Art Kendall
Hi,
I am trying to increase the number of decimal places that are shown in my tables in the SPSS output, does anyone know how I can do this so that it will avoid me having to do it manually for all cells for all my tables? Many Thanks Jamie ============================ This e-mail and all attachments it may contain is confidential and intended solely for the use of the individual to whom it is addressed. Any views or opinions presented are solely those of the author and do not necessarily represent those of Ipsos MORI and its associated companies. If you are not the intended recipient, be advised that you have received this e-mail in error and that any use, dissemination, printing, forwarding or copying of this e-mail is strictly prohibited. Please contact the sender if you have received this e-mail in error. Market & Opinion Research International Ltd , Registered in England and Wales No. 948470 , 79-81 Borough Road , London SE1 1FY, United Kingdom, Email: [hidden email] ============================ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
There are ways of increasing some decimal places without specifying cell
formats but you need to be specific about which tables and which statistics. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jamie Burnett Sent: Tuesday, November 20, 2007 7:38 AM To: [hidden email] Subject: Query: outputing tables in SPSS with more decimal places Hi, I am trying to increase the number of decimal places that are shown in my tables in the SPSS output, does anyone know how I can do this so that it will avoid me having to do it manually for all cells for all my tables? Many Thanks Jamie ============================ This e-mail and all attachments it may contain is confidential and intended solely for the use of the individual to whom it is addressed. Any views or opinions presented are solely those of the author and do not necessarily represent those of Ipsos MORI and its associated companies. If you are not the intended recipient, be advised that you have received this e-mail in error and that any use, dissemination, printing, forwarding or copying of this e-mail is strictly prohibited. Please contact the sender if you have received this e-mail in error. Market & Opinion Research International Ltd , Registered in England and Wales No. 948470 , 79-81 Borough Road , London SE1 1FY, United Kingdom, Email: [hidden email] ============================ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
