In K Means it's possible to save this information as a variable.
Is this possible in any of the hierarchical methods offered in SPSS ? They offer a proximity matrix - which I see as different - as this shows distances between individual respondents NOT the classification mean. Am I missing something ? Regards |
Mark
I do not think this is possible using K-means due to the algorithm used, but I may be wrong. One way round it might be to work out which variables are contributing to the cluster solution. Then formulate an algorithm based on Chaid, Cart or Discrim, Logistic and assign each case a score using the algorithm (coding rules). You could then compute scores from the algorithm and for all cases that are assigned to that cluster with a high (say 80%) level of probability generate means and standard deviations and treat the mean scores as cluster centres and the standard deviations as your index of dispersion (i.e. distance from cluster centre). Cheers Paul > Mark Webb <[hidden email]> wrote: > > In K Means it's possible to save this information as a variable. > Is this possible in any of the hierarchical methods offered in SPSS ? > They offer a proximity matrix - which I see as different - as this shows > distances between individual respondents NOT the classification mean. > Am I missing something ? > > Regards |
In reply to this post by Mark Webb-3
Mark
Apologies. My post should have said I do not think this is possible using hierarchical clustering due to the algorithm used, but I may be wrong. regards Paul > Mark Webb <[hidden email]> wrote: > > In K Means it's possible to save this information as a variable. > Is this possible in any of the hierarchical methods offered in SPSS ? > They offer a proximity matrix - which I see as different - as this shows > distances between individual respondents NOT the classification mean. > Am I missing something ? > > Regards |
In reply to this post by Mark Webb-3
Hi Mark,
While K-Means operates in a metric Euclidean space or something similar, and therefore can easily define the centroids (and uses them during the computing), the Hierarchical algorithm can be used in a more general topological spaces where there are no well defined centroids. Imagine clustering species; take a cluster {baboon, human, chimpanzee} - what is the centroid here? Michael Jackson? Really hard to say. And that is perhaps the reason why SPSS does not prompt you to save the centroid-derived statistics. Otherwise, if you think that they really do give a sense, you can compute the centroid coordinates easily using Aggregate and add them to the file. And then you can compute the distance case - centroid using the familiar formula for the Euclidean distance. Unfortunately, my SPSS 14 is broken now, so I will draft the example syntax in SPSS 12 which is more cumbersome because of the lack of ADDVARIABLES mode in Aggregate. GET FILE='C:\Program Files\SPSS\Cars.sav'. SELE IF nmiss(mpg to cylinder)=0 and uniform(1) < 0.2. DESCRIPTIVES mpg to accel /SAVE. CLUSTER Zmpg to Zaccel /SAVE CLUSTER(5). *Save the coordinates of the centroids. AGGREGATE /OUTF='C:\Program Files\SPSS/aggr.sav' /BREAK=CLU5_1 /Cmpg Cengine Chorse Cweight Caccel = MEAN(Zmpg Zengine Zhorse Zweight Zaccel). *Add them to the file. SORT CASES BY CLU5_1 (A) . MATCH FILES /FILE=* /TABLE='C:\Program Files\SPSS\aggr.sav' /BY CLU5_1. exe. *Compute the Euclidean distance case-centroid. comp distance = 0. do repe centr = Cmpg to Caccel /case = Zmpg to Zaccel. - comp distance = distance + (centr-case)**2. end repe. comp distance = sqrt(distance). var lab distance "Distance case-centroid". exe. *End of the example. Greetings Jan -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Mark Webb Sent: Monday, July 31, 2006 7:43 AM To: [hidden email] Subject: Distance from cluster centre query. In K Means it's possible to save this information as a variable. Is this possible in any of the hierarchical methods offered in SPSS ? They offer a proximity matrix - which I see as different - as this shows distances between individual respondents NOT the classification mean. Am I missing something ? Regards |
Thanks for this Jan.
I may well use your suggestion & compute the centroids BUT would like to discuss the idea of a cluster centroid in the context of what I'm trying to do. I'm finding that discriminant analysis [DA] based on clusters[dep var] & the statements used to make the clusters [indep vars] are not working well in practice. I would like to remove "weakly"associated respondents from each clusters and put them into an additional cluster representing "unclassifiable". I was hoping to define these weak respondents by using the distance from centriod idea but I use Hierarchical methods [Wards] most often - hence my initial querry. Do you think what I'm suggesting is feasible ? I would then run DA on the original clusters plus 1. Regards Mark ----- Original Message ----- From: "Spousta Jan" <[hidden email]> To: "Mark Webb" <[hidden email]>; <[hidden email]> Sent: Monday, July 31, 2006 12:55 PM Subject: RE: Distance from cluster centre query. Hi Mark, While K-Means operates in a metric Euclidean space or something similar, and therefore can easily define the centroids (and uses them during the computing), the Hierarchical algorithm can be used in a more general topological spaces where there are no well defined centroids. Imagine clustering species; take a cluster {baboon, human, chimpanzee} - what is the centroid here? Michael Jackson? Really hard to say. And that is perhaps the reason why SPSS does not prompt you to save the centroid-derived statistics. Otherwise, if you think that they really do give a sense, you can compute the centroid coordinates easily using Aggregate and add them to the file. And then you can compute the distance case - centroid using the familiar formula for the Euclidean distance. Unfortunately, my SPSS 14 is broken now, so I will draft the example syntax in SPSS 12 which is more cumbersome because of the lack of ADDVARIABLES mode in Aggregate. GET FILE='C:\Program Files\SPSS\Cars.sav'. SELE IF nmiss(mpg to cylinder)=0 and uniform(1) < 0.2. DESCRIPTIVES mpg to accel /SAVE. CLUSTER Zmpg to Zaccel /SAVE CLUSTER(5). *Save the coordinates of the centroids. AGGREGATE /OUTF='C:\Program Files\SPSS/aggr.sav' /BREAK=CLU5_1 /Cmpg Cengine Chorse Cweight Caccel = MEAN(Zmpg Zengine Zhorse Zweight Zaccel). *Add them to the file. SORT CASES BY CLU5_1 (A) . MATCH FILES /FILE=* /TABLE='C:\Program Files\SPSS\aggr.sav' /BY CLU5_1. exe. *Compute the Euclidean distance case-centroid. comp distance = 0. do repe centr = Cmpg to Caccel /case = Zmpg to Zaccel. - comp distance = distance + (centr-case)**2. end repe. comp distance = sqrt(distance). var lab distance "Distance case-centroid". exe. *End of the example. Greetings Jan -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Mark Webb Sent: Monday, July 31, 2006 7:43 AM To: [hidden email] Subject: Distance from cluster centre query. In K Means it's possible to save this information as a variable. Is this possible in any of the hierarchical methods offered in SPSS ? They offer a proximity matrix - which I see as different - as this shows distances between individual respondents NOT the classification mean. Am I missing something ? Regards __________ NOD32 1.1684 (20060729) Information __________ This message was checked by NOD32 antivirus system. http://www.eset.com |
In reply to this post by Mark Webb-3
Hi Mark,
A slightly better idea would be to drop the unclassifiable cluster from the analyzis. These unclassifiable cases are hardly separable and will destroy your DA. Also clusters with small number of cases can create similar problems. I suspect that your problems with DA can be caused by such splittered solution of CA. Try to find a good, stable solution of CA first, eliminate the outliers (small clusters + you can use standard diagnostics to find the unusual cases), and DA will probably work better. Jan -----Original Message----- From: Mark Webb [mailto:[hidden email]] Sent: Monday, July 31, 2006 1:27 PM To: Spousta Jan Cc: [hidden email] Subject: Re: Distance from cluster centre query. Thanks for this Jan. I may well use your suggestion & compute the centroids BUT would like to discuss the idea of a cluster centroid in the context of what I'm trying to do. I'm finding that discriminant analysis [DA] based on clusters[dep var] & the statements used to make the clusters [indep vars] are not working well in practice. I would like to remove "weakly"associated respondents from each clusters and put them into an additional cluster representing "unclassifiable". I was hoping to define these weak respondents by using the distance from centriod idea but I use Hierarchical methods [Wards] most often - hence my initial querry. Do you think what I'm suggesting is feasible ? I would then run DA on the original clusters plus 1. Regards Mark ----- Original Message ----- From: "Spousta Jan" <[hidden email]> To: "Mark Webb" <[hidden email]>; <[hidden email]> Sent: Monday, July 31, 2006 12:55 PM Subject: RE: Distance from cluster centre query. Hi Mark, While K-Means operates in a metric Euclidean space or something similar, and therefore can easily define the centroids (and uses them during the computing), the Hierarchical algorithm can be used in a more general topological spaces where there are no well defined centroids. Imagine clustering species; take a cluster {baboon, human, chimpanzee} - what is the centroid here? Michael Jackson? Really hard to say. And that is perhaps the reason why SPSS does not prompt you to save the centroid-derived statistics. Otherwise, if you think that they really do give a sense, you can compute the centroid coordinates easily using Aggregate and add them to the file. And then you can compute the distance case - centroid using the familiar formula for the Euclidean distance. Unfortunately, my SPSS 14 is broken now, so I will draft the example syntax in SPSS 12 which is more cumbersome because of the lack of ADDVARIABLES mode in Aggregate. GET FILE='C:\Program Files\SPSS\Cars.sav'. SELE IF nmiss(mpg to cylinder)=0 and uniform(1) < 0.2. DESCRIPTIVES mpg to accel /SAVE. CLUSTER Zmpg to Zaccel /SAVE CLUSTER(5). *Save the coordinates of the centroids. AGGREGATE /OUTF='C:\Program Files\SPSS/aggr.sav' /BREAK=CLU5_1 /Cmpg Cengine Chorse Cweight Caccel = MEAN(Zmpg Zengine Zhorse Zweight Zaccel). *Add them to the file. SORT CASES BY CLU5_1 (A) . MATCH FILES /FILE=* /TABLE='C:\Program Files\SPSS\aggr.sav' /BY CLU5_1. exe. *Compute the Euclidean distance case-centroid. comp distance = 0. do repe centr = Cmpg to Caccel /case = Zmpg to Zaccel. - comp distance = distance + (centr-case)**2. end repe. comp distance = sqrt(distance). var lab distance "Distance case-centroid". exe. *End of the example. Greetings Jan -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Mark Webb Sent: Monday, July 31, 2006 7:43 AM To: [hidden email] Subject: Distance from cluster centre query. In K Means it's possible to save this information as a variable. Is this possible in any of the hierarchical methods offered in SPSS ? They offer a proximity matrix - which I see as different - as this shows distances between individual respondents NOT the classification mean. Am I missing something ? Regards __________ NOD32 1.1684 (20060729) Information __________ This message was checked by NOD32 antivirus system. http://www.eset.com |
In reply to this post by Mark Webb-3
a methodological framework designed to handle these (and many other
related) issues can be found here: http://www.psychology.su.se/sleipner/ (e.g., you can remove multivariate outliers prior to clustering and work directly with the centroids after clustering) Mark Webb wrote: > Thanks for this Jan. > I may well use your suggestion & compute the centroids BUT would like to > discuss the idea of a cluster centroid in the context of what I'm > trying to > do. > I'm finding that discriminant analysis [DA] based on clusters[dep var] > & the > statements used to make the clusters [indep vars] are not working well in > practice. > I would like to remove "weakly"associated respondents from each > clusters and > put them into an additional cluster representing "unclassifiable". > I was hoping to define these weak respondents by using the distance from > centriod idea but I use Hierarchical methods [Wards] most often - > hence my > initial querry. > Do you think what I'm suggesting is feasible ? > I would then run DA on the original clusters plus 1. > > Regards > > Mark > > > ----- Original Message ----- > From: "Spousta Jan" <[hidden email]> > To: "Mark Webb" <[hidden email]>; <[hidden email]> > Sent: Monday, July 31, 2006 12:55 PM > Subject: RE: Distance from cluster centre query. > > > Hi Mark, > > While K-Means operates in a metric Euclidean space or something similar, > and therefore can easily define the centroids (and uses them during the > computing), the Hierarchical algorithm can be used in a more general > topological spaces where there are no well defined centroids. Imagine > clustering species; take a cluster {baboon, human, chimpanzee} - what is > the centroid here? Michael Jackson? Really hard to say. And that is > perhaps the reason why SPSS does not prompt you to save the > centroid-derived statistics. > > Otherwise, if you think that they really do give a sense, you can > compute the centroid coordinates easily using Aggregate and add them to > the file. And then you can compute the distance case - centroid using > the familiar formula for the Euclidean distance. > > Unfortunately, my SPSS 14 is broken now, so I will draft the example > syntax in SPSS 12 which is more cumbersome because of the lack of > ADDVARIABLES mode in Aggregate. > > GET FILE='C:\Program Files\SPSS\Cars.sav'. > SELE IF nmiss(mpg to cylinder)=0 and uniform(1) < 0.2. > DESCRIPTIVES mpg to accel /SAVE. > CLUSTER Zmpg to Zaccel /SAVE CLUSTER(5). > > *Save the coordinates of the centroids. > AGGREGATE /OUTF='C:\Program Files\SPSS/aggr.sav' /BREAK=CLU5_1 > /Cmpg Cengine Chorse Cweight Caccel = MEAN(Zmpg Zengine Zhorse Zweight > Zaccel). > > *Add them to the file. > SORT CASES BY CLU5_1 (A) . > MATCH FILES /FILE=* /TABLE='C:\Program Files\SPSS\aggr.sav' /BY CLU5_1. > exe. > > *Compute the Euclidean distance case-centroid. > comp distance = 0. > do repe centr = Cmpg to Caccel /case = Zmpg to Zaccel. > - comp distance = distance + (centr-case)**2. > end repe. > comp distance = sqrt(distance). > var lab distance "Distance case-centroid". > exe. > > *End of the example. > > Greetings > > Jan > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of > Mark Webb > Sent: Monday, July 31, 2006 7:43 AM > To: [hidden email] > Subject: Distance from cluster centre query. > > In K Means it's possible to save this information as a variable. > Is this possible in any of the hierarchical methods offered in SPSS ? > They offer a proximity matrix - which I see as different - as this shows > distances between individual respondents NOT the classification mean. > Am I missing something ? > > Regards > > __________ NOD32 1.1684 (20060729) Information __________ > > This message was checked by NOD32 antivirus system. > http://www.eset.com > > > |
Free forum by Nabble | Edit this page |