|
Am I just doing something wrong or do the k-means cluster method results
differ depending on how the data are sorted? I keep running the same k-means analysis and getting different cluster centers each time. Also, can someone tell me what the cluster centers are exactly. If I use raw scores are they just the mean on that variable within the particular cluster. Are they just means? Thanks, Matt ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Paul,
Sounds good. I used Ward's to identify the number of clusters and to create the initial cluster centers for the K-means procedure. From what I'm hearing from you what I've done is kosher, yes? Thanks from a novice clusterer. Matt Matthew Pirritano, Ph.D. Assistant Professor of Psychology Smith Hall 116C Chapman University Department of Psychology One University Drive Orange, CA 92866 Telephone (714)744-7940 FAX (714)997-6780 ----- Original Message ---- From: "Swank, Paul R" <[hidden email]> To: Matt <[hidden email]> Sent: Monday, November 19, 2007 8:31:10 AM Subject: RE: k-means clustering Typically, K means clustering starts by randomly selecting cases as seeds for the clusters so if you resort, then the seeds are different. What this may indicate is a problem with clustering. If the clusters are inherent in tha data, then it shouldn't matter where the seeds start. However, I usually start with a hierarchical method to identify the # of clusters and use the cluster means as the seeds for the K means method. Paul R. Swank, Ph.D. Professor and Director of Research Children's Learning Institute University of Texas Health Science Center - Houston -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Matt Sent: Sunday, November 18, 2007 8:28 PM To: [hidden email] Subject: k-means clustering Am I just doing something wrong or do the k-means cluster method results differ depending on how the data are sorted? I keep running the same k-means analysis and getting different cluster centers each time. Also, can someone tell me what the cluster centers are exactly. If I use raw scores are they just the mean on that variable within the particular cluster. Are they just means? Thanks, Matt ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
As far as I'm concerned, but if that's what you did, why did you get
different clusters from different sorts? Paul R. Swank, Ph.D. Professor and Director of Research Children's Learning Institute University of Texas Health Science Center - Houston From: Matthew Pirritano [mailto:[hidden email]] Sent: Monday, November 19, 2007 10:37 AM To: Swank, Paul R; Matt; [hidden email] Subject: Re: k-means clustering Paul, Sounds good. I used Ward's to identify the number of clusters and to create the initial cluster centers for the K-means procedure. From what I'm hearing from you what I've done is kosher, yes? Thanks from a novice clusterer. Matt Matthew Pirritano, Ph.D. Assistant Professor of Psychology Smith Hall 116C Chapman University Department of Psychology One University Drive Orange, CA 92866 Telephone (714)744-7940 FAX (714)997-6780 ----- Original Message ---- From: "Swank, Paul R" <[hidden email]> To: Matt <[hidden email]> Sent: Monday, November 19, 2007 8:31:10 AM Subject: RE: k-means clustering Typically, K means clustering starts by randomly selecting cases as seeds for the clusters so if you resort, then the seeds are different. What this may indicate is a problem with clustering. If the clusters are inherent in tha data, then it shouldn't matter where the seeds start. However, I usually start with a hierarchical method to identify the # of clusters and use the cluster means as the seeds for the K means method. Paul R. Swank, Ph.D. Professor and Director of Research Children's Learning Institute University of Texas Health Science Center - Houston -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Matt Sent: Sunday, November 18, 2007 8:28 PM To: [hidden email] Subject: k-means clustering Am I just doing something wrong or do the k-means cluster method results differ depending on how the data are sorted? I keep running the same k-means analysis and getting different cluster centers each time. Also, can someone tell me what the cluster centers are exactly. If I use raw scores are they just the mean on that variable within the particular cluster. Are they just means? Thanks, Matt ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ====================To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
The different clusters for different sorts is what I was getting before
I used the Ward's centers as starting seeds. I was lost and now I'm found. Thanks a bunch, Matt Matthew Pirritano, Ph.D. Assistant Professor of Psychology Smith Hall 116C Chapman University Department of Psychology One University Drive Orange, CA 92866 Telephone (714)744-7940 FAX (714)997-6780 -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Swank, Paul R Sent: Monday, November 19, 2007 9:45 AM To: [hidden email] Subject: Re: k-means clustering As far as I'm concerned, but if that's what you did, why did you get different clusters from different sorts? Paul R. Swank, Ph.D. Professor and Director of Research Children's Learning Institute University of Texas Health Science Center - Houston From: Matthew Pirritano [mailto:[hidden email]] Sent: Monday, November 19, 2007 10:37 AM To: Swank, Paul R; Matt; [hidden email] Subject: Re: k-means clustering Paul, Sounds good. I used Ward's to identify the number of clusters and to create the initial cluster centers for the K-means procedure. From what I'm hearing from you what I've done is kosher, yes? Thanks from a novice clusterer. Matt Matthew Pirritano, Ph.D. Assistant Professor of Psychology Smith Hall 116C Chapman University Department of Psychology One University Drive Orange, CA 92866 Telephone (714)744-7940 FAX (714)997-6780 ----- Original Message ---- From: "Swank, Paul R" <[hidden email]> To: Matt <[hidden email]> Sent: Monday, November 19, 2007 8:31:10 AM Subject: RE: k-means clustering Typically, K means clustering starts by randomly selecting cases as seeds for the clusters so if you resort, then the seeds are different. What this may indicate is a problem with clustering. If the clusters are inherent in tha data, then it shouldn't matter where the seeds start. However, I usually start with a hierarchical method to identify the # of clusters and use the cluster means as the seeds for the K means method. Paul R. Swank, Ph.D. Professor and Director of Research Children's Learning Institute University of Texas Health Science Center - Houston -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Matt Sent: Sunday, November 18, 2007 8:28 PM To: [hidden email] Subject: k-means clustering Am I just doing something wrong or do the k-means cluster method results differ depending on how the data are sorted? I keep running the same k-means analysis and getting different cluster centers each time. Also, can someone tell me what the cluster centers are exactly. If I use raw scores are they just the mean on that variable within the particular cluster. Are they just means? Thanks, Matt ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ======= To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Matthew Pirritano
Recent contributions to this list have stated the conditions for a
Durbin-Watson test to be used, but haven't explained exactly what it tests for. The quality of a regression, in terms of standard errors and signficance tests, depends partly on the number of observations, and the more observations the better, PROVIDED that they are independent. Simply making another copy of the data to double the sample size doesn't help, because there is no new information. One of the assumptions for multiple regression is that the observations have a certain kind of independence. Consider an experiment in which seedlings are grown under a number of different conditions (x variables indicate fertiliser, watering, temperature, etc, which according to the curious fiction that we adopt with regression, are assumed to be measured with complete accuracy). Seedlings grown under the same X values do not all achieve the same height Y; they vary about the predicted value of Y, and these deviations are know as the errors or residuals. They are assumed to have a zero mean, and to be independent of each other. If the seedlings are grown by different people in different laboratories there is a reasonable chance that the errors are independent. If they are all grown in the same pot, then the errors are likely to be correlated - they are NOT independent. When the observations come from a time sequence a new problem arises. You might have a nice equation that predicts the output of a factory in terms of season, number of employees, cost of materials, etc, which make the predictable part of the regression. There are unpredictable factors which cause the "errors", such as late deliveries of materials, machine breakdowns, strikes. Over a long period these might average out, but if observations are taken on a short-term basis, perhaps every week, then the unpredictable factors hang over for several time periods, and the errors for observations close in time are not independent. When observations are independent, knowing the error of one observation tells you nothing about the likely error of the next one. When there is lack of independence, then the error in one observation is likely to affect the next one. This is known as autocorrelation or serial correlation. The Durbin-Watson test examines the errors of nearby observations to see if there is a pattern, and if it is significant this indicates that you have less information than if the observations were independent; the estimated standard errors are too small, and significance is not as good as it appears. David Hitchin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
