|
Dear all,
Could anyone please tell me how these are chosen? I tried to decipher the explanation from Algorithms -> Quick cluster but I just can't figure it out. Is the end result of the initial cluster center selection those K observations that have the largest average Euclidian distance among them? Is a proximity matrix among all observations not needed in order to detect those K observations? Are all distances computed between the first case and all the others, in order to proceed to case #2, case #3 etcetera? Could anyone please shed some more light on this? TIA and a nice weekend to all! Ruben van den Berg See all the ways you can stay connected to friends and family |
|
I usually use a hierarchical agglomerative technique to get the
initial cluster centroids and then plug those in. Another method is to try
random starts and then do several of them to ensure you arrive at similar solutions. Dr. Paul R. Swank, Professor and Director of Research Children's Learning Institute University of Texas Health Science Center-Houston From: SPSSX(r) Discussion
[mailto:[hidden email]] On Behalf Of Ruben van den Berg Dear all, See
all the ways you can stay connected to
friends and family |
|
In reply to this post by Ruben Geert van den Berg
Ruben, The procedure is explained in the
Algorithms of SPSS. In this case, to be short, the procedure is as follows:
3.
If the distance between xk (a specific case in the file) and its closest
cluster mean is greater than the distance between the means of the two closest clusters
The result of these operations, performed at the
first pass, are the initial cluster centers. If desired, the keyword NOINITIAL would simply take
the first k cases as initial cluster centers. This allows you to place such
cases at the beginning of the data file. For instance, k cases with clearly
different cluster means. Hector From: SPSSX(r)
Discussion Dear all, See all the ways you can stay connected to
friends and family No virus found in this incoming message. |
|
In reply to this post by Ruben Geert van den Berg
What to recommend depends a lot on how many cases you have and the capabilities of your system. I would not rely on a solution from a single run. I retain solutions based on what is found in common by several clustering methods and distance measures. It is often a good idea to use some form of factor analysis (PCA, PAF, CATPCA) to obtain variables that are fairly independent of each other. If you use variables that you feel are already independent, standardizing will remove the influence of differences in scaling. I would suggest that you look at TWOSTEP which starts with a hierarchical clustering of a subset of the data. Its advantage is that it provide AIC or BIC for different numbers of clusters. This aids in the number of clusters to keep. If you have the machine resources, try different hierarchical methods and distance measures on all of the data. If not try using as large samples as you machine can handle. SPECIFIC replies interspersed below Art Kendall Social Research Consultants Ruben van den Berg wrote: Dear all,Since TWOSTEP has been available I have not been using QUICK CLUSTER much. If memory serves, the initial clusters are single cases first chosen from the first k cases. At the file is passed cases are assigned to clusters so that the within cluster variance is minimized. Is a proximity matrix among all observations not needed in order to detect those K observations?That is correct. This is an advantage only in terms of machine resources. Are all distances computed between the first case and all the others, in order to proceed to case #2, case #3 etcetera? Could anyone please shed some more light on this?If memory serves, the distance of each case from each of the k cluster centers is used. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
In reply to this post by Swank, Paul R
Is it possible (through syntax) for SPSS to access data stored at a web location like google docs? William N. Dudley, PhD Associate Dean for Research The School of Health and Human Performance Office of Research The University of North Carolina at Greensboro 126 HHP Building, PO Box 26170 Greensboro, NC 27402-6170 VOICE 336.2562475 FAX 336.334.3238 |
|
Yes, if you use the extension command SPSSINC GETURI DATA.
This can be downloaded from SPSS Developer Central (www.spss.com/devcentral). It
requires at least V17 and the Python programmability plugin, but no Python
knowledge is needed in order to use it. It also has a dialog box that
will appear on the File menu after installation. HTH, Jon Peck From: SPSSX(r) Discussion
[mailto:[hidden email]] On Behalf Of William Dudley WNDUDLEY
|
| Free forum by Nabble | Edit this page |
