clustering variables (binary scale)

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

clustering variables (binary scale)

Gekko
hi,

does anybody know, the best method and the ideal measure for clustering variables (!not respondents!) in spss?

a method like: linkage, ward, centroid, neighbour, median...
a measure like: euclyd dist, phi-4, lambda, jaccard, rogers and tanimoto...

the variables are binary (0,1)
and some are diseases (no, yes) out of a set of possible diseases, and some are about nutrition like was try to eat healthy (no, yes), every day the same (no, yes)...


thanks
stefan

Reply | Threaded
Open this post in threaded view
|

Re: clustering variables (binary scale)

Melissa Ives
The ideal method and measure really depends on what you are trying to do
and your data.
For example:

Ward's (1963) minimum distance method is a hierarchical method that
groups cases to maximize between-group differences and minimize
within-group differences (i.e., optimizes an F-Statistic).  It keeps
grouping the most similar pair of cases/clusters until there is just one
cluster.

The Squared Euclidean Distance between cases/cluster centers.  This
places greater weights on cases that are further apart and serves to
isolate high-risk groups faster.

However some method group all records into one big cluster and then
breaks out records based on the measure chosen.

Try looking at:
Aldenderfer, M. S., & Blashfield, R. K. (1984). Cluster analysis. In M.
S. Lewis-Beck, Quantitative applications in the social sciences (SAGE
University Paper 44. ). Newbury Park, CA: Sage.

Rapkin, B. D., & Luke, D. A. (1993). Cluster analysis in community
research: Epistemology and practice. American Journal of Community
Psychology, 21, 247-277.

Melissa

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Gekko
Sent: Wednesday, January 30, 2008 3:38 AM
To: [hidden email]
Subject: [SPSSX-L] clustering variables (binary scale)

hi,

does anybody know, the best method and the ideal measure for clustering
variables (!not respondents!) in spss?

a method like: linkage, ward, centroid, neighbour, median...
a measure like: euclyd dist, phi-4, lambda, jaccard, rogers and
tanimoto...

the variables are binary (0,1)
and some are diseases (no, yes) out of a set of possible diseases, and
some are about nutrition like was try to eat healthy (no, yes), every
day the same (no, yes)...


thanks
stefan


--
View this message in context:
http://www.nabble.com/clustering-variables-%28binary-scale%29-tp15177238
p15177238.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list
of commands to manage subscriptions, send the command INFO REFCARD


PRIVILEGED AND CONFIDENTIAL INFORMATION
This transmittal and any attachments may contain PRIVILEGED AND
CONFIDENTIAL information and is intended only for the use of the
addressee. If you are not the designated recipient, or an employee
or agent authorized to deliver such transmittals to the designated
recipient, you are hereby notified that any dissemination,
copying or publication of this transmittal is strictly prohibited. If
you have received this transmittal in error, please notify us
immediately by replying to the sender and delete this copy from your
system. You may also call us at (309) 827-6026 for assistance.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: clustering variables (binary scale)

Zetu, Dan
With binary data (cases or variables), one should use hierarchical
clustering with the Jaccard coefficient of disimilarity. SPSS has this
feature embedded.

-------------------------------
Dan Zetu
Analytical Consultant
R. L. Polk & Co.
248-728-7278
[hidden email]

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Melissa Ives
Sent: Wednesday, January 30, 2008 3:46 PM
To: [hidden email]
Subject: Re: clustering variables (binary scale)

The ideal method and measure really depends on what you are trying to do
and your data.
For example:

Ward's (1963) minimum distance method is a hierarchical method that
groups cases to maximize between-group differences and minimize
within-group differences (i.e., optimizes an F-Statistic).  It keeps
grouping the most similar pair of cases/clusters until there is just one
cluster.

The Squared Euclidean Distance between cases/cluster centers.  This
places greater weights on cases that are further apart and serves to
isolate high-risk groups faster.

However some method group all records into one big cluster and then
breaks out records based on the measure chosen.

Try looking at:
Aldenderfer, M. S., & Blashfield, R. K. (1984). Cluster analysis. In M.
S. Lewis-Beck, Quantitative applications in the social sciences (SAGE
University Paper 44. ). Newbury Park, CA: Sage.

Rapkin, B. D., & Luke, D. A. (1993). Cluster analysis in community
research: Epistemology and practice. American Journal of Community
Psychology, 21, 247-277.

Melissa

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Gekko
Sent: Wednesday, January 30, 2008 3:38 AM
To: [hidden email]
Subject: [SPSSX-L] clustering variables (binary scale)

hi,

does anybody know, the best method and the ideal measure for clustering
variables (!not respondents!) in spss?

a method like: linkage, ward, centroid, neighbour, median...
a measure like: euclyd dist, phi-4, lambda, jaccard, rogers and
tanimoto...

the variables are binary (0,1)
and some are diseases (no, yes) out of a set of possible diseases, and
some are about nutrition like was try to eat healthy (no, yes), every
day the same (no, yes)...


thanks
stefan


--
View this message in context:
http://www.nabble.com/clustering-variables-%28binary-scale%29-tp15177238
p15177238.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list
of commands to manage subscriptions, send the command INFO REFCARD


PRIVILEGED AND CONFIDENTIAL INFORMATION
This transmittal and any attachments may contain PRIVILEGED AND
CONFIDENTIAL information and is intended only for the use of the
addressee. If you are not the designated recipient, or an employee
or agent authorized to deliver such transmittals to the designated
recipient, you are hereby notified that any dissemination,
copying or publication of this transmittal is strictly prohibited. If
you have received this transmittal in error, please notify us
immediately by replying to the sender and delete this copy from your
system. You may also call us at (309) 827-6026 for assistance.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
*****************************************************************
This message has originated from R. L. Polk & Co.,
26955 Northwestern Highway, Southfield, MI 48033.
R. L. Polk & Co. sends various types of email
communications.  If this email message concerns the
potential licensing of a Polk product or service, and
you do not wish to receive further emails regarding Polk
products, forward this email to [hidden email]
with the word "remove" in the subject line.

The email and any files transmitted with it are confidential
and intended solely for the individual or entity to whom they
are addressed.

If you have received this email in error, please delete this
message and notify the Polk System Administrator at
[hidden email].
*****************************************************************

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD