Replicating the Population data Clusters on Validating data

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Replicating the Population data Clusters on Validating data

Richard AK
Hi Team,
 
I am facing a typical problem of replicating one data clusters onto another data.
 
I am having a two datasets by the names of "population" and "validation". I used Two Step Cluster (TSC) and it created 4 clusters in "Population data". My aim is to replicate the same cluster groups on the "Validation data". 
 
Approach carried on replicating for 2 Clusters:
 
We tried using the discriminant analysis and created equations using "Canonical Discriminant Function Coefficients" and fitted the same cluster groups into new validation data when we restricted the cluster groups into 2 only. I am able to replicate the same groups with 92% accuracy and with almost similar central tendency measures. This is considered to be good for our further estimations.
 
Problem:
 
But when we are trying to replicate the same on the 4 cluster groups created using the TSC only our accuracy and other central tendency measures are coming out to be different from the original.
 
Please provide me some suggestions on the same.
 
Thanks in advance,
Richard
 
 
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Replicating the Population data Clusters on Validating data

Art Kendall
Try fitting the ungrouped cases into the 4 clusters more directly.
something like this untested syntax.
ADD FILES
  file = population /in = inpop
 /file = validation /in = inval.
do if inpop,
COMPUTE groupvar= TSC.
ELSE IF inval.
COMPUTE groupvar=5.
end if.
DISCRIMINANT GROUPS = groupvar(1,4) /variables= ...
...
 /classify = unclassified

 /save class (newgroup) ...
CROSSTABS TABLES = groupvar by newgroup inpop.
means tables = varlist by newgroup/ varlist by newgroup inpop .
..


I do not know what you mean by

Art Kendall
Social Research Consultants

On 5/16/2011 2:59 AM, Richard AK wrote:
Hi Team,
 
I am facing a typical problem of replicating one data clusters onto another data.
 
I am having a two datasets by the names of "population" and "validation". I used Two Step Cluster (TSC) and it created 4 clusters in "Population data". My aim is to replicate the same cluster groups on the "Validation data". 
 
Approach carried on replicating for 2 Clusters:
 
We tried using the discriminant analysis and created equations using "Canonical Discriminant Function Coefficients" and fitted the same cluster groups into new validation data when we restricted the cluster groups into 2 only. I am able to replicate the same groups with 92% accuracy and with almost similar central tendency measures. This is considered to be good for our further estimations.
 
Problem:
 
But when we are trying to replicate the same on the 4 cluster groups created using the TSC only our accuracy and other central tendency measures are coming out to be different from the original.
 
Please provide me some suggestions on the same.
 
Thanks in advance,
Richard
 
 
 
 
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants