Two-step cluster in SPSS 19

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Two-step cluster in SPSS 19

bay.worsoe
Hi, we have searched, but cannot find any information about including/excluding variables for the Two-step cluster in SPSS 19.

We have both binomial, categorical and continuous variables.

The first results are:
predictor-importance.pdf
clusters.pdf
cluster-comparison.pdf

Our concern is that the two clusters are not significantly different using ANOVA with regards to our dependent variable, so we need to come up with a different cluster solution.

How would we go about deleting variables? Maybe just delete the ones that show no difference between the two clusters? Or the ones with the lowest predictor importance?

Another idea is to use a correlation matrix to identify which variables correlate with our dependent variable and only use those to form the clusters. We tried this, and ended up using only 5 of the 27 variables and 6 clusters. This solution gave us significant results using the ANOVA, but we are concerned that we might lose too much data using only 5 of the 27 variables to form the clusters.

The project is explorative, and therefore we have no clue on how the clusters should be formed.

Best regards
Jacob & Tina
Reply | Threaded
Open this post in threaded view
|

Re: Two-step cluster in SPSS 19

Art Kendall
A lot depends on the substantive nature of your work.
What constitutes a case in your data?
How many cases do you have/

How did you choose your variables?  
is there a logic to which variables you chose to include?

What is your dependent variable?

How did you decide how many clusters to use?


Art Kendall
Social Research Consultants
On 7/27/2012 10:45 AM, bay.worsoe wrote:
Hi, we have searched, but cannot find any information about
including/excluding variables for the Two-step cluster in SPSS 19.

We have both binomial, categorical and continuous variables.

The first results are:
http://spssx-discussion.1045642.n5.nabble.com/file/n5714503/predictor-importance.pdf
predictor-importance.pdf
http://spssx-discussion.1045642.n5.nabble.com/file/n5714503/clusters.pdf
clusters.pdf
http://spssx-discussion.1045642.n5.nabble.com/file/n5714503/cluster-comparison.pdf
cluster-comparison.pdf

Our concern is that the two clusters are not significantly different using
ANOVA with regards to our dependent variable, so we need to come up with a
different cluster solution.

How would we go about deleting variables? Maybe just delete the ones that
show no difference between the two clusters? Or the ones with the lowest
predictor importance?

Another idea is to use a correlation matrix to identify which variables
correlate with our dependent variable and only use those to form the
clusters. We tried this, and ended up using only 5 of the 27 variables and 6
clusters. This solution gave us significant results using the ANOVA, but we
are concerned that we might lose too much data using only 5 of the 27
variables to form the clusters.

The project is explorative, and therefore we have no clue on how the
clusters should be formed.

Best regards
Jacob & Tina



--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Two-step-cluster-in-SPSS-19-tp5714503.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Automatická odpověď: Two-step cluster in SPSS 19

Spousta Jan
Děkuji za Váš e-mail. Budu si jej však moci přečíst nejdříve 1.8.2012. Poštu nemám přesměrovanou. V případě potřeby kontaktujte prosím Jána Kašprišina, [hidden email]. S přátelským pozdravem, Jan Spousta

Thank you for your e-mail. However, I'll be able to read it earliest on 8/1. My e-mail box isn't forwarded. In urgent cases contact please Ján Kašprišin, [hidden email]. Kind regards, Jan Spousta

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Two-step cluster in SPSS 19

bay.worsoe
In reply to this post by Art Kendall
Hi Art

Thanks for your response.

The cases in our data set are webshops and we have around 400 cases.

The dependent variable is the performance (conversion rate) of the webshop. All the variables were chosen because they had previously been proved to correlate with the dependent variable (but in our case only some of them do). All the variables is related to the design and configuration of the webshops.

The number of clusters were determined automatically by the two-step algorithm.

We have around 30 variables and obviously some of them does not vary much between clusters. Many variables are bivariate and show no variation in choice (0 or 1) but show variation in the percentage of cases that fulfill either 1 or 0 between the clusters. Is it safe to remove these?

Another concern: Is it nonsense to include variables that do not individually correlate with the dependent variable in the first place? Our thoughts on this is that all the variables may have a combined effect on the dependent variable even though each one does not correlate with it.

Best Regards
Jacob & Tina
Reply | Threaded
Open this post in threaded view
|

Re: Two-step cluster in SPSS 19

bay.worsoe
We should mention that the purpose of our analysis is to identify the combined effect of the variables on the dependent variable. To identify that we combined the cases into clusters, which should be fairly similar and then test those clusters against the dependent variable in an ANOVA in order to find a superior cluster with a certain set of variable values.

But which variables do we remove from the clustering process when we find no difference between the clusters?

The ones with no individual correlation with the dependent variable or the ones that show no difference between the clusters?