Re: Help Creating Clusters

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: Help Creating Clusters

Art Kendall
when you have 2 nominal level variables you can display continuous variable on heat maps. See <help>.

However, collapsing categories can increase noise.

A lot depends on the overall goal of your analysis.
Do you have reasons for every case having to be put into the categories?

See if using DISCRIMINANT would help with what you are trying to do.
autorecode the variable ALL5 into sequential integers.
Use the 8 large groups as the group variable and consider the other groups as "ungrouped" for the classification phase of the DISCRIMINANT. That way the unclassified cases will be assigned to the group that is the smallest distance from the centroid of one of the large group.
The classification table will give you info on how distinct the 8 groups are.

Alternatively, cluster only on the other variables and crosstab the membership variable with ALL5.

Art Kendall
Social Research Consultants

On 9/5/2010 9:54 AM, Amal Daher wrote:
Dear Mr. Kendall,
Many thanks for the reply!!
Very good suggestion regarding the computation - I just had to translate the two cases into 1s and 0s using "record into a different variable" first but it worked, I have them split into the different 16 segments now (I am only using 4 variables now, not 5).
To reduce to 4/6 segments, I will do two things:
- First merge those segments that are too small with the closest other segment - already that will reduce the segments significantly. From the 16, eight account for 97% of the sample so I just combined the other eight with one of the first eight segments
- As you pointed out, I have other variables in the data set and need to identify patterns that might allow me to merge segments
I will do frequency graphs for the second point of each segment with the other variables I'll be looking at. Do you know of a more effective way to go about that? Is there heat maps on SPSS I can use to clearly visualise patterns?

Would appreciate your advice, and thanks so much again for the help!
Amal.

On Sun, Sep 5, 2010 at 2:28 PM, Art Kendall <[hidden email]> wrote:
You can get a variable that represents the 5 dichotomies by something like this.

numeric Pattern5(n5).
compute Pattern5= (voice*10**4)+ (data*10**3) + (content*10**2) + (x*10**1) + (y*10**0).
*note how the exponents decrease. the exponents of 1 and 0 are made explicit for purposes of clarification.
*given the order operations are executed the parentheses are also technically unnecessary.

How do you know there should be only 4 to 6 within the 32?

How do you plan to collapse categories on Pattern5?
Do you have other external variables that you want to use so that you can collapse segments that are similar on those other variables?
Do you want to collapse segments where one pattern occurs frequently and another is relatively rare but differs only on one of the original 5 dichotomies? Or what?

What do you mean by a "large" sample? Millions of cases? Hundreds of thousands?

On what basis do you want to assign weights to variables in the collapsing?

Art Kendall
Social Research Consultants

On 9/5/2010 5:12 AM, Amal 1 wrote:
It’s the first time I use SPSS.

I need to make an analysis on specific segments.

I have a very large set of data (results of a sample), and I want to
define segments based on 5 variables (voice usage, data usage, Content
usage, etc.)
So I will have about 32 different combinations (~ 2 cases per
variable). Is there any way I can define those 32 combinations that
faster than having tons of “if statements”?

What I want to do is define those 32 segments, and then try to see
similar patterns between them to see which groups I can combine to
shrink the number of segments to about 4 to 6 maximum.

Let me know what tools I should be looking at in SPSS cause I am
really stuck!

I checked the “direct marketing -> segment my clusters into segment”
functionality, but it doesn’t allow me to define the segments the way
I want or even put weights for each of the variables I want to use.

Any help is appreciated!!

Thanks!!

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants