Setting Cluster Centre - Information

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Setting Cluster Centre - Information

Mark Webb-3

I aim to repeat a clustering assignment & the client wants to ensure some degree of consistency and has suggested setting initial cluster centres.

Where can I get more information on this?

I tend to think it’s only applicable to K means but I’m not sure of this.

Is there any SPSS based info out that can be recommended to me?

 

Regards

 

Mark Webb

 

+27 21 786 4379

+27 72 199 1000

Skype - webbmark

[hidden email]

 

Reply | Threaded
Open this post in threaded view
|

Bizarre regression results - predictors still added when Rsquared is already 1.0000

Ruben Geert van den Berg
Dear all,
 
I've been running a stepwise regression and the results are very weird. When the third predictor is added to the model, the Rsquared is perfect but nevertheless 6 additional predictors are added. Also, AFTER step 2, t-values are no longer calculated. There's 9 predictors and 184 cases so that shouldn't be the source of the problem. Does anyone have an idea what's going wrong here? Is the problem in the correlation matrix?
 
TIA
 
 

Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

,719(a)

,517

,514

,537

2

,811(b)

,658

,654

,453

3

1,000(c)

1,000

1,000

,000

4

1,000(d)

1,000

1,000

,000

5

1,000(e)

1,000

1,000

,000

6

1,000(f)

1,000

1,000

,000

7

1,000(g)

1,000

1,000

,000

8

1,000(h)

1,000

1,000

,000

9

1,000(i)

1,000

1,000

,000

a Predictors: (Constant), V410_9

b Predictors: (Constant), V410_9, V410_3

c Predictors: (Constant), V410_9, V410_3, V410_6

d Predictors: (Constant), V410_9, V410_3, V410_6, V410_2

e Predictors: (Constant), V410_9, V410_3, V410_6, V410_2, V410_8

f Predictors: (Constant), V410_9, V410_3, V410_6, V410_2, V410_8, V410_1

g Predictors: (Constant), V410_9, V410_3, V410_6, V410_2, V410_8, V410_1, V410_4

h Predictors: (Constant), V410_9, V410_3, V410_6, V410_2, V410_8, V410_1, V410_4, V410_7

i Predictors: (Constant), V410_9, V410_3, V410_6, V410_2, V410_8, V410_1, V410_4, V410_7, V410_5








 


Date: Mon, 9 Feb 2009 10:20:11 +0200
From: [hidden email]
Subject: Setting Cluster Centre - Information
To: [hidden email]


I aim to repeat a clustering assignment & the client wants to ensure some degree of consistency and has suggested setting initial cluster centres.

Where can I get more information on this?

I tend to think it’s only applicable to K means but I’m not sure of this.

Is there any SPSS based info out that can be recommended to me?

 

Regards

 

Mark Webb

 

+27 21 786 4379

+27 72 199 1000

Skype - webbmark

[hidden email]

 



What can you do with the new Windows Live? Find out
Reply | Threaded
Open this post in threaded view
|

Re: Bizarre regression results - predictors still added when Rsquared is already 1.0000

Swank, Paul R

Perhaps yet another reason not to use the stepwise method!

 

Paul R. Swank, Ph.D

Professor and Director of Research

Children's Learning Institute

University of Texas Health Science Center

Houston, TX 77038

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Ruben van den Berg
Sent: Monday, February 09, 2009 7:09 AM
To: [hidden email]
Subject: Bizarre regression results - predictors still added when Rsquared is already 1.0000

 

Dear all,
 
I've been running a stepwise regression and the results are very weird. When the third predictor is added to the model, the Rsquared is perfect but nevertheless 6 additional predictors are added. Also, AFTER step 2, t-values are no longer calculated. There's 9 predictors and 184 cases so that shouldn't be the source of the problem. Does anyone have an idea what's going wrong here? Is the problem in the correlation matrix?
 
TIA
 
 

Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

,719(a)

,517

,514

,537

2

,811(b)

,658

,654

,453

3

1,000(c)

1,000

1,000

,000

4

1,000(d)

1,000

1,000

,000

5

1,000(e)

1,000

1,000

,000

6

1,000(f)

1,000

1,000

,000

7

1,000(g)

1,000

1,000

,000

8

1,000(h)

1,000

1,000

,000

9

1,000(i)

1,000

1,000

,000

a Predictors: (Constant), V410_9

b Predictors: (Constant), V410_9, V410_3

c Predictors: (Constant), V410_9, V410_3, V410_6

d Predictors: (Constant), V410_9, V410_3, V410_6, V410_2

e Predictors: (Constant), V410_9, V410_3, V410_6, V410_2, V410_8

f Predictors: (Constant), V410_9, V410_3, V410_6, V410_2, V410_8, V410_1

g Predictors: (Constant), V410_9, V410_3, V410_6, V410_2, V410_8, V410_1, V410_4

h Predictors: (Constant), V410_9, V410_3, V410_6, V410_2, V410_8, V410_1, V410_4, V410_7

i Predictors: (Constant), V410_9, V410_3, V410_6, V410_2, V410_8, V410_1, V410_4, V410_7, V410_5








 



Date: Mon, 9 Feb 2009 10:20:11 +0200
From: [hidden email]
Subject: Setting Cluster Centre - Information
To: [hidden email]

I aim to repeat a clustering assignment & the client wants to ensure some degree of consistency and has suggested setting initial cluster centres.

Where can I get more information on this?

I tend to think it’s only applicable to K means but I’m not sure of this.

Is there any SPSS based info out that can be recommended to me?

 

Regards

 

Mark Webb

 

+27 21 786 4379

+27 72 199 1000

Skype - webbmark

[hidden email]

 

 


What can you do with the new Windows Live? Find out

Reply | Threaded
Open this post in threaded view
|

Re: Bizarre regression results - predictors still added when Rsquared is already 1.0000

Ornelas, Fermin-2
In reply to this post by Ruben Geert van den Berg

I do not think it is wise to use stepwise in isolation. It seems likely that your predictors contribute the same information to the regression function. You should have run some model diagnostics on each of your runs to get clues as to what is causing this problem.  Descriptive statistics could also hint as possible shortcomings in your predictors (% missing values, binary variables, values out of range for a variable, etc).

 

F Ornelas

 


From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Ruben van den Berg
Sent: Monday, February 09, 2009 6:09 AM
To: [hidden email]
Subject: Bizarre regression results - predictors still added when Rsquared is already 1.0000

 

Dear all,
 
I've been running a stepwise regression and the results are very weird. When the third predictor is added to the model, the Rsquared is perfect but nevertheless 6 additional predictors are added. Also, AFTER step 2, t-values are no longer calculated. There's 9 predictors and 184 cases so that shouldn't be the source of the problem. Does anyone have an idea what's going wrong here? Is the problem in the correlation matrix?
 
TIA
 
 

Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

,719(a)

,517

,514

,537

2

,811(b)

,658

,654

,453

3

1,000(c)

1,000

1,000

,000

4

1,000(d)

1,000

1,000

,000

5

1,000(e)

1,000

1,000

,000

6

1,000(f)

1,000

1,000

,000

7

1,000(g)

1,000

1,000

,000

8

1,000(h)

1,000

1,000

,000

9

1,000(i)

1,000

1,000

,000

a Predictors: (Constant), V410_9

b Predictors: (Constant), V410_9, V410_3

c Predictors: (Constant), V410_9, V410_3, V410_6

d Predictors: (Constant), V410_9, V410_3, V410_6, V410_2

e Predictors: (Constant), V410_9, V410_3, V410_6, V410_2, V410_8

f Predictors: (Constant), V410_9, V410_3, V410_6, V410_2, V410_8, V410_1

g Predictors: (Constant), V410_9, V410_3, V410_6, V410_2, V410_8, V410_1, V410_4

h Predictors: (Constant), V410_9, V410_3, V410_6, V410_2, V410_8, V410_1, V410_4, V410_7

i Predictors: (Constant), V410_9, V410_3, V410_6, V410_2, V410_8, V410_1, V410_4, V410_7, V410_5








 



Date: Mon, 9 Feb 2009 10:20:11 +0200
From: [hidden email]
Subject: Setting Cluster Centre - Information
To: [hidden email]

I aim to repeat a clustering assignment & the client wants to ensure some degree of consistency and has suggested setting initial cluster centres.

Where can I get more information on this?

I tend to think it’s only applicable to K means but I’m not sure of this.

Is there any SPSS based info out that can be recommended to me?

 

Regards

 

Mark Webb

 

+27 21 786 4379

+27 72 199 1000

Skype - webbmark

[hidden email]

 

 


What can you do with the new Windows Live? Find out



NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR CONFIDENTIAL information and is intended only for the use of the specific individual(s) to whom it is addressed. It may contain information that is privileged and confidential under state and federal law. This information may be used or disclosed only in accordance with law, and you may be subject to penalties under law for improper use or further disclosure of the information in this e-mail and its attachments. If you have received this e-mail in error, please immediately notify the person named above by reply e-mail, and then delete the original e-mail. Thank you.
Reply | Threaded
Open this post in threaded view
|

R: Setting Cluster Centre - Information

Luca Meyer-3
In reply to this post by Mark Webb-3
Hi Mark,
 
I did something similar sometimes ago.
 
In my case I used hierarchical clustering on some random samples of the database I was provided with and used the outcome to set centers for a k-means clustering procedure. I did that procedure several times using different initial random samples to evaluate how much robust would my results be to random variability.
 
This is an extract of the syntax I used:
 
CLUSTER  S12_1_D56A TO S12_1_D56M
  /METHOD BAVERAGE
  /MEASURE= SEUCLID
  /PRINT SCHEDULE
  /PLOT VICICLE
  /SAVE CLUSTER(2,5) . /* this because I believed that in the data there could be between 2 and 5 natural groups */
SAVE OUTFILE !TEMP+"TEMP.SAV".
GET FILE !TEMP+"TEMP.SAV".
SELECT IF NOT SYSMIS(CLU2_1).
COMPUTE CLUSTER_=CLU2_1.
SORT CASES BY CLUSTER_.
AGGREGATE OUTFILE=*
        /PRESORTED
        /BREAK=CLUSTER_
        /S12_1_D56A = MEAN (S12_1_D56A)
        /S12_1_D56B = MEAN (S12_1_D56B)
        /S12_1_D56C = MEAN (S12_1_D56C)
        /S12_1_D56D = MEAN (S12_1_D56D)
        /S12_1_D56E = MEAN (S12_1_D56E)
        /S12_1_D56F = MEAN (S12_1_D56F)
        /S12_1_D56G = MEAN (S12_1_D56G)
        /S12_1_D56H = MEAN (S12_1_D56H)
        /S12_1_D56J = MEAN (S12_1_D56J)
        /S12_1_D56K = MEAN (S12_1_D56K)
        /S12_1_D56I = MEAN (S12_1_D56I)
        /S12_1_D56L = MEAN (S12_1_D56L)
        /S12_1_D56M = MEAN (S12_1_D56M).
SELECT IF NOT SYSMIS(CLUSTER_).
SAVE OUTFILE !TEMP+"CLUSTER.SAV".
GET FILE !DATI+"S12_CLUSTERING.SAV".
QUICK CLUSTER S12_1_D56A TO S12_1_D56M
  /MISSING=LISTWISE
  /CRITERIA= CLUSTER(2) MXITER(100) CONVERGE(0)
  /METHOD=KMEANS(UPDATE )
  /SAVE CLUSTER
  /PRINT INITIAL ANOVA
  /FILE=!TEMP+"CLUSTER.SAV".
HTH,
 
Luca


Da: SPSSX(r) Discussion [mailto:[hidden email]] Per conto di Mark Webb
Inviato: lunedì 9 febbraio 2009 9.20
A: [hidden email]
Oggetto: Setting Cluster Centre - Information

I aim to repeat a clustering assignment & the client wants to ensure some degree of consistency and has suggested setting initial cluster centres.

Where can I get more information on this?

I tend to think it’s only applicable to K means but I’m not sure of this.

Is there any SPSS based info out that can be recommended to me?

 

Regards

 

Mark Webb

 

+27 21 786 4379

+27 72 199 1000

Skype - webbmark

[hidden email]

 

Reply | Threaded
Open this post in threaded view
|

Re: Bizarre regression results - predictors still added when Rsquared is already 1.0000

Art Kendall
In reply to this post by Ornelas, Fermin-2
stepwise is usually unwise.


To check on an off the wall guess.
Go to you listing.  open the table of results.  expand all of the r-squareds to as many decimals as you can.  Are there changes in very low significant digits?
Open the table of changes in r-squareds. expand all of the r-squareds to as many decimals as you can.  Are there changes in very low significant digits?

did you change some parameter for inclusion?

Art Kendall
Social Research Consultants

Ornelas, Fermin wrote:

I do not think it is wise to use stepwise in isolation. It seems likely that your predictors contribute the same information to the regression function. You should have run some model diagnostics on each of your runs to get clues as to what is causing this problem.  Descriptive statistics could also hint as possible shortcomings in your predictors (% missing values, binary variables, values out of range for a variable, etc).

 

F Ornelas

 


From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Ruben van den Berg
Sent: Monday, February 09, 2009 6:09 AM
To: [hidden email]
Subject: Bizarre regression results - predictors still added when Rsquared is already 1.0000

 

Dear all,
 
I've been running a stepwise regression and the results are very weird. When the third predictor is added to the model, the Rsquared is perfect but nevertheless 6 additional predictors are added. Also, AFTER step 2, t-values are no longer calculated. There's 9 predictors and 184 cases so that shouldn't be the source of the problem. Does anyone have an idea what's going wrong here? Is the problem in the correlation matrix?
 
TIA
 
 

Model Summary

Model

R

R Square

Adjusted R Square

Std. Error of the Estimate

1

,719(a)

,517

,514

,537

2

,811(b)

,658

,654

,453

3

1,000(c)

1,000

1,000

,000

4

1,000(d)

1,000

1,000

,000

5

1,000(e)

1,000

1,000

,000

6

1,000(f)

1,000

1,000

,000

7

1,000(g)

1,000

1,000

,000

8

1,000(h)

1,000

1,000

,000

9

1,000(i)

1,000

1,000

,000

a Predictors: (Constant), V410_9

b Predictors: (Constant), V410_9, V410_3

c Predictors: (Constant), V410_9, V410_3, V410_6

d Predictors: (Constant), V410_9, V410_3, V410_6, V410_2

e Predictors: (Constant), V410_9, V410_3, V410_6, V410_2, V410_8

f Predictors: (Constant), V410_9, V410_3, V410_6, V410_2, V410_8, V410_1

g Predictors: (Constant), V410_9, V410_3, V410_6, V410_2, V410_8, V410_1, V410_4

h Predictors: (Constant), V410_9, V410_3, V410_6, V410_2, V410_8, V410_1, V410_4, V410_7

i Predictors: (Constant), V410_9, V410_3, V410_6, V410_2, V410_8, V410_1, V410_4, V410_7, V410_5








 



Date: Mon, 9 Feb 2009 10:20:11 +0200
From: [hidden email]
Subject: Setting Cluster Centre - Information
To: [hidden email]

I aim to repeat a clustering assignment & the client wants to ensure some degree of consistency and has suggested setting initial cluster centres.

Where can I get more information on this?

I tend to think it’s only applicable to K means but I’m not sure of this.

Is there any SPSS based info out that can be recommended to me?

 

Regards

 

Mark Webb

 

+27 21 786 4379

+27 72 199 1000

Skype - webbmark

[hidden email]

 

 


What can you do with the new Windows Live? Find out



NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR CONFIDENTIAL information and is intended only for the use of the specific individual(s) to whom it is addressed. It may contain information that is privileged and confidential under state and federal law. This information may be used or disclosed only in accordance with law, and you may be subject to penalties under law for improper use or further disclosure of the information in this e-mail and its attachments. If you have received this e-mail in error, please immediately notify the person named above by reply e-mail, and then delete the original e-mail. Thank you.
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Setting Cluster Centre - Information

Art Kendall
In reply to this post by Mark Webb-3
Please tell us more about what you want to accomplish.
Twostep can take an xml model from one set and apply it to new data.
K-means can take centroids as initial cluster centers.

If you previous clustering was non-hierarchical,  it may be that you want to do a DISCRIMINANT.
copy the cluster memberships into anew variables to use as the groups.  Add the cases to the original file with a value on the grouping variable that is one higher than the number of original clusters.  If you have 4 clusters call the new cases group 5.  Do the dfa on groups 1 to 4.  In the classification phase use the option to include ungrouped cases.

You will get an assigned group by original matrix, and for each case its assigned group, its probability of membership in each of the 4 groups, and something that tells how close the case is to the centroid of the assigned group.  (I do not recall whether it is a distance or a probability as member of that group would be so far out.

More specific suggestions would need more detail about your work.

Art Kendall
Social Research Consultants

Mark Webb wrote:

I aim to repeat a clustering assignment & the client wants to ensure some degree of consistency and has suggested setting initial cluster centres.

Where can I get more information on this?

I tend to think it’s only applicable to K means but I’m not sure of this.

Is there any SPSS based info out that can be recommended to me?

 

Regards

 

Mark Webb

 

+27 21 786 4379

+27 72 199 1000

Skype - webbmark

[hidden email]

 

Art Kendall
Social Research Consultants