Grabbing the list of elements from CLUSTER DENDOGRAM

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Grabbing the list of elements from CLUSTER DENDOGRAM

Tim Graettinger
Hi,

I'd like to capture the ordering of the elements from a dendogram produced by the CLUSTER procedure.  It appears to come out only in chart form (no table), and I don't seem to see a way to grab the list of elements with the OMS system.  Here's the code I'm using in SPSS 25 to run the hierarchical clustering on the unordered list of data elements, say eScale=[AnnualTax, BldgSqFt, LotSqFt, SalePrice, TotalAssessment], to produce the dendogram.

BEGIN PROGRAM PYTHON.
import spss

spss.Submit('''
DATASET DECLARE dProximities.
PROXIMITIES %s /MATRIX OUT(dProximities) /VIEW=VARIABLE /MEASURE=CORRELATION /STANDARDIZE=VARIABLE NONE /PRINT NONE.
CLUSTER /MATRIX IN(dProximities) /METHOD BAVERAGE /PLOT DENDROGRAM /PRINT NONE.
DATASET CLOSE dProximities.
'''%(' '.join(eScale)))
END PROGRAM.

Here's the dendogram that results (sorry to resort to old school line printer style to make this post, but you get the idea):

AnnualTax        1 -|--|
TotalAssessment  5 -|  |---|
BldgSqFt         2 ----|   |---------|
SalePrice        4 --------|         |
LotSqFt          3 ------------------|


I'd like to pull out the elements in the order shown in the dendogram, that is, [AnnualTax, TotalAssessment,BldgSqFt,
SalePrice, and LotSqFt], preferably as a Python list.  I find the ordering very useful for (re)-ordering the rows and columns of a
correlation matrix so that the most similar elements are close together.  Thanks for the help.

-Tim

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Grabbing the list of elements from CLUSTER DENDOGRAM

Jon Peck
I think you could figure this out from the Agglomeration Schedule table.  Alternatively, capture the chart as xml using OMS and extract the data you want from the xml. I haven't worked with this xml, but what you want is probably within the embedded source element, which might look like this.
<embeddedSource id="ClusterList">
        <names>CaseNumber;Distance;EndID;StartID;Y</names>
<types>string;string;string;string;string</types>
<row>;;;;</row>
<row>1;0;2;1;4</row>
<row>;1;;2;3</row>
<row>2;0;4;3;3</row>
<row>;1;;4;4</row>
<row>2;0;6;5;3</row>
<row>;1;;6;2</row>
<row>;25;8;7;3</row>
<row>;1;;8;4</row>
<row>;25;10;9;3</row>
<row>;1;;10;2</row>
<row>;1;12;11;3</row>
<row>;25;;12;1</row>
<row>3;0;14;13;2</row>
<row>;1;;14;3</row>
<row>4;0;16;15;1</row>
<row>;25;;16;3</row>
</embeddedSource>

Good luck :-)

On Fri, Mar 23, 2018 at 6:32 AM, Tim Graettinger <[hidden email]> wrote:
Hi,

I'd like to capture the ordering of the elements from a dendogram produced by the CLUSTER procedure.  It appears to come out only in chart form (no table), and I don't seem to see a way to grab the list of elements with the OMS system.  Here's the code I'm using in SPSS 25 to run the hierarchical clustering on the unordered list of data elements, say eScale=[AnnualTax, BldgSqFt, LotSqFt, SalePrice, TotalAssessment], to produce the dendogram.

BEGIN PROGRAM PYTHON.
import spss

spss.Submit('''
DATASET DECLARE dProximities.
PROXIMITIES %s /MATRIX OUT(dProximities) /VIEW=VARIABLE /MEASURE=CORRELATION /STANDARDIZE=VARIABLE NONE /PRINT NONE.
CLUSTER /MATRIX IN(dProximities) /METHOD BAVERAGE /PLOT DENDROGRAM /PRINT NONE.
DATASET CLOSE dProximities.
'''%(' '.join(eScale)))
END PROGRAM.

Here's the dendogram that results (sorry to resort to old school line printer style to make this post, but you get the idea):

AnnualTax        1 -|--|
TotalAssessment  5 -|  |---|
BldgSqFt         2 ----|   |---------|
SalePrice        4 --------|         |
LotSqFt          3 ------------------|


I'd like to pull out the elements in the order shown in the dendogram, that is, [AnnualTax, TotalAssessment,BldgSqFt,
SalePrice, and LotSqFt], preferably as a Python list.  I find the ordering very useful for (re)-ordering the rows and columns of a
correlation matrix so that the most similar elements are close together.  Thanks for the help.

-Tim

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Grabbing the list of elements from CLUSTER DENDOGRAM

Kirill Orlov
In reply to this post by Tim Graettinger
Tim,
My macro !DENDRO (find it in collection "Clustering" on http://www.spsstools.net/en/KO-spssmacros)
draws dendrogram. The input to the macro is the agglomeration schedule which you can borrow by OMS from CLUSTER results or save directly from !HIECLU macro of mine.

Now, if in !DENDRO body you switch off the last by one command NEW FILE, then the run of the macro will leave you the unnnamed dataset with variables xfrom#$ xto#$ yfrom#$ yto#$ lab. That last variable LAB is what you want - it contains (numeric) numbers or codes from the aggl schedule in the order as on the dendrogram.

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Grabbing the list of elements from CLUSTER DENDOGRAM

Kirill Orlov
P.S. I've just slightly modified the !DENDRO macro now. It now leaves you the objects in a new dataset in the sequence of the dendrogram. You don't have to change anything in the macro body. Have it.

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Grabbing the list of elements from CLUSTER DENDOGRAM

Tim Graettinger
In reply to this post by Tim Graettinger
Thanks, Kirill!!  I'll pick it up right away.  Can't wait to try it out.  I'm really excited about it.

Best regards,
-Tim

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD