Clustered boxplots with GGRAPH

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Clustered boxplots with GGRAPH

la volta statistics
I have a dataset with one categorical variable (age_grp) and two scale
variables (r07_sys and Q07_sys).
I would like to produce a boxplot chart with the two scale variables
clustered to the categorical variable.
With the classical graph syntax that would look like that:
EXAMINE
  VARIABLES=r07_sys Q07_sys BY Age_grp /COMPARE VARIABLE/PLOT=BOXPLOT
 /STATISTICS=NONE/NOTOTAL
  /MISSING=LISTWISE .

How would I do that using the GGRAPH syntax?
Thanks, Christian
Reply | Threaded
Open this post in threaded view
|

Re: Clustered boxplots with GGRAPH

Peck, Jon
It is not stunningly obvious how to do this, but here is an example of the solution.  There are two key differences from a simple boxplot: the COORD statement and the ELEMENT statement.  If you omit the COORD statement, you will get paneled boxplots with each cluster in a separate panel.  That looks pretty nice, too, depending on what you want to emphasize.

The real magic is in the ELEMENT statement phrase
position(bin.quantile.letter(("Horsepower"*horse+"Miles per Gallon"*mpg)*origin)

That is, blending the y variables inside the binning but separating the two sets of plots.

GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=origin mpg horse
   MISSING=LISTWISE REPORTMISSING=NO
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: origin=col(source(s), name("origin"), unit.category())
  DATA: mpg=col(source(s), name("mpg"))
  DATA: horse=col(source(s), name("horse"))
  DATA: id=col(source(s), name("$CASENUM"), unit.category())
  COORD: rect(dim(1,2), cluster(3))
  SCALE: linear(dim(2), include(0))
  GUIDE: axis(dim(3), label("Country of Origin"))

  ELEMENT: schema(position(bin.quantile.letter(("Horsepower"*horse+"Miles per Gallon"*mpg)*origin)), label(id), color("Horsepower"+"Miles per Gallon")))
END GPL.

If you leave out the color phrase, all the boxplots will be the same color.

If you change the COORD statement to
  COORD: rect(dim(1,2), cluster(3), transpose())
you will get horizontal boxes, which may work better if you have a lot of categories.

HTH,
Jon Peck (with help from Rick Oswald and ViAnn Beadle)


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of la volta statistics
Sent: Sunday, January 14, 2007 12:14 PM
To: [hidden email]
Subject: [SPSSX-L] Clustered boxplots with GGRAPH

I have a dataset with one categorical variable (age_grp) and two scale
variables (r07_sys and Q07_sys).
I would like to produce a boxplot chart with the two scale variables
clustered to the categorical variable.
With the classical graph syntax that would look like that:
EXAMINE
  VARIABLES=r07_sys Q07_sys BY Age_grp /COMPARE VARIABLE/PLOT=BOXPLOT
 /STATISTICS=NONE/NOTOTAL
  /MISSING=LISTWISE .

How would I do that using the GGRAPH syntax?
Thanks, Christian
Reply | Threaded
Open this post in threaded view
|

AW: Clustered boxplots with GGRAPH

la volta statistics
Thanks Jon, Rick, and ViAnn

Thanks, it works when I omit the COORD statement or change the COORD
Statement to:
   COORD: rect(dim(1,2), cluster(0))
resulting in paneled charts.
Unfortunately, I can not get a unpaneled chart [using: COORD: rect(dim(1,2),
cluster(3))]. Even when I use your syntax and the Cars.sav file. The message
that appears is:
"The requested chart can not be drawn.
Can not have infinite or NaN for tick segment."

I am using SPSS Version 14.

Christian

-----Ursprüngliche Nachricht-----
Von: SPSSX(r) Discussion [mailto:[hidden email]]Im Auftrag von
Peck, Jon
Gesendet: Montag, 15. Januar 2007 14:11
An: [hidden email]
Betreff: Re: Clustered boxplots with GGRAPH


It is not stunningly obvious how to do this, but here is an example of the
solution.  There are two key differences from a simple boxplot: the COORD
statement and the ELEMENT statement.  If you omit the COORD statement, you
will get paneled boxplots with each cluster in a separate panel.  That looks
pretty nice, too, depending on what you want to emphasize.

The real magic is in the ELEMENT statement phrase
position(bin.quantile.letter(("Horsepower"*horse+"Miles per
Gallon"*mpg)*origin)

That is, blending the y variables inside the binning but separating the two
sets of plots.

GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=origin mpg horse
   MISSING=LISTWISE REPORTMISSING=NO
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: origin=col(source(s), name("origin"), unit.category())
  DATA: mpg=col(source(s), name("mpg"))
  DATA: horse=col(source(s), name("horse"))
  DATA: id=col(source(s), name("$CASENUM"), unit.category())
  COORD: rect(dim(1,2), cluster(3))
  SCALE: linear(dim(2), include(0))
  GUIDE: axis(dim(3), label("Country of Origin"))

  ELEMENT: schema(position(bin.quantile.letter(("Horsepower"*horse+"Miles
per Gallon"*mpg)*origin)), label(id), color("Horsepower"+"Miles per
Gallon")))
END GPL.

If you leave out the color phrase, all the boxplots will be the same color.

If you change the COORD statement to
  COORD: rect(dim(1,2), cluster(3), transpose())
you will get horizontal boxes, which may work better if you have a lot of
categories.

HTH,
Jon Peck (with help from Rick Oswald and ViAnn Beadle)


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of la
volta statistics
Sent: Sunday, January 14, 2007 12:14 PM
To: [hidden email]
Subject: [SPSSX-L] Clustered boxplots with GGRAPH

I have a dataset with one categorical variable (age_grp) and two scale
variables (r07_sys and Q07_sys).
I would like to produce a boxplot chart with the two scale variables
clustered to the categorical variable.
With the classical graph syntax that would look like that:
EXAMINE
  VARIABLES=r07_sys Q07_sys BY Age_grp /COMPARE VARIABLE/PLOT=BOXPLOT
 /STATISTICS=NONE/NOTOTAL
  /MISSING=LISTWISE .

How would I do that using the GGRAPH syntax?
Thanks, Christian