I have a dataset with one categorical variable (age_grp) and two scale
variables (r07_sys and Q07_sys). I would like to produce a boxplot chart with the two scale variables clustered to the categorical variable. With the classical graph syntax that would look like that: EXAMINE VARIABLES=r07_sys Q07_sys BY Age_grp /COMPARE VARIABLE/PLOT=BOXPLOT /STATISTICS=NONE/NOTOTAL /MISSING=LISTWISE . How would I do that using the GGRAPH syntax? Thanks, Christian |
It is not stunningly obvious how to do this, but here is an example of the solution. There are two key differences from a simple boxplot: the COORD statement and the ELEMENT statement. If you omit the COORD statement, you will get paneled boxplots with each cluster in a separate panel. That looks pretty nice, too, depending on what you want to emphasize.
The real magic is in the ELEMENT statement phrase position(bin.quantile.letter(("Horsepower"*horse+"Miles per Gallon"*mpg)*origin) That is, blending the y variables inside the binning but separating the two sets of plots. GGRAPH /GRAPHDATASET NAME="graphdataset" VARIABLES=origin mpg horse MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE. BEGIN GPL SOURCE: s=userSource(id("graphdataset")) DATA: origin=col(source(s), name("origin"), unit.category()) DATA: mpg=col(source(s), name("mpg")) DATA: horse=col(source(s), name("horse")) DATA: id=col(source(s), name("$CASENUM"), unit.category()) COORD: rect(dim(1,2), cluster(3)) SCALE: linear(dim(2), include(0)) GUIDE: axis(dim(3), label("Country of Origin")) ELEMENT: schema(position(bin.quantile.letter(("Horsepower"*horse+"Miles per Gallon"*mpg)*origin)), label(id), color("Horsepower"+"Miles per Gallon"))) END GPL. If you leave out the color phrase, all the boxplots will be the same color. If you change the COORD statement to COORD: rect(dim(1,2), cluster(3), transpose()) you will get horizontal boxes, which may work better if you have a lot of categories. HTH, Jon Peck (with help from Rick Oswald and ViAnn Beadle) -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of la volta statistics Sent: Sunday, January 14, 2007 12:14 PM To: [hidden email] Subject: [SPSSX-L] Clustered boxplots with GGRAPH I have a dataset with one categorical variable (age_grp) and two scale variables (r07_sys and Q07_sys). I would like to produce a boxplot chart with the two scale variables clustered to the categorical variable. With the classical graph syntax that would look like that: EXAMINE VARIABLES=r07_sys Q07_sys BY Age_grp /COMPARE VARIABLE/PLOT=BOXPLOT /STATISTICS=NONE/NOTOTAL /MISSING=LISTWISE . How would I do that using the GGRAPH syntax? Thanks, Christian |
Thanks Jon, Rick, and ViAnn
Thanks, it works when I omit the COORD statement or change the COORD Statement to: COORD: rect(dim(1,2), cluster(0)) resulting in paneled charts. Unfortunately, I can not get a unpaneled chart [using: COORD: rect(dim(1,2), cluster(3))]. Even when I use your syntax and the Cars.sav file. The message that appears is: "The requested chart can not be drawn. Can not have infinite or NaN for tick segment." I am using SPSS Version 14. Christian -----Ursprüngliche Nachricht----- Von: SPSSX(r) Discussion [mailto:[hidden email]]Im Auftrag von Peck, Jon Gesendet: Montag, 15. Januar 2007 14:11 An: [hidden email] Betreff: Re: Clustered boxplots with GGRAPH It is not stunningly obvious how to do this, but here is an example of the solution. There are two key differences from a simple boxplot: the COORD statement and the ELEMENT statement. If you omit the COORD statement, you will get paneled boxplots with each cluster in a separate panel. That looks pretty nice, too, depending on what you want to emphasize. The real magic is in the ELEMENT statement phrase position(bin.quantile.letter(("Horsepower"*horse+"Miles per Gallon"*mpg)*origin) That is, blending the y variables inside the binning but separating the two sets of plots. GGRAPH /GRAPHDATASET NAME="graphdataset" VARIABLES=origin mpg horse MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE. BEGIN GPL SOURCE: s=userSource(id("graphdataset")) DATA: origin=col(source(s), name("origin"), unit.category()) DATA: mpg=col(source(s), name("mpg")) DATA: horse=col(source(s), name("horse")) DATA: id=col(source(s), name("$CASENUM"), unit.category()) COORD: rect(dim(1,2), cluster(3)) SCALE: linear(dim(2), include(0)) GUIDE: axis(dim(3), label("Country of Origin")) ELEMENT: schema(position(bin.quantile.letter(("Horsepower"*horse+"Miles per Gallon"*mpg)*origin)), label(id), color("Horsepower"+"Miles per Gallon"))) END GPL. If you leave out the color phrase, all the boxplots will be the same color. If you change the COORD statement to COORD: rect(dim(1,2), cluster(3), transpose()) you will get horizontal boxes, which may work better if you have a lot of categories. HTH, Jon Peck (with help from Rick Oswald and ViAnn Beadle) -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of la volta statistics Sent: Sunday, January 14, 2007 12:14 PM To: [hidden email] Subject: [SPSSX-L] Clustered boxplots with GGRAPH I have a dataset with one categorical variable (age_grp) and two scale variables (r07_sys and Q07_sys). I would like to produce a boxplot chart with the two scale variables clustered to the categorical variable. With the classical graph syntax that would look like that: EXAMINE VARIABLES=r07_sys Q07_sys BY Age_grp /COMPARE VARIABLE/PLOT=BOXPLOT /STATISTICS=NONE/NOTOTAL /MISSING=LISTWISE . How would I do that using the GGRAPH syntax? Thanks, Christian |
Free forum by Nabble | Edit this page |