SPSSX Discussion

Box plot - arrange by medians

Classic

List

Threaded

2 messages Options

paul wilson-7

Box plot - arrange by medians

I am using a box plot to show variance over 20 different categories of my categorical (factor) variable.

I was wondering if there is an option to sort the appearance of each category on the box plot graph so the categories with similar medians appear close to one another (either descending or ascending order is fine) as oposed to categories showing up in alphabetical order.

Thanks

ViAnn Beadle

Re: Box plot - arrange by medians

You can sort an axis by a statistic in GPL using the sort.statistic function which takes a statistic computed by GPL. The function is used in the SCALE statement for the dimension being sorted.

This example uses the Employee Data.sav sample data file:

GGRAPH

/GRAPHDATASET NAME="graphdataset" VARIABLES=minority salary MISSING=LISTWISE REPORTMISSING=NO

/GRAPHSPEC SOURCE=INLINE.

BEGIN GPL

SOURCE: s=userSource(id("graphdataset"))

DATA: minority=col(source(s), name("minority"), unit.category())

DATA: salary=col(source(s), name("salary"))

DATA: id=col(source(s), name("$CASENUM"), unit.category())

GUIDE: axis(dim(1), label("Minority Classification"))

GUIDE: axis(dim(2), label("Current Salary"))

SCALE: cat(dim(1), sort.statistic(summary.median(salary)))

SCALE: linear(dim(2), include(0))

ELEMENT: schema(position(bin.quantile.letter(minority*salary)), label(id))

END GPL.

Notes:

1. The boxplot is drawn using the schema element. The bin.quantile.letter function computes the numbers necessary to draw the boxplot plus label the outliers.

2. Dim(1) on the SCALE statement refers to the first dimension which is the X axis. The ID variable is created from the $CASENUM variable to label the outlier points. Note that you could just as well use another identifier variable to label them.

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of paul wilson
Sent: Tuesday, September 08, 2009 12:51 PM
To: [hidden email]
Subject: Box plot - arrange by medians

I am using a box plot to show variance over 20 different categories of my categorical (factor) variable.

Thanks