Login  Register

Re: Jittering interval and ordinal variables in graphs. was "What does this graph means?"

Posted by Jon K Peck on Jan 30, 2013; 9:02pm
URL: http://spssx-discussion.165.s1.nabble.com/What-does-this-graph-means-tp5717745p5717837.html



Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        Andy W <[hidden email]>
To:        [hidden email],
Date:        01/30/2013 01:55 PM
Subject:        Re: [SPSSX-L] Jittering interval and ordinal variables in graphs.              was "What              does this graph means?"
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Yes I agree Ian that confidence ellipses would be very nice (ditto for
contour plots). It appears error bars (at least for the confidence interval
and arbitrary numbers of standard errors/deviations) have been available for
a long time via the legacy graphs. To produce prediction intervals would
have taken some more math (but not much) as I demonstrate below.


>>>Error bars of various types are available for quite a few graphics elements in the Chart Builder as well.


Anyway the GGRAPH language gives the man the tools to make your own error
bars and plot them! (This is true to a certain extent with the confidence
ellipses, but the pains are much greater).


>>>There is a custom dialog, Graphs > Scatterplot with Data Ellipse, available from the SPSS Community website.

Below is an example with Bruce's same data. NOTE you can wrangle the
regression procedures to give prediction intervals using an empty model with
just an intercept, which is what I do here, and is a convenient way to get
"non-normal" prediction intervals for other types of data that aren't
normally distributed (e.g. you could get intervals for count data using
generalized linear models, or proportion data using logit etc.) The logic
extends to multiple groups as well (just omit the intercept and have dummy
variables for each of the groups).

**********************************************************************************************.
new file.
dataset close all.
data list free / X (f5.1).
begin data
5.5 5.2 5.2 5.8 5.6 4.6 5.6 5.9 4.7 5 5.7 5.2
end data.

compute cons_break = 1.

REGRESSION
 /MISSING LISTWISE
 /STATISTICS COEFF OUTS R ANOVA
 /CRITERIA=PIN(.05) POUT(.10) CIN(95)
 /ORIGIN
 /DEPENDENT X
 /METHOD=ENTER cons_break
 /SAVE ICIN (CI_REG).
*This produces two variables one "LCI_REG" for the lower bound and "UCI_REG"
for the upper.
*Default is 95% prediction intervals.

*now once you have the data can add it to a chart.
formats X (F2.1).

* Chart Builder.
GGRAPH
 /GRAPHDATASET NAME="graphdataset" VARIABLES=X LCI_REG UCI_REG
  MISSING= LISTWISE REPORTMISSING=NO
 /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: X=col(source(s), name("X"))
DATA: LCI_REG=col(source(s), name("LCI_REG"))
DATA: UCI_REG =col(source(s), name("UCI_REG"))
COORD: rect(dim(1))
GUIDE: axis(dim(1), label("X"))
ELEMENT: point.dodge.symmetric(position(bin.dot(X, binStart(4),
binWidth(0.1))),
         color.interior(color.grey), size(size."12"))
ELEMENT: interval(position(region.spread.range(LCI_REG + UCI_REG )),
         shape(shape.ibeam))
ELEMENT: point(position(summary.mean(X)), shape(shape.square),
         color.interior(color.black), size(size."15"))
END GPL.

*Or just solve yourself and use legacy graphs.
*You will want STDEV = t-value*[sqrt(1 + 1/n)].
*So in this example t-value = 1.96, n = 12, so STDEV = 1.96*SQRT(1 + 1/12) =
2.04.

GRAPH
 /ERRORBAR( STDDEV 2.04 )=X BY cons_break .
**********************************************************************************************.

I will update later with examples using multiple groups and error bars (as
the change in graph algebra necessary may not be totally intuitive), and
with examples of many more points (where jittering is more useful than
dodging the points). Note for prediction intervals it is T-value*SQRT(1 +
1/n), as n gets higher SQRT(1 + 1/n) gets closer to 1, so essentially
predictions intervals end up being 2 standard deviations away from the mean
with 20+ observations.

Thanks Mike for sharing the follow up articles to Drummond, It seems
reasonable to me except the part disagreeing about the use of dynamite
plots:

"Dynamite-plunger plots are never an appropriate way to plot the data. We
agree that bar graphs are inappropriate to display means of repeated
measurements, but they are fine to convey magnitudes of differences in means
in different groups, provided the error bars are standard deviations."

The ignorance of this astounds me. So we know from the work of Cleveland
that people evaluate points along a common axis more accurately than they do
the length of bars, and we know that we want to convey the uncertainty in
the data via error bars, and we know that using bars obfuscates half of the
error interval (heaven forbid one display an asymmetrical interval). Tukey's
EDA book has a great example of the absurdity of this comment (using bars to
display differences in means).

I can understand that it is not always appropriate to display the individual
points, but to suggest the bar "dynamite" plots are more appropriate than
graphs displaying error intervals is ridiculous.



-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/What-does-this-graph-means-tp5717745p5717836.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD