SPSSX Discussion - Re: Jittering interval and ordinal variables in graphs. was "What does this graph means?"

Re: Jittering interval and ordinal variables in graphs. was "What does this graph means?"

Posted by Bruce Weaver on Jan 30, 2013; 9:54pm
URL: http://spssx-discussion.165.s1.nabble.com/What-does-this-graph-means-tp5717745p5717841.html

Aha. I was thinking of tackling it that way, but did not realize one could get REGRESSION to run an intercept-only model by including a constant as the lone predictor. Thanks for enlightening me, Andy. That will be useful.

And of course, if you add PRED to the /SAVE line, you can save the mean to your file; and MCIN will give you the usual CI for the mean.

new file.
dataset close all.
data list free / X (f5.1).
begin data
5.5 5.2 5.2 5.8 5.6 4.6 5.6 5.9 4.7 5 5.7 5.2
end data.

compute cons_break = 1.

REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10) CIN(95)
/ORIGIN
/DEPENDENT X
/METHOD=ENTER cons_break
/SAVE PRED (Xbar) ICIN (IndPI) MCIN (CImean).

temp.
select if $casenum EQ 1.
list Xbar to UIndPI.

OUTPUT:
Xbar LCImean UCImean LIndPI UIndPI

5.33333 5.06605 5.60062 4.36962 6.29705

* CImean = CI for the population mean.
* IndPI = individual prediction interval, aka the standard reference range.
* L = lower limit of interval; U = upper limit.

Andy W wrote

Yes I agree Ian that confidence ellipses would be very nice (ditto for contour plots). It appears error bars (at least for the confidence interval and arbitrary numbers of standard errors/deviations) have been available for a long time via the legacy graphs. To produce prediction intervals would have taken some more math (but not much) as I demonstrate below.

Anyway the GGRAPH language gives the man the tools to make your own error bars and plot them! (This is true to a certain extent with the confidence ellipses, but the pains are much greater).

Below is an example with Bruce's same data. NOTE you can wrangle the regression procedures to give prediction intervals using an empty model with just an intercept, which is what I do here, and is a convenient way to get "non-normal" prediction intervals for other types of data that aren't normally distributed (e.g. you could get intervals for count data using generalized linear models, or proportion data using logit etc.) The logic extends to multiple groups as well (just omit the intercept and have dummy variables for each of the groups).

**********************************************************************************************.
new file.
dataset close all.
data list free / X (f5.1).
begin data
5.5 5.2 5.2 5.8 5.6 4.6 5.6 5.9 4.7 5 5.7 5.2
end data.

compute cons_break = 1.

REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10) CIN(95)
/ORIGIN
/DEPENDENT X
/METHOD=ENTER cons_break
/SAVE ICIN (CI_REG).
*This produces two variables one "LCI_REG" for the lower bound and "UCI_REG" for the upper.
*Default is 95% prediction intervals.

*now once you have the data can add it to a chart.
formats X (F2.1).

* Chart Builder.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=X LCI_REG UCI_REG
MISSING= LISTWISE REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: X=col(source(s), name("X"))
DATA: LCI_REG=col(source(s), name("LCI_REG"))
DATA: UCI_REG =col(source(s), name("UCI_REG"))
COORD: rect(dim(1))
GUIDE: axis(dim(1), label("X"))
ELEMENT: point.dodge.symmetric(position(bin.dot(X, binStart(4), binWidth(0.1))),
color.interior(color.grey), size(size."12"))
ELEMENT: interval(position(region.spread.range(LCI_REG + UCI_REG )),
shape(shape.ibeam))
ELEMENT: point(position(summary.mean(X)), shape(shape.square),
color.interior(color.black), size(size."15"))
END GPL.

*Or just solve yourself and use legacy graphs.
*You will want STDEV = t-value*[sqrt(1 + 1/n)].
*So in this example t-value = 1.96, n = 12, so STDEV = 1.96*SQRT(1 + 1/12) = 2.04.

GRAPH
/ERRORBAR( STDDEV 2.04 )=X BY cons_break .
**********************************************************************************************.

I will update later with examples using multiple groups and error bars (as the change in graph algebra necessary may not be totally intuitive), and with examples of many more points (where jittering is more useful than dodging the points). Note for prediction intervals it is T-value*SQRT(1 + 1/n), as n gets higher SQRT(1 + 1/n) gets closer to 1, so essentially predictions intervals end up being 2 standard deviations away from the mean with 20+ observations.

Thanks Mike for sharing the follow up articles to Drummond, It seems reasonable to me except the part disagreeing about the use of dynamite plots:

"Dynamite-plunger plots are never an appropriate way to plot the data. We agree that bar graphs are inappropriate to display means of repeated measurements, but they are fine to convey magnitudes of differences in means in different groups, provided the error bars are standard deviations."

The ignorance of this astounds me. So we know from the work of Cleveland that people evaluate points along a common axis more accurately than they do the length of bars, and we know that we want to convey the uncertainty in the data via error bars, and we know that using bars obfuscates half of the error interval (heaven forbid one display an asymmetrical interval). Tukey's EDA book has a great example of the absurdity of this comment (using bars to display differences in means).

I can understand that it is not always appropriate to display the individual points, but to suggest the bar "dynamite" plots are more appropriate than graphs displaying error intervals is ridiculous.

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).