SPSSX Discussion - Re: Jittering interval and ordinal variables in graphs. was "What does this graph means?"

Re: Jittering interval and ordinal variables in graphs. was "What does this graph means?"

Posted by Jon K Peck on Jan 30, 2013; 9:02pm
URL: http://spssx-discussion.165.s1.nabble.com/What-does-this-graph-means-tp5717745p5717837.html

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621

From: Andy W <[hidden email]>
To: [hidden email],
Date: 01/30/2013 01:55 PM
Subject: Re: [SPSSX-L] Jittering interval and ordinal variables in graphs. was "What does this graph means?"
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Yes I agree Ian that confidence ellipses would be very nice (ditto for contour plots). It appears error bars (at least for the confidence interval and arbitrary numbers of standard errors/deviations) have been available for a long time via the legacy graphs. To produce prediction intervals would have taken some more math (but not much) as I demonstrate below.

>>>Error bars of various types are available for quite a few graphics elements in the Chart Builder as well.
Anyway the GGRAPH language gives the man the tools to make your own error bars and plot them! (This is true to a certain extent with the confidence ellipses, but the pains are much greater).

>>>There is a custom dialog, Graphs > Scatterplot with Data Ellipse, available from the SPSS Community website. Below is an example with Bruce's same data. NOTE you can wrangle the regression procedures to give prediction intervals using an empty model with just an intercept, which is what I do here, and is a convenient way to get "non-normal" prediction intervals for other types of data that aren't normally distributed (e.g. you could get intervals for count data using generalized linear models, or proportion data using logit etc.) The logic extends to multiple groups as well (just omit the intercept and have dummy variables for each of the groups). **********************************************************************************************. new file. dataset close all. data list free / X (f5.1). begin data 5.5 5.2 5.2 5.8 5.6 4.6 5.6 5.9 4.7 5 5.7 5.2 end data. compute cons_break = 1. REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA /CRITERIA=PIN(.05) POUT(.10) CIN(95) /ORIGIN /DEPENDENT X /METHOD=ENTER cons_break /SAVE ICIN (CI_REG). *This produces two variables one "LCI_REG" for the lower bound and "UCI_REG" for the upper. *Default is 95% prediction intervals. *now once you have the data can add it to a chart. formats X (F2.1). * Chart Builder. GGRAPH /GRAPHDATASET NAME="graphdataset" VARIABLES=X LCI_REG UCI_REG MISSING= LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE. BEGIN GPL SOURCE: s=userSource(id("graphdataset")) DATA: X=col(source(s), name("X")) DATA: LCI_REG=col(source(s), name("LCI_REG")) DATA: UCI_REG =col(source(s), name("UCI_REG")) COORD: rect(dim(1)) GUIDE: axis(dim(1), label("X")) ELEMENT: point.dodge.symmetric(position(bin.dot(X, binStart(4), binWidth(0.1))), color.interior(color.grey), size(size."12")) ELEMENT: interval(position(region.spread.range(LCI_REG + UCI_REG )), shape(shape.ibeam)) ELEMENT: point(position(summary.mean(X)), shape(shape.square), color.interior(color.black), size(size."15")) END GPL. *Or just solve yourself and use legacy graphs. *You will want STDEV = t-value*[sqrt(1 + 1/n)]. *So in this example t-value = 1.96, n = 12, so STDEV = 1.96*SQRT(1 + 1/12) = 2.04. GRAPH /ERRORBAR( STDDEV 2.04 )=X BY cons_break . **********************************************************************************************. I will update later with examples using multiple groups and error bars (as the change in graph algebra necessary may not be totally intuitive), and with examples of many more points (where jittering is more useful than dodging the points). Note for prediction intervals it is T-value*SQRT(1 + 1/n), as n gets higher SQRT(1 + 1/n) gets closer to 1, so essentially predictions intervals end up being 2 standard deviations away from the mean with 20+ observations. Thanks Mike for sharing the follow up articles to Drummond, It seems reasonable to me except the part disagreeing about the use of dynamite plots: "Dynamite-plunger plots are never an appropriate way to plot the data. We agree that bar graphs are inappropriate to display means of repeated measurements, but they are fine to convey magnitudes of differences in means in different groups, provided the error bars are standard deviations." The ignorance of this astounds me. So we know from the work of Cleveland that people evaluate points along a common axis more accurately than they do the length of bars, and we know that we want to convey the uncertainty in the data via error bars, and we know that using bars obfuscates half of the error interval (heaven forbid one display an asymmetrical interval). Tukey's EDA book has a great example of the absurdity of this comment (using bars to display differences in means). I can understand that it is not always appropriate to display the individual points, but to suggest the bar "dynamite" plots are more appropriate than graphs displaying error intervals is ridiculous. ----- Andy W [hidden email]http://andrewpwheeler.wordpress.com/-- View this message in context:http://spssx-discussion.1045642.n5.nabble.com/What-does-this-graph-means-tp5717745p5717836.htmlSent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD