SPSSX Discussion

What does this graph means?

Classic

List

Threaded

26 messages Options

Mike

Jan 30, 2013; 4:15pm

Re: Jittering interval and ordinal variables in graphs. was "What does this graph means?"

I would only that one should probably also look at the following:

An article by Hopkins et al (2011) that identifies some problems
with the Drummond et al (2011) articles:
http://jp.physoc.org/content/589/21/5327.full

And Drummond et al's response:
http://jp.physoc.org/content/589/21/5331.full

-Mike Palij
New York University
[hidden email]

----- Original Message -----
From: "Bruce Weaver" <[hidden email]>
To: <[hidden email]>
Sent: Wednesday, January 30, 2013 10:26 AM
Subject: Re: Jittering interval and ordinal variables in graphs. was "What
does this graph means?"

> Art, this should help in chasing down the other articles in the series:
>
> http://jp.physoc.org/cgi/collection/stats_reporting?page=1
> http://jp.physoc.org/cgi/collection/stats_reporting?page=2
>
>
>
> Art Kendall wrote
>> I really like the idea
>> of jittering for interval e.g., Likert,
>> and ordinal variables.
>> Especially since they are often operationalizations of� constructs
>> that are continuous.
>>
>> the article says that it is part of a series.� I'll try to chase
>> them down when I get some time.
>>
>> Art Kendall
>> Social Research Consultants
>> On 1/29/2013 8:47 PM, Andy W wrote:
>>
>>
>> Ahh! This is the point I was making by advocating the jittered
>> scatterplot,
>> how exactly do you make the box-plots for the categories with only 1 and
>> 3
>> observations! Box-plots are certainly reasonable with more data, but with
>> this few of points jittered scatterplots work quite well.
>>
>> To each his own whether or not people feel the "artefact" in the jittered
>> plot is unreasonable. IMO it is only unreasonable as much as you consider
>> it
>> reasonable to think of the ordinal scores as specific to the particular
>> arbitrary numeric values you assign them to begin with. And it is
>> certainly
>> not anymore a complicated topic to discuss how to interpret the jittered
>> plot than any correlation coefficient!
>>
>> For reference with synonymous situations, please read
>>
>> Drummond, Gordon B. & Sarah L. Vowler. 2011. Show the data, don’t
>> conceal
>> them <http://dx.doi.org/10.1113/jphysiol.2011.205062> . The
>> Journal of
>> Physiology 598(8): 1861-1863. PDF available from publisher.
>>
>>
>>
>> -----
>> Andy W
>
>> apwheele@
>
>>
>> http://andrewpwheeler.wordpress.com/
>> --
>> View this message in context:
>> http://spssx-discussion.1045642.n5.nabble.com/What-does-this-graph-means-tp5717745p5717798.html
>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>
>> LISTSERV@.UGA
>
>> (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>>
>>
>>
>>
>>
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>
>> LISTSERV@.UGA
>
>> (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>
>
>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/What-does-this-graph-means-tp5717745p5717814.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

... [show rest of quote]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Ian Martin-2

Jan 30, 2013; 4:19pm

Re: Jittering interval and ordinal variables in graphs. was "What does this graph means?"

In reply to this post by Andy W

I find it interesting, and a bit ironic, that in 1995 SYSTAT could do jittering of overlapping points, and do CI for the data, or for the mean, at various % values, and do bivariate jittered point confidence ellipses for both mean and data.

And then SPSS bought SYSTAT and failed to incorporate all these useful features, despite suggestions to do so.

cheers,
Ian

Ian D. Martin, Ph.D.

Tsuji Laboratory
University of Waterloo
Dept. of Environment & Resource Studies

On Jan 30, 2013, at 10:57 AM, Andy W wrote:

> Unfortunately the authors don't disclose how the plots were made. Given the
> odd arc-like regular displacement I highly suspect they used the beeswarm R
> package to create the plots (see http://www.cbs.dtu.dk/~eklund/beeswarm/).
>
> This particular displacement of the points is not easily accomplished in
> SPSS, but a suitable alternative is to use symmetric dodging. My example
> could be simplified, but the nuts and bolts of the logic I present in a blog
> post titled Avoid Dynamite Plots! Visualizing dot plots with super-imposed
> confidence intervals in SPSS and R
> (http://andrewpwheeler.wordpress.com/2012/02/20/avoid-dynamite-plots-visualizing-dot-plots-with-super-imposed-confidence-intervals-in-spss-and-r/).
>
> Thanks for giving the reference to "reference ranges"! It is good to know
> the difference in lingo between fields.
>
>
>
> -----
> Andy W
> [hidden email]
> http://andrewpwheeler.wordpress.com/
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/What-does-this-graph-means-tp5717745p5717817.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

... [show rest of quote]

Andy W

Jan 30, 2013; 8:53pm

Re: Jittering interval and ordinal variables in graphs. was "What does this graph means?"

Yes I agree Ian that confidence ellipses would be very nice (ditto for contour plots). It appears error bars (at least for the confidence interval and arbitrary numbers of standard errors/deviations) have been available for a long time via the legacy graphs. To produce prediction intervals would have taken some more math (but not much) as I demonstrate below.

Anyway the GGRAPH language gives the man the tools to make your own error bars and plot them! (This is true to a certain extent with the confidence ellipses, but the pains are much greater).

Below is an example with Bruce's same data. NOTE you can wrangle the regression procedures to give prediction intervals using an empty model with just an intercept, which is what I do here, and is a convenient way to get "non-normal" prediction intervals for other types of data that aren't normally distributed (e.g. you could get intervals for count data using generalized linear models, or proportion data using logit etc.) The logic extends to multiple groups as well (just omit the intercept and have dummy variables for each of the groups).

**********************************************************************************************.
new file.
dataset close all.
data list free / X (f5.1).
begin data
5.5 5.2 5.2 5.8 5.6 4.6 5.6 5.9 4.7 5 5.7 5.2
end data.

compute cons_break = 1.

REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10) CIN(95)
/ORIGIN
/DEPENDENT X
/METHOD=ENTER cons_break
/SAVE ICIN (CI_REG).
*This produces two variables one "LCI_REG" for the lower bound and "UCI_REG" for the upper.
*Default is 95% prediction intervals.

*now once you have the data can add it to a chart.
formats X (F2.1).

* Chart Builder.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=X LCI_REG UCI_REG
MISSING= LISTWISE REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: X=col(source(s), name("X"))
DATA: LCI_REG=col(source(s), name("LCI_REG"))
DATA: UCI_REG =col(source(s), name("UCI_REG"))
COORD: rect(dim(1))
GUIDE: axis(dim(1), label("X"))
ELEMENT: point.dodge.symmetric(position(bin.dot(X, binStart(4), binWidth(0.1))),
color.interior(color.grey), size(size."12"))
ELEMENT: interval(position(region.spread.range(LCI_REG + UCI_REG )),
shape(shape.ibeam))
ELEMENT: point(position(summary.mean(X)), shape(shape.square),
color.interior(color.black), size(size."15"))
END GPL.

*Or just solve yourself and use legacy graphs.
*You will want STDEV = t-value*[sqrt(1 + 1/n)].
*So in this example t-value = 1.96, n = 12, so STDEV = 1.96*SQRT(1 + 1/12) = 2.04.

GRAPH
/ERRORBAR( STDDEV 2.04 )=X BY cons_break .
**********************************************************************************************.

I will update later with examples using multiple groups and error bars (as the change in graph algebra necessary may not be totally intuitive), and with examples of many more points (where jittering is more useful than dodging the points). Note for prediction intervals it is T-value*SQRT(1 + 1/n), as n gets higher SQRT(1 + 1/n) gets closer to 1, so essentially predictions intervals end up being 2 standard deviations away from the mean with 20+ observations.

Thanks Mike for sharing the follow up articles to Drummond, It seems reasonable to me except the part disagreeing about the use of dynamite plots:

"Dynamite-plunger plots are never an appropriate way to plot the data. We agree that bar graphs are inappropriate to display means of repeated measurements, but they are fine to convey magnitudes of differences in means in different groups, provided the error bars are standard deviations."

The ignorance of this astounds me. So we know from the work of Cleveland that people evaluate points along a common axis more accurately than they do the length of bars, and we know that we want to convey the uncertainty in the data via error bars, and we know that using bars obfuscates half of the error interval (heaven forbid one display an asymmetrical interval). Tukey's EDA book has a great example of the absurdity of this comment (using bars to display differences in means).

I can understand that it is not always appropriate to display the individual points, but to suggest the bar "dynamite" plots are more appropriate than graphs displaying error intervals is ridiculous.

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/

Jon K Peck

Jan 30, 2013; 9:02pm

Re: Jittering interval and ordinal variables in graphs. was "What does this graph means?"

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621

From: Andy W <[hidden email]>
To: [hidden email],
Date: 01/30/2013 01:55 PM
Subject: Re: [SPSSX-L] Jittering interval and ordinal variables in graphs. was "What does this graph means?"
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Yes I agree Ian that confidence ellipses would be very nice (ditto for contour plots). It appears error bars (at least for the confidence interval and arbitrary numbers of standard errors/deviations) have been available for a long time via the legacy graphs. To produce prediction intervals would have taken some more math (but not much) as I demonstrate below.

>>>Error bars of various types are available for quite a few graphics elements in the Chart Builder as well.
Anyway the GGRAPH language gives the man the tools to make your own error bars and plot them! (This is true to a certain extent with the confidence ellipses, but the pains are much greater).

>>>There is a custom dialog, Graphs > Scatterplot with Data Ellipse, available from the SPSS Community website. Below is an example with Bruce's same data. NOTE you can wrangle the regression procedures to give prediction intervals using an empty model with just an intercept, which is what I do here, and is a convenient way to get "non-normal" prediction intervals for other types of data that aren't normally distributed (e.g. you could get intervals for count data using generalized linear models, or proportion data using logit etc.) The logic extends to multiple groups as well (just omit the intercept and have dummy variables for each of the groups). **********************************************************************************************. new file. dataset close all. data list free / X (f5.1). begin data 5.5 5.2 5.2 5.8 5.6 4.6 5.6 5.9 4.7 5 5.7 5.2 end data. compute cons_break = 1. REGRESSION /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA /CRITERIA=PIN(.05) POUT(.10) CIN(95) /ORIGIN /DEPENDENT X /METHOD=ENTER cons_break /SAVE ICIN (CI_REG). *This produces two variables one "LCI_REG" for the lower bound and "UCI_REG" for the upper. *Default is 95% prediction intervals. *now once you have the data can add it to a chart. formats X (F2.1). * Chart Builder. GGRAPH /GRAPHDATASET NAME="graphdataset" VARIABLES=X LCI_REG UCI_REG MISSING= LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE. BEGIN GPL SOURCE: s=userSource(id("graphdataset")) DATA: X=col(source(s), name("X")) DATA: LCI_REG=col(source(s), name("LCI_REG")) DATA: UCI_REG =col(source(s), name("UCI_REG")) COORD: rect(dim(1)) GUIDE: axis(dim(1), label("X")) ELEMENT: point.dodge.symmetric(position(bin.dot(X, binStart(4), binWidth(0.1))), color.interior(color.grey), size(size."12")) ELEMENT: interval(position(region.spread.range(LCI_REG + UCI_REG )), shape(shape.ibeam)) ELEMENT: point(position(summary.mean(X)), shape(shape.square), color.interior(color.black), size(size."15")) END GPL. *Or just solve yourself and use legacy graphs. *You will want STDEV = t-value*[sqrt(1 + 1/n)]. *So in this example t-value = 1.96, n = 12, so STDEV = 1.96*SQRT(1 + 1/12) = 2.04. GRAPH /ERRORBAR( STDDEV 2.04 )=X BY cons_break . **********************************************************************************************. I will update later with examples using multiple groups and error bars (as the change in graph algebra necessary may not be totally intuitive), and with examples of many more points (where jittering is more useful than dodging the points). Note for prediction intervals it is T-value*SQRT(1 + 1/n), as n gets higher SQRT(1 + 1/n) gets closer to 1, so essentially predictions intervals end up being 2 standard deviations away from the mean with 20+ observations. Thanks Mike for sharing the follow up articles to Drummond, It seems reasonable to me except the part disagreeing about the use of dynamite plots: "Dynamite-plunger plots are never an appropriate way to plot the data. We agree that bar graphs are inappropriate to display means of repeated measurements, but they are fine to convey magnitudes of differences in means in different groups, provided the error bars are standard deviations." The ignorance of this astounds me. So we know from the work of Cleveland that people evaluate points along a common axis more accurately than they do the length of bars, and we know that we want to convey the uncertainty in the data via error bars, and we know that using bars obfuscates half of the error interval (heaven forbid one display an asymmetrical interval). Tukey's EDA book has a great example of the absurdity of this comment (using bars to display differences in means). I can understand that it is not always appropriate to display the individual points, but to suggest the bar "dynamite" plots are more appropriate than graphs displaying error intervals is ridiculous. ----- Andy W [hidden email]http://andrewpwheeler.wordpress.com/-- View this message in context:http://spssx-discussion.1045642.n5.nabble.com/What-does-this-graph-means-tp5717745p5717836.htmlSent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Andy W

Jan 30, 2013; 9:28pm

Re: Jittering interval and ordinal variables in graphs. was "What does this graph means?"

Ahh! I made a mistake in my error bar intervals - I assume the t-critical value was 1.96, but it is not with so few cases, it is actually 2.201 (as Bruce's post shows earlier), and so my example you should go out [2.201*sqrt(1 + 1/12) = 2.29] standard deviations. Still the logic is the same, you could just figure out the multiplicative factor yourself and then use that in the legacy dialogs or the GGRAPH chart builder.

Re: the confidence ellipse R plug-in, yes I have used it and it is nice, I would like a native SPSS application though, with data you can plot in an actual SPSS plot (I'm not asking you to do this Jon, it is simply on my long academic bucket list that who knows I will have time to do!)

A very cool recent article demonstrating the utility of confidence ellipses (especially for mixed models) can be found here (Friendly et al., 2013).

I've dug into the source for the ellipse function in R, and I will tell you what that is a sight of complexity to behold. Here are few simpler examples I think I can wrap my head around though to mimic in SPSS, http://stats.stackexchange.com/a/9900/1036. This seems to me doable with the current constraints (submit data to a matrix procedure, get out a series of x,y points to draw in GGRAPH).

Contour plots though would be much more difficult given the current constraints to GGRAPH as far as I can tell.

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/

Bruce Weaver

Jan 30, 2013; 9:54pm

Re: Jittering interval and ordinal variables in graphs. was "What does this graph means?"

Administrator

In reply to this post by Andy W

Aha. I was thinking of tackling it that way, but did not realize one could get REGRESSION to run an intercept-only model by including a constant as the lone predictor. Thanks for enlightening me, Andy. That will be useful.

And of course, if you add PRED to the /SAVE line, you can save the mean to your file; and MCIN will give you the usual CI for the mean.

new file.
dataset close all.
data list free / X (f5.1).
begin data
5.5 5.2 5.2 5.8 5.6 4.6 5.6 5.9 4.7 5 5.7 5.2
end data.

compute cons_break = 1.

REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10) CIN(95)
/ORIGIN
/DEPENDENT X
/METHOD=ENTER cons_break
/SAVE PRED (Xbar) ICIN (IndPI) MCIN (CImean).

temp.
select if $casenum EQ 1.
list Xbar to UIndPI.

OUTPUT:
Xbar LCImean UCImean LIndPI UIndPI

5.33333 5.06605 5.60062 4.36962 6.29705

* CImean = CI for the population mean.
* IndPI = individual prediction interval, aka the standard reference range.
* L = lower limit of interval; U = upper limit.

Andy W wrote

Yes I agree Ian that confidence ellipses would be very nice (ditto for contour plots). It appears error bars (at least for the confidence interval and arbitrary numbers of standard errors/deviations) have been available for a long time via the legacy graphs. To produce prediction intervals would have taken some more math (but not much) as I demonstrate below.

Anyway the GGRAPH language gives the man the tools to make your own error bars and plot them! (This is true to a certain extent with the confidence ellipses, but the pains are much greater).

Below is an example with Bruce's same data. NOTE you can wrangle the regression procedures to give prediction intervals using an empty model with just an intercept, which is what I do here, and is a convenient way to get "non-normal" prediction intervals for other types of data that aren't normally distributed (e.g. you could get intervals for count data using generalized linear models, or proportion data using logit etc.) The logic extends to multiple groups as well (just omit the intercept and have dummy variables for each of the groups).

**********************************************************************************************.
new file.
dataset close all.
data list free / X (f5.1).
begin data
5.5 5.2 5.2 5.8 5.6 4.6 5.6 5.9 4.7 5 5.7 5.2
end data.

compute cons_break = 1.

REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10) CIN(95)
/ORIGIN
/DEPENDENT X
/METHOD=ENTER cons_break
/SAVE ICIN (CI_REG).
*This produces two variables one "LCI_REG" for the lower bound and "UCI_REG" for the upper.
*Default is 95% prediction intervals.

*now once you have the data can add it to a chart.
formats X (F2.1).

* Chart Builder.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=X LCI_REG UCI_REG
MISSING= LISTWISE REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: X=col(source(s), name("X"))
DATA: LCI_REG=col(source(s), name("LCI_REG"))
DATA: UCI_REG =col(source(s), name("UCI_REG"))
COORD: rect(dim(1))
GUIDE: axis(dim(1), label("X"))
ELEMENT: point.dodge.symmetric(position(bin.dot(X, binStart(4), binWidth(0.1))),
color.interior(color.grey), size(size."12"))
ELEMENT: interval(position(region.spread.range(LCI_REG + UCI_REG )),
shape(shape.ibeam))
ELEMENT: point(position(summary.mean(X)), shape(shape.square),
color.interior(color.black), size(size."15"))
END GPL.

*Or just solve yourself and use legacy graphs.
*You will want STDEV = t-value*[sqrt(1 + 1/n)].
*So in this example t-value = 1.96, n = 12, so STDEV = 1.96*SQRT(1 + 1/12) = 2.04.

GRAPH
/ERRORBAR( STDDEV 2.04 )=X BY cons_break .
**********************************************************************************************.

I will update later with examples using multiple groups and error bars (as the change in graph algebra necessary may not be totally intuitive), and with examples of many more points (where jittering is more useful than dodging the points). Note for prediction intervals it is T-value*SQRT(1 + 1/n), as n gets higher SQRT(1 + 1/n) gets closer to 1, so essentially predictions intervals end up being 2 standard deviations away from the mean with 20+ observations.

Thanks Mike for sharing the follow up articles to Drummond, It seems reasonable to me except the part disagreeing about the use of dynamite plots:

"Dynamite-plunger plots are never an appropriate way to plot the data. We agree that bar graphs are inappropriate to display means of repeated measurements, but they are fine to convey magnitudes of differences in means in different groups, provided the error bars are standard deviations."

The ignorance of this astounds me. So we know from the work of Cleveland that people evaluate points along a common axis more accurately than they do the length of bars, and we know that we want to convey the uncertainty in the data via error bars, and we know that using bars obfuscates half of the error interval (heaven forbid one display an asymmetrical interval). Tukey's EDA book has a great example of the absurdity of this comment (using bars to display differences in means).

I can understand that it is not always appropriate to display the individual points, but to suggest the bar "dynamite" plots are more appropriate than graphs displaying error intervals is ridiculous.
... [show rest of quote]

... [show rest of quote]

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).