Dear all,
I would like to find correlation between age and forest conservation satisfaction score. According to the guideline, I need to do preliminary test which is scatterplot graph. However, when I did the test, my graph turn out to be as the image attached. So, I need to know what does this graph means because it doesn't look like any of the graph shown in the guideline. Thank you very much in advance for the help. |
Presumably your forest conservation scores are the discrete values of 1-5, while your ages are in years. If so, this is perfectly reasonable for the data provided. You would get something looking more like a conventional scatter diagram by compressing the y axis to perhaps a third or fifth, but it would still be essentially the same thing. Scatter diagrams are more often used with two continous scale variables. Had you scored the forest conservation into ten or twenty values, then the scatter diagram would have looked more conventional, though I wouldn't recommend it, as people have difficulty properly categorising into that many values.
|
In reply to this post by makan
In addition to Robert's fine advice, a simple way to improve the readability of the scatter plot is to use jittering, example below with similar data (I even believe newer SPSS versions have the ability to do this jittering in the post-hoc chart editor).
*******************************. set seed = 10. /* making data that I think looks like yours */. input program. loop #i = 1 to 100. compute cat = TRUNC(RV.UNIFORM(1,6)). compute cont = RV.NORM(50,15). end case. end loop. end file. end input program. dataset name sim. execute. GGRAPH /GRAPHDATASET NAME="graphdataset" VARIABLES= cat cont MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE. BEGIN GPL SOURCE: s=userSource(id("graphdataset")) DATA: cont=col(source(s), name("cont")) DATA: cat=col(source(s), name("cat"), unit.category()) GUIDE: axis(dim(2), label("Continous Variable")) GUIDE: axis(dim(1), label("Categorical Variable")) ELEMENT: point.jitter.conditional(position(cont*cat), size(size."5")) END GPL. *******************************. This is a perfect example for jittering to. Summaries by group would potentially hide the fact that only 1 and 3 observations for the 1 and 5 categories of forest conservation exist. Also there is some evidence of a bimodality in the 2 value for forest conservation that wouldn't be captured in typical summaries (but with such small numbers you certainly want to take that with a grain of salt). Overall there appears to be little differences between the groups (all have a central tendency between 40 & 60), and no obvious linear relationship between forest conservation and exact age (assuming forest conservation is an ordinal variable). |
Thanks Andy for the advices. I need to asked again..since you said that there is no linear relationship between both of the variables is that means that I can't proceed with the correlation analysis. because one of the assumption before proceed with the analysis is linear relationship besides random sample, independent observations and bivariate normal distribution. since the assumption is not met (linear relationship), does it mean I need to use spearman p or rank correlation?
p/s: Oya, I can't see the graph that you send it to me. |
In reply to this post by Robert Jones
Thanks Robert. So, is it okay for me to present the graph to my audience? I assure they will ask me why my graph look that way or should I just remove it from my slide?
|
In reply to this post by makan
I provided example code to demonstrate jittering, I have no saved graph because you are supposed to apply it to your own data. If you want to see the example graph it produces simply copy and paste the syntax provided and run it to see for yourself (should work in any version with GGRAPH - which I believe is 15+).
Assuming Forest conservation is ordinal it would have been reasonable at the onset to only consider rank order correlations without even looking at the scatterplot (I typically use Tau-b, but I'm sure others on here could give more advice about that aspect). When I said does not look linear I perhaps should have said flat (i.e. it doesn't even look monotonic). This doesn't preclude you from estimating a correlation though, plots can be deceiving, and small effects are typically not noticeable. |
In reply to this post by makan
It's really up to you, if you think the information is of relevance to your audience. I would be inclined to present it as it is, with Andy's comments, though I wouldn't do the jittering, as it would just be a confusing artefact in this particular case.
|
From inspection of the supplied plot, it seems to me that
a very informative presentation would be a parallel boxplot of AGE grouped by Forest Conservation Score. ... Mark Miller
On Tue, Jan 29, 2013 at 8:57 AM, Robert Jones <[hidden email]> wrote: It's really up to you, if you think the information is of relevance to your |
Ahh! This is the point I was making by advocating the jittered scatterplot, how exactly do you make the box-plots for the categories with only 1 and 3 observations! Box-plots are certainly reasonable with more data, but with this few of points jittered scatterplots work quite well.
To each his own whether or not people feel the "artefact" in the jittered plot is unreasonable. IMO it is only unreasonable as much as you consider it reasonable to think of the ordinal scores as specific to the particular arbitrary numeric values you assign them to begin with. And it is certainly not anymore a complicated topic to discuss how to interpret the jittered plot than any correlation coefficient! For reference with synonymous situations, please read Drummond, Gordon B. & Sarah L. Vowler. 2011. Show the data, don’t conceal them. The Journal of Physiology 598(8): 1861-1863. PDF available from publisher. |
I really like the idea
of jittering for interval e.g., Likert,
and ordinal variables.
Especially since they are often operationalizations of� constructs that are continuous. the article says that it is part of a series.� I'll try to chase them down when I get some time. Art Kendall Social Research ConsultantsOn 1/29/2013 8:47 PM, Andy W wrote: Ahh! This is the point I was making by advocating the jittered scatterplot, how exactly do you make the box-plots for the categories with only 1 and 3 observations! Box-plots are certainly reasonable with more data, but with this few of points jittered scatterplots work quite well. To each his own whether or not people feel the "artefact" in the jittered plot is unreasonable. IMO it is only unreasonable as much as you consider it reasonable to think of the ordinal scores as specific to the particular arbitrary numeric values you assign them to begin with. And it is certainly not anymore a complicated topic to discuss how to interpret the jittered plot than any correlation coefficient! For reference with synonymous situations, please read Drummond, Gordon B. & Sarah L. Vowler. 2011. Show the data, don’t conceal them <http://dx.doi.org/10.1113/jphysiol.2011.205062> . The Journal of Physiology 598(8): 1861-1863. PDF available from publisher. ----- Andy W [hidden email] http://andrewpwheeler.wordpress.com/ -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/What-does-this-graph-means-tp5717745p5717798.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
Administrator
|
That's a nice little article. It took me a while to work out what the authors meant by "95% CI of population" in Fig 2. In the text, they say:
"To define reference ranges a very large sample of normal values is needed. Then the 95% CI is chosen as the reference range. This implies, by definition, that those 5% of ‘normal values’ that lie outside the range would be considered abnormal. We can then say that a new value drawn from this population would be likely, 95% of the time, to have a value within these confidence limits." This sounds to me like an "individual prediction interval" in SPSS lingo, whereas the CI for the mean would be a "mean prediction interval".
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Yeah that is a bit of a loaded statement. I've heard "prediction intervals" and "tolerance intervals" used more frequently to distinguish between the two concepts (see wikipedia for tolerance intervals) - which themselves are different (Bruce mentions a prediction interval).
I would probably just avoid using the word confidence there to avoid potential confusion. But the main point of the article is to just be clear what error bars you are using (which is a point certainly applicable to this situation as well). I would guess the article is using prediction intervals, but going against there own advice it is not so clear! |
Administrator
|
In reply to this post by Bruce Weaver
What those authors call the "95% CI of the population" is also known as the "standard reference range".
http://en.wikipedia.org/wiki/Reference_range Notice that the SQRT((n+1)/n) in the formula for the limits of the reference range could also be written as SQRT(1 + 1/n). Here is syntax to compute the limits of the standard reference range for the example given on the Wikipedia page. It also computes the 95% CI for the mean, illustrating that the only difference in the formulae for the two is the presence/absence of that 1 under the square sign. new file. dataset close all. data list free / X (f5.1). begin data 5.5 5.2 5.2 5.8 5.6 4.6 5.6 5.9 4.7 5 5.7 5.2 end data. descriptives X. AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK= /meanX=MEAN(X) /sdX=SD(X) /nX=N. compute alpha = .05. compute tcrit = idf.t(1-alpha/2,nX-1). compute LL_CImean = meanX - tcrit*SQRT(1/nX)*sdX. compute UL_CImean = meanX + tcrit*SQRT(1/nX)*sdX. compute LL_IndPI = meanX - tcrit*SQRT(1+1/nX)*sdX. compute UL_IndPI = meanX + tcrit*SQRT(1+1/nX)*sdX. formats alpha to UL_IndPI (f5.3). temporary. select if $casenum EQ 1. list alpha to UL_IndPI. * Use EXAMINE to check that 95% CI for the mean is correct. EXAMINE VARIABLES=X /PLOT NONE /STATISTICS DESCRIPTIVES /CINTERVAL 95 /MISSING LISTWISE /NOTOTAL. OUTPUT from LIST: alpha tcrit LL_CImean UL_CImean LL_IndPI UL_IndPI .050 2.201 5.066 5.601 4.370 6.297 Wikipedia reported the reference range limits as 4.4 and 6.3. HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Administrator
|
In reply to this post by Andy W
I had not seen this message when I made my post about the standard reference range. Will have to look at that page on tolerance intervals.
Any thoughts on how to make SPSS produce plots like those in Fig 2 of the Drummond & Vowler article? Cheers, Bruce
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Administrator
|
In reply to this post by Art Kendall
Art, this should help in chasing down the other articles in the series:
http://jp.physoc.org/cgi/collection/stats_reporting?page=1 http://jp.physoc.org/cgi/collection/stats_reporting?page=2
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
These look to be very
interesting.� I have saved those links.
I often find it worthwhile to see how basics are presented in fields I have never dealt with before.. Art Kendall Social Research ConsultantsOn 1/30/2013 10:26 AM, Bruce Weaver wrote: Art, this should help in chasing down the other articles in the series: http://jp.physoc.org/cgi/collection/stats_reporting?page=1 http://jp.physoc.org/cgi/collection/stats_reporting?page=2 Art Kendall wroteI really like the idea of jittering for interval e.g., Likert, and ordinal variables. Especially since they are often operationalizations of� constructs that are continuous. the article says that it is part of a series.� I'll try to chase them down when I get some time. Art Kendall Social Research Consultants On 1/29/2013 8:47 PM, Andy W wrote: Ahh! This is the point I was making by advocating the jittered scatterplot, how exactly do you make the box-plots for the categories with only 1 and 3 observations! Box-plots are certainly reasonable with more data, but with this few of points jittered scatterplots work quite well. To each his own whether or not people feel the "artefact" in the jittered plot is unreasonable. IMO it is only unreasonable as much as you consider it reasonable to think of the ordinal scores as specific to the particular arbitrary numeric values you assign them to begin with. And it is certainly not anymore a complicated topic to discuss how to interpret the jittered plot than any correlation coefficient! For reference with synonymous situations, please read Drummond, Gordon B. & Sarah L. Vowler. 2011. Show the data, don’t conceal them <http://dx.doi.org/10.1113/jphysiol.2011.205062> . The Journal of Physiology 598(8): 1861-1863. PDF available from publisher. ----- Andy Wapwheele@http://andrewpwheeler.wordpress.com/ -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/What-does-this-graph-means-tp5717745p5717798.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to[hidden email](not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to[hidden email](not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/What-does-this-graph-means-tp5717745p5717814.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
In reply to this post by Bruce Weaver
Unfortunately the authors don't disclose how the plots were made. Given the odd arc-like regular displacement I highly suspect they used the beeswarm R package to create the plots (see http://www.cbs.dtu.dk/~eklund/beeswarm/).
This particular displacement of the points is not easily accomplished in SPSS, but a suitable alternative is to use symmetric dodging. My example could be simplified, but the nuts and bolts of the logic I present in a blog post titled Avoid Dynamite Plots! Visualizing dot plots with super-imposed confidence intervals in SPSS and R (http://andrewpwheeler.wordpress.com/2012/02/20/avoid-dynamite-plots-visualizing-dot-plots-with-super-imposed-confidence-intervals-in-spss-and-r/). Thanks for giving the reference to "reference ranges"! It is good to know the difference in lingo between fields. |
In reply to this post by Bruce Weaver
Hi Bruce
El 30/01/2013 16:05, Bruce Weaver escribió: > I had not seen this message when I made my post about the standard reference > range. Will have to look at that page on tolerance intervals. > > Any thoughts on how to make SPSS produce plots like those in Fig 2 of the > Drummond & Vowler article? Would you like an Excel-sheet based solution? (caaaaarefully coded by me, don't start throwing stones, please, at least not yet, wait until you take a look at it) Best regards, Marta GG ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
Hola Marta. I hope you are well. Yes please, I'd like to see your caaaaarefully coded Excel solution. Please use my Lakehead U e-mail (in sig file below) if sending it via e-mail.
Cheers, Bruce
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Free forum by Nabble | Edit this page |