Hello I discovered some useful GPL commands today at the SPSS help menu tutorial search term 'Jittered Categorical Scatterplot (GPL)' to enable me to construct a categorical scatterplot in SPSS using jittering. While it was useful to see the original labels for the categories appearing on the axes in the correct order, I did not find the jittering effect enhanced the scatterplot. One reason for this was that all I am really in need of is a little vertical jittering so as to highlight where, for any one category on the horizontal axis, there are multiple data points, which would otherwise fully overlap. Can anyone kindly suggest how the GPL syntax ELEMENT: point.jitter(position(jobcat*gender)) could be adapted to ensure that jittering is only vertical? (Here "jobcat" and "gender" are the assigned names for the variables represented by the horizontal and vertical axis, respectively of the scatterplot.) I would be keen to preserve as much of the original syntax as possible from the example provided under the above search term, while ideally, finding a suitable alternative to the above line of syntax for jittering. Thanks in advance Best wishes Margaret |
If you only want to jitter up and down, you simply need to add "point.jitter.conditional" to the ELEMENT line - e.g.
ELEMENT: point.jitter.conditional(position(jobcat*gender)) If you want to jitter on the X axis direction it is a much bigger pain in the butt - you need to draw the chart algebra opposite of how you want and then transpose the axes. Example below of each. *****************************************************. SET SEED 10. INPUT PROGRAM. LOOP #i = 1 TO 120. COMPUTE x = TRUNC((#i-1)/6). COMPUTE y = x*0.9 + RV.NORMAL(0,SQRT(0.1)). END CASE. END LOOP. END FILE. END INPUT PROGRAM. DATASET NAME JittCat. RANK VARIABLES = y /TIES = CONDENSE /NTILES(5) INTO YCat. *Only on Y axis. GGRAPH /GRAPHDATASET NAME="graphdataset" VARIABLES=x YCat MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE. BEGIN GPL SOURCE: s=userSource(id("graphdataset")) DATA: x=col(source(s), name("x")) DATA: YCat=col(source(s), name("YCat"), unit.category()) GUIDE: axis(dim(1), label("X Continuous")) GUIDE: axis(dim(2), label("Y Category")) ELEMENT: point.jitter.conditional(position(x*YCat)) END GPL. *This does not work like you want. GGRAPH /GRAPHDATASET NAME="graphdataset" VARIABLES=x YCat MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE. BEGIN GPL SOURCE: s=userSource(id("graphdataset")) DATA: x=col(source(s), name("x")) DATA: YCat=col(source(s), name("YCat"), unit.category()) GUIDE: axis(dim(2), label("X Continuous")) GUIDE: axis(dim(1), label("Y Category")) ELEMENT: point.jitter.conditional(position(YCat*x)) END GPL. *So you need to transpose the coordinate system. GGRAPH /GRAPHDATASET NAME="graphdataset" VARIABLES=x YCat MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE. BEGIN GPL SOURCE: s=userSource(id("graphdataset")) DATA: x=col(source(s), name("x")) DATA: YCat=col(source(s), name("YCat"), unit.category()) COORD: rect(dim(1,2), transpose()) GUIDE: axis(dim(1), label("X Continuous")) GUIDE: axis(dim(2), label("Y Category")) ELEMENT: point.jitter.conditional(position(x*YCat)) END GPL. *****************************************************. |
Thank you, Andy, You have kindly provided the correct coding for the element line in reply to my request, However, I can see that the effect is that the graph is still not clear for the reader to interpret. One problem is that points are offset from their vertical axis categories even where they have no duplicates, which leads to confusion. This problem is compounded by the fact that by the very nature of the operation, jittering generates random (and hence inconsistent) distances between the points, with the effect that some distances are too large to allow the reader to decide which category on the y-axis is intended for the jittered points. Can you kindly suggest an alternative approach which simply causes vertical jittering when it is called for (precisely where points are identical), but removes the inconsistency in distances between points which is characteristic of jittering. Again, I would like to preserve as much of the original syntax as possible and therefore, if possible, re-edit the original line of syntax I had quoted in my original request. Having duplicate points slightly overlapping is from my perspective more helpful than, as is the case with my current version of the graph, viewing them too far apart to recognize they are intended to represent the same point, . Best wishes Margaret On Monday, 14 March 2016, 19:38, Andy W <[hidden email]> wrote:
On Monday, 14 March 2016, 19:38, Andy W <[hidden email]> wrote:
|
I think the jittering is ok for some graphs, as long as you are clear in the notes of the graph. I like it when both axes are categorical, or for very dense point clouds with alot of overlap.
But another way is to dodge the points when they perfectly overlap - which works ok for smaller datasets. To do this though you need to bin your continuous axis at some level to define the stacking. E.g. a two points could be different x values, say 1.001 and 1.002, but there points in the actual graph will still overlap. See here for an example, https://andrewpwheeler.wordpress.com/2013/03/06/some-random-spss-graph-tips-shading-areas-under-curves-and-using-dodging-in-binned-dot-plots/, and below is this dodging applied to my fake data example I just gave. ********************************************************. *dodging instead of jittering - need to bin the X axis at some level. GGRAPH /GRAPHDATASET NAME="graphdataset" VARIABLES=x YCat MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE. BEGIN GPL SOURCE: s=userSource(id("graphdataset")) DATA: x=col(source(s), name("x")) DATA: YCat=col(source(s), name("YCat"), unit.category()) GUIDE: axis(dim(1), label("X Continuous")) GUIDE: axis(dim(2), label("Y Category")) ELEMENT: point.dodge.symmetric(position(bin.rect(x*YCat, dim(2), binWidth(0.05))), size(size."6")) END GPL. ********************************************************. There are a few other ways to deal with overlapping or near overlapping points. One is to use transparency judiciously, so points that repeatedly overlap will be darker. You can't estimate exact numbers, but works well for large N, see https://andrewpwheeler.wordpress.com/2012/06/17/visualization-techniques-for-large-n-scatterplots-in-spss/. Here with the same example data. ********************************************************. *Transparency. GGRAPH /GRAPHDATASET NAME="graphdataset" VARIABLES=x YCat MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE. BEGIN GPL SOURCE: s=userSource(id("graphdataset")) DATA: x=col(source(s), name("x")) DATA: YCat=col(source(s), name("YCat"), unit.category()) GUIDE: axis(dim(1), label("X Continuous")) GUIDE: axis(dim(2), label("Y Category")) ELEMENT: point(position(x*YCat), color.interior(color.black), transparency.interior(transparency."0.8"), transparency.exterior(transparency."0.8"), size(size."9")) END GPL. ********************************************************. Other ways are to aggregate the points that overlap exactly, and then visualize them with different glyphs - sunflower plots, http://sas-and-r.blogspot.com/2011/03/example-832-histdata-package-sunflower.html?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+SASandR+%28SAS+and+R%29 are one example. Another example I prefer is making the size of the point bigger if it represents a larger number of points, see https://andrewpwheeler.wordpress.com/2013/04/25/fluctuation-diagrams-in-spss/ for an example. |
Many thanks, Andy, for your most helpful reply. I have slightly adjusted the code you have provided for dodging so as to accommodate the fact that my x-axis variable is categorical. I have in turn found that the resultant graph appears to meet the needs of the reader. Being able to count the number of identical points for any one point often makes sense where this number and the number of points plotted are modest. Best wishes Margaret On Wednesday, 16 March 2016, 12:24, Andy W <[hidden email]> wrote:
|
Free forum by Nabble | Edit this page |