Data-dependent Reference line in GPL

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Data-dependent Reference line in GPL

Kirill Orlov
How do we plot via GPL a reference line at some data-dependent statistic value, say, at the mean of the data values plotted?
Reply | Threaded
Open this post in threaded view
|

Re: Data-dependent Reference line in GPL

Bruce Weaver
Administrator
Hello Kirill.  In what kind of graph?  Does this help at all?  

https://www.ibm.com/support/pages/how-add-line-grand-mean-profile-plot-group-means


Kirill Orlov wrote
How do we plot via GPL a reference line at some data-dependent statistic value, say, at the mean of the data values plotted?
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Data-dependent Reference line in GPL

Andy W
I don't use the mailing list (my email summaries said there was some back and forth), but just answer on nabble.

Here is how I do this typically.

*****************************************.
DATA LIST FREE / X Y.
BEGIN DATA
1 1
2 0
3 1
4 0
END DATA.
DATASET NAME Sim.
EXECUTE.

AGGREGATE OUTFILE=* MODE=ADDVARIABLES
   /BREAK
   /MX = MEAN(X)
   /MY = MEAN(Y).
EXECUTE.

*Example vertical line red.
*Example horizontal line blue.
GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=X Y MX MY
  /GRAPHSPEC SOURCE=INLINE
  /FITLINE TOTAL=NO SUBGROUP=NO.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: X=col(source(s), name("X"))
  DATA: Y=col(source(s), name("Y"))
  DATA: MX=col(source(s), name("MX"))
  DATA: MY=col(source(s), name("MY"))
  GUIDE: axis(dim(1), label("X"))
  GUIDE: axis(dim(2), label("Y"))
  ELEMENT: point(position(X*Y), size(size."12"))
  ELEMENT: edge(position(region.spread.range(MX*Y)),color(color.red))
  ELEMENT: line(position(X*MY),color(color.blue))
END GPL.
*****************************************.

This is somewhat different than using GUIDE, which will extend the line 100% of the graph area. If you want to use this like GUIDE, you need to explicitly set the SCALE min/max, and then have the data extend beyond those limits. (But often just going the data length is OK for my graphs.)
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Data-dependent Reference line in GPL

Andy W
Also perhaps of interest if you want to draw whole polygons instead of reference lines (and of course you can do the polygons as a line as well), https://andrewpwheeler.com/2013/04/03/some-notes-on-single-line-charts-in-spss/
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Data-dependent Reference line in GPL

Kirill Orlov
In reply to this post by Andy W
Andy, thank you for the suggestion. It works. But could we do without recourse to AGGREGATE?
What I don't understand is why direct request of MEAN() summary on GGRAPH does not work the same way. Example:

DATA LIST FREE / X Y.
BEGIN DATA
1 1
2 0
3 1
4 0
1 2
2 0
1 1
2 0
3 2
8 3
END DATA.
DATASET NAME Sim.
EXECUTE.
descr X Y.

*Your recipe. Works.
AGGREGATE OUTFILE=* MODE=ADDVARIABLES
   /MX = MEAN(X)
   /MY = MEAN(Y).
GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES= X Y MX MY
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: X=col(source(s), name("X"))
  DATA: Y=col(source(s), name("Y"))
  DATA: MX=col(source(s), name("MX"))
  DATA: MY=col(source(s), name("MY"))
  GUIDE: axis(dim(1), label("X"))
  GUIDE: axis(dim(2), label("Y"))
  ELEMENT: point(position(X*Y), size(size."12"))
  ELEMENT: line(position(X*MY),color(color.blue))
  ELEMENT: line(position(MX*Y),color(color.red))
END GPL.

*Seemingly the same (?) without AGGREGATE. But does not work.
GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES= X Y MEAN(X)[name="MX"] MEAN(Y)[name="MY"]
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: X=col(source(s), name("X"))
  DATA: Y=col(source(s), name("Y"))
  DATA: MX=col(source(s), name("MX"))
  DATA: MY=col(source(s), name("MY"))
  GUIDE: axis(dim(1), label("X"))
  GUIDE: axis(dim(2), label("Y"))
  ELEMENT: point(position(X*Y), size(size."12"))
  ELEMENT: line(position(X*MY),color(color.blue))
  ELEMENT: line(position(MX*Y),color(color.red))
END GPL.

Can you (or Jon, Bruce, ar anybody) comment on it anything?
Reply | Threaded
Open this post in threaded view
|

Re: Data-dependent Reference line in GPL

Andy W
When you use the aggregation functions on GRAPHDATASET, it treats all of the non-aggregated variables as the break variables (so turns them into categories). If you want to do it that way, you can generate a second graphdataset. Note when you go this, you cannot mix variables from the two datasets (e.g. cannot use (X*MeanY) in the graph algebra).

***************************************************.
GGRAPH
  /GRAPHDATASET NAME="g1" VARIABLES= X Y
  /GRAPHDATASET NAME="g2" VARIABLES= MEAN(X)[name="MeanX"] MEAN(Y)[name="MeanY"]
    MINIMUM(Y)[name="MinY"] MAXIMUM(Y)[name="MaxY"]
    MINIMUM(X)[name="MinX"] MAXIMUM(X)[name="MaxX"]
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: g1=userSource(id("g1"))
  DATA: X=col(source(g1), name("X"))
  DATA: Y=col(source(g1), name("Y"))
  SOURCE: g2=userSource(id("g2"))
  DATA: MeanX=col(source(g2), name("MeanX"))
  DATA: MeanY=col(source(g2), name("MeanY"))
  DATA: MinY=col(source(g2), name("MinY"))
  DATA: MinX=col(source(g2), name("MinX"))
  DATA: MaxY=col(source(g2), name("MaxY"))
  DATA: MaxX=col(source(g2), name("MaxX"))
  GUIDE: axis(dim(1), label("X"))
  GUIDE: axis(dim(2), label("Y"))
  ELEMENT: point(position(X*Y), size(size."12"))
  ELEMENT: edge(position(region.spread.range(MeanX*(MinY + MaxY))),color(color.red))
  ELEMENT: line(position((MinX + MaxX)*MeanY),color(color.blue))
END GPL.
***************************************************.

I prefer just adding data to the dataset, as you can see this becomes a bit more verbose. But horses for courses. (I know Jon showed a way to use smooth.mean to generate the horizontal line in some email somewhere, but most of the time I want to generate vertical lines, not sure how to do that using summary. functions.)
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Data-dependent Reference line in GPL

Kirill Orlov
Ah, I see, Andy. That is very instructive. Thank you for the explanation.
Reply | Threaded
Open this post in threaded view
|

Re: Data-dependent Reference line in GPL

Kirill Orlov
In reply to this post by Bruce Weaver
Bruce, thank you. The case under the link is not quite my case. But, inspected closer, it reveals correspondence to the same style of solution that Andy has proposed initially in this thread.
Reply | Threaded
Open this post in threaded view
|

Re: Data-dependent Reference line in GPL

Kirill Orlov
In reply to this post by Kirill Orlov
A solution found by Jon Peck (via SPSSX List emailing):.
Jon's comment: "This seems incredibly arcane, but it seems to give the right answer.  Isn't GPL wonderful?"

GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES= X Y
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: X=col(source(s), name("X"))
  DATA: Y=col(source(s), name("Y"))
  GUIDE: axis(dim(1), label("X"))
  GUIDE: axis(dim(2), label("Y"))
  ELEMENT: point(position(X*Y))
  ELEMENT: line(position(smooth.mean.uniform(X*Y)))
END GPL.

This solution is very concise thus wonderful, but seems less general than the "add data approach" shown above by Andy. In particularly, it is so far unclear (to me) if this solution can be adopted to produce a vertical reference line too. (Andy's remark:
"I prefer just adding data to the dataset, as you can see this becomes
a bit more verbose. But horses for courses. (I know Jon showed a way to use smooth.mean
to generate the horizontal line, but most of the time I want to generate vertical lines,
not sure how to do that using "summary." functions.")

------------------

So,
JON, ANDY - THANK YOU very much for your answers!
Bruce - thanks for the link, too.

It would be nice if SPSS improves its Reference line options in GPL - for example, to make it possible to use a summary function (returning a scalar value) right on the GUIDE: form.line(position(?, ?)) syntax statement.
Reply | Threaded
Open this post in threaded view
|

Re: Data-dependent Reference line in GPL

jkpeck
This post was updated on .
I have posted a small Python function with a usage example for a chart with a Vertical or Horizontal Reference Line at the mean on my Onedrive site here
https://1drv.ms/u/s!AoWcE61g_FAdisVPvH2Kdqmm77vNEg?e=8npaBg

Usage is explained in the comments.  You run the first block of code once in a session.  Then the usage is via a small BEGIN PROGRAM python3 blockthat provides the specific GGRAPH/GPL code and the name of the variable whose mean is needed.  You include a GUIDE form.line statementlike this in the GPLGUIDE: form.line(position({mean}, *))
and call the function like thisggraph("V1", cmd)
specifying the reference variable name in quotes and the GPL command.
Let me know of any issues.