SPSSX Discussion

Data-dependent Reference line in GPL

Classic

List

Threaded

10 messages Options

Kirill Orlov

Oct 16, 2021; 1:39pm

Data-dependent Reference line in GPL

How do we plot via GPL a reference line at some data-dependent statistic value, say, at the mean of the data values plotted?

Bruce Weaver

Oct 16, 2021; 8:46pm

Re: Data-dependent Reference line in GPL

Administrator

Hello Kirill. In what kind of graph? Does this help at all?

https://www.ibm.com/support/pages/how-add-line-grand-mean-profile-plot-group-means

Kirill Orlov wrote

How do we plot via GPL a reference line at some data-dependent statistic value, say, at the mean of the data values plotted?

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Andy W

Oct 17, 2021; 1:41pm

Re: Data-dependent Reference line in GPL

I don't use the mailing list (my email summaries said there was some back and forth), but just answer on nabble.

Here is how I do this typically.

*****************************************.
DATA LIST FREE / X Y.
BEGIN DATA
1 1
2 0
3 1
4 0
END DATA.
DATASET NAME Sim.
EXECUTE.

AGGREGATE OUTFILE=* MODE=ADDVARIABLES
/BREAK
/MX = MEAN(X)
/MY = MEAN(Y).
EXECUTE.

*Example vertical line red.
*Example horizontal line blue.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=X Y MX MY
/GRAPHSPEC SOURCE=INLINE
/FITLINE TOTAL=NO SUBGROUP=NO.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: X=col(source(s), name("X"))
DATA: Y=col(source(s), name("Y"))
DATA: MX=col(source(s), name("MX"))
DATA: MY=col(source(s), name("MY"))
GUIDE: axis(dim(1), label("X"))
GUIDE: axis(dim(2), label("Y"))
ELEMENT: point(position(X*Y), size(size."12"))
ELEMENT: edge(position(region.spread.range(MX*Y)),color(color.red))
ELEMENT: line(position(X*MY),color(color.blue))
END GPL.
*****************************************.

This is somewhat different than using GUIDE, which will extend the line 100% of the graph area. If you want to use this like GUIDE, you need to explicitly set the SCALE min/max, and then have the data extend beyond those limits. (But often just going the data length is OK for my graphs.)

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/

Andy W

Oct 17, 2021; 1:44pm

Re: Data-dependent Reference line in GPL

Also perhaps of interest if you want to draw whole polygons instead of reference lines (and of course you can do the polygons as a line as well), https://andrewpwheeler.com/2013/04/03/some-notes-on-single-line-charts-in-spss/

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/

Kirill Orlov

Oct 18, 2021; 12:08pm

Re: Data-dependent Reference line in GPL

In reply to this post by Andy W

Andy, thank you for the suggestion. It works. But could we do without recourse to AGGREGATE?
What I don't understand is why direct request of MEAN() summary on GGRAPH does not work the same way. Example:

DATA LIST FREE / X Y.
BEGIN DATA
1 1
2 0
3 1
4 0
1 2
2 0
1 1
2 0
3 2
8 3
END DATA.
DATASET NAME Sim.
EXECUTE.
descr X Y.

*Your recipe. Works.
AGGREGATE OUTFILE=* MODE=ADDVARIABLES
/MX = MEAN(X)
/MY = MEAN(Y).
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES= X Y MX MY
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: X=col(source(s), name("X"))
DATA: Y=col(source(s), name("Y"))
DATA: MX=col(source(s), name("MX"))
DATA: MY=col(source(s), name("MY"))
GUIDE: axis(dim(1), label("X"))
GUIDE: axis(dim(2), label("Y"))
ELEMENT: point(position(X*Y), size(size."12"))
ELEMENT: line(position(X*MY),color(color.blue))
ELEMENT: line(position(MX*Y),color(color.red))
END GPL.

*Seemingly the same (?) without AGGREGATE. But does not work.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES= X Y MEAN(X)[name="MX"] MEAN(Y)[name="MY"]
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: X=col(source(s), name("X"))
DATA: Y=col(source(s), name("Y"))
DATA: MX=col(source(s), name("MX"))
DATA: MY=col(source(s), name("MY"))
GUIDE: axis(dim(1), label("X"))
GUIDE: axis(dim(2), label("Y"))
ELEMENT: point(position(X*Y), size(size."12"))
ELEMENT: line(position(X*MY),color(color.blue))
ELEMENT: line(position(MX*Y),color(color.red))
END GPL.

Can you (or Jon, Bruce, ar anybody) comment on it anything?

Andy W

Oct 18, 2021; 12:28pm

Re: Data-dependent Reference line in GPL

When you use the aggregation functions on GRAPHDATASET, it treats all of the non-aggregated variables as the break variables (so turns them into categories). If you want to do it that way, you can generate a second graphdataset. Note when you go this, you cannot mix variables from the two datasets (e.g. cannot use (X*MeanY) in the graph algebra).

***************************************************.
GGRAPH
/GRAPHDATASET NAME="g1" VARIABLES= X Y
/GRAPHDATASET NAME="g2" VARIABLES= MEAN(X)[name="MeanX"] MEAN(Y)[name="MeanY"]
MINIMUM(Y)[name="MinY"] MAXIMUM(Y)[name="MaxY"]
MINIMUM(X)[name="MinX"] MAXIMUM(X)[name="MaxX"]
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: g1=userSource(id("g1"))
DATA: X=col(source(g1), name("X"))
DATA: Y=col(source(g1), name("Y"))
SOURCE: g2=userSource(id("g2"))
DATA: MeanX=col(source(g2), name("MeanX"))
DATA: MeanY=col(source(g2), name("MeanY"))
DATA: MinY=col(source(g2), name("MinY"))
DATA: MinX=col(source(g2), name("MinX"))
DATA: MaxY=col(source(g2), name("MaxY"))
DATA: MaxX=col(source(g2), name("MaxX"))
GUIDE: axis(dim(1), label("X"))
GUIDE: axis(dim(2), label("Y"))
ELEMENT: point(position(X*Y), size(size."12"))
ELEMENT: edge(position(region.spread.range(MeanX*(MinY + MaxY))),color(color.red))
ELEMENT: line(position((MinX + MaxX)*MeanY),color(color.blue))
END GPL.
***************************************************.

I prefer just adding data to the dataset, as you can see this becomes a bit more verbose. But horses for courses. (I know Jon showed a way to use smooth.mean to generate the horizontal line in some email somewhere, but most of the time I want to generate vertical lines, not sure how to do that using summary. functions.)

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/

Kirill Orlov

Oct 18, 2021; 12:42pm

Re: Data-dependent Reference line in GPL

Ah, I see, Andy. That is very instructive. Thank you for the explanation.

Kirill Orlov

Oct 18, 2021; 1:12pm

Re: Data-dependent Reference line in GPL

In reply to this post by Bruce Weaver

Bruce, thank you. The case under the link is not quite my case. But, inspected closer, it reveals correspondence to the same style of solution that Andy has proposed initially in this thread.

Kirill Orlov

Oct 18, 2021; 2:09pm

Re: Data-dependent Reference line in GPL

In reply to this post by Kirill Orlov

A solution found by Jon Peck (via SPSSX List emailing):.
Jon's comment: "This seems incredibly arcane, but it seems to give the right answer. Isn't GPL wonderful?"

GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES= X Y
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: X=col(source(s), name("X"))
DATA: Y=col(source(s), name("Y"))
GUIDE: axis(dim(1), label("X"))
GUIDE: axis(dim(2), label("Y"))
ELEMENT: point(position(X*Y))
ELEMENT: line(position(smooth.mean.uniform(X*Y)))
END GPL.

This solution is very concise thus wonderful, but seems less general than the "add data approach" shown above by Andy. In particularly, it is so far unclear (to me) if this solution can be adopted to produce a vertical reference line too. (Andy's remark:
"I prefer just adding data to the dataset, as you can see this becomes
a bit more verbose. But horses for courses. (I know Jon showed a way to use smooth.mean
to generate the horizontal line, but most of the time I want to generate vertical lines,
not sure how to do that using "summary." functions.")

------------------

So,
JON, ANDY - THANK YOU very much for your answers!
Bruce - thanks for the link, too.

It would be nice if SPSS improves its Reference line options in GPL - for example, to make it possible to use a summary function (returning a scalar value) right on the GUIDE: form.line(position(?, ?)) syntax statement.

jkpeck

Oct 18, 2021; 5:22pm

Re: Data-dependent Reference line in GPL

This post was updated on Oct 19, 2021; 10:00pm.

I have posted a small Python function with a usage example for a chart with a Vertical or Horizontal Reference Line at the mean on my Onedrive site here
https://1drv.ms/u/s!AoWcE61g_FAdisVPvH2Kdqmm77vNEg?e=8npaBg

Usage is explained in the comments. You run the first block of code once in a session. Then the usage is via a small BEGIN PROGRAM python3 blockthat provides the specific GGRAPH/GPL code and the name of the variable whose mean is needed. You include a GUIDE form.line statementlike this in the GPLGUIDE: form.line(position({mean}, *))
and call the function like thisggraph("V1", cmd)
specifying the reference variable name in quotes and the GPL command.
Let me know of any issues.