Lines on scatterplot

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Lines on scatterplot

John F Hall
This is not for analysis, but a revised version of my SPSS tutorial 4.5.1 Graphic teaching aid for regression and correlation [http://surveyresearch.weebly.com/uploads/2/9/9/8/2998485/4.5.1_graphic_teaching_aid_for_regression_and_correlation.pdf]. This was written on artficial data, but I now want to produce similar charts on real data, particularly the regression lines of X on Y and Y on X, plus the overlay of both regression lines to show that Pearson's r is the cosine of the angle between the two. With these data from 21 countries in Europe, plus UK and USA (N=23) V1 = Homicide rate; V2 Gini coefficient. 1.0 34.94 0.7 30.25 1.7 28.53 1.6 33.68 1.0 26.63 0.8 29.02 2.2 27.74 1.3 33.78 1.0 31.14 1.6 34.48 1.1 32.30 2.0 42.78 0.9 34.41 0.9 28.73 0.6 25.86 1.1 33.25 1.2 35.84 1.6 27.32 0.9 35.79 1.0 26.81 0.7 32.72 1.2 34.81 4.8 40.46 . . and this syntax: DATASET ACTIVATE DataSet0. STATS REGRESS PLOT YVARS=homicide XVARS=Gini /OPTIONS CATEGORICAL=BARS GROUP=1 BOXPLOTS INDENT=15 YSCALE=75 /FITLINES APPLYTO=TOTAL. I can produce the following chart: Scatterplot I can then copy the chart to Word, Right-click >> Wrap text >> Behind and manually add: 1) a horizontal line through mean y with repeated underscores ____ 2) a vertical line through mean x with repeated rows of "¦" ¦ ¦ ¦ Can SPSS produce a chart with a) a vertical line through mean x b) a horizontal line through mean y (preferably both together)? After that, can I get a chart with: a) horizontal lines from each data point to mean x and b) vertical lines from each data point to mean y?

Sent from the SPSSX Discussion mailing list archive at Nabble.com.
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Lines on scatterplot

PRogman
It can -- you have to supply the mean of each axis (at least I cannot
calculate them inside GPL). This makes the lines of the means limited to the
minimum and maximum of the data. Otherwise the have to be hardcoded in gpl
(with GUIDE: form.line(...)), or added manually added (possibly by a
template). To modify the code you only have to change the variable names in
the GGRAPH VARIABLES section. They are renamed to minimize changes in the
GPL code if other variables are used. The only changes in the GPL code are
the axis titles  found in GUIDE: commands in the Main Graph section.
HTH, PR


DATASET CLOSE ALL.
PRESERVE.
SET DECIMAL=DOT.

DATA LIST free /
Homicide (F8.1) Gini (F8.2).
BEGIN DATA
1.0 34.94 0.7 30.25 1.7 28.53 1.6 33.68 1.0 26.63
0.8 29.02 2.2 27.74 1.3 33.78 1.0 31.14 1.6 34.48
1.1 32.30 2.0 42.78 0.9 34.41 0.9 28.73 0.6 25.86
1.1 33.25 1.2 35.84 1.6 27.32 0.9 35.79 1.0 26.81
0.7 32.72 1.2 34.81 4.8 40.46
END DATA.
RESTORE.

DATASET NAME JFH.
*Get mean values.
AGGREGATE
  /OUTFILE=* MODE=ADDVARIABLES OVERWRITE=YES
  /BREAK=
  /homicide_mean=MEAN(homicide)
  /Gini_mean=MEAN(Gini).

*Labels are set in Main Graph GUIDE: commands.
GGRAPH
  /GRAPHDATASET
   NAME          = "graphdataset"
   VARIABLES     = Gini           [NAME = 'x'   ]
                   Gini_mean      [NAME = 'x_mn']
                   homicide       [NAME = 'y'   ]
                   homicide_mean  [NAME = 'y_mn']
   MISSING       = LISTWISE
   REPORTMISSING = NO
  /GRAPHSPEC
   SOURCE        = INLINE.
BEGIN GPL
  SOURCE:  s=userSource(id("graphdataset"))
  DATA:    x   =col(source(s), name("x"))
  DATA:    x_mn=col(source(s), name("x_mn"))
  DATA:    y   =col(source(s), name("y"))
  DATA:    y_mn=col(source(s), name("y_mn"))

  COMMENT:  MAIN GRAPH
  GRAPH:     begin(origin(10%, 10%), scale(70%, 70%))
    GUIDE:   axis(dim(1), label("Gini"))
    GUIDE:   axis(dim(2), label("Homicide"))

    COMMENT: vertical x-mean line
    ELEMENT: line(position(x_mn*y)
                 ,size(size."0.5")
                 ,color(color.gray))
                 )
    COMMENT: horizontal y-mean line
    ELEMENT: line(position(x*y_mn)
                 ,size(size."0.5")
                 ,color(color.gray))
                 )
    COMMENT: vertical lines
    ELEMENT: edge(position(link.join(x*(y_mn+y)))
                 ,shape(shape.half_dash)
                 ,size(size."0.25")
                 )
    COMMENT: horizontal lines
    ELEMENT: edge(position(link.join((x_mn+x)*y))
                 ,shape(shape.half_dash)
                 ,size(size."0.25")
                 )
    ELEMENT: point(position(x*y))
  GRAPH: end()

  COMMENT: TOP LETTERBOX
  GRAPH:     begin(origin(10%, 0%), scale(70%, 10%))
    COORD:   rect(dim(1))
    GUIDE:   axis(dim(1), ticks(null()))
    ELEMENT: schema(position(bin.quantile.letter(x))
                   ,size(size."80%")
                   ,color.interior(color.lightgray)
                   )
  GRAPH: end()

  COMMENT: RIGHT LETTERBOX
  GRAPH:     begin(origin(80%, 10%), scale(10%, 70%))
    COORD:   transpose(rect(dim(1)))
    GUIDE:   axis(dim(1), ticks(null()))
    ELEMENT: schema(position(bin.quantile.letter(y))
                   ,size(size."80%")
                   ,color.interior(color.lightgray)
                   )
  GRAPH: end()
END GPL.
<http://spssx-discussion.1045642.n5.nabble.com/file/t339873/JFHGraph.png>

John F Hall wrote

> This is not for analysis, but a revised version of my SPSS tutorial *4.5.1
> Graphic teaching aid for regression and
> correlation*[http://surveyresearch.weebly.com/uploads/2/9/9/8/2998485/4.5.1_graphic_teaching_aid_for_regression_and_correlation.pdf].
> This was written on artficial data, but I now want to produce similar
> charts
> on real data, particularly the regression lines of X on Y and Y on X, plus
> the overlay of both regression lines to show that Pearson's */r/*// is the
> cosine of the angle between the two.With these data from 21 countries in
> Europe, plus UK and USA (N=23) V1 = Homicide rate; V2 Gini coefficient.1.0      
> 34.940.7       30.251.7       28.531.6       33.681.0       26.630.8      
> 29.022.2       27.741.3       33.781.0       31.141.6       34.481.1      
> 32.302.0       42.780.9       34.410.9       28.730.6       25.861.1      
> 33.251.2       35.841.6       27.320.9       35.791.0       26.810.7      
> 32.721.2       34.814.8       40.46. . and this syntax:DATASET ACTIVATE
> DataSet0.STATS REGRESS PLOT YVARS=homicide XVARS=Gini /OPTIONS
> CATEGORICAL=BARS GROUP=1 BOXPLOTS INDENT=15 YSCALE=75 /FITLINES
> APPLYTO=TOTAL.
> I can produce the following chart:
> &lt;http://spssx-discussion.1045642.n5.nabble.com/file/t27438/Scatterplot.jpg&gt; 
>
> I can then copy the chart to Word,  Right-click >> Wrap text >> Behind and
> manually add:1) a horizontal line through mean y with repeated
> underscores
> ____ 2) a vertical line through mean x with repeated rows of  "¦"¦ ¦ ¦
> Can SPSS produce a chart with
> a) a vertical line through mean x
> b) a horizontal line through mean y (preferably both together)?  
> After that, can I get a chart with:  
> a) horizontal lines from each data point to mean x and
> b) vertical lines from each data point to mean y?
> --
> Sent from: http://spssx-discussion.1045642.n5.nabble.com/
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Lines on scatterplot

John F Hall
In reply to this post by John F Hall

After a couple of off-list exchanges (too complex for me to post via Nabble) Dr Peder Rogman "PRogman" supplied the following syntax which has produced exactly what I need. 

 

* Encoding: UTF-8.

DATASET CLOSE ALL.

PRESERVE.

SET DECIMAL=DOT.

 

DATA LIST free /

Homicide (F8.1) Gini (F8.2).

BEGIN DATA

1.0 34.94 0.7 30.25 1.7 28.53 1.6 33.68 1.0 26.63

0.8 29.02 2.2 27.74 1.3 33.78 1.0 31.14 1.6 34.48

1.1 32.30 2.0 42.78 0.9 34.41 0.9 28.73 0.6 25.86

1.1 33.25 1.2 35.84 1.6 27.32 0.9 35.79 1.0 26.81

0.7 32.72 1.2 34.81 4.8 40.46

END DATA.

RESTORE.

 

DATASET NAME JFH.

*Get mean values.

AGGREGATE

  /OUTFILE=* MODE=ADDVARIABLES OVERWRITE=YES

  /BREAK=

  /homicide_mean=MEAN(homicide)

  /Gini_mean=MEAN(Gini).

 

*Updated GGRAPH.

GGRAPH

  /GRAPHDATASET

   NAME          = "graphdataset"

   VARIABLES     = Gini           [NAME = 'x'   ]

                   Gini_mean      [NAME = 'x_mn']

                   homicide       [NAME = 'y'   ]

                   homicide_mean  [NAME = 'y_mn']

   MISSING       = LISTWISE

   REPORTMISSING = NO

  /GRAPHSPEC

   DEFAULTTEMPLATE = yes

   SOURCE        = INLINE.

BEGIN GPL

  SOURCE:  s=userSource(id("graphdataset"))

  DATA:    x   =col(source(s), name("x"))

  DATA:    x_mn=col(source(s), name("x_mn"))

  DATA:    y   =col(source(s), name("y"))

  DATA:    y_mn=col(source(s), name("y_mn"))

 

  TRANS:   yr  =eval(-1.591 + 0.091 * x)

  TRANS:   xr  =eval(29.075 + 2.348 * y)

 

  COMMENT:  MAIN GRAPH

  GRAPH:     begin(origin(10%, 10%), scale(70%, 70%))

    GUIDE:   axis(dim(1), label("Gini"))

    GUIDE:   axis(dim(2), label("Homicide"))

 

    COMMENT: vertical x-mean line

    ELEMENT: line(position(x_mn*y)

                 ,size(size."1")

                 ,color(color.red))

                 )

    COMMENT: horizontal y-mean line

    ELEMENT: line(position(x*y_mn)

                 ,size(size."1")

                 ,color(color.blue))

                 )

    COMMENT: vertical lines

    ELEMENT: edge(position(link.join(x*(y_mn+y)))

                 ,shape(shape.half_dash)

                 ,size(size."0.50")

                 ,color(color.blue)

                 )

    COMMENT: horizontal lines

    ELEMENT: edge(position(link.join((x_mn+x)*y))

                 ,shape(shape.half_dash)

                 ,size(size."0.50")

                 ,color(color.red)

                 )

    ELEMENT: point(position(x*y)

                  )

 

    ELEMENT: line(position(x*yr)

                 ,color(color.green)

                 ,size(size."2")

                 )

 

    ELEMENT: line(position(xr*y)

                 ,color(color.purple)

                 ,size(size."2")

                 )

  GRAPH: end()

 

  COMMENT: TOP LETTERBOX

  GRAPH:     begin(origin(10%, 0%), scale(70%, 10%))

    COORD:   rect(dim(1))

    GUIDE:   axis(dim(1), ticks(null()))

    ELEMENT: schema(position(bin.quantile.letter(x))

                   ,size(size."80%")

                   ,color.interior(color.lightblue)

                   )

  GRAPH: end()

 

  COMMENT: RIGHT LETTERBOX

  GRAPH:     begin(origin(80%, 10%), scale(10%, 70%))

    COORD:   transpose(rect(dim(1)))

    GUIDE:   axis(dim(1), ticks(null()))

    ELEMENT: schema(position(bin.quantile.letter(y))

                   ,size(size."80%")

                   ,color.interior(color.lightblue)

                   )

  GRAPH: end()

END GPL.

 

[NB: The output needs a note to explain that the boxplots show median and IQR, not mean and sd.] 

 

The exercise will form part of a new tutorial in which PRogman's contribution will be handsomely acknowledged.  He's a star!

 

The data, extracted from World Bank World Development Indicators, are in Table 8.2: Income inequality and homicide rates in a selection of economically developed countries (page 175) of an impressive new textbook:

 

Robert de Vries

(https://www.kent.ac.uk/sspssr/staff/academic/c-d/devries-robert.html: Author profile, Kent.ac.uk )
Critical Statistics: Seeing Beyond the Headlines

(https://www.macmillanihe.com/page/detail/Critical-Statistics/?K=9781137609809: Publisher)
(Macmillan,
 Red Globe Press, 2018)

The companion site (https://www.macmillanihe.com/companion/De-Vries-Critical-Statistics/) has links to all the URLs featured in the book and my initial comments can be seen on https://surveyresearch.weebly.com/de-vries-2018.html

 

It's long time since I did any programming (in Algol on a KDF9, 1964-68; on a PDP11, 1968-70: input and output on 8-hole paper tape) but I'll see if I can parse the syntax and play with the elements to see what I get.

 

No boxplots outside

No equations on the chart

Labels indicating country (but might be too cluttered)

Build up chart one step at a time (but can be done by reverse editing the spv chart)

 

Next step would be an arc showing the angle between the lines and its cosine (Pearson's r) and an animated applet to get the mean lines to rotate and stabilise when the (elastic) tension has levelled out.  Nice little project for a graphics student?

 

John F Hall  MA (Cantab) Dip Ed (Dunelm)

[Retired academic survey researcher]

 

Email:          [hidden email]

Website:     Journeys in Survey Research

Course:       Survey Analysis Workshop (SPSS)

Research:   Subjective Social Indicators (Quality of Life)

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD