Scatter plot with proportions?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Scatter plot with proportions?

Robert L
Many of my clients have categorical data which should be expressed as "profiles" for different categories. One of the graphs I often suggest they should look at is the stacked bar chart with proportions summing up to 100% for each category. There is a similar type which I can't see how it should be made:

*The data consists of at least one binary variable indicating the status for each individual.

*For each year, there are a number of individuals measured.

*Calculating a crosstab with proportions for each year of the two categories is simple.

*I would like to set up a line plot or scatter plot with the proportion for one of the categories on the y-axis and the year on the x-axis.

The reason it seems to be a good graphical description is that many of the variables people work with here are binary, so the question is actually also how to represent *several* (binary) variables in the same line chart. There should be a simple solution, shouldn't it? Some syntax are enclosed in order describe what kind of data I am referring to. Any suggestions are most welcome.

Robert
***************.*
DATA LIST FREE /yr (F2.0) type (F1.0).
BEGIN DATA
10 1
10 2
10 1
10 1
11 1
11 1
11 1
11 2
11 2
12 1
12 1
12 2
12 1
12 1
12 1
13 2
13 2
13 2
13 2
13 1
13 1
END DATA.

VARIABLE LEVEL type (NOMINAL) yr (ORDINAL).

CROSSTABS yr BY type.

CROSSTABS yr BY type
/CELLS=ROW.

*Stacked bar chart.
GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=yr COUNT()[name="COUNT"] type MISSING=LISTWISE
    REPORTMISSING=NO
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: yr=col(source(s), name("yr"), unit.category())
  DATA: COUNT=col(source(s), name("COUNT"))
  DATA: type=col(source(s), name("type"), unit.category())
  GUIDE: axis(dim(1), label("yr"))
  GUIDE: axis(dim(2), label("Percent"))
  GUIDE: legend(aesthetic(aesthetic.color.interior), label("type"))
  SCALE: linear(dim(2), include(0))
  ELEMENT: interval.stack(position(summary.percent(yr*COUNT, base.coordinate(dim(1)))),
    color.interior(type), shape.interior(shape.square))
END GPL.

*Line plot, but with both categories where only one is needed.
GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=yr COUNT()[name="COUNT"] type MISSING=LISTWISE
    REPORTMISSING=NO
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: yr=col(source(s), name("yr"), unit.category())
  DATA: COUNT=col(source(s), name("COUNT"))
  DATA: type=col(source(s), name("type"), unit.category())
  GUIDE: axis(dim(1), label("yr"))
  GUIDE: axis(dim(2), label("Percent"))
  GUIDE: legend(aesthetic(aesthetic.color.interior), label("type"))
  SCALE: linear(dim(2), include(0))
  ELEMENT: line(position(summary.percent(yr*COUNT, base.coordinate(dim(1)))),
    color.interior(type))
END GPL.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Robert Lundqvist
Reply | Threaded
Open this post in threaded view
|

Re: Scatter plot with proportions?

Bruce Weaver
Administrator
How about this?

DATA LIST FREE /yr (F2.0) type (F1.0).
BEGIN DATA
10 1  10 2  10 1  10 1
11 1  11 1  11 1  11 2  11 2
12 1  12 1  12 2  12 1  12 1  12 1
13 2  13 2  13 2  13 2  13 1  13 1
END DATA.

DATASET DECLARE AggData.
AGGREGATE
  /OUTFILE='AggData'
  /BREAK=yr
  /PercentType1 "Percent Type 1" = PLT(type 2).

DATASET ACTIVATE AggData.
VARIABLE LEVEL yr (ORDINAL).
VARIABLE LABELS yr "Year".

* Chart Builder.
GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=yr PercentType1 MISSING=LISTWISE REPORTMISSING=NO
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: yr=col(source(s), name("yr"), unit.category())
  DATA: PercentType1=col(source(s), name("PercentType1"))
  GUIDE: axis(dim(1), label("Year"))
  GUIDE: axis(dim(2), label("Percent Type 1"))
  SCALE: linear(dim(2), include(0))
  ELEMENT: line(position(yr*PercentType1), missing.wings())
END GPL.



Robert Lundqvist-3 wrote
Many of my clients have categorical data which should be expressed as "profiles" for different categories. One of the graphs I often suggest they should look at is the stacked bar chart with proportions summing up to 100% for each category. There is a similar type which I can't see how it should be made:

*The data consists of at least one binary variable indicating the status for each individual.

*For each year, there are a number of individuals measured.

*Calculating a crosstab with proportions for each year of the two categories is simple.

*I would like to set up a line plot or scatter plot with the proportion for one of the categories on the y-axis and the year on the x-axis.

The reason it seems to be a good graphical description is that many of the variables people work with here are binary, so the question is actually also how to represent *several* (binary) variables in the same line chart. There should be a simple solution, shouldn't it? Some syntax are enclosed in order describe what kind of data I am referring to. Any suggestions are most welcome.

Robert
***************.*
DATA LIST FREE /yr (F2.0) type (F1.0).
BEGIN DATA
10 1
10 2
10 1
10 1
11 1
11 1
11 1
11 2
11 2
12 1
12 1
12 2
12 1
12 1
12 1
13 2
13 2
13 2
13 2
13 1
13 1
END DATA.

VARIABLE LEVEL type (NOMINAL) yr (ORDINAL).

CROSSTABS yr BY type.

CROSSTABS yr BY type
/CELLS=ROW.

*Stacked bar chart.
GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=yr COUNT()[name="COUNT"] type MISSING=LISTWISE
    REPORTMISSING=NO
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: yr=col(source(s), name("yr"), unit.category())
  DATA: COUNT=col(source(s), name("COUNT"))
  DATA: type=col(source(s), name("type"), unit.category())
  GUIDE: axis(dim(1), label("yr"))
  GUIDE: axis(dim(2), label("Percent"))
  GUIDE: legend(aesthetic(aesthetic.color.interior), label("type"))
  SCALE: linear(dim(2), include(0))
  ELEMENT: interval.stack(position(summary.percent(yr*COUNT, base.coordinate(dim(1)))),
    color.interior(type), shape.interior(shape.square))
END GPL.

*Line plot, but with both categories where only one is needed.
GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=yr COUNT()[name="COUNT"] type MISSING=LISTWISE
    REPORTMISSING=NO
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: yr=col(source(s), name("yr"), unit.category())
  DATA: COUNT=col(source(s), name("COUNT"))
  DATA: type=col(source(s), name("type"), unit.category())
  GUIDE: axis(dim(1), label("yr"))
  GUIDE: axis(dim(2), label("Percent"))
  GUIDE: legend(aesthetic(aesthetic.color.interior), label("type"))
  SCALE: linear(dim(2), include(0))
  ELEMENT: line(position(summary.percent(yr*COUNT, base.coordinate(dim(1)))),
    color.interior(type))
END GPL.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Scatter plot with proportions?

PRogman
In reply to this post by Robert L
Robert,
It seems to me that your example only includes one (1) binary variable. You want to plot year*(var1,var2,...varN) in a clustered stacked bar graph. It is difficult to predict how many colors/patterns you will need (well actually 1/value/variable).
If your data is dichotomous with a common outcome (no/yes: binary (0|1)] you can reduce colors to one per variable and just plot one value in a grouped bar, with percentage of the category. Another solution is to plot each variable in a column or row panel after a little data formatting massage.
I am not proficient enough in GPL to know if a grouped stacked bar plot is possible.

/PR

NEW FILE .
DATA LIST FREE / year (F4.0) alfa bravo (2F1.0).
BEGIN DATA
2010 1 0    2010 0 1    2010 1 1    2010 1 1
2011 1 0    2011 1 0    2011 1 0    2011 0 0    2011 0 1    
2012 1 0    2012 1 0    2012 0 1    2012 1 0    2012 1 1    2012 1 0
2013 0 0    2013 0 1    2013 0 0    2013 0 1    2013 1 1    2013 1 1
END DATA.
DATASET NAME GroupStack .

VALUE LABELS
  alfa
  bravo  0 'No'  1 'Yes'
.
VARIABLE LEVEL
 alfa bravo (NOMINAL)
 year       (ORDINAL)
.
AGGREGATE
 /OUTFILE=*
  MODE=ADDVARIABLES
 /BREAK=year
 /alfa_sum=SUM(alfa)
 /bravo_sum=SUM(bravo)
 /ncases=N
.
COMPUTE alfa_prop  = alfa_sum/ncases .
COMPUTE bravo_prop = bravo_sum/ncases .
EXECUTE.

FORMATS
  alfa_sum bravo_sum (F8.0)
 /alfa_prop bravo_prop (F8.2)
.

* Line graph of proportions *.
GGRAPH
 /GRAPHDATASET
  NAME="graphdataset"
  VARIABLES=year
            MEAN(alfa_prop)
            MEAN(bravo_prop)
  MISSING=LISTWISE
  REPORTMISSING=NO
  TRANSFORM=VARSTOCASES(SUMMARY="#SUMMARY" INDEX="#INDEX")
 /GRAPHSPEC
  SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA:   year=col(source(s), name("year"), unit.category())
  DATA:   SUMMARY=col(source(s), name("#SUMMARY"))
  DATA:   INDEX=col(source(s), name("#INDEX"), unit.category())
  GUIDE:  axis(dim(1), label("Year"))
  GUIDE:  axis(dim(2), delta(0.1), label("Proportion"))
  GUIDE:  legend(aesthetic(aesthetic.color.interior), label(""))
  SCALE:  linear(dim(2), min(0), max(1))
  SCALE:  cat(aesthetic(aesthetic.color.interior), include("0", "1"))
  ELEMENT: line(position(year*SUMMARY), color.interior(INDEX), missing.wings())
END GPL.

* Grouped proportions .
GGRAPH
 /GRAPHDATASET
  NAME="graphdataset"
  VARIABLES=year
            MEAN(alfa_prop)
            MEAN(bravo_prop)
  MISSING=LISTWISE
  REPORTMISSING=NO
  TRANSFORM=VARSTOCASES(SUMMARY="#SUMMARY" INDEX="#INDEX")
 /GRAPHSPEC
  SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: year=col(source(s), name("year"), unit.category())
  DATA: SUMMARY=col(source(s), name("#SUMMARY"))
  DATA: INDEX=col(source(s), name("#INDEX"), unit.category())
  COORD: rect(dim(1,2), cluster(3,0))
  GUIDE: axis(dim(3), label("year"))
  GUIDE: axis(dim(2), label("Percent"))
  GUIDE: legend(aesthetic(aesthetic.color.interior), label("Variables"))
  SCALE: linear(dim(2), include(0))
  SCALE: cat(aesthetic(aesthetic.color.interior), include("0", "1"))
  SCALE: cat(dim(1), include("0", "1"))
  ELEMENT: interval(position(INDEX*SUMMARY*year), color.interior(INDEX), shape.interior(shape.square))
END GPL.

* I assume you want a combination of the following two graphs: a grouped stacked bar graph.
* Stacked variable alfa *.
GGRAPH
 /GRAPHDATASET
  NAME="graphdataset"
  VARIABLES=year
            COUNT()[name="COUNT"]
            alfa
  MISSING=LISTWISE
  REPORTMISSING=NO
 /GRAPHSPEC
  SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: year=col(source(s), name("year"),   unit.category())
  DATA: COUNT=col(source(s), name("COUNT"))
  DATA: alfa=col(source(s), name("alfa"),   unit.category())
  GUIDE: axis(dim(1), label("year"))
  GUIDE: axis(dim(2), label("Percent"))
  GUIDE: legend(aesthetic(aesthetic.color.interior), label("alfa"))
  SCALE: linear(dim(2), include(0))
  SCALE: cat(aesthetic(aesthetic.color.interior), include("0", "1"))
  ELEMENT: interval.stack(position(summary.percent(year*COUNT, base.coordinate(dim(1)))), color.interior(alfa),  shape.interior(shape.square))
END GPL.

* Stacked variable bravo *.
GGRAPH
 /GRAPHDATASET
  NAME="graphdataset"
  VARIABLES=year
            COUNT()[name="COUNT"]
            bravo
  MISSING=LISTWISE
  REPORTMISSING=NO
 /GRAPHSPEC
  SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: year=col(source(s), name("year"),   unit.category())
  DATA: COUNT=col(source(s), name("COUNT"))
  DATA: bravo=col(source(s), name("bravo"),   unit.category())
  GUIDE: axis(dim(1), label("year"))
  GUIDE: axis(dim(2), label("Percent"))
  GUIDE: legend(aesthetic(aesthetic.color.interior), label("bravo"))
  SCALE: linear(dim(2), include(0))
  SCALE: cat(aesthetic(aesthetic.color.interior), include("0", "1"))
  ELEMENT: interval.stack(position(summary.percent(year*COUNT, base.coordinate(dim(1)))), color.interior(bravo),  shape.interior(shape.square))
END GPL.

*Possible solution: column panel charts *.
VARSTOCASES
 /MAKE Outcome FROM alfa bravo
 /INDEX=Index(2)
 /KEEP=year alfa_sum bravo_sum ncases alfa_prop bravo_prop
 /NULL=KEEP.

VALUE LABELS
  Index 1 'alfa'  2 'bravo'
.
GGRAPH
 /GRAPHDATASET
  NAME="graphdataset"
  VARIABLES=year
            COUNT()[name="COUNT"]
            Outcome
            Index
  MISSING=LISTWISE
  REPORTMISSING=NO
 /GRAPHSPEC
  SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: year=col(source(s), name("year"), unit.category())
  DATA: COUNT=col(source(s), name("COUNT"))
  DATA: Outcome=col(source(s), name("Outcome"), unit.category())
  DATA: Index=col(source(s), name("Index"), unit.category())
  GUIDE: axis(dim(1), label("year"))
  GUIDE: axis(dim(2), label("Percent"))
  GUIDE: axis(dim(3), label("Index"), opposite())
  GUIDE: legend(aesthetic(aesthetic.color.interior), label("Outcome"))
  SCALE: linear(dim(2), include(0))
  SCALE: cat(aesthetic(aesthetic.color.interior), include("0", "1"))
  ELEMENT: interval.stack(position(summary.percent(year*COUNT*Index, base.coordinate(dim(1)))),
    color.interior(Outcome), shape.interior(shape.square))
END GPL.
*==========================================*.
Reply | Threaded
Open this post in threaded view
|

Re: Scatter plot with proportions?

ViAnn Beadle
In reply to this post by Robert L
Use the GGRAPH PGT command to get your percentage and use the ELEMENT command to create a line for each profile variable. Here's a simple example assuming that you have a type A and a type B variable.

GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=yr[LEVEL=ORDINAL] PGT(typeA, 1)[name="PGT_typeA_1"
    LEVEL=SCALE]
    PGT(typeB, 1)[name="PGT_typeB_1"
    LEVEL=SCALE]
    MISSING=LISTWISE REPORTMISSING=NO
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: yr=col(source(s), name("yr"), unit.category())
  DATA: PGT_typeA_1=col(source(s), name("PGT_typeA_1"))
  DATA: PGT_typeB_1=col(source(s), name("PGT_typeB_1"))
  GUIDE: axis(dim(1), label("yr"))
  GUIDE: axis(dim(2), label("% > 1 typeA"))
  SCALE: linear(dim(2), include(0))
  ELEMENT: line(position(yr*PGT_typeA_1), color(color.blue),  missing.wings())
  ELEMENT: line(position(yr*PGT_typeB_1), color(color.red), missing.wings())
END GPL.
Reply | Threaded
Open this post in threaded view
|

Re: Scatter plot with proportions?

ViAnn Beadle
In reply to this post by Robert L
Oh, a few more aesthetic things, use a label function for each element to label the line and provide a more appropriate y-axis title:
GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=yr[LEVEL=ORDINAL] PGT(typeA, 1)[name="PGT_typeA_1"
    LEVEL=SCALE]
    PGT(typeB, 1)[name="PGT_typeB_1"
    LEVEL=SCALE]
    MISSING=LISTWISE REPORTMISSING=NO
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: yr=col(source(s), name("yr"), unit.category())
  DATA: PGT_typeA_1=col(source(s), name("PGT_typeA_1"))
  DATA: PGT_typeB_1=col(source(s), name("PGT_typeB_1"))
  GUIDE: axis(dim(1), label("yr"))
  GUIDE: axis(dim(2), label("% of Type variables with a value of 2"))
  SCALE: linear(dim(2), include(0))
  ELEMENT: line(position(yr*PGT_typeA_1), color(color.blue),  label("Type A"), missing.wings())
  ELEMENT: line(position(yr*PGT_typeB_1), color(color.red), label("Type B"), missing.wings())
END GPL.
Reply | Threaded
Open this post in threaded view
|

Re: Scatter plot with proportions?

Paul Cook
Hi Robert,

Alternatively, there is a profiling add-on which may be of use to you. It was designed for marketers doing customer profiling. The output is not as you specify, but it may meet your needs. Information and downloads are available at http://dangerousenterprises.com/profile.html

Kind regards,



On 6 October 2013 06:29, ViAnn Beadle <[hidden email]> wrote:
Oh, a few more aesthetic things, use a label function for each element to
label the line and provide a more appropriate y-axis title:
GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=yr[LEVEL=ORDINAL] PGT(typeA,
1)[name="PGT_typeA_1"
    LEVEL=SCALE]
    PGT(typeB, 1)[name="PGT_typeB_1"
    LEVEL=SCALE]
    MISSING=LISTWISE REPORTMISSING=NO
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: yr=col(source(s), name("yr"), unit.category())
  DATA: PGT_typeA_1=col(source(s), name("PGT_typeA_1"))
  DATA: PGT_typeB_1=col(source(s), name("PGT_typeB_1"))
  GUIDE: axis(dim(1), label("yr"))
  GUIDE: axis(dim(2), label("% of Type variables with a value of 2"))
  SCALE: linear(dim(2), include(0))
  ELEMENT: line(position(yr*PGT_typeA_1), color(color.blue),  label("Type
A"), missing.wings())
  ELEMENT: line(position(yr*PGT_typeB_1), color(color.red), label("Type B"),
missing.wings())
END GPL.



--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Scatter-plot-with-proportions-tp5722367p5722399.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD