Many of my clients have categorical data which should be expressed as "profiles" for different categories. One of the graphs I often suggest they should look at is the stacked bar chart with proportions summing up to 100% for each category. There is a similar type which I can't see how it should be made:
*The data consists of at least one binary variable indicating the status for each individual. *For each year, there are a number of individuals measured. *Calculating a crosstab with proportions for each year of the two categories is simple. *I would like to set up a line plot or scatter plot with the proportion for one of the categories on the y-axis and the year on the x-axis. The reason it seems to be a good graphical description is that many of the variables people work with here are binary, so the question is actually also how to represent *several* (binary) variables in the same line chart. There should be a simple solution, shouldn't it? Some syntax are enclosed in order describe what kind of data I am referring to. Any suggestions are most welcome. Robert ***************.* DATA LIST FREE /yr (F2.0) type (F1.0). BEGIN DATA 10 1 10 2 10 1 10 1 11 1 11 1 11 1 11 2 11 2 12 1 12 1 12 2 12 1 12 1 12 1 13 2 13 2 13 2 13 2 13 1 13 1 END DATA. VARIABLE LEVEL type (NOMINAL) yr (ORDINAL). CROSSTABS yr BY type. CROSSTABS yr BY type /CELLS=ROW. *Stacked bar chart. GGRAPH /GRAPHDATASET NAME="graphdataset" VARIABLES=yr COUNT()[name="COUNT"] type MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE. BEGIN GPL SOURCE: s=userSource(id("graphdataset")) DATA: yr=col(source(s), name("yr"), unit.category()) DATA: COUNT=col(source(s), name("COUNT")) DATA: type=col(source(s), name("type"), unit.category()) GUIDE: axis(dim(1), label("yr")) GUIDE: axis(dim(2), label("Percent")) GUIDE: legend(aesthetic(aesthetic.color.interior), label("type")) SCALE: linear(dim(2), include(0)) ELEMENT: interval.stack(position(summary.percent(yr*COUNT, base.coordinate(dim(1)))), color.interior(type), shape.interior(shape.square)) END GPL. *Line plot, but with both categories where only one is needed. GGRAPH /GRAPHDATASET NAME="graphdataset" VARIABLES=yr COUNT()[name="COUNT"] type MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE. BEGIN GPL SOURCE: s=userSource(id("graphdataset")) DATA: yr=col(source(s), name("yr"), unit.category()) DATA: COUNT=col(source(s), name("COUNT")) DATA: type=col(source(s), name("type"), unit.category()) GUIDE: axis(dim(1), label("yr")) GUIDE: axis(dim(2), label("Percent")) GUIDE: legend(aesthetic(aesthetic.color.interior), label("type")) SCALE: linear(dim(2), include(0)) ELEMENT: line(position(summary.percent(yr*COUNT, base.coordinate(dim(1)))), color.interior(type)) END GPL. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Robert Lundqvist
|
Administrator
|
How about this?
DATA LIST FREE /yr (F2.0) type (F1.0). BEGIN DATA 10 1 10 2 10 1 10 1 11 1 11 1 11 1 11 2 11 2 12 1 12 1 12 2 12 1 12 1 12 1 13 2 13 2 13 2 13 2 13 1 13 1 END DATA. DATASET DECLARE AggData. AGGREGATE /OUTFILE='AggData' /BREAK=yr /PercentType1 "Percent Type 1" = PLT(type 2). DATASET ACTIVATE AggData. VARIABLE LEVEL yr (ORDINAL). VARIABLE LABELS yr "Year". * Chart Builder. GGRAPH /GRAPHDATASET NAME="graphdataset" VARIABLES=yr PercentType1 MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE. BEGIN GPL SOURCE: s=userSource(id("graphdataset")) DATA: yr=col(source(s), name("yr"), unit.category()) DATA: PercentType1=col(source(s), name("PercentType1")) GUIDE: axis(dim(1), label("Year")) GUIDE: axis(dim(2), label("Percent Type 1")) SCALE: linear(dim(2), include(0)) ELEMENT: line(position(yr*PercentType1), missing.wings()) END GPL.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Robert L
Robert,
It seems to me that your example only includes one (1) binary variable. You want to plot year*(var1,var2,...varN) in a clustered stacked bar graph. It is difficult to predict how many colors/patterns you will need (well actually 1/value/variable). If your data is dichotomous with a common outcome (no/yes: binary (0|1)] you can reduce colors to one per variable and just plot one value in a grouped bar, with percentage of the category. Another solution is to plot each variable in a column or row panel after a little data formatting massage. I am not proficient enough in GPL to know if a grouped stacked bar plot is possible. /PR NEW FILE . DATA LIST FREE / year (F4.0) alfa bravo (2F1.0). BEGIN DATA 2010 1 0 2010 0 1 2010 1 1 2010 1 1 2011 1 0 2011 1 0 2011 1 0 2011 0 0 2011 0 1 2012 1 0 2012 1 0 2012 0 1 2012 1 0 2012 1 1 2012 1 0 2013 0 0 2013 0 1 2013 0 0 2013 0 1 2013 1 1 2013 1 1 END DATA. DATASET NAME GroupStack . VALUE LABELS alfa bravo 0 'No' 1 'Yes' . VARIABLE LEVEL alfa bravo (NOMINAL) year (ORDINAL) . AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=year /alfa_sum=SUM(alfa) /bravo_sum=SUM(bravo) /ncases=N . COMPUTE alfa_prop = alfa_sum/ncases . COMPUTE bravo_prop = bravo_sum/ncases . EXECUTE. FORMATS alfa_sum bravo_sum (F8.0) /alfa_prop bravo_prop (F8.2) . * Line graph of proportions *. GGRAPH /GRAPHDATASET NAME="graphdataset" VARIABLES=year MEAN(alfa_prop) MEAN(bravo_prop) MISSING=LISTWISE REPORTMISSING=NO TRANSFORM=VARSTOCASES(SUMMARY="#SUMMARY" INDEX="#INDEX") /GRAPHSPEC SOURCE=INLINE. BEGIN GPL SOURCE: s=userSource(id("graphdataset")) DATA: year=col(source(s), name("year"), unit.category()) DATA: SUMMARY=col(source(s), name("#SUMMARY")) DATA: INDEX=col(source(s), name("#INDEX"), unit.category()) GUIDE: axis(dim(1), label("Year")) GUIDE: axis(dim(2), delta(0.1), label("Proportion")) GUIDE: legend(aesthetic(aesthetic.color.interior), label("")) SCALE: linear(dim(2), min(0), max(1)) SCALE: cat(aesthetic(aesthetic.color.interior), include("0", "1")) ELEMENT: line(position(year*SUMMARY), color.interior(INDEX), missing.wings()) END GPL. * Grouped proportions . GGRAPH /GRAPHDATASET NAME="graphdataset" VARIABLES=year MEAN(alfa_prop) MEAN(bravo_prop) MISSING=LISTWISE REPORTMISSING=NO TRANSFORM=VARSTOCASES(SUMMARY="#SUMMARY" INDEX="#INDEX") /GRAPHSPEC SOURCE=INLINE. BEGIN GPL SOURCE: s=userSource(id("graphdataset")) DATA: year=col(source(s), name("year"), unit.category()) DATA: SUMMARY=col(source(s), name("#SUMMARY")) DATA: INDEX=col(source(s), name("#INDEX"), unit.category()) COORD: rect(dim(1,2), cluster(3,0)) GUIDE: axis(dim(3), label("year")) GUIDE: axis(dim(2), label("Percent")) GUIDE: legend(aesthetic(aesthetic.color.interior), label("Variables")) SCALE: linear(dim(2), include(0)) SCALE: cat(aesthetic(aesthetic.color.interior), include("0", "1")) SCALE: cat(dim(1), include("0", "1")) ELEMENT: interval(position(INDEX*SUMMARY*year), color.interior(INDEX), shape.interior(shape.square)) END GPL. * I assume you want a combination of the following two graphs: a grouped stacked bar graph. * Stacked variable alfa *. GGRAPH /GRAPHDATASET NAME="graphdataset" VARIABLES=year COUNT()[name="COUNT"] alfa MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE. BEGIN GPL SOURCE: s=userSource(id("graphdataset")) DATA: year=col(source(s), name("year"), unit.category()) DATA: COUNT=col(source(s), name("COUNT")) DATA: alfa=col(source(s), name("alfa"), unit.category()) GUIDE: axis(dim(1), label("year")) GUIDE: axis(dim(2), label("Percent")) GUIDE: legend(aesthetic(aesthetic.color.interior), label("alfa")) SCALE: linear(dim(2), include(0)) SCALE: cat(aesthetic(aesthetic.color.interior), include("0", "1")) ELEMENT: interval.stack(position(summary.percent(year*COUNT, base.coordinate(dim(1)))), color.interior(alfa), shape.interior(shape.square)) END GPL. * Stacked variable bravo *. GGRAPH /GRAPHDATASET NAME="graphdataset" VARIABLES=year COUNT()[name="COUNT"] bravo MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE. BEGIN GPL SOURCE: s=userSource(id("graphdataset")) DATA: year=col(source(s), name("year"), unit.category()) DATA: COUNT=col(source(s), name("COUNT")) DATA: bravo=col(source(s), name("bravo"), unit.category()) GUIDE: axis(dim(1), label("year")) GUIDE: axis(dim(2), label("Percent")) GUIDE: legend(aesthetic(aesthetic.color.interior), label("bravo")) SCALE: linear(dim(2), include(0)) SCALE: cat(aesthetic(aesthetic.color.interior), include("0", "1")) ELEMENT: interval.stack(position(summary.percent(year*COUNT, base.coordinate(dim(1)))), color.interior(bravo), shape.interior(shape.square)) END GPL. *Possible solution: column panel charts *. VARSTOCASES /MAKE Outcome FROM alfa bravo /INDEX=Index(2) /KEEP=year alfa_sum bravo_sum ncases alfa_prop bravo_prop /NULL=KEEP. VALUE LABELS Index 1 'alfa' 2 'bravo' . GGRAPH /GRAPHDATASET NAME="graphdataset" VARIABLES=year COUNT()[name="COUNT"] Outcome Index MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE. BEGIN GPL SOURCE: s=userSource(id("graphdataset")) DATA: year=col(source(s), name("year"), unit.category()) DATA: COUNT=col(source(s), name("COUNT")) DATA: Outcome=col(source(s), name("Outcome"), unit.category()) DATA: Index=col(source(s), name("Index"), unit.category()) GUIDE: axis(dim(1), label("year")) GUIDE: axis(dim(2), label("Percent")) GUIDE: axis(dim(3), label("Index"), opposite()) GUIDE: legend(aesthetic(aesthetic.color.interior), label("Outcome")) SCALE: linear(dim(2), include(0)) SCALE: cat(aesthetic(aesthetic.color.interior), include("0", "1")) ELEMENT: interval.stack(position(summary.percent(year*COUNT*Index, base.coordinate(dim(1)))), color.interior(Outcome), shape.interior(shape.square)) END GPL. *==========================================*. |
In reply to this post by Robert L
Use the GGRAPH PGT command to get your percentage and use the ELEMENT command to create a line for each profile variable. Here's a simple example assuming that you have a type A and a type B variable.
GGRAPH /GRAPHDATASET NAME="graphdataset" VARIABLES=yr[LEVEL=ORDINAL] PGT(typeA, 1)[name="PGT_typeA_1" LEVEL=SCALE] PGT(typeB, 1)[name="PGT_typeB_1" LEVEL=SCALE] MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE. BEGIN GPL SOURCE: s=userSource(id("graphdataset")) DATA: yr=col(source(s), name("yr"), unit.category()) DATA: PGT_typeA_1=col(source(s), name("PGT_typeA_1")) DATA: PGT_typeB_1=col(source(s), name("PGT_typeB_1")) GUIDE: axis(dim(1), label("yr")) GUIDE: axis(dim(2), label("% > 1 typeA")) SCALE: linear(dim(2), include(0)) ELEMENT: line(position(yr*PGT_typeA_1), color(color.blue), missing.wings()) ELEMENT: line(position(yr*PGT_typeB_1), color(color.red), missing.wings()) END GPL. |
In reply to this post by Robert L
Oh, a few more aesthetic things, use a label function for each element to label the line and provide a more appropriate y-axis title:
GGRAPH /GRAPHDATASET NAME="graphdataset" VARIABLES=yr[LEVEL=ORDINAL] PGT(typeA, 1)[name="PGT_typeA_1" LEVEL=SCALE] PGT(typeB, 1)[name="PGT_typeB_1" LEVEL=SCALE] MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE. BEGIN GPL SOURCE: s=userSource(id("graphdataset")) DATA: yr=col(source(s), name("yr"), unit.category()) DATA: PGT_typeA_1=col(source(s), name("PGT_typeA_1")) DATA: PGT_typeB_1=col(source(s), name("PGT_typeB_1")) GUIDE: axis(dim(1), label("yr")) GUIDE: axis(dim(2), label("% of Type variables with a value of 2")) SCALE: linear(dim(2), include(0)) ELEMENT: line(position(yr*PGT_typeA_1), color(color.blue), label("Type A"), missing.wings()) ELEMENT: line(position(yr*PGT_typeB_1), color(color.red), label("Type B"), missing.wings()) END GPL. |
Hi Robert, Alternatively, there is a profiling add-on which may be of use to you. It was designed for marketers doing customer profiling. The output is not as you specify, but it may meet your needs. Information and downloads are available at http://dangerousenterprises.com/profile.html
Kind regards, On 6 October 2013 06:29, ViAnn Beadle <[hidden email]> wrote: Oh, a few more aesthetic things, use a label function for each element to |
Free forum by Nabble | Edit this page |