Data Display Question

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Data Display Question

David Thompson
Colleagues:

I have a set of 17 customer satisfaction ratings. I would like to calculate the mean for each of the 17 categories, and then display the means in a histogram (same page, 17 bars).

What is the easiest way to accomplish this in SPSS?

David


David W. Thompson, PhD, ABPP
Diplomate in Forensic Psychology
American Board of Professional Psychology

Deputy Director
Walworth Co. Dept. of Health and Human Services
262-741-3232 (voice) 262-741-3217 (fax)

NOTICE OF CONFIDENTIALITY: This e-mail and any files transmitted with it may contain information that is privileged, confidential, and exempt from disclosure under applicable laws. This communication is intended for the sole use of the individual or entity to which it is addressed. Dissemination, forwarding, printing, or copying of this e-mail and any files transmitted with it without the consent of the sender is strictly prohibited. If you have received this e-mail in error, please do not distribute it. Please notify the sender by e-mail at the address shown and delete the original message. Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: Data Display Question

Andy W
My guess is below, for future reference it is easier if you give an example of what your data currently look like. Pretty much the code can be summed up as AGGREGATE to get the means, then reshape so all 17 means are in the same column. Then plotting is easy.

*******************************.
set seed = 10. /* making data that I think looks like yours */.
input program.
loop #i = 1 to 100.
vector cat(17).
do repeat x = cat1 to cat17.
    compute x = RV.NORMAL(0,1).
end repeat.
end case.
end loop.
end file.
end input program.
dataset name sim.
execute.

*aggregate to means, reshape varstocases and then plot.
compute break = 1.
DATASET DECLARE agg_cat.
AGGREGATE
  /OUTFILE='agg_cat'
  /BREAK=break
  /cat1 to cat17 = MEAN(cat1 to cat17).
dataset activate agg_cat.
varstocases
/make cat from cat1 to cat17
/index id
/drop break.
GRAPH
  /BAR(SIMPLE)=VALUE( cat ) BY id.
*******************************.

You could reshape the original data and then use summary options to plot the original data (useful to see distributions for each of the 17 categories besides just the summary means etc.) Also FYI I would not call this a histogram, I understand how they are similar but IMO that just potentially leads to confusion (binning vs. actual values).
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Data Display Question

David Marso
Administrator
You can also get the graph directly with something like:
GRAPH
  /BAR(SIMPLE)= MEAN(cat1) MEAN(cat2) MEAN(cat3) MEAN(cat4) MEAN(cat5)
  MEAN(cat6) MEAN(cat7) MEAN(cat8) MEAN(cat9) MEAN(cat10) MEAN(cat11)
  MEAN(cat12) MEAN(cat13) MEAN(cat14) MEAN(cat15) MEAN(cat16) MEAN(cat17)
  /MISSING=LISTWISE .
--
IIRC: modern versions permit AGG without specifying a BREAK so that can be simplified.
Also VARSOTOCASES allows creating the index based on the variable name so that might end up with a more interpretable graph.
---
VARSTOCASES  /MAKE cat FROM cat1 TO cat17 /INDEX = VarName(cat).
GRAPH  /BAR(SIMPLE)=VALUE( cat ) BY Varname.



Andy W wrote
My guess is below, for future reference it is easier if you give an example of what your data currently look like. Pretty much the code can be summed up as AGGREGATE to get the means, then reshape so all 17 means are in the same column. Then plotting is easy.

*******************************.
set seed = 10. /* making data that I think looks like yours */.
input program.
loop #i = 1 to 100.
vector cat(17).
do repeat x = cat1 to cat17.
    compute x = RV.NORMAL(0,1).
end repeat.
end case.
end loop.
end file.
end input program.
dataset name sim.
execute.

*aggregate to means, reshape varstocases and then plot.
compute break = 1.
DATASET DECLARE agg_cat.
AGGREGATE
  /OUTFILE='agg_cat'
  /BREAK=break
  /cat1 to cat17 = MEAN(cat1 to cat17).
dataset activate agg_cat.
varstocases
/make cat from cat1 to cat17
/index id
/drop break.
GRAPH
  /BAR(SIMPLE)=VALUE( cat ) BY id.
*******************************.

You could reshape the original data and then use summary options to plot the original data (useful to see distributions for each of the 17 categories besides just the summary means etc.) Also FYI I would not call this a histogram, I understand how they are similar but IMO that just potentially leads to confusion (binning vs. actual values).
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Data Display Question

Andy W
Good point! I forgot about that legacy graph with multiple variables - some experimentation suggests you can use the TO modifier within it as well - so below works and is much simpler to write (NOTE TO OP: variables need to be contiguous in the dataset though for this TO modifier to work like this). This is easy peasy and if you give nice variable labels they will show up in the chart.

*******************************.
variable label Cat1 'Item 1'.
GRAPH
  /BAR(SIMPLE)= MEAN(cat1 TO cat17)
  /MISSING=LISTWISE .
*******************************.

I fiddled around a bit - and below is what I thought would be equivalent GGRAPH code using VARSTOCASES within, but the TO modifier does not work within a summary statistic (at least for my V15 I'm on at the moment).

*******************************.
*To within mean doesn't seem to work.
GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES= MEAN(cat1 TO cat17) MISSING=LISTWISE
  REPORTMISSING=NO  TRANSFORM = VARSTOCASES(SUMMARY="MeanV" INDEX="Cat")
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
 SOURCE: s=userSource(id("graphdataset"))
 DATA: MeanV=col(source(s), name("MeanV"))
 DATA: Cat=col(source(s), name("Cat"))
 GUIDE: axis(dim(1), label("cat2"))
 GUIDE: axis(dim(2), label("cat1"))
 ELEMENT: interval(position(Cat*Mean))
END GPL.
*This gives an error that MEAN has unexpected number of arguments.
*******************************.

But ..... you can do a work around with the original variables and then specifying a summary statistic withing the ELEMENT statement.

*******************************.
*But this works!.
GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES= cat1 TO cat17 MISSING=LISTWISE
  REPORTMISSING=NO  TRANSFORM = VARSTOCASES(SUMMARY="MeanV" INDEX="Cat")
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
 SOURCE: s=userSource(id("graphdataset"))
 DATA: MeanV=col(source(s), name("MeanV"))
 DATA: Cat=col(source(s), name("Cat"), unit.category())
 GUIDE: axis(dim(1), label("Category"))
 GUIDE: axis(dim(2), label("Mean Value"))
 ELEMENT: interval(position(summary.mean(Cat*MeanV)))
END GPL.
*******************************.

It is still good to know about aggregating the data and plotting the actual values though. I frequently want tables as well, and sometimes with the actual values it is easier to make the graph you want. Good to know all the options though.

While we are at it - I would probably suggest plotting error bars - or showing the actual data values (examples below) - See my example discussion at Avoid Dynamite Plots! and example below;

*******************************.
GRAPH
  /ERRORBAR( CI 95 )=cat1 to cat17
  /MISSING=LISTWISE .

*Or actual data values.
GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES= cat1 TO cat17 MISSING=LISTWISE
  REPORTMISSING=NO  TRANSFORM = VARSTOCASES(SUMMARY="MeanV" INDEX="Cat")
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
 SOURCE: s=userSource(id("graphdataset"))
 DATA: MeanV=col(source(s), name("MeanV"))
 DATA: Cat=col(source(s), name("Cat"), unit.category())
 COORD: rect(dim(1,2), transpose())
 GUIDE: axis(dim(1), label("Category"))
 GUIDE: axis(dim(2), label("Mean Value"))
 ELEMENT: point.jitter.conditional(position(MeanV*Cat), size(size."2"))
END GPL.
*******************************.

To superimpose standard error bars over the jittered scatterplot takes more work (so not shown here) but can be done.

Andy
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Data Display Question

David Thompson
Nice! Now, can I add color to the graph?


David W. Thompson, PhD, ABPP
Diplomate in Forensic Psychology
American Board of Professional Psychology

Deputy Director
Walworth Co. Dept. of Health and Human Services
262-741-3232 (voice) 262-741-3217 (fax)

NOTICE OF CONFIDENTIALITY: This e-mail and any files transmitted with it may contain information that is privileged, confidential, and exempt from disclosure under applicable laws. This communication is intended for the sole use of the individual or entity to which it is addressed. Dissemination, forwarding, printing, or copying of this e-mail and any files transmitted with it without the consent of the sender is strictly prohibited. If you have received this e-mail in error, please do not distribute it. Please notify the sender by e-mail at the address shown and delete the original message. Thank you.




From:        Andy W <[hidden email]>
To:        [hidden email]
Date:        01/25/2013 08:52 AM
Subject:        Re: Data Display Question
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Good point! I forgot about that legacy graph with multiple variables - some
experimentation suggests you can use the TO modifier within it as well - so
below works and is much simpler to write (NOTE TO OP: variables need to be
contiguous in the dataset though for this TO modifier to work like this).
This is easy peasy and if you give nice variable labels they will show up in
the chart.

*******************************.
variable label Cat1 'Item 1'.
GRAPH
 /BAR(SIMPLE)= MEAN(cat1 TO cat17)
 /MISSING=LISTWISE .
*******************************.

I fiddled around a bit - and below is what I thought would be equivalent
GGRAPH code using VARSTOCASES within, but the TO modifier does not work
within a summary statistic (at least for my V15 I'm on at the moment).

*******************************.
*To within mean doesn't seem to work.
GGRAPH
 /GRAPHDATASET NAME="graphdataset" VARIABLES= MEAN(cat1 TO cat17)
MISSING=LISTWISE
 REPORTMISSING=NO  TRANSFORM = VARSTOCASES(SUMMARY="MeanV" INDEX="Cat")
 /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: MeanV=col(source(s), name("MeanV"))
DATA: Cat=col(source(s), name("Cat"))
GUIDE: axis(dim(1), label("cat2"))
GUIDE: axis(dim(2), label("cat1"))
ELEMENT: interval(position(Cat*Mean))
END GPL.
*This gives an error that MEAN has unexpected number of arguments.
*******************************.

But ..... you can do a work around with the original variables and then
specifying a summary statistic withing the ELEMENT statement.

*******************************.
*But this works!.
GGRAPH
 /GRAPHDATASET NAME="graphdataset" VARIABLES= cat1 TO cat17
MISSING=LISTWISE
 REPORTMISSING=NO  TRANSFORM = VARSTOCASES(SUMMARY="MeanV" INDEX="Cat")
 /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: MeanV=col(source(s), name("MeanV"))
DATA: Cat=col(source(s), name("Cat"), unit.category())
GUIDE: axis(dim(1), label("Category"))
GUIDE: axis(dim(2), label("Mean Value"))
ELEMENT: interval(position(summary.mean(Cat*MeanV)))
END GPL.
*******************************.

It is still good to know about aggregating the data and plotting the actual
values though. I frequently want tables as well, and sometimes with the
actual values it is easier to make the graph you want. Good to know all the
options though.

While we are at it - I would probably suggest plotting error bars - or
showing the actual data values (examples below) - See my example discussion
at  Avoid Dynamite Plots!
<
http://andrewpwheeler.wordpress.com/2012/02/20/avoid-dynamite-plots-visualizing-dot-plots-with-super-imposed-confidence-intervals-in-spss-and-r/>
and example below;

*******************************.
GRAPH
 /ERRORBAR( CI 95 )=cat1 to cat17
 /MISSING=LISTWISE .

*Or actual data values.
GGRAPH
 /GRAPHDATASET NAME="graphdataset" VARIABLES= cat1 TO cat17
MISSING=LISTWISE
 REPORTMISSING=NO  TRANSFORM = VARSTOCASES(SUMMARY="MeanV" INDEX="Cat")
 /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: MeanV=col(source(s), name("MeanV"))
DATA: Cat=col(source(s), name("Cat"), unit.category())
COORD: rect(dim(1,2), transpose())
GUIDE: axis(dim(1), label("Category"))
GUIDE: axis(dim(2), label("Mean Value"))
ELEMENT: point.jitter.conditional(position(MeanV*Cat), size(size."2"))
END GPL.
*******************************.

To superimpose standard error bars over the jittered scatterplot takes more
work (so not shown here) but can be done.

Andy



-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Data-Display-Question-tp5717664p5717692.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Data Display Question

Andy W
What exactly do you want colored and why do you want color?
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Data Display Question

Andy W
In reply to this post by David Thompson
In reply to this email, please post directly to the list in the future.

----------------------------------------------------
I want to color the individual bars in the graph. I want to do so because my boss wants them colored for a presentation to our board. Each bar in the graph represents a different variable mean.

David

David W. Thompson, PhD, ABPP
Diplomate in Forensic Psychology
American Board of Professional Psychology
-----------------------------------------------------

You can do this with GGRAPH (but not with the legacy graph commands). Within the ELEMENT statement you would simply add a command to color the interior of the bars according to "Cat" - example at the end of the post. A few things to note, many people would frown on this, and (reasonably) say it is chart junk, as the color does not encode any information at best (and at worst is visually annoying). I would agree with that sentiment for the example below, but you can make it a bit more palatable by coloring the bars with a purpose. For example, if categories form into some similar groups (e.g. categories 1 to 4 are similar, and categories 5 to 8 are similar) you can color the categories within each group simply varying shades of one color (e.g. categories 1 to 4 varying shades of blue and categories 5 to 8 varying shades of green).

Another reason to do this is because it is near impossible to choose 17 different nominal colors and make the chart still look nice. Using a smaller subset of colors and changing the hues will make an overall nicer looking chart. I leave it as an exercise to the reader to choose the colors, but here you can see an example of mapping a color aesthetic to particular categories (http://spssx-discussion.1045642.n5.nabble.com/Manually-adding-legend-in-GPL-td5714324.html), you can also post-hoc edit the chart to the same end. I would suggest the online color-brewer app (http://colorbrewer2.org/) to choose the set of varying hues and/or nominal colors for the charts.


*******************************.
set seed = 10. /* making data that I think looks like yours */.
input program.
loop #i = 1 to 100.
vector cat(17).
do repeat x = cat1 to cat17.
    compute x = RV.NORMAL(0,1).
end repeat.
end case.
end loop.
end file.
end input program.
dataset name sim.
execute.

GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES= cat1 TO cat17 MISSING=LISTWISE
  REPORTMISSING=NO  TRANSFORM = VARSTOCASES(SUMMARY="MeanV" INDEX="Cat")
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
 SOURCE: s=userSource(id("graphdataset"))
 DATA: MeanV=col(source(s), name("MeanV"))
 DATA: Cat=col(source(s), name("Cat"), unit.category())
 GUIDE: axis(dim(1), label("Category"))
 GUIDE: axis(dim(2), label("Mean Value"))
 ELEMENT: interval(position(summary.mean(Cat*MeanV)), color.interior(Cat))
END GPL.
*******************************.
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/