SPSSX Discussion

Plotting Graphs & Visualizations when data is in "long form"

Classic

List

Threaded

10 messages Options

joseph.williams

Plotting Graphs & Visualizations when data is in "long form"

Hi Everyone,

I typically almost analyze data in "wide" form (one participant per line), and am very familiar with plotting Bar Graphs with standard errors using the GUI "Legacy Dialogs".

But I now have a "long form" very large data set where one subject's data spans a 1000+ lines.

Could anyone point me to which graphing menus or functions I should read up documentation on?

Right now if I just use Legacy Dialogs for Bar Graphs as usual, it's calculating variables like mean as if every line was a participant, rather than averaging within each participant.

For example I've found the "Chart Builder" quite confusing, but maybe there's an option in there?

Thanks a lot,

Joseph

Kylie

Re: Plotting Graphs & Visualizations when data is in "long form"

Hi Joseph,

If you want to work with the averages within each participant, use AGGREGATE to create a new dataset of the participant averages (break by participant ID, making new variables with the MEAN function – this will give you one row per participant), and then use Chart Builder to get the graph that you want.

Hope this helps.

Cheers,

Kylie.

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Joseph Williams
Sent: Tuesday, 26 November 2013 11:46 AM
To: [hidden email]
Subject: Plotting Graphs & Visualizations when data is in "long form"

Hi Everyone,

I typically almost analyze data in "wide" form (one participant per line), and am very familiar with plotting Bar Graphs with standard errors using the GUI "Legacy Dialogs".

But I now have a "long form" very large data set where one subject's data spans a 1000+ lines.

Could anyone point me to which graphing menus or functions I should read up documentation on?

Right now if I just use Legacy Dialogs for Bar Graphs as usual, it's calculating variables like mean as if every line was a participant, rather than averaging within each participant.

For example I've found the "Chart Builder" quite confusing, but maybe there's an option in there?

Thanks a lot,

Joseph

Frans Marcelissen-5

Re: Plotting Graphs & Visualizations when data is in "long form"

In reply to this post by joseph.williams

2013/11/26 Joseph Williams <[hidden email]>

Hi Everyone,

I typically almost analyze data in "wide" form (one participant per line), and am very familiar with plotting Bar Graphs with standard errors using the GUI "Legacy Dialogs".

But I now have a "long form" very large data set where one subject's data spans a 1000+ lines.

Could anyone point me to which graphing menus or functions I should read up documentation on?

Right now if I just use Legacy Dialogs for Bar Graphs as usual, it's calculating variables like mean as if every line was a participant, rather than averaging within each participant.

For example I've found the "Chart Builder" quite confusing, but maybe there's an option in there?

Thanks a lot,

Joseph

--

-------------------
dr F.H.G. (Frans) Marcelissen
DigiPsy (www.DigiPsy.nl)
Pomperschans 26
5595 AV Leende
tel: 040 2065030/06 2325 06 53
skype adres: frans.marcelissen
email: [hidden email]

Kornbrot, Diana

Re: Plotting Graphs & Visualizations when data is in "long form"

In reply to this post by joseph.williams

Re: Plotting Graphs & Visualizations when data is in "long form" I have this problem too, must be getting more common as long form is needed for mixed and generalized linear models.
Alos would like correlation matrix, which is simpe in wide form, but less so in long form. Although many procedures for long form do have output option for corr or cov matrices, but no specific procedures.

Do not think graphs are possible at present – future versions request?
We can’t be the only ones

Best

Diana

On 26/11/2013 01:16, "Joseph Williams" <joseph.williams@...> wrote:

Hi Everyone,

I typically almost analyze data in "wide" form (one participant per line), and am very familiar with plotting Bar Graphs with standard errors using the GUI "Legacy Dialogs".

But I now have a "long form" very large data set where one subject's data spans a 1000+ lines.

Could anyone point me to which graphing menus or functions I should read up documentation on?

Right now if I just use Legacy Dialogs for Bar Graphs as usual, it's calculating variables like mean as if every line was a participant, rather than averaging within each participant.

For example I've found the "Chart Builder" quite confusing, but maybe there's an option in there?

Thanks a lot,

Joseph

Professor Diana Kornbrot
email: : d.e.kornbrot@...
web:    http://dianakornbrot.wordpress.com/
            http://go.herts.ac.uk/diana_kornbrot
Work
Department of Psychology
School of Life and Medical Sciences
University of Hertfordshire
College Lane, Hatfield, Hertfordshire AL10 9AB, UK
voice:   +44 (0) 170 728 4626
Home
19 Elmhurst Avenue
London N2 0LT, UK
voice:   +44 (0) 208 444 2081
mobile: +44 (0) 740 318 1612

Kornbrot, Diana

Re: Plotting Graphs & Visualizations when data is in "long form"

In reply to this post by Kylie

Re: Plotting Graphs & Visualizations when data is in "long form" This will work, but not having to create a new data set would be good!
Best
diana

On 26/11/2013 01:40, "Kylie Lange" <kylie.lange@...> wrote:

Hi Joseph,

If you want to work with the averages within each participant, use AGGREGATE to create a new dataset of the participant averages (break by participant ID, making new variables with the MEAN function – this will give you one row per participant), and then use Chart Builder to get the graph that you want.

Hope this helps.

Cheers,
Kylie.

From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Joseph Williams
Sent: Tuesday, 26 November 2013 11:46 AM
To: SPSSX-L@...
Subject: Plotting Graphs & Visualizations when data is in "long form"

Hi Everyone,

I typically almost analyze data in "wide" form (one participant per line), and am very familiar with plotting Bar Graphs with standard errors using the GUI "Legacy Dialogs".

But I now have a "long form" very large data set where one subject's data spans a 1000+ lines.

Could anyone point me to which graphing menus or functions I should read up documentation on?

Right now if I just use Legacy Dialogs for Bar Graphs as usual, it's calculating variables like mean as if every line was a participant, rather than averaging within each participant.

For example I've found the "Chart Builder" quite confusing, but maybe there's an option in there?

Thanks a lot,

Joseph

David Marso

Re: Plotting Graphs & Visualizations when data is in "long form"

Administrator

In reply to this post by joseph.williams

I suspect any answer really depends upon what you want to graph?
"-But I now have a "long form" very large data set where one subject's data spans a 1000+ lines.'
Very large is a relative concept. 1 Million rows of data is modest by some standards. Some people work with SPSS on over 100,000,000 rows . In your case you will likely want to AGGREGATE the data. The specifics of the aggregation totally depends upon your goals.

joseph.williams wrote

Hi Everyone,

I typically almost analyze data in "wide" form (one participant per line),
and am very familiar with plotting Bar Graphs with standard errors using
the GUI "Legacy Dialogs".

But I now have a "long form" very large data set where one subject's data
spans a 1000+ lines.

Could anyone point me to which graphing menus or functions I should read up
documentation on?

Right now if I just use Legacy Dialogs for Bar Graphs as usual, it's
calculating variables like mean as if every line was a participant, rather
than averaging within each participant.

For example I've found the "Chart Builder" quite confusing, but maybe
there's an option in there?

Thanks a lot,

Joseph

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

Maguin, Eugene

Re: Plotting Graphs & Visualizations when data is in "long form"

In reply to this post by joseph.williams

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Joseph Williams
Sent: Monday, November 25, 2013 8:16 PM
To: [hidden email]
Subject: Plotting Graphs & Visualizations when data is in "long form"

Hi Everyone,

I typically almost analyze data in "wide" form (one participant per line), and am very familiar with plotting Bar Graphs with standard errors using the GUI "Legacy Dialogs".

But I now have a "long form" very large data set where one subject's data spans a 1000+ lines.

Could anyone point me to which graphing menus or functions I should read up documentation on?

Right now if I just use Legacy Dialogs for Bar Graphs as usual, it's calculating variables like mean as if every line was a participant, rather than averaging within each participant.

For example I've found the "Chart Builder" quite confusing, but maybe there's an option in there?

Thanks a lot,

Joseph

Jon K Peck

Re: Plotting Graphs & Visualizations when data is in "long form"

The GPL syntax reference is installed with other Help materials. It's in the Reference section of the help along with the CSR and other materials. In Statistics 20, the GPL reference is listed with the CSR and option help.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621

From: "Maguin, Eugene" <[hidden email]>
To: [hidden email],
Date: 11/26/2013 06:56 AM
Subject: Re: [SPSSX-L] Plotting Graphs & Visualizations when data is in "long form"
Sent by: "SPSSX(r) Discussion" <[hidden email]>

In addition to what others have said with respect to Aggregate, which sounds like the immediate, practical solution, I urge you to look at the GPL syntax reference, another FM, although not quite as fine as the command syntax reference and which should have been installed along with everything else. There are numerous examples with syntax. However, there is a conceptual and linguistic shift with GPL that takes time to get used to. Gene Maguin

From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Joseph Williams
Sent: Monday, November 25, 2013 8:16 PM
To: [hidden email]
Subject: Plotting Graphs & Visualizations when data is in "long form"

Hi Everyone,

I typically almost analyze data in "wide" form (one participant per line), and am very familiar with plotting Bar Graphs with standard errors using the GUI "Legacy Dialogs".

But I now have a "long form" very large data set where one subject's data spans a 1000+ lines.

Could anyone point me to which graphing menus or functions I should read up documentation on?

Right now if I just use Legacy Dialogs for Bar Graphs as usual, it's calculating variables like mean as if every line was a participant, rather than averaging within each participant.

For example I've found the "Chart Builder" quite confusing, but maybe there's an option in there?

Thanks a lot,

Joseph

Andy W

Re: Plotting Graphs & Visualizations when data is in "long form"

In reply to this post by joseph.williams

It helps to be more specific (both you and Diana) about what you want. Below are a bunch of examples of different "caterpillar" charts of means and standard errors for a set of fake data with 100 people, 10 waves, with half in an experimental condition and a random categorical grouping variable.

The advice about having SPSS generate the necessary dataset of your specific summaries (through aggregate or split file and regression or OMS or whatever) generically extends to any situation. If you are more specific about what you want the end result to be, it is easier to give advice about how to go about it.

************************************************************.
set seed 10.
input program.
loop #j = 1 to 100.
compute #cate = TRUNC(RV.UNIFORM(1,11)).
loop #i = 1 to 10.
compute pers = #j.
compute wave = #i.
compute exp = (#j > 50).
compute cate = #cate.
end case.
end loop.
end loop.
end file.
end input program.
dataset name long.
*outcome variable.
compute out = RV.NORMAL(pers/10,wave) + RV.UNIFORM(-5,5) + 5*exp*(wave-6)*RV.NORMAL(0,1) - cate.
variable levels wave (ordinal)
exp (nominal)
pers (nominal)
cate (nominal)
out (scale).
formats pers wave exp cate (F2.0) out (F2.0).
value label exp
0 'Control'
1 'Treatment'.
exe.

*Standard Error Graph of out by person - generated this through GUI.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=pers MEANSE(out, 2)[name=
"MEANSE_out_2" LOW="MEANSE_out_2_LOW" HIGH="MEANSE_out_2_HIGH"] MISSING=
LISTWISE REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: pers=col(source(s), name("pers"), unit.category())
DATA: MEAN_out=col(source(s), name("MEANSE_out_2"))
DATA: LOW=col(source(s), name("MEANSE_out_2_LOW"))
DATA: HIGH=col(source(s), name("MEANSE_out_2_HIGH"))
GUIDE: axis(dim(1), label("pers"))
GUIDE: axis(dim(2), label("Mean out"))
GUIDE: text.footnote(label("Error Bars: +/- 2 SE"))
SCALE: cat(dim(1))
SCALE: linear(dim(2), include(0))
ELEMENT: point(position(pers*MEAN_out))
ELEMENT: interval(position(region.spread.range(pers*(LOW+HIGH))),
shape.interior(shape.ibeam))
END GPL.

*Standard Error Graph of out by wave - generated through GUI.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=wave MEANSE(out, 2)[name=
"MEANSE_out_2" LOW="MEANSE_out_2_LOW" HIGH="MEANSE_out_2_HIGH"] MISSING=
LISTWISE REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: wave=col(source(s), name("wave"), unit.category())
DATA: MEAN_out=col(source(s), name("MEANSE_out_2"))
DATA: LOW=col(source(s), name("MEANSE_out_2_LOW"))
DATA: HIGH=col(source(s), name("MEANSE_out_2_HIGH"))
GUIDE: axis(dim(1), label("wave"))
GUIDE: axis(dim(2), label("Mean out"))
GUIDE: text.footnote(label("Error Bars: +/- 2 SE"))
SCALE: cat(dim(1))
SCALE: linear(dim(2), include(0))
ELEMENT: point(position(wave*MEAN_out))
ELEMENT: interval(position(region.spread.range(wave*(LOW+HIGH))),
shape.interior(shape.ibeam))
END GPL.

*SE graph of out by person with categorical group colored and experimental groups in different panels.
*Added false person variable for the graphs (location on the x axis is arbitrary) and then manually added in color statement for element.
*Took out point because bar is symmetric - also took out x axis label since it is arbitrary.
compute pers2 = pers - (50*exp).
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=pers2[LEVEL=NOMINAL] MEANSE(
out, 2)[name="MEANSE_out_2" LOW="MEANSE_out_2_LOW" HIGH=
"MEANSE_out_2_HIGH"] exp cate
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: pers2=col(source(s), name("pers2"), unit.category())
DATA: MEAN_out=col(source(s), name("MEANSE_out_2"))
DATA: LOW=col(source(s), name("MEANSE_out_2_LOW"))
DATA: HIGH=col(source(s), name("MEANSE_out_2_HIGH"))
DATA: exp=col(source(s), name("exp"), unit.category())
DATA: cate=col(source(s), name("cate"), unit.category())
GUIDE: axis(dim(1), null())
GUIDE: axis(dim(2), label("Mean out"))
GUIDE: axis(dim(4), label("exp"), opposite())
GUIDE: text.footnote(label("Error Bars: +/- 2 SE"))
SCALE: cat(dim(1))
SCALE: linear(dim(2), include(0))
SCALE: cat(dim(4), include("0", "1"))
ELEMENT: interval(position(region.spread.range(pers2*(LOW+HIGH)*1*exp)),
shape.interior(shape.ibeam), color(cate))
END GPL.

*Lets do some more panels by group and by treatment.
*Again making a false person variable for location on the x axis.
sort cases by cate exp pers wave.
DO IF $casenum = 1 or cate <> lag(cate) or exp <> lag(exp).
COMPUTE pers3 = 1.
ELSE IF cate = LAG(cate) AND exp = LAG(exp) AND pers = LAG(pers).
COMPUTE pers3 = lag(pers3).
ELSE.
COMPUTE pers3 = lag(pers3) + 1.
END IF.

*Only change to prior graph is added cate to faceting structure in the element statement and used pers3 instead of pers2 & took out axis label for outcome.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=pers3[LEVEL=NOMINAL] MEANSE(
out, 2)[name="MEANSE_out_2" LOW="MEANSE_out_2_LOW" HIGH=
"MEANSE_out_2_HIGH"] exp cate
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: pers3=col(source(s), name("pers3"), unit.category())
DATA: MEAN_out=col(source(s), name("MEANSE_out_2"))
DATA: LOW=col(source(s), name("MEANSE_out_2_LOW"))
DATA: HIGH=col(source(s), name("MEANSE_out_2_HIGH"))
DATA: exp=col(source(s), name("exp"), unit.category())
DATA: cate=col(source(s), name("cate"), unit.category())
GUIDE: axis(dim(1), null(), label("Person"))
GUIDE: axis(dim(2))
GUIDE: axis(dim(3), opposite())
GUIDE: axis(dim(4), opposite())
GUIDE: text.footnote(label("Error Bars: +/- 2 SE for persons"))
SCALE: cat(dim(1))
SCALE: linear(dim(2))
ELEMENT: interval(position(region.spread.range(pers3*(LOW+HIGH)*exp*cate)),
shape.interior(shape.ibeam))
END GPL.
*This shows group 3 is confounded - no individuals in the control panel.
************************************************************.

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/

Jon K Peck

Re: Plotting Graphs & Visualizations when data is in "long form"

In reply to this post by Frans Marcelissen-5

I'm not sure that that is the plot anyone would want, but it is simple to do it in native Statistics code.
compute means = mean(V1 to V10).
GRAPH /BAR(SIMPLE)=VALUE(means).

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621

From: Frans Marcelissen <[hidden email]>
To: [hidden email],
Date: 11/26/2013 07:20 AM
Subject: Re: [SPSSX-L] Plotting Graphs & Visualizations when data is in "long form"
Sent by: "SPSSX(r) Discussion" <[hidden email]>

Hi Joseph,
This can very easily be done using the R plugin. R is much better in data manipulation and graphing than the basic spss.:

begin program r.
vars<-spssdata.GetDataFromSPSS()
barplot(
   rowMeans(vars[-1],
      na.rm=T),
    names=vars[1],
    horiz=T)
end program.

2013/11/26 Joseph Williams <joseph.williams@...>
Hi Everyone,

I typically almost analyze data in "wide" form (one participant per line), and am very familiar with plotting Bar Graphs with standard errors using the GUI "Legacy Dialogs".

But I now have a "long form" very large data set where one subject's data spans a 1000+ lines.

Could anyone point me to which graphing menus or functions I should read up documentation on?

Right now if I just use Legacy Dialogs for Bar Graphs as usual, it's calculating variables like mean as if every line was a participant, rather than averaging within each participant.

For example I've found the "Chart Builder" quite confusing, but maybe there's an option in there?

Thanks a lot,

Joseph

--

-------------------
dr F.H.G. (Frans) Marcelissen
DigiPsy (www.DigiPsy.nl)
Pomperschans 26
5595 AV Leende
tel: 040 2065030/06 2325 06 53
skype adres: frans.marcelissen
email: frans.marcelissen@...