Cumulative Probability Plots

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Cumulative Probability Plots

ClintR
Dear SPSSX Members, I was wondering if you would recommend SPSSX as a tool for cumulative probability plots? I’m after a software package that will enable me to plot the cumulative probability percentage of a given variable form a large data set against the value of the same variable. This plot enables me to rapidly visualise distinct populations within large data sets of geochemical data. I’m hoping to be able to tag each data point with an id so that I can cross reference the point within the plot with the database. Does SPSSX have the flexibility to do this work or am I stuck with Excel? Kind regards, Clint
Reply | Threaded
Open this post in threaded view
|

Re: Cumulative Probability Plots

Maguin, Eugene
Clint,
 
Yes, Spss, now 19, not 'X', will do cumulataive probability plots. One option is the Examine command; although this command does not support tagging, so far as i know. The other option is constructing graphs in the GPL facility. I have used it but i have not used it to your type of plots. GPL is a graphing language and seems to be extremely flexible and powerful. The downside of that is that constructing a graph is done by writing a set of commands. I'm sure others on the list, more familiar with GPL, can give a fuller comment.
 
At the same time, there are other, what seem to be 'graphing-primary', programs on the market. Two i've seen the promotional material for are JMP and DataDesk. But given what you are working with, i guess that there are more content-focused graphinc/data programs on the market.
 
Gene Maguin


From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of ClintR
Sent: Thursday, June 23, 2011 12:47 AM
To: [hidden email]
Subject: Cumulative Probability Plots

Dear SPSSX Members, I was wondering if you would recommend SPSSX as a tool for cumulative probability plots? I’m after a software package that will enable me to plot the cumulative probability percentage of a given variable form a large data set against the value of the same variable. This plot enables me to rapidly visualise distinct populations within large data sets of geochemical data. I’m hoping to be able to tag each data point with an id so that I can cross reference the point within the plot with the database. Does SPSSX have the flexibility to do this work or am I stuck with Excel? Kind regards, Clint

View this message in context: Cumulative Probability Plots
Sent from the SPSSX Discussion mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: Cumulative Probability Plots

Jon K Peck
In reply to this post by ClintR
Cumulative distributions are often most usefully viewed as a q-q plot against either a theoretical distribution or another variable or dataset. q-q plots against theoretical distributions are built in to SPSS Statistics, and q-q plots against empirical distributions can be obtained via the SPSSINC QQPLOT2 extension command available from the SPSS Community website (www.ibm.com/developerworks/spssdevcentral).

If you want a cumulative histogram, you can use GGRAPH.  The easiest way to do this is to paste a histogram specification from the Chart Builder and then slightly modify the generated GPL code.  Here is an example.

GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=salary MISSING=LISTWISE REPORTMISSING=NO
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: salary=col(source(s), name("salary"))
  GUIDE: axis(dim(1), label("Current Salary"))
  GUIDE: axis(dim(2), label("Frequency"))
  ELEMENT: area(position(summary.count.cumulative(bin.rect(salary, binWidth(100)))), missing.wings())
END GPL.

The changes from the pasted code are to change summary.count to summary.count.cumulative and to add the binWidth function to the bin.rect specification.  I set the bin width to 100 here, but setting the width is optional.

HTH,

Jon Peck
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        ClintR <[hidden email]>
To:        [hidden email]
Date:        06/23/2011 07:55 AM
Subject:        [SPSSX-L] Cumulative Probability Plots
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Dear SPSSX Members, I was wondering if you would recommend SPSSX as a tool for cumulative probability plots? I’m after a software package that will enable me to plot the cumulative probability percentage of a given variable form a large data set against the value of the same variable. This plot enables me to rapidly visualise distinct populations within large data sets of geochemical data. I’m hoping to be able to tag each data point with an id so that I can cross reference the point within the plot with the database. Does SPSSX have the flexibility to do this work or am I stuck with Excel? Kind regards, Clint

View this message in context: Cumulative Probability Plots
Sent from the
SPSSX Discussion mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: Cumulative Probability Plots

Jon K Peck
In reply to this post by ClintR
Or, another variation without any aggregation.

GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=salary id MISSING=LISTWISE REPORTMISSING=NO
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: salary=col(source(s), name("salary"))
  DATA: id=col(source(s), name("id"))
  GUIDE: axis(dim(1), label("Current Salary"))
  GUIDE: axis(dim(2), label("Frequency"))
  ELEMENT: point(position(summary.count.cumulative(salary)))
END GPL.

Jon Peck
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        ClintR <[hidden email]>
To:        [hidden email]
Date:        06/23/2011 07:55 AM
Subject:        [SPSSX-L] Cumulative Probability Plots
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Dear SPSSX Members, I was wondering if you would recommend SPSSX as a tool for cumulative probability plots? I’m after a software package that will enable me to plot the cumulative probability percentage of a given variable form a large data set against the value of the same variable. This plot enables me to rapidly visualise distinct populations within large data sets of geochemical data. I’m hoping to be able to tag each data point with an id so that I can cross reference the point within the plot with the database. Does SPSSX have the flexibility to do this work or am I stuck with Excel? Kind regards, Clint

View this message in context: Cumulative Probability Plots
Sent from the
SPSSX Discussion mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: Cumulative Probability Plots

Jon K Peck
In reply to this post by ClintR
One more version, this time with a case id label.

GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=salary id MISSING=LISTWISE REPORTMISSING=NO
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: salary=col(source(s), name("salary"))
  DATA: id=col(source(s), name("id"))
  GUIDE: axis(dim(1), label("Current Salary"))
  GUIDE: axis(dim(2), label("Frequency"))
  ELEMENT: point(position(summary.count.cumulative(salary)),label(summary.first(id)))
END GPL.

Jon Peck
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        Jon K Peck/Chicago/IBM
To:        ClintR <[hidden email]>
Cc:        [hidden email]
Date:        06/23/2011 09:55 AM
Subject:        Re: [SPSSX-L] Cumulative Probability Plots



Or, another variation without any aggregation.

GGRAPH
  /GRAPHDATASET NAME="graphdataset" VARIABLES=salary id MISSING=LISTWISE REPORTMISSING=NO
  /GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
  SOURCE: s=userSource(id("graphdataset"))
  DATA: salary=col(source(s), name("salary"))
  DATA: id=col(source(s), name("id"))
  GUIDE: axis(dim(1), label("Current Salary"))
  GUIDE: axis(dim(2), label("Frequency"))
  ELEMENT: point(position(summary.count.cumulative(salary)))
END GPL.

Jon Peck
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621





From:        ClintR <[hidden email]>
To:        [hidden email]
Date:        06/23/2011 07:55 AM
Subject:        [SPSSX-L] Cumulative Probability Plots
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Dear SPSSX Members, I was wondering if you would recommend SPSSX as a tool for cumulative probability plots? I’m after a software package that will enable me to plot the cumulative probability percentage of a given variable form a large data set against the value of the same variable. This plot enables me to rapidly visualise distinct populations within large data sets of geochemical data. I’m hoping to be able to tag each data point with an id so that I can cross reference the point within the plot with the database. Does SPSSX have the flexibility to do this work or am I stuck with Excel? Kind regards, Clint

View this message in context: Cumulative Probability Plots
Sent from the
SPSSX Discussion mailing list archive at Nabble.com.