need help creating violin plots

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

need help creating violin plots

Cleland, Patricia (EDU)

I have data on the % of clients who met a criterion (format f8.2) for approx 5000 clinics. The clinics are nested in about 100 Agencies and the Agencies are nested in 7 Regions. Each Clinic ‘belongs’ to only 1 Agency and each Agency is a member of only 1 Region.

 

Currently the Clinic and Agency ID’s are strings, but could easily be changed to numeric values if that makes any difference.

 

In order to show graphically the variation in % of clients who meet a criterion both within and among Agencies, I want to do violin plots separately for each Region, that is, 7 separate charts, showing the data for each Agency in the Region. 

 

I have some syntax from a colleague for producing violin plots but it’s based on using R as a stand-alone product rather than as an extension of SPSS. Since I’m a newbie at R, I don’t know what to modify in the syntax for use as an SPSS extension.

 

Any help would be appreciated.

 

Any suggestions for learning R, especially as an SPSS extension, would also be appreciated.

 

Thanks.

Pat

Reply | Threaded
Open this post in threaded view
|

Re: need help creating violin plots

ViAnn Beadle

What is a violin plot? Any examples online?

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Cleland, Patricia (EDU)
Sent: Wednesday, February 09, 2011 12:34 PM
To: [hidden email]
Subject: need help creating violin plots

 

I have data on the % of clients who met a criterion (format f8.2) for approx 5000 clinics. The clinics are nested in about 100 Agencies and the Agencies are nested in 7 Regions. Each Clinic ‘belongs’ to only 1 Agency and each Agency is a member of only 1 Region.

 

Currently the Clinic and Agency ID’s are strings, but could easily be changed to numeric values if that makes any difference.

 

In order to show graphically the variation in % of clients who meet a criterion both within and among Agencies, I want to do violin plots separately for each Region, that is, 7 separate charts, showing the data for each Agency in the Region. 

 

I have some syntax from a colleague for producing violin plots but it’s based on using R as a stand-alone product rather than as an extension of SPSS. Since I’m a newbie at R, I don’t know what to modify in the syntax for use as an SPSS extension.

 

Any help would be appreciated.

 

Any suggestions for learning R, especially as an SPSS extension, would also be appreciated.

 

Thanks.

Pat

Reply | Threaded
Open this post in threaded view
|

Re: need help creating violin plots

Cleland, Patricia (EDU)

A violin plot is a combination of a boxplot and a kernel density plot.  They are essentially pretty versions of box plots, where the width is set by the local density. For skewed distributions, you get things that look a bit like "violins", hence the name.

 

Here are links to some examples:

 

http://en.wikipedia.org/wiki/Violin_plot

 

http://www.statmethods.net/graphs/boxplot.html

 

http://www.r-bloggers.com/example-8-11-violin-plots/

 

http://www2.warwick.ac.uk/fac/sci/moac/degrees/modules/ch923/r_introduction/boxplot/

 

 

 

Pat

--------------------------------
Patricia Cleland, OCT
Senior Statistical and Research Analyst
Learning Environment Branch
Ministry of Education

15th Floor, Mowat Block
900 Bay Street
Toronto, Ontario
M7A 1L2

phone: 416-325-2697
fax:     416-325-4344

email: [hidden email]


From: ViAnn Beadle [mailto:[hidden email]]
Sent: February 9, 2011 2:55 PM
To: Cleland, Patricia (EDU); [hidden email]
Subject: RE: need help creating violin plots

 

What is a violin plot? Any examples online?

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Cleland, Patricia (EDU)
Sent: Wednesday, February 09, 2011 12:34 PM
To: [hidden email]
Subject: need help creating violin plots

 

I have data on the % of clients who met a criterion (format f8.2) for approx 5000 clinics. The clinics are nested in about 100 Agencies and the Agencies are nested in 7 Regions. Each Clinic ‘belongs’ to only 1 Agency and each Agency is a member of only 1 Region.

 

Currently the Clinic and Agency ID’s are strings, but could easily be changed to numeric values if that makes any difference.

 

In order to show graphically the variation in % of clients who meet a criterion both within and among Agencies, I want to do violin plots separately for each Region, that is, 7 separate charts, showing the data for each Agency in the Region. 

 

I have some syntax from a colleague for producing violin plots but it’s based on using R as a stand-alone product rather than as an extension of SPSS. Since I’m a newbie at R, I don’t know what to modify in the syntax for use as an SPSS extension.

 

Any help would be appreciated.

 

Any suggestions for learning R, especially as an SPSS extension, would also be appreciated.

 

Thanks.

Pat

Reply | Threaded
Open this post in threaded view
|

Re: need help creating violin plots

Thomas MacFarland
In reply to this post by Cleland, Patricia (EDU)

Everyone:

 

Attached is a R script on how to generate a violin plot.

 

They are certainly interesting, but too often they are not understood by the typical reader – at least not in higher education.

 

Let me know if you wish to receive the graphical images generated by this script.  I’ll shy away from including them in this message but I’ll be glad to send them separately if you do not use R.

 

Best wishes.

 

Tom

 

Exam_Score <- c(100,098,097,056,078,086,045,

                093,059,074,082,096,091,086,

                059,067,083,085,081,080,078,

                082,095,088,095)           

 

summary(Exam_Score)

 

par(ask=TRUE)      # Freeze the screen.    

boxplot(Exam_Score,

   horizontal=TRUE)

 

# You need to use the external violinmplot package and then 

# the violinmplot() function found in this external package.

 

install.packages("violinmplot")

library(violinmplot)          

 

# Note:  It is good R practice to use      

# package_name:::function_name() syntax    

# when using a function from an external   

# package, for future documentation        

# purposes.  This example is a bit different

# since the function name is violinmplot   

# and this function shares the same name   

# used for the package.                    

 

# However, an oddity of the violinmplot    

# package is that it does not use a        

# namespace so you only key the function   

# name and not the package name.           

 

par(ask=TRUE)      # Freeze the screen.    

violinmplot(Exam_Score,                    

   main="Violin Plot of Exam Scores")      

 

 

# Dr. Thomas W. MacFarland

# [hidden email]

# Feb-09-11

 

 

 

 

-----

Thomas W. MacFarland, Ed.D.
Senior Research Associate; Institutional Effectiveness and Associate Professor
Nova Southeastern University
Voice 954-262-5395  Fax 954-262-3970  [hidden email]

 


From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Cleland, Patricia (EDU)
Sent: Wednesday, February 09, 2011 2:34 PM
To: [hidden email]
Subject: need help creating violin plots

 

I have data on the % of clients who met a criterion (format f8.2) for approx 5000 clinics. The clinics are nested in about 100 Agencies and the Agencies are nested in 7 Regions. Each Clinic ‘belongs’ to only 1 Agency and each Agency is a member of only 1 Region.

 

Currently the Clinic and Agency ID’s are strings, but could easily be changed to numeric values if that makes any difference.

 

In order to show graphically the variation in % of clients who meet a criterion both within and among Agencies, I want to do violin plots separately for each Region, that is, 7 separate charts, showing the data for each Agency in the Region. 

 

I have some syntax from a colleague for producing violin plots but it’s based on using R as a stand-alone product rather than as an extension of SPSS. Since I’m a newbie at R, I don’t know what to modify in the syntax for use as an SPSS extension.

 

Any help would be appreciated.

 

Any suggestions for learning R, especially as an SPSS extension, would also be appreciated.

 

Thanks.

Pat

Reply | Threaded
Open this post in threaded view
|

Re: need help creating violin plots

Jon K Peck
In reply to this post by Cleland, Patricia (EDU)
It is very easy to run a R program inside Statistics with the output appearing automatically in the Viewer.  Here is an example, assuming that  you have installed the R violinmplot package.  I've assumed that you use the regular SPSS techniques to select the cases in a region.  criterion is the percentage variable and agency, well, the agency.  Be careful to match the case of the actual variable names and the functions and parameters below, since everything in R is case sensitive.

begin program r.
library(violinmplot)
dta= spssdata.GetDataFromSPSS("criterion agency", missingValueToNA=TRUE)
violinmplot(criterion~agency, data=dta, horizontal=FALSE)
end program.

There are some chapters on using R in Statistics in the Programming and Data Managment Book downloadable from the SPSS Community (www.ibm.com/developerworks/spssdevcentral) that could help you get started.

HTH,

Jon Peck
Senior Software Engineer, IBM
[hidden email]
312-651-3435




From:        "Cleland, Patricia (EDU)" <[hidden email]>
To:        [hidden email]
Date:        02/09/2011 12:38 PM
Subject:        [SPSSX-L] need help creating violin plots
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




I have data on the % of clients who met a criterion (format f8.2) for approx 5000 clinics. The clinics are nested in about 100 Agencies and the Agencies are nested in 7 Regions. Each Clinic ‘belongs’ to only 1 Agency and each Agency is a member of only 1 Region.
 
Currently the Clinic and Agency ID’s are strings, but could easily be changed to numeric values if that makes any difference.
 
In order to show graphically the variation in % of clients who meet a criterion both within and among Agencies, I want to do violin plots separately for each Region, that is, 7 separate charts, showing the data for each Agency in the Region.  
 
I have some syntax from a colleague for producing violin plots but it’s based on using R as a stand-alone product rather than as an extension of SPSS. Since I’m a newbie at R, I don’t know what to modify in the syntax for use as an SPSS extension.
 
Any help would be appreciated.
 
Any suggestions for learning R, especially as an SPSS extension, would also be appreciated.
 
Thanks.

Pat

Reply | Threaded
Open this post in threaded view
|

Re: need help creating violin plots

Albert-Jan Roskam
Hi Patricia,
 
I can recommend "R for SAS and SPSS Users" by Robert A. Muenchen (Springer). You can look up Spss keywords and see the associated R script. www.statmethods.net also is a good site. There are many, many more pdf's about learning R out there. Be careful not to drown in all this information. Much of it is very, VERY esotheric.
 
Also, if you're going to use external libraries, be aware that many operations can be done in at least 5 ways. I'd try to stick with just one, even if it takes a couple of milisecs longer to calculate the solution. I mainly use the following packages: Hmisc, foreign, ggplot2, RODBC, stringr, R.utils, plyr, reshape.
 
Cheers!!
Albert-Jan


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



From: Jon K Peck <[hidden email]>
To: [hidden email]
Sent: Wed, February 9, 2011 11:24:01 PM
Subject: Re: [SPSSX-L] need help creating violin plots

It is very easy to run a R program inside Statistics with the output appearing automatically in the Viewer.  Here is an example, assuming that  you have installed the R violinmplot package.  I've assumed that you use the regular SPSS techniques to select the cases in a region.  criterion is the percentage variable and agency, well, the agency.  Be careful to match the case of the actual variable names and the functions and parameters below, since everything in R is case sensitive.

begin program r.
library(violinmplot)
dta= spssdata.GetDataFromSPSS("criterion agency", missingValueToNA=TRUE)
violinmplot(criterion~agency, data=dta, horizontal=FALSE)
end program.

There are some chapters on using R in Statistics in the Programming and Data Managment Book downloadable from the SPSS Community (www.ibm.com/developerworks/spssdevcentral) that could help you get started.

HTH,

Jon Peck
Senior Software Engineer, IBM
[hidden email]
312-651-3435




From:        "Cleland, Patricia (EDU)" <[hidden email]>
To:        [hidden email]
Date:        02/09/2011 12:38 PM
Subject:        [SPSSX-L] need help creating violin plots
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




I have data on the % of clients who met a criterion (format f8.2) for approx 5000 clinics. The clinics are nested in about 100 Agencies and the Agencies are nested in 7 Regions. Each Clinic ‘belongs’ to only 1 Agency and each Agency is a member of only 1 Region.
 
Currently the Clinic and Agency ID’s are strings, but could easily be changed to numeric values if that makes any difference.
 
In order to show graphically the variation in % of clients who meet a criterion both within and among Agencies, I want to do violin plots separately for each Region, that is, 7 separate charts, showing the data for each Agency in the Region.  
 
I have some syntax from a colleague for producing violin plots but it’s based on using R as a stand-alone product rather than as an extension of SPSS. Since I’m a newbie at R, I don’t know what to modify in the syntax for use as an SPSS extension.
 
Any help would be appreciated.
 
Any suggestions for learning R, especially as an SPSS extension, would also be appreciated.
 
Thanks.

Pat


Reply | Threaded
Open this post in threaded view
|

Re: need help creating violin plots

Albert-Jan Roskam
In reply to this post by Thomas MacFarland
Hello,

Re: your remark about package_name:::function_name()

I always use double, not triple colons. This also works in your example:
violinmplot::violinmplot(Exam_Score, main="Violin Plot of Exam Scores")

Triple colons are used to access library variables that are not designed to be accessed. See also:
http://stat.ethz.ch/R-manual/R-devel/library/base/html/ns-dblcolon.html
 
Cheers!!
Albert-Jan

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



From: Dr. Thomas W. MacFarland <[hidden email]>
To: [hidden email]
Sent: Wed, February 9, 2011 10:46:32 PM
Subject: Re: [SPSSX-L] need help creating violin plots

Everyone:

 

Attached is a R script on how to generate a violin plot.

 

They are certainly interesting, but too often they are not understood by the typical reader – at least not in higher education.

 

Let me know if you wish to receive the graphical images generated by this script.  I’ll shy away from including them in this message but I’ll be glad to send them separately if you do not use R.

 

Best wishes.

 

Tom

 

Exam_Score <- c(100,098,097,056,078,086,045,

                093,059,074,082,096,091,086,

                059,067,083,085,081,080,078,

                082,095,088,095)           

 

summary(Exam_Score)

 

par(ask=TRUE)      # Freeze the screen.    

boxplot(Exam_Score,

   horizontal=TRUE)

 

# You need to use the external violinmplot package and then 

# the violinmplot() function found in this external package.

 

install.packages("violinmplot")

library(violinmplot)          

 

# Note:  It is good R practice to use      

# package_name:::function_name() syntax    

# when using a function from an external   

# package, for future documentation        

# purposes.  This example is a bit different

# since the function name is violinmplot   

# and this function shares the same name   

# used for the package.                    

 

# However, an oddity of the violinmplot    

# package is that it does not use a        

# namespace so you only key the function   

# name and not the package name.           

 

par(ask=TRUE)      # Freeze the screen.    

violinmplot(Exam_Score,                    

   main="Violin Plot of Exam Scores")      

 

 

# Dr. Thomas W. MacFarland

# [hidden email]

# Feb-09-11

 

 

 

 

-----

Thomas W. MacFarland, Ed.D.
Senior Research Associate; Institutional Effectiveness and Associate Professor
Nova Southeastern University
Voice 954-262-5395  Fax 954-262-3970  [hidden email]

 


From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Cleland, Patricia (EDU)
Sent: Wednesday, February 09, 2011 2:34 PM
To: [hidden email]
Subject: need help creating violin plots

 

I have data on the % of clients who met a criterion (format f8.2) for approx 5000 clinics. The clinics are nested in about 100 Agencies and the Agencies are nested in 7 Regions. Each Clinic ‘belongs’ to only 1 Agency and each Agency is a member of only 1 Region.

 

Currently the Clinic and Agency ID’s are strings, but could easily be changed to numeric values if that makes any difference.

 

In order to show graphically the variation in % of clients who meet a criterion both within and among Agencies, I want to do violin plots separately for each Region, that is, 7 separate charts, showing the data for each Agency in the Region. 

 

I have some syntax from a colleague for producing violin plots but it’s based on using R as a stand-alone product rather than as an extension of SPSS. Since I’m a newbie at R, I don’t know what to modify in the syntax for use as an SPSS extension.

 

Any help would be appreciated.

 

Any suggestions for learning R, especially as an SPSS extension, would also be appreciated.

 

Thanks.

Pat