|
Does anyone know of any SPSS syntax that will generate a specifed number
of data pairs possessing a specifed correlation coefficient? I other words, I want to be able to generate n number of cases comprised of two variables, X and Y, in which the X and Y pairs have a Pearson product moment correlation coefficient equal to a specified value, r. I want to be able to change the specified n and r values to generate samples of different numbers of pairs having different correlation coefficients. Any suggestions will be greatly appreciated. |
|
Hi Jim
> Does anyone know of any SPSS syntax that will generate a specifed number > of data pairs possessing a specifed correlation coefficient? > > I other words, I want to be able to generate n number of cases > comprised of two variables, X and Y, in which the X and Y pairs have a > Pearson product moment correlation coefficient equal to a specified > value, r. > > I want to be able to change the specified n and r values to generate > samples of different numbers of pairs having different correlation > coefficients. > SPSS Knowledgebase has a solution for that: http://support.spss.com/Tech/Troubleshooting/ressearchdetail.asp?ID=24615 (*Resolution number*: 24615 *Created on:* Apr 5 2002 *Last Reviewed on:* Mar 26 2007 *Problem Subject*: Generating 2 variables with specified correlation *Problem Description*: I want to generate 2 normal random variables, Y1 and Y2, that have a correlation of exactly .75. How can I do this in SPSS?....) Regards, Marta |
|
In reply to this post by Jim Moffitt
Hi Jim:
This is a complete solution to your question: "I want to be able to generate n number of cases comprised of two variables, X and Y, in which the X and Y pairs have a Pearson product moment correlation coefficient equal to a specified value, r. I want to be able to change the specified n and r values to generate samples of different numbers of pairs having different correlation coefficients." DEFINE TWOCORRDATA(r=!TOKENS(1) /n=!TOKENS(1)). INPUT PROGRAM. LOOP id = 1 to !n. . COMPUTE r1 = NORMAL(1). . COMPUTE r2 = NORMAL(1). . END CASE. END LOOP. END FILE. END INPUT PROGRAM. PRESERVE. SET ERRORS=NONE RESULTS=NONE. FACTOR /VARIABLES r1 r2 /CRITERIA FACTORS(2) /EXTRACTION PC /ROTATION NOROTATE /SAVE REG(ALL orth) . RESTORE. COMPUTE y1 = orth1. COMPUTE y2 = !r*orth1 + SQRT(1-!r**2)*orth2. EXE. DELETE VARIABLES id r1 r2 orth1 orth2. !ENDDEFINE. * MACRO call (with n=10 & r=0.7) *. TWOCORRDATA r=0.7 n=10. Regards, Marta |
|
Marta,
What is the purpose of putting r1 and r2 through the factor procedure and then using using the factor scores to compute the correlation? I would have expected that you'd do the correlation operation as part of the data creation step. Gene Maguin |
|
Hi
Read resolution 24615 for technical details (use Guest both as user & password to login, BTW). http://support.spss.com/Tech/Troubleshooting/ressearchdetail.asp?ID=24615 > What is the purpose of putting r1 and r2 through the factor procedure and > then using using the factor scores to compute the correlation? I would have > expected that you'd do the correlation operation as part of the data > creation step. Regards, Marta |
|
In reply to this post by Maguin, Eugene
In order to wind up with a correlation that is exactly equal to the one desired, one has to start out with variables that correlate at exactly zero. The factor analyses necessarily produces orthogonal factors, so one wants use the factor scores instead of two variables of random values (which will correlate at nearly, but not exactly, zero).
- James Cantor > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]]On > Behalf Of > Gene Maguin > Sent: Thursday, September 13, 2007 12:34 PM > To: [hidden email] > Subject: Re: Generating Samples of Variables Having a > Specified r Value. > > > Marta, > > What is the purpose of putting r1 and r2 through the factor > procedure and > then using using the factor scores to compute the > correlation? I would have > expected that you'd do the correlation operation as part of the data > creation step. > > Gene Maguin > |
|
In reply to this post by Marta Garcia-Granero
Marta:
Thank you very much. This is exactly what I need. Jim -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Marta Garcia-Granero Sent: Thursday, September 13, 2007 11:15 AM To: [hidden email] Subject: Re: Generating Samples of Variables Having a Specified r Value. Hi Jim: This is a complete solution to your question: "I want to be able to generate n number of cases comprised of two variables, X and Y, in which the X and Y pairs have a Pearson product moment correlation coefficient equal to a specified value, r. I want to be able to change the specified n and r values to generate samples of different numbers of pairs having different correlation coefficients." DEFINE TWOCORRDATA(r=!TOKENS(1) /n=!TOKENS(1)). INPUT PROGRAM. LOOP id = 1 to !n. . COMPUTE r1 = NORMAL(1). . COMPUTE r2 = NORMAL(1). . END CASE. END LOOP. END FILE. END INPUT PROGRAM. PRESERVE. SET ERRORS=NONE RESULTS=NONE. FACTOR /VARIABLES r1 r2 /CRITERIA FACTORS(2) /EXTRACTION PC /ROTATION NOROTATE /SAVE REG(ALL orth) . RESTORE. COMPUTE y1 = orth1. COMPUTE y2 = !r*orth1 + SQRT(1-!r**2)*orth2. EXE. DELETE VARIABLES id r1 r2 orth1 orth2. !ENDDEFINE. * MACRO call (with n=10 & r=0.7) *. TWOCORRDATA r=0.7 n=10. Regards, Marta |
|
In reply to this post by Jim Moffitt
A side comment:
At 11:43 AM 9/13/2007, Jim Moffitt wrote: >>Does anyone know of any SPSS syntax that will generate a specifed >>number of data pairs possessing a specifed correlation coefficient? At 12:15 PM 9/13/2007, Marta Garcia-Granero responded, giving a complete solution that I won't quote here. Briefly, its logic is to calculate two (it could be more) probabilistically independent random variables; convert using FACTOR to get two random variables uncorrelated *in the sample*; and and then transform those, to get the desired correlation. (I wouldn't have thought of FACTOR. You can do the same thing with REGRESSION, since the regression residuals are uncorrelated with all independent variables; but to get n uncorrelated variables, REGRESSION must be run n-1 times.) So, that's one excellent approach. Here's another approach, that may be preferable for some simulations: Transform the original random variables, without reducing them to zero correlation in the sample, by FACTOR or any other method. That gives two variables with the desired correlation *in the population*, with the observed correlation in the sample a random variable around this value. As I say, this may be more representative of real-world instances. Oh, talking correlations *in the population*, when k pairs have been generated and there's no other comparable data? Well, I advocate, and take, the view that the population is all instances that *could* be generated with the same underlying method. (This makes the population infinite. I've no objection to that.) |
| Free forum by Nabble | Edit this page |
