SPSSX Discussion

Distribution function question

Classic

List

Threaded

3 messages Options

Maguin, Eugene

Distribution function question

All,

I'm doing a little data simulation project and have discovered that spss now
has a bivariate normal function having a specifiable correlation. My
question is about what is returned from a call to this function. I'm
thinking that a single call should return two random variates whose
population correlation is the specified value. I think that because if two
variates were returned then the variables containing them could be
correlated. But the documentation doesn't say anything which leads me to
think that one variate is returned. If only one variate is returned per
call, then it seems to me that I couldn't create two variables having the
specified correlation.

Am I missing something? If so, what? Either way, it seems to me that the
documentation needs to be fixed for this function.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

SPSS Support

Re: Distribution function question

Hi Gene,

The functions available for the BVNOR (bivariate normal) distribution are the CDF.BVNOR (cumulative distribution function) and PDF.BVNOR (probability density function).The RV.<dist> function, which generates a random variable with the specified distribution, is not available for the bivariate normal distribution.

CDF.BVNOR. CDF.BVNOR(quant1, quant2, corr). Numeric. Returns the cumulative probability that a value from the standard bivariate normal distribution, with the given correlation parameter, will be less than quant1 and quant2.

PDF.BVNOR. PDF.BVNOR(quant1, quant2, corr). Numeric. Returns the probability density of the standard bivariate normal distribution, with the given correlation parameter, at quant1, quant2.

Which documentation item were you consulting?

I've pasted a pair of resolutions below that show you how to generate bivariate normal and multivariate normal distributions with specified correlations or covariances.

David Matheson
SPSS Statistical Support

*****************************
Resolution number: 24615 Created on: Apr 5 2002 Last Reviewed on: Mar 26 2007

Problem Subject: Generating 2 variables with specified correlation

Problem Description: I want to generate 2 normal random variables, Y1 and Y2, that have a correlation of exactly .75. How can I do this in SPSS?

Resolution Description:
You can start by generating 2 standard normal variables that are perfectly orthogonal, i.e. have means of 0, variances of 1 and a correlation of 0. Suppose these variables are called ORTH1 and ORTH2. Y1 can be a linear transformation of ORTH1, i.e. multiply ORTH1 by the desired standard deviation (SD) for Y1 and then add the desired mean for Y1. Y2 is computed by adding weighted copies of ORTH1 and ORTH2 so that the variance of this sum equals 1. The variance of the sum of 2 variables, W and X, equals:

VARIANCE(W+X) = VARIANCE(W) + VARIANCE(X) + 2*COVARIANCE(W,X)

IF W = .75*ORTH1, then
VARIANCE(W) = VARIANCE(ORTH1)*(.75**2)= .75**2
** is the power operator, so .75**2 is .75 to the power of 2, or .75 squared.
Likewise, if X = SQRT(1-.75**2)*ORTH2, then
VARIANCE(X) = VARIANCE(ORTH2)*(1-.75**2) = (1-.75**2)

If ORTH1 and ORTH2 are uncorrelated, so are W and X. Therefore,
VARIANCE(W+X) = .75**2 + (1-.75**2) + 0 = 1
and .75**2 of the variance of W+X is accounted for by W. The correlation of W with W+X is therefore .75, as are the correlations of ORTH1 with W+X and Y1 with W+X. Y2 is therefore computed as a linear transformation of W+X to give the desired mean and SD.

The above method is implemented in SPSS commands as follows:
1. Generate 2 random variables, R1 and R2, from standard normal distributions with COMPUTE commands. Although they are drawn from Normal(0,1) distributions, R1 and R2 as generated in your sample are not guaranteed to have means of 0, SDs of 1.0, or a correlation of exactly 0. These sample statistics will approach those population values as the sample size increases.
2. To arrive at 2 orthogonal random variables with means and SDs of exactly 0 and 1.0, respectively, perform a principal components analysis of R1 and R2, saving two principal components as ORTH1 and ORTH2.
3. Compute Y1 as a linear transformation of ORTH1 and Y2 as a linear transformation of a weighted sum of ORTH1 and ORTH2, as described above.
The SPSS commands to implement these steps are as follows. The following commands assume that there is already an active data file, i.e. data have been read from a file or entered into the Data Editor. Y1 and Y2 are added as new variables in this data set.

COMPUTE r1 = NORMAL(1).
COMPUTE r2 = NORMAL(1).
FACTOR
/VARIABLES r1 r2
/CRITERIA FACTORS(2)
/EXTRACTION PC
/ROTATION NOROTATE
/SAVE REG(ALL orth) .
* Placing 'orth' after ALL in the FACTOR /SAVE subcommand
* requests that 'orth' be the stem of the component names.

* Y1 will have a mean of 50 and SD of 10.
* Y2 will have a mean of 100 and SD of 15 .
COMPUTE y1 = 10*orth1 + 50.
COMPUTE y2 = 15*(.75*orth1 + SQRT(1 - .75**2)*orth2) + 100.
* To check the means, SDs, correlation for Y1 and Y2 .
CORRELATIONS
/VARIABLES=y1 y2
/STATISTICS DESCRIPTIVES .

You can perform all of the above steps in the graphic user interface (GUI), i.e. the menus and dialog boxes. The COMPUTE facility is available from the Data Editor under Transform->Compute. The Factor procedure is available from any SPSS window under Analyze->Data Reduction->Factor. Principal components analysis is the default extraction method. To specify 2 components, click the Extraction button in the main Factor Analysis dialog box. Then click the radio button beside "Number of Factors" and enter 2 in the box to its right. To request that component scores be saved, click the Score button in the main Factor Analysis dialog. Then check the "Save as variables" box in the Factor Scores dialog. To control the name of the component variables, you will need to paste the Factor syntax and add the name stem to the /SAVE subcommand, as in
/SAVE REG(ALL orth)
above. If FACTOR is run from the GUI, the component scores will have names like FAC1_1 and FAC2_1. You could rename these in the Data Editor or use those default names in the final COMPUTE expressions for Y1 and Y2.
The CORRELATIONS procedure is available from any SPSS window from
Analyze->Correlate->Bivariate.

If there is no data file to which these variables are to be added, you can create the cases and initial variables in an INPUT PROGRAM to be run in a syntax window. The remaining steps could be run from either the syntax window or GUI. For example, the following commands generate 300 cases and variables Y1 and Y2 with the means, SDs, and correlation as described above. See the Syntax Reference Guide for your SPSS version for a discussion of Input Programs.

INPUT PROGRAM.
LOOP id = 1 to 300.
COMPUTE r1 = NORMAL(1).
COMPUTE r2 = NORMAL(1).
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
EXECUTE.

FACTOR
/VARIABLES r1 r2
/CRITERIA FACTORS(2) ITERATE(25)
/EXTRACTION PC
/ROTATION NOROTATE
/SAVE REG(ALL orth) .
* Placing 'orth' after ALL in the FACTOR /SAVE subcommand
* requests that 'orth' be the stem of the component names.

* Y1 will have a mean of 50 and SD of 10.
* Y2 will have a mean of 100 and SD of 15 .
COMPUTE y1 = 10*orth1 + 50.
COMPUTE y2 = 15*(.75*orth1 + SQRT(1 - .75**2)*orth2) + 100.
* To check the means, SDs, correlation for Y1 and Y2 .
CORRELATIONS
/VARIABLES=y1 y2
/STATISTICS DESCRIPTIVES .

See resolution 19178 ("Generating MVN data with a specific covariance matrix"), for help in generating 3 or more variables with a user-specified covariance matrix.

***************************
Resolution number: 19178 Created on: Dec 9 1996 Last Reviewed on: May 26 2009

Problem Subject: Generating MVN data with a specific covariance matrix

Problem Description: How can I generate data which are multivariate normal and have a covariance or correlation matrix that I specify?

Resolution Subject: Using Matrix Language to generate the variables with specified covariances/correlations

Resolution Description:
This job can be performed in SPSS (versions 4.0 and above) with the MATRIX command language of SPSS, which is part of the SPSS syntax command language. The general algorithm is described in

Rubinstein, R. (1981), "Simulation and the Monte Carlo Method", New York: Wiley.

The basic strategy is:

1. Before entering MATRIX, generate the required number of normally-distributed variables by using compute commands. These variables should be multivariate normal (MVN) with correlations approaching 0, i.e. approaching independence.

2. To compute a set of variables which are multivariate normal and strictly uncorrelated, run the FACTOR procedure with the variables created in Step 1. Use the default extraction method of principal components, extracting as many components as you included in the analysis. Save these components to a new set of variables with the FACTOR /SAVE subcommand. These new variables will be MVN and uncorrelated, with means of 0 and variances of 1.

3. Enter the MATRIX procedure and read the set of standard normal variables from Step 2 as a matrix, Z for example. Then define and enter the target covariance matrix, S.

4. Calculate the Cholesky factor for the target covariance matrix. The Cholesky factor is an upper triangular matrix which is the "square root" of the covariance matrix. If V is the Cholesky factor of matrix S, and V' is the transpose of V, then V'*V=S , where * is the matrix multiplication operator.

5. Post-multiply the uncorrelated standard normals, Z, by the Cholesky factor, V, to give a new data matrix, X. If the target matrix was a correlation matrix and you want the new variables to have a standard deviation other than 1, then multiply X by the desired SD. If you want the new variables to have a nonzero mean, add the desired mean to X (but not before multiplication by SD).

If the target covariance matrix is not a correlation matrix, then the new variables will have variances equal to their respective diagonal elements in the target matrix. Their means will be 0. The desired mean can be added to the variables with a compute statement, either before or after leaving MATRIX.

6. Save the X matrix to variables in the SPSS active file and leave MATRIX.

An example of the implementation of this algorithm is shown below. The target matrix is a correlation matrix for the five variables to be generated. I added some statements to print the determinant and the
condition number of the input covariance matrix to ensure that it was not singular or ill-conditioned. I also printed the product of V'*V to ensure that the target matrix was reproduced. All the new variables were scaled to have means of 100 and standard deviations of 15 before leaving the MATRIX language.

input program.
+ loop #i = 1 to 10000.
+ do repeat response=r1 to r5 .
+ compute response = normal(1) .
+ end repeat.
+ end case.
+ end loop.
+ end file.
end input program.

correlations r1 to r5 / statistics=descriptives.

* Factor procedure computes pr1 to pr5, which are standard MVN .
factor variables = r1 to r5 / print = default det
/criteria = factors(5) /save=reg (all,pr).

* use matrix to set corr matrix.
* x is a 10,000 by 5 matrix of uncorrelated standard normals .
* cor is the target covariance matrix.
* cho is the Cholesky factor of cor .
* newx is the 10,000 by 5 data matrix which has target covariance matrix .

matrix.
get x / variables=pr1 to pr5.
compute cor={1, 0.4, 0.3, 0.2, 0.1 ;
0.4, 1, 0.4, 0.3, 0.2 ;
0.3, 0.4, 1, 0.4, 0.3 ;
0.2, 0.3, 0.4, 1, 0.4 ;
0.1, 0.2, 0.3, 0.4, 1 }.
compute deter=det(cor).
print deter / title "determinant of corr matrix" / format=f10.7 .
print sval(cor) / title "singular value decomposition of corr".
print eval(cor) / title "eigenvalues of input corr".

* In a symmetric matrix sval and eigenvalues are identical - choose 1 .

compute condnum=mmax(sval(cor))/mmin(sval(cor)).
print condnum / title "condition number of corr matrix" / format=f10.2 .
compute cho=chol(cor).
print cho / title "cholesky factor of corr matrix" .
compute chochek=t(cho)*cho.
print chochek / title "chol factor premult by its transpose " /format=f10.2 .
compute newx=x*cho.
compute newx=newx*15 + 100.
save newx /outfile=* /variables= nr1 to nr5.
end matrix.

correlations nr1 to nr5 / statistics=descriptives.

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Gene Maguin
Sent: Thursday, June 11, 2009 8:48 AM
To: [hidden email]
Subject: Distribution function question

All,

I'm doing a little data simulation project and have discovered that spss now
has a bivariate normal function having a specifiable correlation. My
question is about what is returned from a call to this function. I'm
thinking that a single call should return two random variates whose
population correlation is the specified value. I think that because if two
variates were returned then the variables containing them could be
correlated. But the documentation doesn't say anything which leads me to
think that one variate is returned. If only one variate is returned per
call, then it seems to me that I couldn't create two variables having the
specified correlation.

Am I missing something? If so, what? Either way, it seems to me that the
documentation needs to be fixed for this function.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Maguin, Eugene

Re: Distribution function question

David,

Thank you for your quick reply. And, thank you for the addition information
you sent.

I'm (was) interested in using the RV.BVNOR function, which I actually saw by
accident as I was looking for something else. I was looking at page 72 of
the v16 CSR.
I'm creating data on a case-by-case basis, so I normally use rv.normal and
compute correlations as regression equations. I thought that BVNOR would be
a short-cut. I was wrong. I saw and now see the qualification in the
Function paragraph. Issue is done.

Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD