SPSSX Discussion

Bivariate Normal Distribution

Classic

List

Threaded

15 messages Options

drfg2008

Bivariate Normal Distribution

SPSS 20, 64bit

We have two bivariate N~ distributed (discrete) random variables, where you have variable_X and variable_Y . Both variables are not correlated (but could be correlated in some cases).
How can I determine the probability for all cases where variable_X is (one point or more) greater than variable_Y? (lower triangular matrix)
We try to develop a python script with the function CDF.BVNOR(quant,quant,corr), summing up the lower triangular matrix. However, the solution is complicated and time consuming. Isn’t there a straight forward solution?

Our data looks like this

input program.
loop a =1 to 10**6 by 1.
end case.
end loop.
end file.
end input program.
EXECUTE.

COMPUTE variable_X=RND(RV.NORMAL(100,10)).
COMPUTE variable_Y=RND(RV.NORMAL(100,10)).
EXECUTE.

FORMATS variable_X variable_Y (F8.0).
DELETE VARIABLES a.

DO IF variable_X > variable_Y.
COMPUTE result = 1.
ELSE.
COMPUTE result = 0.
END IF.
EXECUTE.

FREQUENCIES result.

Dr. Frank Gaeth

David Marso

Re: Bivariate Normal Distribution

Administrator

Considering: (from the FM).
Two variables with a bivariate normal(ρ) distribution with correlation ρ have
marginal normal distributions with a mean of 0 and a standard deviation of 1."
I am not sure that your RV.NORMAL(100,10) have any appropriate place here?
Also not terribly sure what you are really attempting to achieve with this?

--

drfg2008 wrote

SPSS 20, 64bit

We have two bivariate N~ distributed (discrete) random variables, where you have variable_X and variable_Y . Both variables are not correlated (but could be correlated in some cases).
How can I determine the probability for all cases where variable_X is (one point or more) greater than variable_Y? (lower triangular matrix)
We try to develop a python script with the function CDF.BVNOR(quant,quant,corr), summing up the lower triangular matrix. However, the solution is complicated and time consuming. Isn’t there a straight forward solution?

Our data looks like this

input program.
loop a =1 to 10**6 by 1.
end case.
end loop.
end file.
end input program.
EXECUTE.

COMPUTE variable_X=RND(RV.NORMAL(100,10)).
COMPUTE variable_Y=RND(RV.NORMAL(100,10)).
EXECUTE.

FORMATS variable_X variable_Y (F8.0).
DELETE VARIABLES a.

DO IF variable_X > variable_Y.
COMPUTE result = 1.
ELSE.
COMPUTE result = 0.
END IF.
EXECUTE.

FREQUENCIES result.

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

David Marso

Re: Bivariate Normal Distribution

Administrator

In reply to this post by drfg2008

Since you complain that this code is time consuming why not program it more efficiently?
EXECUTE is a total waste of resources.
---
input program.
loop #a =1 to 10**6 .
COMPUTE result = RND(RV.NORMAL(100,10) ) - RND(RV.NORMAL(100,10) ) GT 1.
end case.
end loop.
end file.
end input program.
FREQUENCIES result.

drfg2008 wrote

SPSS 20, 64bit

We have two bivariate N~ distributed (discrete) random variables, where you have variable_X and variable_Y . Both variables are not correlated (but could be correlated in some cases).
How can I determine the probability for all cases where variable_X is (one point or more) greater than variable_Y? (lower triangular matrix)
We try to develop a python script with the function CDF.BVNOR(quant,quant,corr), summing up the lower triangular matrix. However, the solution is complicated and time consuming. Isn’t there a straight forward solution?

Our data looks like this

input program.
loop a =1 to 10**6 by 1.
end case.
end loop.
end file.
end input program.
EXECUTE.

COMPUTE variable_X=RND(RV.NORMAL(100,10)).
COMPUTE variable_Y=RND(RV.NORMAL(100,10)).
EXECUTE.

FORMATS variable_X variable_Y (F8.0).
DELETE VARIABLES a.

DO IF variable_X > variable_Y.
COMPUTE result = 1.
ELSE.
COMPUTE result = 0.
END IF.
EXECUTE.

FREQUENCIES result.

David Marso

Re: Bivariate Normal Distribution

Administrator

*AND* generating 1000000 cases is also a waste of resources!
consider:
input program.
+ loop x=1 to 1.
+ end case.
+ end loop.
+ end file.
end input program.
compute gt1=0.
loop #=1 to 1000000.
+ compute gt1=SUM(GT1,((RV.NORMAL(100,10)-RV.NORMAL(100,10)) GT 1)).
end loop.
compute pgt1=gt1/1000000.
print /pgt1 (F10.6).
exe.

David Marso wrote

Since you complain that this code is time consuming why not program it more efficiently?
EXECUTE is a total waste of resources.
---
input program.
loop #a =1 to 10**6 .
COMPUTE result = RND(RV.NORMAL(100,10) ) - RND(RV.NORMAL(100,10) ) GT 1.
end case.
end loop.
end file.
end input program.
FREQUENCIES result.

drfg2008 wrote

SPSS 20, 64bit

We have two bivariate N~ distributed (discrete) random variables, where you have variable_X and variable_Y . Both variables are not correlated (but could be correlated in some cases).
How can I determine the probability for all cases where variable_X is (one point or more) greater than variable_Y? (lower triangular matrix)
We try to develop a python script with the function CDF.BVNOR(quant,quant,corr), summing up the lower triangular matrix. However, the solution is complicated and time consuming. Isn’t there a straight forward solution?

Our data looks like this

input program.
loop a =1 to 10**6 by 1.
end case.
end loop.
end file.
end input program.
EXECUTE.

COMPUTE variable_X=RND(RV.NORMAL(100,10)).
COMPUTE variable_Y=RND(RV.NORMAL(100,10)).
EXECUTE.

FORMATS variable_X variable_Y (F8.0).
DELETE VARIABLES a.

DO IF variable_X > variable_Y.
COMPUTE result = 1.
ELSE.
COMPUTE result = 0.
END IF.
EXECUTE.

FREQUENCIES result.

Bruce Weaver

Re: Bivariate Normal Distribution

Administrator

We are not worthy! ;-)

David Marso wrote

*AND* generating 1000000 cases is also a waste of resources!
consider:
input program.
+ loop x=1 to 1.
+ end case.
+ end loop.
+ end file.
end input program.
compute gt1=0.
loop #=1 to 1000000.
+ compute gt1=SUM(GT1,((RV.NORMAL(100,10)-RV.NORMAL(100,10)) GT 1)).
end loop.
compute pgt1=gt1/1000000.
print /pgt1 (F10.6).
exe.

David Marso wrote

Since you complain that this code is time consuming why not program it more efficiently?
EXECUTE is a total waste of resources.
---
input program.
loop #a =1 to 10**6 .
COMPUTE result = RND(RV.NORMAL(100,10) ) - RND(RV.NORMAL(100,10) ) GT 1.
end case.
end loop.
end file.
end input program.
FREQUENCIES result.

drfg2008 wrote

SPSS 20, 64bit

We have two bivariate N~ distributed (discrete) random variables, where you have variable_X and variable_Y . Both variables are not correlated (but could be correlated in some cases).
How can I determine the probability for all cases where variable_X is (one point or more) greater than variable_Y? (lower triangular matrix)
We try to develop a python script with the function CDF.BVNOR(quant,quant,corr), summing up the lower triangular matrix. However, the solution is complicated and time consuming. Isn’t there a straight forward solution?

Our data looks like this

input program.
loop a =1 to 10**6 by 1.
end case.
end loop.
end file.
end input program.
EXECUTE.

COMPUTE variable_X=RND(RV.NORMAL(100,10)).
COMPUTE variable_Y=RND(RV.NORMAL(100,10)).
EXECUTE.

FORMATS variable_X variable_Y (F8.0).
DELETE VARIABLES a.

DO IF variable_X > variable_Y.
COMPUTE result = 1.
ELSE.
COMPUTE result = 0.
END IF.
EXECUTE.

FREQUENCIES result.

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Richard Ristow

Re: Bivariate Normal Distribution

In reply to this post by drfg2008

At 10:37 AM 10/4/2012, drfg2008 wrote:

>We have two bivariate N~ distributed (discrete) random variables,
>where you have variable_X and variable_Y . Both variables are not
>correlated (but could be correlated in some cases).

First, terminology. Your code has

>COMPUTE variable_X=RND(RV.NORMAL(100,10)).
>COMPUTE variable_Y=RND(RV.NORMAL(100,10)).

Those are actually univariate variables; the two together can be
taken as a bivariate random variable. The two are independent, and
uncorrelated, in the (conceptually infinite) population they're drawn
from, although any sample will show a non-zero correlation because of
random variation. And they're not normally distributed, because of
the RND function; they have a discrete distribution that is similar
to, but is not, the normal distribution.

>How can I determine the probability for all cases where variable_X
>is (one point or more) greater than variable_Y?

Can't that be done in closed form, a priori? If X and Y are
independent, normally distributed, with the same mean and standard
deviation, then X-Y is normally distributed, with mean 0 and standard
deviation SQRT(2) times their common standard deviation: in your
case, 10*SQRT(2). The tables for the normal distribution can give you
the probability that that's greater than 1.

Or, is that not what you were looking for?

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall

Re: Bivariate Normal Distribution

In reply to this post by David Marso

Wonderful!!!

Art Kendall
Social Research Consultants

On 10/4/2012 4:42 PM, David Marso wrote:

Since you complain that this code is time consuming why not program it more
efficiently?
EXECUTE is a total waste of resources.
---
input program.
loop #a =1 to 10**6 .
COMPUTE result = RND(RV.NORMAL(100,10) ) - RND(RV.NORMAL(100,10) )  GT 1.
end case.
end loop.
end file.
end input program.
FREQUENCIES result.


drfg2008 wrote

SPSS 20, 64bit

We have two  bivariate N~ distributed (discrete) random variables, where
you have variable_X and variable_Y . Both variables are not correlated
(but could be correlated in some cases).
How can I determine the probability for all cases where  variable_X is
(one point or more) greater than  variable_Y? (lower triangular matrix)
We try to develop a python script with the function
CDF.BVNOR(quant,quant,corr), summing up the lower triangular matrix.
However, the solution is complicated and time consuming.  Isn’t there a
straight forward solution?

Our data looks like this

input program.
loop a =1 to 10**6 by 1.
end case.
end loop.
end file.
end input program.
EXECUTE.

COMPUTE variable_X=RND(RV.NORMAL(100,10)).
COMPUTE variable_Y=RND(RV.NORMAL(100,10)).
EXECUTE.

FORMATS variable_X variable_Y (F8.0).
DELETE VARIABLES a.

DO IF variable_X > variable_Y.
COMPUTE result = 1.
ELSE.
COMPUTE result = 0.
END IF.
EXECUTE.

FREQUENCIES result.





-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Bivariate-Normal-Distribution-tp5715479p5715486.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Art Kendall
Social Research Consultants

Art Kendall

Re: Bivariate Normal Distribution

In reply to this post by Richard Ristow

Seeing David's and Richard's posts reminds me of the value of presenting/teaching the same idea in two ways.

Richard's post presents the theoretical perspective.

David's post show the value of simulation in reinforcing ideas.

Those who teach might want to put both in their pedagogical toolboxes.

Art Kendall
Social Research Consultants

On 10/5/2012 10:13 AM, Richard Ristow wrote:

At 10:37 AM 10/4/2012, drfg2008 wrote:

We have two bivariate N~ distributed (discrete) random variables,
where you have variable_X and variable_Y . Both variables are not
correlated (but could be correlated in some cases).

First, terminology. Your code has

COMPUTE variable_X=RND(RV.NORMAL(100,10)).
COMPUTE variable_Y=RND(RV.NORMAL(100,10)).

Those are actually univariate variables; the two together can be
taken as a bivariate random variable. The two are independent, and
uncorrelated, in the (conceptually infinite) population they're drawn
from, although any sample will show a non-zero correlation because of
random variation. And they're not normally distributed, because of
the RND function; they have a discrete distribution that is similar
to, but is not, the normal distribution.

How can I determine the probability for all cases where variable_X
is (one point or more) greater than variable_Y?

Can't that be done in closed form, a priori? If X and Y are
independent, normally distributed, with the same mean and standard
deviation, then X-Y is normally distributed, with mean 0 and standard
deviation SQRT(2) times their common standard deviation: in your
case, 10*SQRT(2). The tables for the normal distribution can give you
the probability that that's greater than 1.

Or, is that not what you were looking for?

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Art Kendall
Social Research Consultants

drfg2008

Re: Bivariate Normal Distribution

In reply to this post by Art Kendall

Thank you.

Yes, a N~ distributed RV can't be discrete, etc. etc. Yes, nice video.

No, this is not what I meant.

Let's put it that way: Two handball clubs play against each other. We know that the points (goals) of team1 and team2 are of a distribution (each team) that reminds some statisticians of a N~distribution (KSO: p>0.05). So they take the N~distr. as a good approximation, although the distribution is descrete, of course. The points (goals) team1 and team2 achive are not correlated (not sig.). Therefor it is assumed r=0. We know the expected value for the average number of points (goals) for team1 and team2 separately (when they play against each other). Plus, we know the stdv. for the two distributions (bivariate Normal distribution as an approximation).

(Other -discrete- distributions might be better suited. Any suggestions welcome. Especially how to do compute with SPSS. Bivariate Poisson with/without intercorrelation and over/underdispersion has not yet been implemented, I guess)

Now the problem is how to compute the probability of the "1-Goal-Handycap" for team2. That is, when team1 does not only win against team2, but wins with more (!) than one goal in advance (i. e. 0:[2 or higher], 1:[3 or higher], 2:[4:or higher] )

Thanks

Dr. Frank Gaeth

Richard Ristow

Re: Bivariate Normal Distribution

At 01:58 PM 10/5/2012, drfg2008 wrote:

>Let's put it that way: Two handball clubs play against each other.
>The points (goals) team1 and team2 achieve are not correlated (not sig.).

You know your problem best, so I'll just have to accept this.
However, I suspect that there are correlations in a lot of sports:
how well a team's offense is doing affects how well its defense is
doing, if only by requiring the defense to be active for a greater or
lesser part of the game.

>We know that the points (goals) of team1 and team2 are of a
>distribution (each team) that reminds some statisticians of a
>N~distribution (KSO: p>0.05). So they take the N~distr. as a good
>approximation, although the distribution is discrete, of course.

Have you thought about approximating by a binomial distribution,
instead? A binomial distribution has, at least, the advantages of
being inherently discrete, and of never taking on negative values.
And binomial distributions (for highish values of n) look pretty much
like the normal distribution, which may have influenced your statisticians.

Anyway, you say that for each team you have an estimate of the
expected number of points, and the standard deviation If team 1 has
E1 expected goals with S1 standard deviation (so V1 variance, where
V1=S1**2), the corresponding binomial distribution has parameters
B(n1,p1), where p1=1+V1/E1, and n1=E1/p1 (round to an integer). And
the same for team 2.

Now you have two independent binomially distributed random variables,
B1(n1,p1) and B2(n2,p2). You want the probability that

B2 >= Bq1+1.

I don't see a closed-form solution for this, but I don't think it
needs to be simulated, either. I think the quantity you want can be
found in an SPSS loop:

COMPUTE Prob1up = 0.
LOOP #i = 0 TO n1.
. Compute Prob1up = PDF.BINOM(n1,p1,#i)
* (1-CDF.BINOM(n2,p2,#i).
END LOOP.

I'm afraid I'm tossing this off without testing; apologies, for any errors.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

drfg2008

Re: Bivariate Normal Distribution

Thanks!

(Yes, you're right. In some cases there is a correlation (and an underdispersion -> Poisson). That's why I wanted to use the biv. N~ distribution implemented in SPSS. It includes a correlation parameter and can handle underdispersion.)

Dr. Frank Gaeth

Automatic reply: Bivariate Normal Distribution

New Mexico state offices are closed Monday October 8. Please expect a reply on Tuesday. Regards.

Richard Ristow

Re: Bivariate Normal Distribution

In reply to this post by drfg2008

At 03:50 AM 10/6/2012, drfg2008 wrote:

>(Yes, you're right. In some cases there is a correlation (and an
>underdispersion -> Poisson). That's why I wanted to use the biv.
>N~distribution implemented in SPSS. It includes a correlation
>parameter and can handle underdispersion.)

Fair enough. I still don't think you need to do simulation; it should
be a fairly simple numerical-integration exercise to get what
proportion of the bivariate distribution is in the range Y>X+1.

You write, here, "In some cases there is a correlation" (between
points scored by the two teams). I'll mention that at 01:58 PM
10/5/2012, you wrote:

>Two handball clubs play against each other. The points (goals) team1
>and team2 achieve are not correlated (not sig.).

I hope that's not a confusion.

>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

drfg2008

Re: Bivariate Normal Distribution

This post was updated on .

Fair enough. I still don't think you need to do simulation; it should
be a fairly simple numerical-integration exercise to get what
proportion of the bivariate distribution is in the range Y>X+1.

Yes, it is a fairly simple numerical-integration exercise (i. e. with Mathematica). But how to do that with SPSS? That's our problem.

[as an example for a bivariate N~distr.: Y ~N (µ=15.73, s= 7.24) , X ~N (µ=14.17, s= 8.19), r(X,Y) = -0.047,

what proportion of the bivariate distribution is in the range (Y>X+1) =? ]

Frank

Dr. Frank Gaeth

little_mo

Re: Bivariate Normal Distribution

This post was updated on .

Hi, I am working on handball too and I came accross this post. I have the following problem, I want to make a handball live (in-play) pricing model, and I am trying to find the best distribution for my calculations. The problem is that since one team is expected to score x goals, this x tends to 0 as time tends towards the end. Suppose that someone uses normal distribution. What happends with the standard deviation? Where does this converge to? And what about negative values. Lets say home team is expected to score 27.5 goals, negative values are not significant, but 5 minutes before the end, with 2.5 goals expectancy remaining, apparently a large "amount of probability" comes from negative values, which dont make sence. On the other hand, lets assume binomial distribution, and that the probability of scoring is about 50% for each team (1 goal every 2 offences). Since binomial distribution uses integers (positive ones) if I have 3.3 of 3.7 goals expectany at some point, I cannot feed formula with this, So I would have to use round up/down, and thats not convinient too. I tried continuoys correction, to convert binomial to normal, but at some point the rule np>5, nq>5 does not stand. So, does anyone have any idea?? thanks