Bivariate Normal Distribution

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Bivariate Normal Distribution

drfg2008
SPSS 20, 64bit

We have two  bivariate N~ distributed (discrete) random variables, where you have variable_X and variable_Y . Both variables are not correlated (but could be correlated in some cases).
How can I determine the probability for all cases where  variable_X is (one point or more) greater than  variable_Y? (lower triangular matrix)
We try to develop a python script with the function CDF.BVNOR(quant,quant,corr), summing up the lower triangular matrix. However, the solution is complicated and time consuming.  Isn’t there a straight forward solution?

Our data looks like this

input program.
loop a =1 to 10**6 by 1.
end case.
end loop.
end file.
end input program.
EXECUTE.

COMPUTE variable_X=RND(RV.NORMAL(100,10)).
COMPUTE variable_Y=RND(RV.NORMAL(100,10)).
EXECUTE.

FORMATS variable_X variable_Y (F8.0).
DELETE VARIABLES a.

DO IF variable_X > variable_Y.
COMPUTE result = 1.
ELSE.
COMPUTE result = 0.
END IF.
EXECUTE.

FREQUENCIES result.
Dr. Frank Gaeth

Reply | Threaded
Open this post in threaded view
|

Re: Bivariate Normal Distribution

David Marso
Administrator
Considering: (from the FM).
Two variables with a bivariate normal(ρ) distribution with correlation ρ have
marginal normal distributions with a mean of 0 and a standard deviation of 1."
I am not sure that your RV.NORMAL(100,10) have any appropriate place here?
Also not terribly sure what you are really attempting to achieve with this?

--
drfg2008 wrote
SPSS 20, 64bit

We have two  bivariate N~ distributed (discrete) random variables, where you have variable_X and variable_Y . Both variables are not correlated (but could be correlated in some cases).
How can I determine the probability for all cases where  variable_X is (one point or more) greater than  variable_Y? (lower triangular matrix)
We try to develop a python script with the function CDF.BVNOR(quant,quant,corr), summing up the lower triangular matrix. However, the solution is complicated and time consuming.  Isn’t there a straight forward solution?

Our data looks like this

input program.
loop a =1 to 10**6 by 1.
end case.
end loop.
end file.
end input program.
EXECUTE.

COMPUTE variable_X=RND(RV.NORMAL(100,10)).
COMPUTE variable_Y=RND(RV.NORMAL(100,10)).
EXECUTE.

FORMATS variable_X variable_Y (F8.0).
DELETE VARIABLES a.

DO IF variable_X > variable_Y.
COMPUTE result = 1.
ELSE.
COMPUTE result = 0.
END IF.
EXECUTE.

FREQUENCIES result.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Bivariate Normal Distribution

David Marso
Administrator
In reply to this post by drfg2008
Since you complain that this code is time consuming why not program it more efficiently?
EXECUTE is a total waste of resources.
---
input program.
loop #a =1 to 10**6 .
COMPUTE result = RND(RV.NORMAL(100,10) ) - RND(RV.NORMAL(100,10) )  GT 1.                        
end case.
end loop.
end file.
end input program.
FREQUENCIES result.

drfg2008 wrote
SPSS 20, 64bit

We have two  bivariate N~ distributed (discrete) random variables, where you have variable_X and variable_Y . Both variables are not correlated (but could be correlated in some cases).
How can I determine the probability for all cases where  variable_X is (one point or more) greater than  variable_Y? (lower triangular matrix)
We try to develop a python script with the function CDF.BVNOR(quant,quant,corr), summing up the lower triangular matrix. However, the solution is complicated and time consuming.  Isn’t there a straight forward solution?

Our data looks like this

input program.
loop a =1 to 10**6 by 1.
end case.
end loop.
end file.
end input program.
EXECUTE.

COMPUTE variable_X=RND(RV.NORMAL(100,10)).
COMPUTE variable_Y=RND(RV.NORMAL(100,10)).
EXECUTE.

FORMATS variable_X variable_Y (F8.0).
DELETE VARIABLES a.

DO IF variable_X > variable_Y.
COMPUTE result = 1.
ELSE.
COMPUTE result = 0.
END IF.
EXECUTE.

FREQUENCIES result.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Bivariate Normal Distribution

David Marso
Administrator
*AND* generating 1000000 cases is also a waste of resources!
consider:
input program.
+  loop x=1 to 1.
+    end case.
+  end loop.
+  end file.
end input program.
compute gt1=0.
loop #=1 to 1000000.
+  compute gt1=SUM(GT1,((RV.NORMAL(100,10)-RV.NORMAL(100,10)) GT 1)).
end loop.
compute pgt1=gt1/1000000.
print /pgt1 (F10.6).
exe.

David Marso wrote
Since you complain that this code is time consuming why not program it more efficiently?
EXECUTE is a total waste of resources.
---
input program.
loop #a =1 to 10**6 .
COMPUTE result = RND(RV.NORMAL(100,10) ) - RND(RV.NORMAL(100,10) )  GT 1.                        
end case.
end loop.
end file.
end input program.
FREQUENCIES result.

drfg2008 wrote
SPSS 20, 64bit

We have two  bivariate N~ distributed (discrete) random variables, where you have variable_X and variable_Y . Both variables are not correlated (but could be correlated in some cases).
How can I determine the probability for all cases where  variable_X is (one point or more) greater than  variable_Y? (lower triangular matrix)
We try to develop a python script with the function CDF.BVNOR(quant,quant,corr), summing up the lower triangular matrix. However, the solution is complicated and time consuming.  Isn’t there a straight forward solution?

Our data looks like this

input program.
loop a =1 to 10**6 by 1.
end case.
end loop.
end file.
end input program.
EXECUTE.

COMPUTE variable_X=RND(RV.NORMAL(100,10)).
COMPUTE variable_Y=RND(RV.NORMAL(100,10)).
EXECUTE.

FORMATS variable_X variable_Y (F8.0).
DELETE VARIABLES a.

DO IF variable_X > variable_Y.
COMPUTE result = 1.
ELSE.
COMPUTE result = 0.
END IF.
EXECUTE.

FREQUENCIES result.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Bivariate Normal Distribution

Bruce Weaver
Administrator
We are not worthy!  ;-)



David Marso wrote
*AND* generating 1000000 cases is also a waste of resources!
consider:
input program.
+  loop x=1 to 1.
+    end case.
+  end loop.
+  end file.
end input program.
compute gt1=0.
loop #=1 to 1000000.
+  compute gt1=SUM(GT1,((RV.NORMAL(100,10)-RV.NORMAL(100,10)) GT 1)).
end loop.
compute pgt1=gt1/1000000.
print /pgt1 (F10.6).
exe.

David Marso wrote
Since you complain that this code is time consuming why not program it more efficiently?
EXECUTE is a total waste of resources.
---
input program.
loop #a =1 to 10**6 .
COMPUTE result = RND(RV.NORMAL(100,10) ) - RND(RV.NORMAL(100,10) )  GT 1.                        
end case.
end loop.
end file.
end input program.
FREQUENCIES result.

drfg2008 wrote
SPSS 20, 64bit

We have two  bivariate N~ distributed (discrete) random variables, where you have variable_X and variable_Y . Both variables are not correlated (but could be correlated in some cases).
How can I determine the probability for all cases where  variable_X is (one point or more) greater than  variable_Y? (lower triangular matrix)
We try to develop a python script with the function CDF.BVNOR(quant,quant,corr), summing up the lower triangular matrix. However, the solution is complicated and time consuming.  Isn’t there a straight forward solution?

Our data looks like this

input program.
loop a =1 to 10**6 by 1.
end case.
end loop.
end file.
end input program.
EXECUTE.

COMPUTE variable_X=RND(RV.NORMAL(100,10)).
COMPUTE variable_Y=RND(RV.NORMAL(100,10)).
EXECUTE.

FORMATS variable_X variable_Y (F8.0).
DELETE VARIABLES a.

DO IF variable_X > variable_Y.
COMPUTE result = 1.
ELSE.
COMPUTE result = 0.
END IF.
EXECUTE.

FREQUENCIES result.
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Bivariate Normal Distribution

Richard Ristow
In reply to this post by drfg2008
At 10:37 AM 10/4/2012, drfg2008 wrote:

>We have two  bivariate N~ distributed (discrete) random variables,
>where you have variable_X and variable_Y . Both variables are not
>correlated (but could be correlated in some cases).

First, terminology. Your code has

>COMPUTE variable_X=RND(RV.NORMAL(100,10)).
>COMPUTE variable_Y=RND(RV.NORMAL(100,10)).

Those are actually univariate variables; the two together can be
taken as a bivariate random variable. The two are independent, and
uncorrelated, in the (conceptually infinite) population they're drawn
from, although any sample will show a non-zero correlation because of
random variation. And they're not normally distributed, because of
the RND function; they have a discrete distribution that is similar
to, but is not, the normal distribution.

>How can I determine the probability for all cases where  variable_X
>is (one point or more) greater than  variable_Y?

Can't that be done in closed form, a priori? If X and Y are
independent, normally distributed, with the same mean and standard
deviation, then X-Y is normally distributed, with mean 0 and standard
deviation SQRT(2) times their common standard deviation: in your
case, 10*SQRT(2). The tables for the normal distribution can give you
the probability that that's greater than 1.

Or, is that not what you were looking for?

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Bivariate Normal Distribution

Art Kendall
In reply to this post by David Marso
Wonderful!!!

Art Kendall
Social Research Consultants
On 10/4/2012 4:42 PM, David Marso wrote:
Since you complain that this code is time consuming why not program it more
efficiently?
EXECUTE is a total waste of resources.
---
input program.
loop #a =1 to 10**6 .
COMPUTE result = RND(RV.NORMAL(100,10) ) - RND(RV.NORMAL(100,10) )  GT 1.
end case.
end loop.
end file.
end input program.
FREQUENCIES result.


drfg2008 wrote
SPSS 20, 64bit

We have two  bivariate N~ distributed (discrete) random variables, where
you have variable_X and variable_Y . Both variables are not correlated
(but could be correlated in some cases).
How can I determine the probability for all cases where  variable_X is
(one point or more) greater than  variable_Y? (lower triangular matrix)
We try to develop a python script with the function
CDF.BVNOR(quant,quant,corr), summing up the lower triangular matrix.
However, the solution is complicated and time consuming.  Isn’t there a
straight forward solution?

Our data looks like this

input program.
loop a =1 to 10**6 by 1.
end case.
end loop.
end file.
end input program.
EXECUTE.

COMPUTE variable_X=RND(RV.NORMAL(100,10)).
COMPUTE variable_Y=RND(RV.NORMAL(100,10)).
EXECUTE.

FORMATS variable_X variable_Y (F8.0).
DELETE VARIABLES a.

DO IF variable_X > variable_Y.
COMPUTE result = 1.
ELSE.
COMPUTE result = 0.
END IF.
EXECUTE.

FREQUENCIES result.




-----
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Bivariate-Normal-Distribution-tp5715479p5715486.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Bivariate Normal Distribution

Art Kendall
In reply to this post by Richard Ristow
Seeing David's and Richard's posts reminds me of  the value of presenting/teaching  the same idea in two ways.

Richard's post presents the theoretical perspective.

David's post show the value of simulation in reinforcing ideas.

Those who teach might want to put both in their pedagogical toolboxes.

Art Kendall
Social Research Consultants
On 10/5/2012 10:13 AM, Richard Ristow wrote:
At 10:37 AM 10/4/2012, drfg2008 wrote:

We have two  bivariate N~ distributed (discrete) random variables,
where you have variable_X and variable_Y . Both variables are not
correlated (but could be correlated in some cases).

First, terminology. Your code has

COMPUTE variable_X=RND(RV.NORMAL(100,10)).
COMPUTE variable_Y=RND(RV.NORMAL(100,10)).

Those are actually univariate variables; the two together can be
taken as a bivariate random variable. The two are independent, and
uncorrelated, in the (conceptually infinite) population they're drawn
from, although any sample will show a non-zero correlation because of
random variation. And they're not normally distributed, because of
the RND function; they have a discrete distribution that is similar
to, but is not, the normal distribution.

How can I determine the probability for all cases where  variable_X
is (one point or more) greater than  variable_Y?

Can't that be done in closed form, a priori? If X and Y are
independent, normally distributed, with the same mean and standard
deviation, then X-Y is normally distributed, with mean 0 and standard
deviation SQRT(2) times their common standard deviation: in your
case, 10*SQRT(2). The tables for the normal distribution can give you
the probability that that's greater than 1.

Or, is that not what you were looking for?

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Bivariate Normal Distribution

drfg2008
In reply to this post by Art Kendall
Thank you.

Yes, a N~ distributed RV can't be discrete, etc. etc. Yes, nice video.

No, this is not what I meant.

Let's put it that way: Two handball clubs play against each other. We know that the points (goals) of team1 and team2 are of a distribution (each team) that reminds some statisticians of a N~distribution (KSO: p>0.05). So they take the N~distr. as a good approximation, although the distribution is descrete, of course. The points (goals) team1 and team2 achive are not correlated (not sig.). Therefor it is assumed r=0. We know the expected value for the average number of points (goals) for team1 and team2 separately (when they play against each other). Plus, we know the stdv. for the two distributions (bivariate Normal distribution as an approximation).

(Other -discrete- distributions might be better suited. Any suggestions welcome. Especially how to do compute with SPSS. Bivariate Poisson with/without intercorrelation and over/underdispersion has not yet been implemented, I guess)

Now the problem is how to compute the probability of the "1-Goal-Handycap" for team2. That is, when team1 does not only win against team2, but wins with more (!) than one goal in advance (i. e. 0:[2 or higher], 1:[3 or higher], 2:[4:or higher] )


Thanks
Dr. Frank Gaeth

Reply | Threaded
Open this post in threaded view
|

Re: Bivariate Normal Distribution

Richard Ristow
At 01:58 PM 10/5/2012, drfg2008 wrote:

>Let's put it that way: Two handball clubs play against each other.
>The points (goals) team1 and team2 achieve are not correlated (not sig.).

You know your problem best, so I'll just have to accept this.
However, I suspect that there are correlations in a lot of sports:
how well a team's offense is doing affects how well its defense is
doing, if only by requiring the defense to be active for a greater or
lesser part of the game.

>We know that the points (goals) of team1 and team2 are of a
>distribution (each team) that reminds some statisticians of a
>N~distribution (KSO: p>0.05). So they take the N~distr. as a good
>approximation, although the distribution is discrete, of course.

Have you thought about approximating by a binomial distribution,
instead? A binomial distribution has, at least, the advantages of
being inherently discrete, and of never taking on negative values.
And binomial distributions (for highish values of n) look pretty much
like the normal distribution, which may have influenced your statisticians.

Anyway, you say that for each team you have an estimate of the
expected number of points, and the standard deviation If team 1 has
E1 expected goals with S1 standard deviation (so V1 variance, where
V1=S1**2), the corresponding binomial distribution has parameters
B(n1,p1), where p1=1+V1/E1, and n1=E1/p1 (round to an integer). And
the same for team 2.

Now you have two independent binomially distributed random variables,
B1(n1,p1) and B2(n2,p2). You want the probability that

B2 >= Bq1+1.

I don't see a closed-form solution for this, but I don't think it
needs to be simulated, either. I think the quantity you want can be
found in an SPSS loop:

COMPUTE Prob1up = 0.
LOOP #i = 0 TO n1.
.  Compute Prob1up =    PDF.BINOM(n1,p1,#i)
                    * (1-CDF.BINOM(n2,p2,#i).
END LOOP.

I'm afraid I'm tossing this off without testing; apologies, for any errors.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Bivariate Normal Distribution

drfg2008
Thanks!

(Yes, you're right. In some cases there is a correlation (and an underdispersion -> Poisson). That's why I wanted to use the biv. N~ distribution implemented in SPSS. It includes a correlation parameter and can handle underdispersion.)

Dr. Frank Gaeth

CG
Reply | Threaded
Open this post in threaded view
|

Automatic reply: Bivariate Normal Distribution

CG

New Mexico state offices are closed Monday October 8.  Please expect a reply on Tuesday. Regards.

Reply | Threaded
Open this post in threaded view
|

Re: Bivariate Normal Distribution

Richard Ristow
In reply to this post by drfg2008
At 03:50 AM 10/6/2012, drfg2008 wrote:

>(Yes, you're right. In some cases there is a correlation (and an
>underdispersion -> Poisson). That's why I wanted to use the biv.
>N~distribution implemented in SPSS. It includes a correlation
>parameter and can handle underdispersion.)

Fair enough. I still don't think you need to do simulation; it should
be a fairly simple numerical-integration exercise to get what
proportion of the bivariate distribution is in the range Y>X+1.

You write, here, "In some cases there is a correlation" (between
points scored by the two teams). I'll mention that at 01:58 PM
10/5/2012, you wrote:

>Two handball clubs play against each other. The points (goals) team1
>and team2 achieve are not correlated (not sig.).

I hope that's not a confusion.

>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Bivariate Normal Distribution

drfg2008
This post was updated on .
Fair enough. I still don't think you need to do simulation; it should
be a fairly simple numerical-integration exercise to get what
proportion of the bivariate distribution is in the range Y>X+1.


Yes, it is a fairly simple numerical-integration exercise (i. e. with Mathematica). But how to do that with SPSS? That's our problem.

[as an example for a bivariate N~distr.: Y ~N (µ=15.73, s= 7.24) , X ~N (µ=14.17, s= 8.19), r(X,Y) = -0.047,

what proportion of the bivariate distribution is in the range (Y>X+1) =? ]

Frank

Dr. Frank Gaeth

Reply | Threaded
Open this post in threaded view
|

Re: Bivariate Normal Distribution

little_mo
This post was updated on .
Hi, I am working on handball too and I came accross this post. I have the following problem, I want to make a handball live (in-play) pricing model, and I am trying to find the best distribution for my calculations. The problem is that since one team is expected to score x goals, this x tends to 0 as time tends towards the end. Suppose that someone uses normal distribution. What happends with the standard deviation? Where does this converge to? And what about negative values. Lets say home team is expected to score 27.5 goals, negative values are not significant, but 5 minutes before the end, with 2.5 goals expectancy remaining, apparently a large "amount of probability" comes from negative values, which dont make sence. On the other hand, lets assume binomial distribution, and that the probability of scoring is about 50% for each team (1 goal every 2 offences). Since binomial distribution uses integers (positive ones) if I have 3.3 of 3.7 goals expectany at some point, I cannot feed formula with this, So I would have to use round up/down, and thats not convinient too. I tried continuoys correction, to convert binomial to normal, but at some point the rule np>5, nq>5 does not stand. So, does anyone have any idea?? thanks