|
SPSS 20, 64bit
We have two bivariate N~ distributed (discrete) random variables, where you have variable_X and variable_Y . Both variables are not correlated (but could be correlated in some cases). How can I determine the probability for all cases where variable_X is (one point or more) greater than variable_Y? (lower triangular matrix) We try to develop a python script with the function CDF.BVNOR(quant,quant,corr), summing up the lower triangular matrix. However, the solution is complicated and time consuming. Isn’t there a straight forward solution? Our data looks like this input program. loop a =1 to 10**6 by 1. end case. end loop. end file. end input program. EXECUTE. COMPUTE variable_X=RND(RV.NORMAL(100,10)). COMPUTE variable_Y=RND(RV.NORMAL(100,10)). EXECUTE. FORMATS variable_X variable_Y (F8.0). DELETE VARIABLES a. DO IF variable_X > variable_Y. COMPUTE result = 1. ELSE. COMPUTE result = 0. END IF. EXECUTE. FREQUENCIES result.
Dr. Frank Gaeth
|
|
Administrator
|
Considering: (from the FM).
Two variables with a bivariate normal(ρ) distribution with correlation ρ have marginal normal distributions with a mean of 0 and a standard deviation of 1." I am not sure that your RV.NORMAL(100,10) have any appropriate place here? Also not terribly sure what you are really attempting to achieve with this? --
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
|
Administrator
|
In reply to this post by drfg2008
Since you complain that this code is time consuming why not program it more efficiently?
EXECUTE is a total waste of resources. --- input program. loop #a =1 to 10**6 . COMPUTE result = RND(RV.NORMAL(100,10) ) - RND(RV.NORMAL(100,10) ) GT 1. end case. end loop. end file. end input program. FREQUENCIES result.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
|
Administrator
|
*AND* generating 1000000 cases is also a waste of resources!
consider: input program. + loop x=1 to 1. + end case. + end loop. + end file. end input program. compute gt1=0. loop #=1 to 1000000. + compute gt1=SUM(GT1,((RV.NORMAL(100,10)-RV.NORMAL(100,10)) GT 1)). end loop. compute pgt1=gt1/1000000. print /pgt1 (F10.6). exe.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
|
Administrator
|
We are not worthy! ;-)
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
In reply to this post by drfg2008
At 10:37 AM 10/4/2012, drfg2008 wrote:
>We have two bivariate N~ distributed (discrete) random variables, >where you have variable_X and variable_Y . Both variables are not >correlated (but could be correlated in some cases). First, terminology. Your code has >COMPUTE variable_X=RND(RV.NORMAL(100,10)). >COMPUTE variable_Y=RND(RV.NORMAL(100,10)). Those are actually univariate variables; the two together can be taken as a bivariate random variable. The two are independent, and uncorrelated, in the (conceptually infinite) population they're drawn from, although any sample will show a non-zero correlation because of random variation. And they're not normally distributed, because of the RND function; they have a discrete distribution that is similar to, but is not, the normal distribution. >How can I determine the probability for all cases where variable_X >is (one point or more) greater than variable_Y? Can't that be done in closed form, a priori? If X and Y are independent, normally distributed, with the same mean and standard deviation, then X-Y is normally distributed, with mean 0 and standard deviation SQRT(2) times their common standard deviation: in your case, 10*SQRT(2). The tables for the normal distribution can give you the probability that that's greater than 1. Or, is that not what you were looking for? ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by David Marso
Wonderful!!!
Art Kendall Social Research ConsultantsOn 10/4/2012 4:42 PM, David Marso wrote: Since you complain that this code is time consuming why not program it more efficiently? EXECUTE is a total waste of resources. --- input program. loop #a =1 to 10**6 . COMPUTE result = RND(RV.NORMAL(100,10) ) - RND(RV.NORMAL(100,10) ) GT 1. end case. end loop. end file. end input program. FREQUENCIES result. drfg2008 wroteSPSS 20, 64bit We have two bivariate N~ distributed (discrete) random variables, where you have variable_X and variable_Y . Both variables are not correlated (but could be correlated in some cases). How can I determine the probability for all cases where variable_X is (one point or more) greater than variable_Y? (lower triangular matrix) We try to develop a python script with the function CDF.BVNOR(quant,quant,corr), summing up the lower triangular matrix. However, the solution is complicated and time consuming. Isn’t there a straight forward solution? Our data looks like this input program. loop a =1 to 10**6 by 1. end case. end loop. end file. end input program. EXECUTE. COMPUTE variable_X=RND(RV.NORMAL(100,10)). COMPUTE variable_Y=RND(RV.NORMAL(100,10)). EXECUTE. FORMATS variable_X variable_Y (F8.0). DELETE VARIABLES a. DO IF variable_X > variable_Y. COMPUTE result = 1. ELSE. COMPUTE result = 0. END IF. EXECUTE. FREQUENCIES result.----- Please reply to the list and not to my personal email. Those desiring my consulting or training services please feel free to email me. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Bivariate-Normal-Distribution-tp5715479p5715486.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
In reply to this post by Richard Ristow
Seeing David's and
Richard's posts reminds me of the value of
presenting/teaching the same idea in two ways.
Richard's post presents the theoretical perspective. David's post show the value of simulation in reinforcing ideas. Those who teach might want to put both in their pedagogical toolboxes. Art Kendall Social Research ConsultantsOn 10/5/2012 10:13 AM, Richard Ristow wrote: At 10:37 AM 10/4/2012, drfg2008 wrote: ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
In reply to this post by Art Kendall
Thank you.
Yes, a N~ distributed RV can't be discrete, etc. etc. Yes, nice video. No, this is not what I meant. Let's put it that way: Two handball clubs play against each other. We know that the points (goals) of team1 and team2 are of a distribution (each team) that reminds some statisticians of a N~distribution (KSO: p>0.05). So they take the N~distr. as a good approximation, although the distribution is descrete, of course. The points (goals) team1 and team2 achive are not correlated (not sig.). Therefor it is assumed r=0. We know the expected value for the average number of points (goals) for team1 and team2 separately (when they play against each other). Plus, we know the stdv. for the two distributions (bivariate Normal distribution as an approximation). (Other -discrete- distributions might be better suited. Any suggestions welcome. Especially how to do compute with SPSS. Bivariate Poisson with/without intercorrelation and over/underdispersion has not yet been implemented, I guess) Now the problem is how to compute the probability of the "1-Goal-Handycap" for team2. That is, when team1 does not only win against team2, but wins with more (!) than one goal in advance (i. e. 0:[2 or higher], 1:[3 or higher], 2:[4:or higher] ) Thanks
Dr. Frank Gaeth
|
|
At 01:58 PM 10/5/2012, drfg2008 wrote:
>Let's put it that way: Two handball clubs play against each other. >The points (goals) team1 and team2 achieve are not correlated (not sig.). You know your problem best, so I'll just have to accept this. However, I suspect that there are correlations in a lot of sports: how well a team's offense is doing affects how well its defense is doing, if only by requiring the defense to be active for a greater or lesser part of the game. >We know that the points (goals) of team1 and team2 are of a >distribution (each team) that reminds some statisticians of a >N~distribution (KSO: p>0.05). So they take the N~distr. as a good >approximation, although the distribution is discrete, of course. Have you thought about approximating by a binomial distribution, instead? A binomial distribution has, at least, the advantages of being inherently discrete, and of never taking on negative values. And binomial distributions (for highish values of n) look pretty much like the normal distribution, which may have influenced your statisticians. Anyway, you say that for each team you have an estimate of the expected number of points, and the standard deviation If team 1 has E1 expected goals with S1 standard deviation (so V1 variance, where V1=S1**2), the corresponding binomial distribution has parameters B(n1,p1), where p1=1+V1/E1, and n1=E1/p1 (round to an integer). And the same for team 2. Now you have two independent binomially distributed random variables, B1(n1,p1) and B2(n2,p2). You want the probability that B2 >= Bq1+1. I don't see a closed-form solution for this, but I don't think it needs to be simulated, either. I think the quantity you want can be found in an SPSS loop: COMPUTE Prob1up = 0. LOOP #i = 0 TO n1. . Compute Prob1up = PDF.BINOM(n1,p1,#i) * (1-CDF.BINOM(n2,p2,#i). END LOOP. I'm afraid I'm tossing this off without testing; apologies, for any errors. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Thanks!
(Yes, you're right. In some cases there is a correlation (and an underdispersion -> Poisson). That's why I wanted to use the biv. N~ distribution implemented in SPSS. It includes a correlation parameter and can handle underdispersion.)
Dr. Frank Gaeth
|
|
New Mexico state offices are closed Monday October 8. Please expect a reply on Tuesday. Regards. |
|
In reply to this post by drfg2008
At 03:50 AM 10/6/2012, drfg2008 wrote:
>(Yes, you're right. In some cases there is a correlation (and an >underdispersion -> Poisson). That's why I wanted to use the biv. >N~distribution implemented in SPSS. It includes a correlation >parameter and can handle underdispersion.) Fair enough. I still don't think you need to do simulation; it should be a fairly simple numerical-integration exercise to get what proportion of the bivariate distribution is in the range Y>X+1. You write, here, "In some cases there is a correlation" (between points scored by the two teams). I'll mention that at 01:58 PM 10/5/2012, you wrote: >Two handball clubs play against each other. The points (goals) team1 >and team2 achieve are not correlated (not sig.). I hope that's not a confusion. > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
This post was updated on .
Fair enough. I still don't think you need to do simulation; it should
be a fairly simple numerical-integration exercise to get what proportion of the bivariate distribution is in the range Y>X+1. Yes, it is a fairly simple numerical-integration exercise (i. e. with Mathematica). But how to do that with SPSS? That's our problem. [as an example for a bivariate N~distr.: Y ~N (µ=15.73, s= 7.24) , X ~N (µ=14.17, s= 8.19), r(X,Y) = -0.047, what proportion of the bivariate distribution is in the range (Y>X+1) =? ] Frank
Dr. Frank Gaeth
|
|
This post was updated on .
Hi, I am working on handball too and I came accross this post.
I have the following problem, I want to make a handball live (in-play) pricing model, and I am trying to find the best distribution for my calculations.
The problem is that since one team is expected to score x goals, this x tends to 0 as time tends towards the end.
Suppose that someone uses normal distribution. What happends with the standard deviation? Where does this converge to? And what about negative values. Lets say home team is expected to score 27.5 goals, negative values are not significant, but 5 minutes before the end, with 2.5 goals expectancy remaining, apparently a large "amount of probability" comes from negative values, which dont make sence.
On the other hand, lets assume binomial distribution, and that the probability of scoring is about 50% for each team (1 goal every 2 offences). Since binomial distribution uses integers (positive ones) if I have 3.3 of 3.7 goals expectany at some point, I cannot feed formula with this, So I would have to use round up/down, and thats not convinient too.
I tried continuoys correction, to convert binomial to normal, but at some point the rule np>5, nq>5 does not stand. So, does anyone have any idea??
thanks
|
| Free forum by Nabble | Edit this page |
