WinVista / SPSS20
The following problem: I would like to approximate a Poisson-distribution with a negative Binomial Function. For example: Lambda (mean) = 3 (Standarddeviation = 3) x = 5 cumulative = yes The p (cumulative) is 0,91608206 in Excel: =POISSON(x;lamda;TRUE) in SPSS: COMPUTE p_poisson=1-CDF.POISSON(x,lambda). EXECUTE. But how to compute (approximate) it with a negative Binomial Function in SPSS?
Dr. Frank Gaeth
|
Administrator
|
Is there not a CDF.NEGBIN function in SPSS?
What are you really attempting to do Frank. You have posed some variant of this question several times in recent weeks.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
the problem is that mean and variance of the Poisson distribution are the same. We figured out (and literature says) that there is a slightly higher variance ("oversdispersion") than the mean , therefore literature suggests to use negbin instead.
But, I've no clue how to use negbin instead of poisson with SPSS. The data is simple: Lambda (mean) = 3 (Standarddeviation = 3 or a bit higher) x = 5 cumulative = yes Does it make sense to you? Frank
Dr. Frank Gaeth
|
Administrator
|
In reply to this post by David Marso
Yes. In the my copy of the fine manual (v19), it is found under Universals - Transformation Expressions - Random variable and distribution functions.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Administrator
|
My post was more of a Soctratic prodding ;-)
It appears in my 11.5 FM as well: "NEGBIN Negative binomial distribution. The negative binomial distribution takes one threshold parameter, a, and one success probability parameter, b. Parameter a must be an integer and parameter b must be greater than 0 and less than or equal to 1. The CDF, PDF, and RV functions are available, where q is the number of trials needed (including the last trial) before a successes are observed. If a=1, it is a geometric distribution." I have *NO* idea what parameterization will yield a Poisson approximate. --
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Thanks David.
If I use "Distribution Fit" (Extention Bundle - R) and fit the data with the negbin Distribution, R computes an estimate for 1) size and 2) mu ------------------------------------------------- Variable: Score_Sum Distribution: negative binomial size mu Estimate 100,003 2,775 Std. Error 114,112 ,078 Number of valid cases: 472, out of 479 total cases ------------------------------------------------ So, my question would be how to estimate these two parameters. It should be possible. Frank (merry christmas, by the way)
Dr. Frank Gaeth
|
... as a supplement to the above message a descripton by Wikipedia: The negative binomial distribution, especially in its alternative parameterization described above, can be used as an alternative to the Poisson distribution. It is especially useful for discrete data over an unbounded positive range whose sample variance exceeds the sample mean. In such cases, the observations are overdispersed with respect to a Poisson distribution, for which the mean is equal to the variance. Hence a Poisson distribution is not an appropriate model. Since the negative binomial distribution has one more parameter than the Poisson, the second parameter can be used to adjust the variance independently of the mean. http://en.wikipedia.org/wiki/Negative_binomial_distribution the mean of negbin equals lambda of poisson. But how to estimate the variance (size) with SPSS ? Frank
Dr. Frank Gaeth
|
In reply to this post by drfg2008
The negative binomial distribution has
two equivalent parameterizations. What you are getting from STATS
DISTFIT is
mean mu, and size, the dispersion parameter, where prob = size/(size+mu). The variance is mu + mu^2/size in this parameterization. So it's easy to calculate from mu, size to mean (=mu) and variance or to solve for size from estimated moments. The other parameterization, which is what the Statistics functions use, is with prob = p and size. The density is Γ(x+n)/(Γ(n) x!) p^n (1-p)^x HTH, Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] new phone: 720-342-5621 From: drfg2008 <[hidden email]> To: [hidden email] Date: 12/24/2011 12:30 PM Subject: [SPSSX-L] Re: Poisson – negative Binomial Sent by: "SPSSX(r) Discussion" <[hidden email]> Thanks David. If I use "Distribution Fit" (Extention Bundle - R) and fit the data with the negbin Distribution, R computes an estimate for 1) size and 2) mu ------------------------------------------------- Variable: Score_Sum Distribution: negative binomial size mu Estimate 100,003 2,775 Std. Error 114,112 ,078 Number of valid cases: 472, out of 479 total cases ------------------------------------------------ So, my question would be how to estimate these two parameters. It should be possible. Frank (merry christmas, by the way) ----- Dr. Frank Gaeth FU-Berlin -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Poisson-negative-Binomial-tp5085330p5099580.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
That's exactly what I need. Great. However, I checked the statistical textbooks and also across the internet, but couldn't find the parametric variant of the negbin density function (mean mu, and size) that is used by STATS DISTFIT. Could you give me a hint where to find ... a) the density function b) the cumulative density function that is used by STATS DISTFIT and c) does SPSS have a Random Variable (compute) that is based on this function (mean mu, and size) I only know this version: COMPUTE negbin_var=RV.NEGBIN(threshold,prob). (hope you understand what I mean) Frank
Dr. Frank Gaeth
|
From the underlying package doc...
The negative binomial distribution with size = n and prob = p has density Γ(x+n)/(Γ(n) x!) p^n (1-p)^x for x = 0, 1, 2, …, n > 0 and 0 < p ≤ 1. This represents the number of failures which occur in a sequence of Bernoulli trials before a target number of successes is reached. A negative binomial distribution can arise as a mixture of Poisson distributions with mean distributed as a gamma distribution (seepgamma) with scale parameter (1 - prob)/prob and shape parameter size. (This definition allows non-integer values of size.) In this model prob = 1/(1+size), and the mean is size * (1 - prob)/prob. The alternative parametrization (often used in ecology) is by the mean mu, and size, the dispersion parameter, where prob = size/(size+mu). The variance is mu + mu^2/size in this parametrization or n (1-p)/p^2 in the first one. Wikipedia has full documentation on this IIRC. In Statistics, see the rv.Negbin, Cdf.Negbin,
and Pdf.Negbin functions. These do not use the parameterization used
by DISTFIT, so you would have to convert the parameters according to the
given formula.
That's exactly what I need. Great. However, I checked the statistical textbooks and also across the internet, but couldn't find the parametric variant of the negbin density function (mean mu, and size) that is used by STATS DISTFIT. Could you give me a hint where to find ... a) the density function b) the cumulative density function that is used by STATS DISTFIT and c) does SPSS have a Random Variable (compute) that is based on this function (mean mu, and size) I only know this version: COMPUTE negbin_var=RV.NEGBIN(threshold,prob). (hope you understand what I mean) Frank ----- Dr. Frank Gaeth FU-Berlin -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Poisson-negative-Binomial-tp5085330p5103094.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Thank you.
Am I right, that the estimates of mu equals the estimate of lambda (I guess from DISFIT) mu = lambda = arithmetic mean. That means, the alternative parametrization of negbin and the poisson distribution have the same location parameter and only vary in dispersion [parameter: size]. So, if you have lambda (poisson), you have mu (negbin). Frank
Dr. Frank Gaeth
|
In reply to this post by Jon K Peck
I knew it would be somewhat complicated.
The problem comes from the sports betting. A bookmaker wants to determine how high to set the odds in ice hockey. The ratio is calculated as 1 / p. The following is known: 1. expected mean value of goals in a game (μ or lambda if distribution is Poisson - thank David for the iteration, which works fine) 2. Threshold k (k = 5 goals) 3. The probability of up to 5 goals in a game: p (x) <= k. Also, the results (goals) of the past 2.000 games are known. The estimate to point 3 (probability of up to 5 goals) comes from the market. The market is considered efficient, therefore the best estimate. And hence the slightly angled approach. Now it is to calculate how high the probability is that more than 4 goals (alternative: six goals) are shot. With the Poisson distribution it can be easily calculated, because only lambda and the threshold k must be known (both are known). With the Poisson distribution the SPSS function would simply be as follows: CDF.POISSON (k-1, LAMBDA_F) /* up to 4 Goals CDF.POISSON (k +1, LAMBDA_F) /*up to 6 Goals However, the distribution of goals in ice hockey is not exactly Poisson. It is assumed that the scatter is slightly is too large (over-dispersion). Therefore, a negative binomial function is to be taken (or any other suitable distribution). The problem with negbin is that the SPSS function requires three parameters here. CDF.NEGBIN (QUANT, THRESH, PROB) Although size could be estimated iteratively of prob = size / (size + mu), there would yet remain, the problem to calculate THRESH. - lost in translation
Dr. Frank Gaeth
|
Frank,
Do you have SAS? Ryan On Dec 28, 2011, at 5:26 AM, drfg2008 <[hidden email]> wrote: > I knew it would be somewhat complicated. > > The problem comes from the sports betting. A bookmaker wants to determine > how high to set the odds in ice hockey. The ratio is calculated as 1 / p. > The following is known: > > 1. expected mean value of goals in a game (μ or lambda if distribution is > Poisson - thank David for the iteration, which works fine) > > 2. Threshold k (k = 5 goals) > > 3. The probability of up to 5 goals in a game: p (x) <= k. > > Also, the results (goals) of the past 2.000 games are known. > > The estimate to point 3 (probability of up to 5 goals) comes from the > market. The market is considered efficient, therefore the best estimate. And > hence the slightly angled approach. > > Now it is to calculate how high the probability is that more than 4 goals > (alternative: six goals) are shot. > > With the Poisson distribution it can be easily calculated, because only > lambda and the threshold k must be known (both are known). With the Poisson > distribution the SPSS function would simply be as follows: > > CDF.POISSON (k-1, LAMBDA_F) /* up to 4 Goals > CDF.POISSON (k +1, LAMBDA_F) /*up to 6 Goals > > However, the distribution of goals in ice hockey is not exactly Poisson. It > is assumed that the scatter is slightly is too large (over-dispersion). > Therefore, a negative binomial function is to be taken (or any other > suitable distribution). > > The problem with negbin is that the SPSS function requires three parameters > here. > > CDF.NEGBIN (QUANT, THRESH, PROB) > > Although size could be estimated iteratively of prob = size / (size + mu), > there would yet remain, the problem to calculate THRESH. > > - lost in translation > > ----- > Dr. Frank Gaeth > FU-Berlin > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Poisson-negative-Binomial-tp5085330p5104981.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
I could get a licence, however since all the programming is done with SPSS, a solution in SPSS would be wonderful.
Frank
Dr. Frank Gaeth
|
Thank you, Matthew Zack!
However, the point is to get all estimates from the market (not from the data base), because the market is the best estimate (efficiency theorem). That's why I tried to get lambda from the "5 point line" that is the odds for up to 5 goals in one game. For example: A bookmaker (the market) offers 1,85 Euro if Team1 and Team2 together score not more than 5 goals in a certain game. With Poisson this is no problem to get from k = 5 (goals), and given the odds (in this example 1,85 -> p(cumulative)= 1/1,85 ~54%) the equvalent lambda. From lambda and k I can compute all other probabilities, for example for "not more than 4 goals" or "not more than 6 goals". The only problem with Poisson is, that it is slightly "overdispersed", therefore using negbin would be more appropriate. So how can I estimate the negbin function parameters from the market, from where I only know the odds (cumulative p) and k. --------------------------------------------- Under the negative binomial distribution, the probability of zero goals, P(0), equals P(0) = prob**size (using your terminology below). Since you have the results of 2000 games, you know P(0). Thus, ln P(0) = size*ln(prob) -> size= ln P(0) / ln(prob). You can then use the following recursive formula to calculate P(1), P(2), etc.: P(x+1) = P(x)*(1 - prob)*(x+size)/(x+size-1). For example, P(1) = P(0)*(1 - prob)*(size)/(size-1), P(2) = P(1)*(1 - prob)*(1+size)/(size), etc. You can then calculate the cumulative distribution function values by summing up these successive values. Matthew Zack
Dr. Frank Gaeth
|
I was suggesting you consider using SAS because it is capable (via the NLMIXED procedure) of doing exactly what Matthew was suggesting off-list; that is, summing the estimated probabilities.
Ryan
On Wed, Dec 28, 2011 at 11:12 AM, drfg2008 <[hidden email]> wrote: Thank you, Matthew Zack! |
Thanks everybody,
if found a solution at last. -> using gamma distribution (approximation of the negative binomial distribution), compute scale and shape from µ and variance scale = µ / variance shape = µ * scale --------------------------------------- see: Schlittgen, Rainer: Statistik (5. Auflage) Oldenburg 1995 S.225 Never imagined to find the solution in my stats books back from college times. Frank
Dr. Frank Gaeth
|
This thread led me to consider how one might generate data from a negative binomial regression equation using SPSS. I did some research online, and then came up with the simulation code provided below my name in this post. Consider this an initial attempt that might require modifications.
It should be noted that the fixed effects estimates and dispersion parameter estimate obtained from the GENLIN procedure are very close to the values used in the simulation code. Hope this is of interest to others. Ryan -- *Generate Data. set seed 89765432. new file. inp pro. loop ID= 1 to 100000. comp b0 = 1.2.
comp b1 = -1.8. comp x = rv.normal(0,1). comp lambda = exp(b0 + b1*x). comp shape = 0.8. comp dispersion = 1 / shape. comp scale = lambda / shape.
comp mean = rv.gamma(shape, 1/scale). comp y = rv.poisson(mean). end case. end loop. end file. end inp pro. exe. delete variables lambda shape scale mean. *Fit Model. GENLIN y WITH x /MODEL x INTERCEPT=YES DISTRIBUTION=NEGBIN(MLE) LINK=LOG /PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION (EXPONENTIATED).
On Thu, Dec 29, 2011 at 11:41 AM, drfg2008 <[hidden email]> wrote: Thanks everybody, |
I will be out of the office until Monday, January 9. If you need immediate assistance, please call 812-856-5824. I will respond to your e-mail when I return to the office.
Thank you, Shimon Sarraf Center for Postsecondary Research Indiana University Bloomington
|
Free forum by Nabble | Edit this page |