Poisson – negative Binomial

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

Poisson – negative Binomial

drfg2008
WinVista / SPSS20

The following problem: I would like to approximate a Poisson-distribution with a negative Binomial Function.

For example:
 
Lambda (mean) = 3
(Standarddeviation = 3)
x = 5
cumulative = yes

The p (cumulative) is 0,91608206


in Excel: =POISSON(x;lamda;TRUE)

in SPSS:

COMPUTE p_poisson=1-CDF.POISSON(x,lambda).
EXECUTE.

But how to compute (approximate) it with a negative Binomial Function in SPSS?
Dr. Frank Gaeth

Reply | Threaded
Open this post in threaded view
|

Re: Poisson – negative Binomial

David Marso
Administrator
Is there not a CDF.NEGBIN function in SPSS?
What are you really attempting to do Frank.
You have posed some variant of this question several times in recent weeks.
drfg2008 wrote
WinVista / SPSS20

The following problem: I would like to approximate a Poisson-distribution with a negative Binomial Function.

For example:
 
Lambda (mean) = 3
(Standarddeviation = 3)
x = 5
cumulative = yes

The p (cumulative) is 0,91608206


in Excel: =POISSON(x;lamda;TRUE)

in SPSS:

COMPUTE p_poisson=1-CDF.POISSON(x,lambda).
EXECUTE.

But how to compute (approximate) it with a negative Binomial Function in SPSS?
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Poisson – negative Binomial

drfg2008
the problem is that mean and variance of the Poisson distribution are the same. We figured out (and literature says) that there is a slightly higher variance ("oversdispersion") than the mean , therefore literature suggests to use negbin instead.

But, I've no clue how to use negbin instead of poisson with SPSS.

The data is simple:

Lambda (mean) = 3
(Standarddeviation = 3 or a bit higher)
x = 5
cumulative = yes


Does it make sense to you?

Frank
Dr. Frank Gaeth

Reply | Threaded
Open this post in threaded view
|

Re: Poisson – negative Binomial

Bruce Weaver
Administrator
In reply to this post by David Marso
David Marso wrote
Is there not a CDF.NEGBIN function in SPSS?
Yes.  In the my copy of the fine manual (v19), it is found under Universals - Transformation Expressions - Random variable and distribution functions.  

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Poisson – negative Binomial

David Marso
Administrator
My post was more of a Soctratic prodding ;-)  
It appears in my 11.5 FM as well:
"NEGBIN Negative binomial distribution. The negative binomial distribution takes
one threshold parameter, a, and one success probability parameter, b.
Parameter a must be an integer and parameter b must be greater than 0
and less than or equal to 1. The CDF, PDF, and RV functions are available,
where q is the number of trials needed (including the last trial) before a
successes are observed. If a=1, it is a geometric distribution."

I have *NO* idea what parameterization will yield a Poisson approximate.
--

Bruce Weaver wrote
David Marso wrote
Is there not a CDF.NEGBIN function in SPSS?
Yes.  In the my copy of the fine manual (v19), it is found under Universals - Transformation Expressions - Random variable and distribution functions.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Poisson – negative Binomial

drfg2008
Thanks David.

If I use "Distribution Fit" (Extention Bundle - R) and fit the data with the negbin Distribution, R computes an estimate for

1) size and 2) mu

-------------------------------------------------
Variable: Score_Sum Distribution: negative binomial
                size     mu
Estimate 100,003 2,775
Std. Error 114,112 ,078
Number of valid cases: 472, out of 479 total cases
------------------------------------------------

So, my question would be how to estimate these two parameters.
It should be possible.


Frank

(merry christmas, by the way)
Dr. Frank Gaeth

Reply | Threaded
Open this post in threaded view
|

Re: Poisson – negative Binomial

drfg2008

... as a supplement to the above message a descripton by Wikipedia:


The negative binomial distribution, especially in its alternative parameterization described above, can be used as an alternative to the Poisson distribution. It is especially useful for discrete data over an unbounded positive range whose sample variance exceeds the sample mean. In such cases, the observations are overdispersed with respect to a Poisson distribution, for which the mean is equal to the variance. Hence a Poisson distribution is not an appropriate model. Since the negative binomial distribution has one more parameter than the Poisson, the second parameter can be used to adjust the variance independently of the mean.
http://en.wikipedia.org/wiki/Negative_binomial_distribution

the mean of negbin equals lambda of poisson. But how to estimate the variance (size) with SPSS ?

Frank
Dr. Frank Gaeth

Reply | Threaded
Open this post in threaded view
|

Re: [SPSSX-L] Re: Poisson – negative Binomial

Jon K Peck
In reply to this post by drfg2008
The negative binomial distribution has two equivalent parameterizations.  What you are getting from STATS DISTFIT is
mean mu, and size, the dispersion parameter, where prob = size/(size+mu). The variance is mu + mu^2/size in this parameterization.

So it's easy to calculate from mu, size to mean (=mu) and variance or to solve for size from estimated moments.

The other parameterization, which is what the Statistics functions use, is with prob = p and size.  The density is
Γ(x+n)/(Γ(n) x!) p^n (1-p)^x

HTH,

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        drfg2008 <[hidden email]>
To:        [hidden email]
Date:        12/24/2011 12:30 PM
Subject:        [SPSSX-L] Re: Poisson – negative Binomial
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Thanks David.

If I use "Distribution Fit" (Extention Bundle - R) and fit the data with the
negbin Distribution, R computes an estimate for

1) size and 2) mu

-------------------------------------------------
Variable: Score_Sum Distribution: negative binomial
               size            mu
Estimate        100,003 2,775
Std. Error      114,112 ,078
Number of valid cases: 472, out of 479 total cases
------------------------------------------------

So, my question would be how to estimate these two parameters.
It should be possible.


Frank

(merry christmas, by the way)


-----
Dr. Frank Gaeth
FU-Berlin

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Poisson-negative-Binomial-tp5085330p5099580.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: [SPSSX-L] Re: Poisson – negative Binomial

drfg2008


That's exactly what I need. Great.

However, I checked the statistical textbooks and also across the internet, but couldn't find the parametric variant of the negbin density function (mean mu, and size) that is used by STATS DISTFIT.

Could you give me a hint where to find ...

a) the density function
b) the cumulative density function

that is used by STATS DISTFIT and

c) does SPSS have a Random Variable (compute) that is based on this function (mean mu, and size)
    I only know this version: COMPUTE negbin_var=RV.NEGBIN(threshold,prob).


(hope you understand what I mean)

Frank
Dr. Frank Gaeth

Reply | Threaded
Open this post in threaded view
|

Re: [SPSSX-L] Re: [SPSSX-L] Re: Poisson – negative Binomial

Jon K Peck
From the underlying package doc...

The negative binomial distribution with size = n and prob = p has density

Γ(x+n)/(Γ(n) x!) p^n (1-p)^x

for x = 0, 1, 2, …, n > 0 and 0 < p ≤ 1.

This represents the number of failures which occur in a sequence of Bernoulli trials before a target number of successes is reached.

A negative binomial distribution can arise as a mixture of Poisson distributions with mean distributed as a gamma distribution (seepgamma) with scale parameter (1 - prob)/prob and shape parameter size. (This definition allows non-integer values of size.) In this model prob = 1/(1+size), and the mean is size * (1 - prob)/prob.

The alternative parametrization (often used in ecology) is by the mean mu, and size, the dispersion parameter, where prob = size/(size+mu). The variance is mu + mu^2/size in this parametrization or n (1-p)/p^2 in the first one.

Wikipedia has full documentation on this IIRC.

In Statistics, see the rv.Negbin, Cdf.Negbin, and Pdf.Negbin functions.  These do not use the parameterization used by DISTFIT, so you would have to convert the parameters according to the given formula.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        drfg2008 <[hidden email]>
To:        [hidden email]
Date:        12/27/2011 06:18 AM
Subject:        [SPSSX-L]              Re: [SPSSX-L] Re: Poisson �C negative Binomial
Sent by:        "SPSSX(r) Discussion" <[hidden email]>





That's exactly what I need. Great.

However, I checked the statistical textbooks and also across the internet,
but couldn't find the parametric variant of the negbin density function
(mean mu, and size) that is used by STATS DISTFIT.

Could you give me a hint where to find ...

a) the density function
b) the cumulative density function

that is used by STATS DISTFIT and

c) does SPSS have a Random Variable (compute) that is based on this function
(mean mu, and size)
   I only know this version: COMPUTE negbin_var=RV.NEGBIN(threshold,prob).


(hope you understand what I mean)

Frank

-----
Dr. Frank Gaeth
FU-Berlin

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Poisson-negative-Binomial-tp5085330p5103094.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: [SPSSX-L] Re: [SPSSX-L] Re: Poisson – negative Binomial

drfg2008
Thank you.

Am I right, that the estimates of mu equals the estimate of lambda (I guess from DISFIT)

mu = lambda = arithmetic mean.

That means, the alternative parametrization of negbin and the poisson distribution have the same location parameter and only vary in dispersion [parameter: size]. So, if you have lambda (poisson), you have mu (negbin).

Frank
Dr. Frank Gaeth

Reply | Threaded
Open this post in threaded view
|

Re: [SPSSX-L] Re: [SPSSX-L] Re: Poisson – negative Binomial

drfg2008
In reply to this post by Jon K Peck
I knew it would be somewhat complicated.

The problem comes from the sports betting. A bookmaker wants to determine how high to set the odds in ice hockey. The ratio is calculated as 1 / p. The following is known:

1. expected mean value of goals in a game (μ  or lambda if distribution is Poisson - thank David for the iteration, which works fine)

2.  Threshold k (k = 5 goals)

3. The probability of up to 5 goals in a game: p (x) <= k.

Also, the results (goals) of the past 2.000 games are known.

The estimate to point 3 (probability of up to 5 goals) comes from the market. The market is considered efficient, therefore the best estimate. And hence the slightly angled approach.

Now it is to calculate how high the probability is that more than 4 goals (alternative: six goals) are shot.

With the Poisson distribution it can be easily calculated, because only lambda and the threshold k must be known (both are known). With the Poisson distribution the SPSS function would simply be as follows:

CDF.POISSON (k-1, LAMBDA_F) /* up to 4 Goals
CDF.POISSON (k +1, LAMBDA_F) /*up to 6 Goals

However, the distribution of  goals in ice hockey is not exactly Poisson. It is assumed that the scatter is slightly is too large (over-dispersion). Therefore, a negative binomial function is to be taken (or any other suitable distribution).

The problem with negbin is that the SPSS function requires three parameters here.

CDF.NEGBIN (QUANT, THRESH, PROB)

Although size could be estimated iteratively of prob = size / (size + mu), there would yet remain, the problem to calculate THRESH.

- lost in translation
Dr. Frank Gaeth

Reply | Threaded
Open this post in threaded view
|

Re: [SPSSX-L] Re: [SPSSX-L] Re: Poisson – negative Binomial

Ryan
Frank,

Do you have SAS?

Ryan

On Dec 28, 2011, at 5:26 AM, drfg2008 <[hidden email]> wrote:

> I knew it would be somewhat complicated.
>
> The problem comes from the sports betting. A bookmaker wants to determine
> how high to set the odds in ice hockey. The ratio is calculated as 1 / p.
> The following is known:
>
> 1. expected mean value of goals in a game (μ  or lambda if distribution is
> Poisson - thank David for the iteration, which works fine)
>
> 2.  Threshold k (k = 5 goals)
>
> 3. The probability of up to 5 goals in a game: p (x) <= k.
>
> Also, the results (goals) of the past 2.000 games are known.
>
> The estimate to point 3 (probability of up to 5 goals) comes from the
> market. The market is considered efficient, therefore the best estimate. And
> hence the slightly angled approach.
>
> Now it is to calculate how high the probability is that more than 4 goals
> (alternative: six goals) are shot.
>
> With the Poisson distribution it can be easily calculated, because only
> lambda and the threshold k must be known (both are known). With the Poisson
> distribution the SPSS function would simply be as follows:
>
> CDF.POISSON (k-1, LAMBDA_F) /* up to 4 Goals
> CDF.POISSON (k +1, LAMBDA_F) /*up to 6 Goals
>
> However, the distribution of  goals in ice hockey is not exactly Poisson. It
> is assumed that the scatter is slightly is too large (over-dispersion).
> Therefore, a negative binomial function is to be taken (or any other
> suitable distribution).
>
> The problem with negbin is that the SPSS function requires three parameters
> here.
>
> CDF.NEGBIN (QUANT, THRESH, PROB)
>
> Although size could be estimated iteratively of prob = size / (size + mu),
> there would yet remain, the problem to calculate THRESH.
>
> - lost in translation
>
> -----
> Dr. Frank Gaeth
> FU-Berlin
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Poisson-negative-Binomial-tp5085330p5104981.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: [SPSSX-L] Re: [SPSSX-L] Re: Poisson – negative Binomial

drfg2008
I could get a licence, however since all the programming is done with SPSS, a solution in SPSS would be wonderful.

Frank
Dr. Frank Gaeth

Reply | Threaded
Open this post in threaded view
|

Re: [SPSSX-L] Re: [SPSSX-L] Re: Poisson – negative Binomial

drfg2008
Thank you, Matthew Zack!

However, the point is to get all estimates from the market (not from the data base), because the market is the best estimate (efficiency theorem). That's why I tried to get lambda from the "5 point line" that is the odds for up to 5 goals in one game.

For example: A bookmaker (the market) offers 1,85 Euro if Team1 and Team2 together score not more than 5 goals in a certain game.

With Poisson this is no problem to get from k = 5 (goals), and given the odds (in this example 1,85 -> p(cumulative)= 1/1,85 ~54%) the equvalent lambda. From lambda and k I can compute all other probabilities, for example for "not more than 4 goals" or "not more than 6 goals".

The only problem with Poisson is, that it is slightly "overdispersed", therefore using negbin would be more appropriate. So how can I estimate the negbin function parameters from the market, from where I only know the odds (cumulative p) and k.


---------------------------------------------


Under the negative binomial distribution, the probability of zero goals, P(0),
equals
   
    P(0) = prob**size (using your terminology below).

Since you have the results of 2000 games, you know P(0).  Thus,

    ln P(0) = size*ln(prob) -> size= ln P(0) / ln(prob).

You can then use the following recursive formula to calculate P(1), P(2), etc.:

    P(x+1) = P(x)*(1 - prob)*(x+size)/(x+size-1).

For example,

    P(1) = P(0)*(1 - prob)*(size)/(size-1),

    P(2) = P(1)*(1 - prob)*(1+size)/(size), etc.

You can then calculate the cumulative distribution function values by summing
up these successive values.

Matthew Zack
Dr. Frank Gaeth

Reply | Threaded
Open this post in threaded view
|

Re: [SPSSX-L] Re: [SPSSX-L] Re: Poisson – negative Binomial

Ryan
I was suggesting you consider using SAS because it is capable (via the NLMIXED procedure) of doing exactly what Matthew was suggesting off-list; that is, summing the estimated probabilities. 

Ryan

On Wed, Dec 28, 2011 at 11:12 AM, drfg2008 <[hidden email]> wrote:
Thank you, Matthew Zack!

However, the point is to get all estimates from the market (not from the
data base), because the market is the best estimate (efficiency theorem).
That's why I tried to get lambda from the "5 point line" that is the odds
for up to 5 goals in one game.

For example: A bookmaker (the market) offers 1,85 Euro if Team1 and Team2
together score not more than 5 goals in a certain game.

With Poisson this is no problem to get from k = 5 (goals), and given the
odds (in this example 1,85 -> p(cumulative)= 1/1,85 ~54%) the equvalent
lambda. From lambda and k I can compute all other probabilities, for example
for "not more than 4 goals" or "not more than 6 goals".

The only problem with Poisson is, that it is slightly "overdispersed",
therefore using negbin would be more appropriate. So how can I estimate the
negbin function parameters from the market, from where I only know the odds
(cumulative p) and k.


---------------------------------------------


Under the negative binomial distribution, the probability of zero goals,
P(0),
equals

   P(0) = prob**size (using your terminology below).

Since you have the results of 2000 games, you know P(0).  Thus,

   ln P(0) = size*ln(prob) -> size= ln P(0) / ln(prob).

You can then use the following recursive formula to calculate P(1), P(2),
etc.:

   P(x+1) = P(x)*(1 - prob)*(x+size)/(x+size-1).

For example,

   P(1) = P(0)*(1 - prob)*(size)/(size-1),

   P(2) = P(1)*(1 - prob)*(1+size)/(size), etc.

You can then calculate the cumulative distribution function values by
summing
up these successive values.

Matthew Zack

-----
Dr. Frank Gaeth
FU-Berlin

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Poisson-negative-Binomial-tp5085330p5105563.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: [SPSSX-L] Re: [SPSSX-L] Re: Poisson – negative Binomial

drfg2008
Thanks everybody,

if found a solution at last.

-> using gamma distribution (approximation of the negative binomial distribution), compute scale and shape from µ and variance

scale = µ / variance
shape = µ * scale

---------------------------------------
see:
Schlittgen, Rainer: Statistik (5. Auflage) Oldenburg 1995 S.225

Never imagined to find the solution in my stats books back from college times.

Frank
Dr. Frank Gaeth

Reply | Threaded
Open this post in threaded view
|

Re: [SPSSX-L] Re: [SPSSX-L] Re: Poisson – negative Binomial

Ryan
This thread led me to consider how one might generate data from a negative binomial regression equation using SPSS. I did some research online, and then came up with the simulation code provided below my name in this post. Consider this an initial attempt that might require modifications.

It should be noted that the fixed effects estimates and dispersion parameter estimate obtained from the GENLIN procedure are very close to the values used in the simulation code. 

Hope this is of interest to others.

Ryan
--

*Generate Data.
set seed 89765432.
new file.
inp pro.

loop ID= 1 to 100000.

   comp b0 = 1.2.
   comp b1 = -1.8.
   comp x = rv.normal(0,1).
   comp lambda = exp(b0 + b1*x).
   comp shape = 0.8.
   comp dispersion = 1 / shape.
   comp scale = lambda / shape.
   comp mean  = rv.gamma(shape, 1/scale).
   comp y = rv.poisson(mean).

   end case.
 end loop.
end file.
end inp pro.
exe.

delete variables lambda shape scale mean.

*Fit Model.
GENLIN y WITH x
  /MODEL x INTERCEPT=YES
 DISTRIBUTION=NEGBIN(MLE) LINK=LOG
  /PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION (EXPONENTIATED).

On Thu, Dec 29, 2011 at 11:41 AM, drfg2008 <[hidden email]> wrote:
Thanks everybody,

if found a solution at last.

-> using gamma distribution (approximation of the negative binomial
distribution), compute scale and shape from µ and variance

scale = µ / variance
shape = µ * scale

---------------------------------------
see:
Schlittgen, Rainer: Statistik (5. Auflage) Oldenburg 1995 S.225

Never imagined to find the solution in my stats books back from college
times.

Frank


-----
Dr. Frank Gaeth
FU-Berlin

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Poisson-negative-Binomial-tp5085330p5108065.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Automatic reply: [SPSSX-L] Re: [SPSSX-L] Re: Poisson – negative Binomial

Sarraf, Shimon Aaron

I will be out of the office until Monday, January 9. If you need immediate assistance, please call 812-856-5824. I will respond to your e-mail when I return to the office.

 

Thank you,

Shimon Sarraf

Center for Postsecondary Research

Indiana University Bloomington