Negative Binomial Regression

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Negative Binomial Regression

Kathryn Gardner

Dear List,
I am using negative binomial regression and would appreciate some input on how to run and interpret the analysis. Excuse my ignorance, but I am struggling to find information:

Running the analysis:
1) Under "type of model", what's the difference in selecting the "negative binomial with log link" model with the parameter fixed at 1, or a custom model where the paramater is estimated? Which is the correct option and under what circumstances?
2) I tried both of the options above, and in  both cases the deviance/df ratio was below 1, suggesting underdispersion I believe. I read something in the context of Poission regression that suggests this can be resolved by calculating the scale as the inverse of the Deviance/df: Compute pscale=1/2.2033  then refitting the model using pscale as the "Scale Weight Variable" under the Response tab. Can this be used in Negative Binomial regression also? And if so, what does this do exactly?
3) I wondered whether a zero inflated negative binomial regression might be more appropriate but i'm not sure. My response is an eating disorder variable, and 198 out of 235 participants have scored 0 to indicate they do not engage in purging behaviours such as vomiting, and the remainder of the sample have scored between 1 and 21 (Mean = .71, SD = 2.51). If a zero inflated model is more appropriate, how do I run this in SPSS?

Interpreting the output:
1) Is there any guidance on how small deviance should be to indicate good model fit?
2) The regression coefficients in the output appear to be unstandardised in the output. Is there a way to produce standardised estimates other than standardising variables beforehand?


Many thanks.
Kathryn

Reply | Threaded
Open this post in threaded view
|

Re: Negative Binomial Regression

Ryan
Kathryn,

Lots could be said but before we get too far down this road, exactly what is your response variable (e.g.,  # of times or days purged) and is there an absolute upper limit. Also, what does the shape of the distribution look like for the non-zero values? Another question, could it be argued that everybody in your sample is "at risk" of scoring a 1 or higher?

Ryan

On Nov 10, 2011, at 5:24 AM, Kathryn Gardner <[hidden email]> wrote:

Dear List,
I am using negative binomial regression and would appreciate some input on how to run and interpret the analysis. Excuse my ignorance, but I am struggling to find information:

Running the analysis:
1) Under "type of model", what's the difference in selecting the "negative binomial with log link" model with the parameter fixed at 1, or a custom model where the paramater is estimated? Which is the correct option and under what circumstances?
2) I tried both of the options above, and in  both cases the deviance/df ratio was below 1, suggesting underdispersion I believe. I read something in the context of Poission regression that suggests this can be resolved by calculating the scale as the inverse of the Deviance/df: Compute pscale=1/2.2033  then refitting the model using pscale as the "Scale Weight Variable" under the Response tab. Can this be used in Negative Binomial regression also? And if so, what does this do exactly?
3) I wondered whether a zero inflated negative binomial regression might be more appropriate but i'm not sure. My response is an eating disorder variable, and 198 out of 235 participants have scored 0 to indicate they do not engage in purging behaviours such as vomiting, and the remainder of the sample have scored between 1 and 21 (Mean = .71, SD = 2.51). If a zero inflated model is more appropriate, how do I run this in SPSS?

Interpreting the output:
1) Is there any guidance on how small deviance should be to indicate good model fit?
2) The regression coefficients in the output appear to be unstandardised in the output. Is there a way to produce standardised estimates other than standardising variables beforehand?


Many thanks.
Kathryn

Reply | Threaded
Open this post in threaded view
|

Re: Negative Binomial Regression

Ryan
Kathryn,

If you take the mean of 3 questions, wouldn't it most certainly be possible to end up with non-integer values? I do not see how a count regression model, poisson or negative binomial, would be appropriate here. 

By the way, although you stated that "the distributions are be shown below," I do not see any illustration. Anyway, let's assume for the moment that you do in fact have count data for your dependent variable that could range from 0 to positive infinity.

As I'm sure you are aware, a Poisson regression assumes the expected value is equal to the variance. Often, this assumption is not tenable; that is, there is a greater variance than the expected value (overdispersion). Of course, there is the possibility of underdispersion as well. Determining the cause of overdispersion or underdispersion is not always straightforward, and there are several ways to account for it, one of which is to fit a model which relaxes the assumption that the mean equals the variance (a.k.a. negative binomial). 

One could fit a standard Poisson regression and fix the scale parameter to 1.0 via GENLIN, and then fit a negative binomial model which allows the scale parameter to be freely estimated. In addition to examining the scale parameter, since the Poisson regression is nested in the negative binomial regression, one could also construct a likelihood ratio test. There are other options in SPSS that I'll skip over for the moment (e.g., fitting an overdispersed Poisson regression model--specifying a Poisson regression in GENLIN but then allowing the scale parameter to be estimated).

There are still many questions that remain unanswered. First and foremost, do you actually have count data? Second, what do those distributions look like. If you have a spike at zero for both with right skew for the positive values, then you might consider a zero-inflated model. Unfortunately, as far as I'm aware, SPSS is not capable of fitting zero-inflated poisson or negative binomial models. Third, it sounds like there is an absolute upper limit for your dependent variables. This is not consistent with the Poisson or NB distributions which would likely assume a non-zero probability of obtaining values greater than the actual max value. Fourth, from your description, I wouldn't be surprised if both dependent variables are correlated. Fitting a single model which allows for correlation between the dependent variables might be considered.

Ryan 

On Mon, Nov 21, 2011 at 12:33 PM, Kathryn Gardner <[hidden email]> wrote:
Thank you for your response Ryan, and apologies about the delay in getting back to you. I have 2 response variables I want to run an NBR on: 1) Purging is measured using the mean of 3 questions that ask about the number of times purged on average during the past few months, from 0-14, AND 2) Binge eating is measured using the mean of 3 questions, two of which use a dichotomous yes (1)/ no (0) response, and the third uses a 0-14 scale as purging does.

The distributions are shown below. If I remove the non-zero values it confirms the picture below, which is moderate positive skewness and kurtosis for both, although the values are below 3. 

In answer to your question, " could it be argued that everybody in your sample is "at risk" of scoring a 1 or higher?" the answer is no. They would need to score higher than this to be considered at risk of eating problems.

Many thanks! Hope you can shed some light on these issues for me.
Kathryn





Date: Fri, 11 Nov 2011 00:05:19 -0500
From: [hidden email]
Subject: Re: Negative Binomial Regression
To: [hidden email]


Kathryn,

Lots could be said but before we get too far down this road, exactly what is your response variable (e.g.,  # of times or days purged) and is there an absolute upper limit. Also, what does the shape of the distribution look like for the non-zero values? Another question, could it be argued that everybody in your sample is "at risk" of scoring a 1 or higher?

Ryan

On Nov 10, 2011, at 5:24 AM, Kathryn Gardner <[hidden email]> wrote:

Dear List,
I am using negative binomial regression and would appreciate some input on how to run and interpret the analysis. Excuse my ignorance, but I am struggling to find information:

Running the analysis:
1) Under "type of model", what's the difference in selecting the "negative binomial with log link" model with the parameter fixed at 1, or a custom model where the paramater is estimated? Which is the correct option and under what circumstances?
2) I tried both of the options above, and in  both cases the deviance/df ratio was below 1, suggesting underdispersion I believe. I read something in the context of Poission regression that suggests this can be resolved by calculating the scale as the inverse of the Deviance/df: Compute pscale=1/2.2033  then refitting the model using pscale as the "Scale Weight Variable" under the Response tab. Can this be used in Negative Binomial regression also? And if so, what does this do exactly?
3) I wondered whether a zero inflated negative binomial regression might be more appropriate but i'm not sure. My response is an eating disorder variable, and 198 out of 235 participants have scored 0 to indicate they do not engage in purging behaviours such as vomiting, and the remainder of the sample have scored between 1 and 21 (Mean = .71, SD = 2.51). If a zero inflated model is more appropriate, how do I run this in SPSS?

Interpreting the output:
1) Is there any guidance on how small deviance should be to indicate good model fit?
2) The regression coefficients in the output appear to be unstandardised in the output. Is there a way to produce standardised estimates other than standardising variables beforehand?


Many thanks.
Kathryn


Reply | Threaded
Open this post in threaded view
|

Re: Negative Binomial Regression

Kathryn Gardner
sorry I meant the sum of the items, not the mean, so both variables include only integers. Both distributions have a spike at zero with right skew for the positive values, hence the reason I thought I needed a zero-inflated model, but you have confirmed my suspicions that this is not available in SPSS. That said, there probably is an absolute upper limit for the dependent variable given the time frame of the past 3 months (there is probably a max. no. of times a person could engage in these behaviours, even if they were at the extreme end). In that case NBR isn't the right model then it seems. I hadn't considered this. I did want to run OLS regression, but they were so skewed and overdispersed with lots of zeros that I had to look into other procedures. I did try statistically transforming both variables, but the transformations did not correct the problem.
Kathryn


Date: Fri, 25 Nov 2011 07:58:55 -0500
From: [hidden email]
Subject: Re: Negative Binomial Regression
To: [hidden email]

Kathryn,

If you take the mean of 3 questions, wouldn't it most certainly be possible to end up with non-integer values? I do not see how a count regression model, poisson or negative binomial, would be appropriate here. 

By the way, although you stated that "the distributions are be shown below," I do not see any illustration. Anyway, let's assume for the moment that you do in fact have count data for your dependent variable that could range from 0 to positive infinity.

As I'm sure you are aware, a Poisson regression assumes the expected value is equal to the variance. Often, this assumption is not tenable; that is, there is a greater variance than the expected value (overdispersion). Of course, there is the possibility of underdispersion as well. Determining the cause of overdispersion or underdispersion is not always straightforward, and there are several ways to account for it, one of which is to fit a model which relaxes the assumption that the mean equals the variance (a.k.a. negative binomial). 

One could fit a standard Poisson regression and fix the scale parameter to 1.0 via GENLIN, and then fit a negative binomial model which allows the scale parameter to be freely estimated. In addition to examining the scale parameter, since the Poisson regression is nested in the negative binomial regression, one could also construct a likelihood ratio test. There are other options in SPSS that I'll skip over for the moment (e.g., fitting an overdispersed Poisson regression model--specifying a Poisson regression in GENLIN but then allowing the scale parameter to be estimated).

There are still many questions that remain unanswered. First and foremost, do you actually have count data? Second, what do those distributions look like. If you have a spike at zero for both with right skew for the positive values, then you might consider a zero-inflated model. Unfortunately, as far as I'm aware, SPSS is not capable of fitting zero-inflated poisson or negative binomial models. Third, it sounds like there is an absolute upper limit for your dependent variables. This is not consistent with the Poisson or NB distributions which would likely assume a non-zero probability of obtaining values greater than the actual max value. Fourth, from your description, I wouldn't be surprised if both dependent variables are correlated. Fitting a single model which allows for correlation between the dependent variables might be considered.

Ryan 

On Mon, Nov 21, 2011 at 12:33 PM, Kathryn Gardner <[hidden email]> wrote:
Thank you for your response Ryan, and apologies about the delay in getting back to you. I have 2 response variables I want to run an NBR on: 1) Purging is measured using the mean of 3 questions that ask about the number of times purged on average during the past few months, from 0-14, AND 2) Binge eating is measured using the mean of 3 questions, two of which use a dichotomous yes (1)/ no (0) response, and the third uses a 0-14 scale as purging does.

The distributions are shown below. If I remove the non-zero values it confirms the picture below, which is moderate positive skewness and kurtosis for both, although the values are below 3. 

In answer to your question, " could it be argued that everybody in your sample is "at risk" of scoring a 1 or higher?" the answer is no. They would need to score higher than this to be considered at risk of eating problems.

Many thanks! Hope you can shed some light on these issues for me.
Kathryn





Date: Fri, 11 Nov 2011 00:05:19 -0500
From: [hidden email]
Subject: Re: Negative Binomial Regression
To: [hidden email]


Kathryn,

Lots could be said but before we get too far down this road, exactly what is your response variable (e.g.,  # of times or days purged) and is there an absolute upper limit. Also, what does the shape of the distribution look like for the non-zero values? Another question, could it be argued that everybody in your sample is "at risk" of scoring a 1 or higher?

Ryan

On Nov 10, 2011, at 5:24 AM, Kathryn Gardner <[hidden email]> wrote:

Dear List,
I am using negative binomial regression and would appreciate some input on how to run and interpret the analysis. Excuse my ignorance, but I am struggling to find information:

Running the analysis:
1) Under "type of model", what's the difference in selecting the "negative binomial with log link" model with the parameter fixed at 1, or a custom model where the paramater is estimated? Which is the correct option and under what circumstances?
2) I tried both of the options above, and in  both cases the deviance/df ratio was below 1, suggesting underdispersion I believe. I read something in the context of Poission regression that suggests this can be resolved by calculating the scale as the inverse of the Deviance/df: Compute pscale=1/2.2033  then refitting the model using pscale as the "Scale Weight Variable" under the Response tab. Can this be used in Negative Binomial regression also? And if so, what does this do exactly?
3) I wondered whether a zero inflated negative binomial regression might be more appropriate but i'm not sure. My response is an eating disorder variable, and 198 out of 235 participants have scored 0 to indicate they do not engage in purging behaviours such as vomiting, and the remainder of the sample have scored between 1 and 21 (Mean = .71, SD = 2.51). If a zero inflated model is more appropriate, how do I run this in SPSS?

Interpreting the output:
1) Is there any guidance on how small deviance should be to indicate good model fit?
2) The regression coefficients in the output appear to be unstandardised in the output. Is there a way to produce standardised estimates other than standardising variables beforehand?


Many thanks.
Kathryn