Fitting negative binomial regression to continuous data

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Fitting negative binomial regression to continuous data

Natalia
Hello,

I am struggling with fitting the model to overdispersed (positively skewed) data in SPSS and want to ask for your opinion.

I measured for how long a specific behavior (B in [s]) lasted in tested subjects during a fixed time of observation.
There are two independent variables/predictors, i.e., subject's sex (S: male or female) and genotype (G: 1 or 2).

My research question is whether the subject's sex or genotype affects the duration of behavior B and whether the genotype modulates sex's effect.

EXP: B ~ S + G + S*G

My data do not follow the assumptions of the general linear model, so I decided to go with generalized linear models
(as far as I know, regular, non-parametric tests cannot estimate the factors' interaction, in which I am interested).

I cannot use GLMs with gamma distribution since behavior B did not appear for many subjects (B = 0 s), yet these cases are relevant for my experiment.

I decided to try GLMs Poisson and then ZIP, but they do not fit data appropriately. The best fit had GLMs negative binomial regression, and here is my question:

My data for B is a continuous variable (time measured in [s]). For the sake of my experiment, I can use the integer values (i.e., I can substitute 30,35 s --> 31 s)
but is this the only available approach for me to use NBR, and is it legitimate in your opinion?

Have you any other ideas on how I can handle this design and data to estimate S*G interaction?

I will genuinely appreciate your feedback.

Best regards,


Natalia

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Fitting negative binomial regression to continuous data

David Greenberg
Natalia, to me your plan sounds wacky, Why not do an event history,
restricting your analysis to those cases that exhibited the phenomenon
whose uration you want to study? It makes no sense to include cases
that did not exhibit the phenomenon in a study of how long the
phenomenon lasted.I do not know what SPSS offers for doing even
history, I would expect as a minimum that it would have Cox
regression. Stata also has a number of different parametric even
history models. David Greenberg, sociology  Dept., NYU

On Fri, Mar 12, 2021 at 7:10 PM Natalia <[hidden email]> wrote:

>
> Hello,
>
> I am struggling with fitting the model to overdispersed (positively skewed) data in SPSS and want to ask for your opinion.
>
> I measured for how long a specific behavior (B in [s]) lasted in tested subjects during a fixed time of observation.
> There are two independent variables/predictors, i.e., subject's sex (S: male or female) and genotype (G: 1 or 2).
>
> My research question is whether the subject's sex or genotype affects the duration of behavior B and whether the genotype modulates sex's effect.
>
> EXP: B ~ S + G + S*G
>
> My data do not follow the assumptions of the general linear model, so I decided to go with generalized linear models
> (as far as I know, regular, non-parametric tests cannot estimate the factors' interaction, in which I am interested).
>
> I cannot use GLMs with gamma distribution since behavior B did not appear for many subjects (B = 0 s), yet these cases are relevant for my experiment.
>
> I decided to try GLMs Poisson and then ZIP, but they do not fit data appropriately. The best fit had GLMs negative binomial regression, and here is my question:
>
> My data for B is a continuous variable (time measured in [s]). For the sake of my experiment, I can use the integer values (i.e., I can substitute 30,35 s --> 31 s)
> but is this the only available approach for me to use NBR, and is it legitimate in your opinion?
>
> Have you any other ideas on how I can handle this design and data to estimate S*G interaction?
>
> I will genuinely appreciate your feedback.
>
> Best regards,
>
>
> Natalia
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Fitting negative binomial regression to continuous data

Bruce Weaver
Administrator
Hi David.  Your suggestion of Cox regression makes me wonder if you read
Natalia's post the same way I did.  I understood that she was measuring the
~duration~ of some behaviour, not time to onset of the behaviour (i.e., time
to event).  Cox regression would be appropriate for the latter, but I don't
know how one would use it for the former.  

Your comments did make me think that one approach would be something similar
to a hurdle model, as follows:

1. Binary logistic regression model with Y = occurrence of the
behaviour/phenomenon.

2. Some kind of model with Y = duration in seconds using only those
observations for which the phenomenon occurred.  

When the stage 2 model includes only the cases with the phenomenon of
interest, it may be that an OLS model is fine.  But some other type of model
could be used if OLS is not reasonable and defensible.  Personally, I might
be inclined to use quantile regression.  Another possibility, given that the
length of the period of observation is fixed, would be to treat Y as a
proportion of the total time.  In Stata, one could use -betareg- or
-fracreg- for that type of outcome.  I think one could achieve something
similar with GENLIN in SPSS, perhaps using the events-of-trials
specification for the outcome, but "events" = the duration of the behaviour,
and "trials" being the total time of observation.  

By the way, Natalia, what would the sample sizes be for the two stages I
described above?  

Cheers,
Bruce


David Greenberg wrote
> Natalia, to me your plan sounds wacky, Why not do an event history,
> restricting your analysis to those cases that exhibited the phenomenon
> whose uration you want to study? It makes no sense to include cases
> that did not exhibit the phenomenon in a study of how long the
> phenomenon lasted.I do not know what SPSS offers for doing even
> history, I would expect as a minimum that it would have Cox
> regression. Stata also has a number of different parametric even
> history models. David Greenberg, sociology  Dept., NYU





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Fitting negative binomial regression to continuous data

David Greenberg
Bruce, I understand Natalia's post the same way you do, but would
stand by my suggestion If she has a start time  and an end time, she
can use an event history approach to study the factors that influence
the  duration of the phenomenon. David Greenberg

On Fri, Mar 12, 2021 at 8:42 PM Bruce Weaver <[hidden email]> wrote:

>
> Hi David.  Your suggestion of Cox regression makes me wonder if you read
> Natalia's post the same way I did.  I understood that she was measuring the
> ~duration~ of some behaviour, not time to onset of the behaviour (i.e., time
> to event).  Cox regression would be appropriate for the latter, but I don't
> know how one would use it for the former.
>
> Your comments did make me think that one approach would be something similar
> to a hurdle model, as follows:
>
> 1. Binary logistic regression model with Y = occurrence of the
> behaviour/phenomenon.
>
> 2. Some kind of model with Y = duration in seconds using only those
> observations for which the phenomenon occurred.
>
> When the stage 2 model includes only the cases with the phenomenon of
> interest, it may be that an OLS model is fine.  But some other type of model
> could be used if OLS is not reasonable and defensible.  Personally, I might
> be inclined to use quantile regression.  Another possibility, given that the
> length of the period of observation is fixed, would be to treat Y as a
> proportion of the total time.  In Stata, one could use -betareg- or
> -fracreg- for that type of outcome.  I think one could achieve something
> similar with GENLIN in SPSS, perhaps using the events-of-trials
> specification for the outcome, but "events" = the duration of the behaviour,
> and "trials" being the total time of observation.
>
> By the way, Natalia, what would the sample sizes be for the two stages I
> described above?
>
> Cheers,
> Bruce
>
>
> David Greenberg wrote
> > Natalia, to me your plan sounds wacky, Why not do an event history,
> > restricting your analysis to those cases that exhibited the phenomenon
> > whose uration you want to study? It makes no sense to include cases
> > that did not exhibit the phenomenon in a study of how long the
> > phenomenon lasted.I do not know what SPSS offers for doing even
> > history, I would expect as a minimum that it would have Cox
> > regression. Stata also has a number of different parametric even
> > history models. David Greenberg, sociology  Dept., NYU
>
>
>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> https://urldefense.proofpoint.com/v2/url?u=http-3A__sites.google.com_a_lakeheadu.ca_bweaver_&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=hjZ7i9ec0QWFsbuAR5BUwQ&m=V2TlLRM7r0-ZouYrFDLXNR8rv6wMCbWwWWzQLb0aKhU&s=LV_Fw3jSBweMhhOicxNY9eF77xJ93ZtB0V1sh_Xbxo0&e=
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> Sent from: https://urldefense.proofpoint.com/v2/url?u=http-3A__spssx-2Ddiscussion.1045642.n5.nabble.com_&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=hjZ7i9ec0QWFsbuAR5BUwQ&m=V2TlLRM7r0-ZouYrFDLXNR8rv6wMCbWwWWzQLb0aKhU&s=-B_yWuUUbZ_-GnSbXy6Qq6UoFysnT6kgxIoB8OXmK_Q&e=
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Fitting negative binomial regression to continuous data

Jon Peck
In reply to this post by Bruce Weaver
Another approach to consider might be Heckman regression, which is available in Statistics as an extension command.

On Fri, Mar 12, 2021 at 6:42 PM Bruce Weaver <[hidden email]> wrote:
Hi David.  Your suggestion of Cox regression makes me wonder if you read
Natalia's post the same way I did.  I understood that she was measuring the
~duration~ of some behaviour, not time to onset of the behaviour (i.e., time
to event).  Cox regression would be appropriate for the latter, but I don't
know how one would use it for the former. 

Your comments did make me think that one approach would be something similar
to a hurdle model, as follows:

1. Binary logistic regression model with Y = occurrence of the
behaviour/phenomenon.

2. Some kind of model with Y = duration in seconds using only those
observations for which the phenomenon occurred. 

When the stage 2 model includes only the cases with the phenomenon of
interest, it may be that an OLS model is fine.  But some other type of model
could be used if OLS is not reasonable and defensible.  Personally, I might
be inclined to use quantile regression.  Another possibility, given that the
length of the period of observation is fixed, would be to treat Y as a
proportion of the total time.  In Stata, one could use -betareg- or
-fracreg- for that type of outcome.  I think one could achieve something
similar with GENLIN in SPSS, perhaps using the events-of-trials
specification for the outcome, but "events" = the duration of the behaviour,
and "trials" being the total time of observation. 

By the way, Natalia, what would the sample sizes be for the two stages I
described above? 

Cheers,
Bruce


David Greenberg wrote
> Natalia, to me your plan sounds wacky, Why not do an event history,
> restricting your analysis to those cases that exhibited the phenomenon
> whose uration you want to study? It makes no sense to include cases
> that did not exhibit the phenomenon in a study of how long the
> phenomenon lasted.I do not know what SPSS offers for doing even
> history, I would expect as a minimum that it would have Cox
> regression. Stata also has a number of different parametric even
> history models. David Greenberg, sociology  Dept., NYU





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Fitting negative binomial regression to continuous data

Rich Ulrich
In reply to this post by Natalia
The notion of a GLM is that there is /some/ equation with linear terms
that describe the phenomenon ... with the existence of equal intervals
in the predictor equation having an "equal effect" when accounted for
by some transformed metric and specific computations of error of fit.

One straightforward approach to seeing what is there, at all, is to make it
two questions -- 0 vs. other, and predicting quantity among "other."  This
is a good thing to look at because, what you get from any other approach
that has a single answer will be some weighted composite of those two
answers.  You have only two predictors, so it is not so obvious as it would be
with five or ten, that you may have solutions that are silly to combine into
one equation.

Where duration is not zero, what does its distribution look like?   This is, 
you say, of an event duration in a fixed interval.  What jumps out at me is the
prospect that your "event length" might deserve treatment as logistic, bounded
at zero and max.  What can you say about event length, for or against that?

DO YOU expect that the same things (in general) that predict length should
predict the event? 

Are you looking at something that does have some strong and obvious effects
which you are trying to fit to a model, or are you just scrambling?

--
Rich Ulrich

From: SPSSX(r) Discussion <[hidden email]> on behalf of Natalia <[hidden email]>
Sent: Friday, March 12, 2021 2:17 AM
To: [hidden email] <[hidden email]>
Subject: Fitting negative binomial regression to continuous data
 
Hello,

I am struggling with fitting the model to overdispersed (positively skewed) data in SPSS and want to ask for your opinion.

I measured for how long a specific behavior (B in [s]) lasted in tested subjects during a fixed time of observation.
There are two independent variables/predictors, i.e., subject's sex (S: male or female) and genotype (G: 1 or 2).

My research question is whether the subject's sex or genotype affects the duration of behavior B and whether the genotype modulates sex's effect.

EXP: B ~ S + G + S*G

My data do not follow the assumptions of the general linear model, so I decided to go with generalized linear models
(as far as I know, regular, non-parametric tests cannot estimate the factors' interaction, in which I am interested).

I cannot use GLMs with gamma distribution since behavior B did not appear for many subjects (B = 0 s), yet these cases are relevant for my experiment.

I decided to try GLMs Poisson and then ZIP, but they do not fit data appropriately. The best fit had GLMs negative binomial regression, and here is my question:

My data for B is a continuous variable (time measured in [s]). For the sake of my experiment, I can use the integer values (i.e., I can substitute 30,35 s --> 31 s)
but is this the only available approach for me to use NBR, and is it legitimate in your opinion?

Have you any other ideas on how I can handle this design and data to estimate S*G interaction?

I will genuinely appreciate your feedback.

Best regards,


Natalia

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Fitting negative binomial regression to continuous data

Jon Peck
Heckman censored regression, is a generalization of tobit regression. Censoring is modeled using probit analysis, and the observed outcomes are modeled with regression

On Fri, Mar 12, 2021 at 11:07 PM Rich Ulrich <[hidden email]> wrote:
The notion of a GLM is that there is /some/ equation with linear terms
that describe the phenomenon ... with the existence of equal intervals
in the predictor equation having an "equal effect" when accounted for
by some transformed metric and specific computations of error of fit.

One straightforward approach to seeing what is there, at all, is to make it
two questions -- 0 vs. other, and predicting quantity among "other."  This
is a good thing to look at because, what you get from any other approach
that has a single answer will be some weighted composite of those two
answers.  You have only two predictors, so it is not so obvious as it would be
with five or ten, that you may have solutions that are silly to combine into
one equation.

Where duration is not zero, what does its distribution look like?   This is, 
you say, of an event duration in a fixed interval.  What jumps out at me is the
prospect that your "event length" might deserve treatment as logistic, bounded
at zero and max.  What can you say about event length, for or against that?

DO YOU expect that the same things (in general) that predict length should
predict the event? 

Are you looking at something that does have some strong and obvious effects
which you are trying to fit to a model, or are you just scrambling?

--
Rich Ulrich

From: SPSSX(r) Discussion <[hidden email]> on behalf of Natalia <[hidden email]>
Sent: Friday, March 12, 2021 2:17 AM
To: [hidden email] <[hidden email]>
Subject: Fitting negative binomial regression to continuous data
 
Hello,

I am struggling with fitting the model to overdispersed (positively skewed) data in SPSS and want to ask for your opinion.

I measured for how long a specific behavior (B in [s]) lasted in tested subjects during a fixed time of observation.
There are two independent variables/predictors, i.e., subject's sex (S: male or female) and genotype (G: 1 or 2).

My research question is whether the subject's sex or genotype affects the duration of behavior B and whether the genotype modulates sex's effect.

EXP: B ~ S + G + S*G

My data do not follow the assumptions of the general linear model, so I decided to go with generalized linear models
(as far as I know, regular, non-parametric tests cannot estimate the factors' interaction, in which I am interested).

I cannot use GLMs with gamma distribution since behavior B did not appear for many subjects (B = 0 s), yet these cases are relevant for my experiment.

I decided to try GLMs Poisson and then ZIP, but they do not fit data appropriately. The best fit had GLMs negative binomial regression, and here is my question:

My data for B is a continuous variable (time measured in [s]). For the sake of my experiment, I can use the integer values (i.e., I can substitute 30,35 s --> 31 s)
but is this the only available approach for me to use NBR, and is it legitimate in your opinion?

Have you any other ideas on how I can handle this design and data to estimate S*G interaction?

I will genuinely appreciate your feedback.

Best regards,


Natalia

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD