SPSSX Discussion

Poisson assumptions, hessian matrix

Classic

List

Threaded

19 messages Options

lken

Poisson assumptions, hessian matrix

Hello everyone,

I have a number of things that I am stuck on in the poisson distribution, I'm hoping I can get some help.

I am conducting a poisson distribribution with continuous data-the amount of rot in trees in three different areas. I want to find the difference between the areas.

My data had many zeros in it, so I added a constant (1) and log-transformed it, then fit it to a poisson regression. My Chi-squared statistic is 0.678, which I have read is acceptable.

Here are my main questions:

Can I proceed with a poisson distribution if I get the warning "Hessian Matrix is singular, some convergence criteria are not met."

How else do I check if my data can be used for a poisson regression?

Any secondly, givin that I am trying to find differences between groups, how would I report my results? Can I ask for pairwise comparisons in the GLM and just report those with standard error?

Poes, Matthew Joseph-2

Re: Poisson assumptions, hessian matrix

If you log transformed it, isn't it no longer Poisson distributed? Didn't the log transformation convert it to a normal distribution? My only work with Poisson data is with counts of something, often money. In that case, if I take the log transformation, then I don't do a Poisson regression, as the data is normally distributed (something you should test and verify). If I am only working with data that is Poisson distributed, then I may do a Poisson regression. I'll say I'm no expert in this particular issue, my experience with data that had this issue has been limited, so I only recently (maybe last year or two) had learned of ways to deal with it, and largely it was learning ways of making it work with linear models.

Matthew J Poes
Research Data Specialist
Center for Prevention Research and Development
University of Illinois
510 Devonshire Dr.
Champaign, IL 61820
Phone: 217-265-4576
email: [hidden email]

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of lken
Sent: Monday, April 02, 2012 1:19 PM
To: [hidden email]
Subject: Poisson assumptions, hessian matrix

Hello everyone,

I have a number of things that I am stuck on in the poisson distribution, I'm hoping I can get some help.

I am conducting a poisson distribribution with continuous data-the amount of rot in trees in three different areas. I want to find the difference between the areas.

My data had many zeros in it, so I added a constant (1) and log-transformed it, then fit it to a poisson regression. My Chi-squared statistic is 0.678, which I have read is acceptable.

Here are my main questions:

Can I proceed with a poisson distribution if I get the warning "Hessian Matrix is singular, some convergence criteria are not met."

How else do I check if my data can be used for a poisson regression?

Any secondly, givin that I am trying to find differences between groups, how would I report my results? Can I ask for pairwise comparisons in the GLM and just report those with standard error?

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Poisson-assumptions-hessian-matrix-tp5613256p5613256.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

lken

Re: Poisson assumptions, hessian matrix

Thanks for the quick reply.

The log transformation did not make my data normal because my data basically range from 1-6, with a few larger values (10-20). The log transformation tends to only work when data range farther than that.

Additionally, about 45% of my response data are zeros. When I log transformed my data (after I had added a constant of 1 to all my points), the zeros became 0 again.

Log10(1)=0

Does youof a better way I can find differences between groups when I have many zeros in a continuous dataset?

Poes, Matthew Joseph-2

Re: Poisson assumptions, hessian matrix

What about using a ranking method instead? Kruskal-Wallace or something like that?

Matthew J Poes
Research Data Specialist
Center for Prevention Research and Development
University of Illinois
510 Devonshire Dr.
Champaign, IL 61820
Phone: 217-265-4576
email: [hidden email]

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of lken
Sent: Monday, April 02, 2012 1:48 PM
To: [hidden email]
Subject: Re: Poisson assumptions, hessian matrix

Thanks for the quick reply.

The log transformation did not make my data normal because my data basically range from 1-6, with a few larger values (10-20). The log transformation tends to only work when data range farther than that.

Additionally, about 45% of my response data are zeros. When I log transformed my data (after I had added a constant of 1 to all my points), the zeros became 0 again.

Log10(1)=0

Does youof a better way I can find differences between groups when I have many zeros in a continuous dataset?

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Poisson-assumptions-hessian-matrix-tp5613256p5613331.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Kornbrot, Diana

Re: Poisson assumptions, hessian matrix

In reply to this post by lken

ordinal regression is much best for this kid of data
best
diana

Sent from my iPhone

On 2 Apr 2012, at 07:49 PM, "lken" <[hidden email]> wrote:

> Thanks for the quick reply.
>
> The log transformation did not make my data normal because my data basically
> range from 1-6, with a few larger values (10-20). The log transformation
> tends to only work when data range farther than that.
>
> Additionally, about 45% of my response data are zeros. When I log
> transformed my data (after I had added a constant of 1 to all my points),
> the zeros became 0 again.
>
> Log10(1)=0
>
> Does youof a better way I can find differences between groups when I have
> many zeros in a continuous dataset?
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Poisson-assumptions-hessian-matrix-tp5613256p5613331.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

lken

Re: Poisson assumptions, hessian matrix

In reply to this post by Poes, Matthew Joseph-2

KS would work, but I also have covariates that I want to add into the model. I have just made another post that changes my analyses methods a bit. Maybe someone has an idea there.

Garry Gelade

Re: Poisson assumptions, hessian matrix

In reply to this post by Poes, Matthew Joseph-2

Why don't you explore the Generalized linear model? (GENLIN). This can
handle Poisson distribution without the need to transfrom the data, and if
you have an excess of zeros, you can use the Negative Binomial option
instead of the Poisson.

Garry Gelade

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Poes, Matthew Joseph
Sent: 02 April 2012 19:27
To: [hidden email]
Subject: Re: Poisson assumptions, hessian matrix

If you log transformed it, isn't it no longer Poisson distributed? Didn't
the log transformation convert it to a normal distribution? My only work
with Poisson data is with counts of something, often money. In that case,
if I take the log transformation, then I don't do a Poisson regression, as
the data is normally distributed (something you should test and verify). If
I am only working with data that is Poisson distributed, then I may do a
Poisson regression. I'll say I'm no expert in this particular issue, my
experience with data that had this issue has been limited, so I only
recently (maybe last year or two) had learned of ways to deal with it, and
largely it was learning ways of making it work with linear models.

Matthew J Poes
Research Data Specialist
Center for Prevention Research and Development University of Illinois
510 Devonshire Dr.
Champaign, IL 61820
Phone: 217-265-4576
email: [hidden email]

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
lken
Sent: Monday, April 02, 2012 1:19 PM
To: [hidden email]
Subject: Poisson assumptions, hessian matrix

Hello everyone,

I have a number of things that I am stuck on in the poisson distribution,
I'm hoping I can get some help.

I am conducting a poisson distribribution with continuous data-the amount of
rot in trees in three different areas. I want to find the difference between
the areas.

My data had many zeros in it, so I added a constant (1) and log-transformed
it, then fit it to a poisson regression. My Chi-squared statistic is 0.678,
which I have read is acceptable.

Here are my main questions:

Can I proceed with a poisson distribution if I get the warning "Hessian
Matrix is singular, some convergence criteria are not met."

How else do I check if my data can be used for a poisson regression?

Any secondly, givin that I am trying to find differences between groups, how
would I report my results? Can I ask for pairwise comparisons in the GLM and
just report those with standard error?

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Poisson-assumptions-hessian-ma
trix-tp5613256p5613256.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

lken

Re: Poisson assumptions, hessian matrix

My data do not fit a poisson distribution because they are overdispersed. I would like to use a negative binomial distribution but I believe that is only for count data correct? My data is continuous.

My covariates are also pretty far from normally distributed (especially year since death because it was taken for a site, not for each particular tree). There doesn't seem to be a transformation that will change this. Does anyone know if a logistic regression is robust enough that I can still use year of death as a covariate?

Poes, Matthew Joseph-2

Re: Poisson assumptions, hessian matrix

I'm not sure I understand why that would be a problem? What is your concern?

Matthew J Poes
Research Data Specialist
Center for Prevention Research and Development
University of Illinois
510 Devonshire Dr.
Champaign, IL 61820
Phone: 217-265-4576
email: [hidden email]

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of lken
Sent: Tuesday, April 03, 2012 11:08 AM
To: [hidden email]
Subject: Re: Poisson assumptions, hessian matrix

My data do not fit a poisson distribution because they are overdispersed. I would like to use a negative binomial distribution but I believe that is only for count data correct? My data is continuous.

My covariates are also pretty far from normally distributed (especially year since death because it was taken for a site, not for each particular tree).
There doesn't seem to be a transformation that will change this. Does anyone know if a logistic regression is robust enough that I can still use year of death as a covariate?

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Poisson-assumptions-hessian-matrix-tp5613256p5615673.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

lken

Re: Poisson assumptions, hessian matrix

Not quite sure which part you are refering to.

A suggestion was made that I could use a poisson distribution or a negative binomial model. I was just saying that I can't because my data do not fit a poisson distribution, and as far as I'm aware I can't use a negative binomial model because I do not have count data, my data are scale data (cm of rot). Is this corrent?

My second question was that my covariates do not meet the test for homogeneity of variance. Is it wise to just ignore this fact in a binary logistic model? Are there any studies that say a binary logistic model can be robust to this?

Poes, Matthew Joseph-2

Re: Poisson assumptions, hessian matrix

I see, sorry, I was referring to the logistic regression, and didn't understand what you were asking. Logistic regression doesn't assume homoscedasticity. It's not an OLS model.

Matthew J Poes
Research Data Specialist
Center for Prevention Research and Development
University of Illinois
510 Devonshire Dr.
Champaign, IL 61820
Phone: 217-265-4576
email: [hidden email]

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of lken
Sent: Tuesday, April 03, 2012 11:27 AM
To: [hidden email]
Subject: Re: Poisson assumptions, hessian matrix

Not quite sure which part you are refering to.

A suggestion was made that I could use a poisson distribution or a negative binomial model. I was just saying that I can't because my data do not fit a poisson distribution, and as far as I'm aware I can't use a negative binomial model because I do not have count data, my data are scale data (cm of rot). Is this corrent?

My second question was that my covariates do not meet the test for homogeneity of variance. Is it wise to just ignore this fact in a binary logistic model? Are there any studies that say a binary logistic model can be robust to this?

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Poisson-assumptions-hessian-matrix-tp5613256p5615739.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Rich Ulrich

Re: Poisson assumptions, hessian matrix

In reply to this post by lken

You did not mention before now, that you were not working
with counts. Poisson distributions are based on counts, so
you must not have a chance for a Poisson model, either.

Maximum likelihood equations do not formally require
normality or homogeneity of variance for the tests to be
valid. But any time that you are creating a linear prediction
equation, your ability to "make sense" or have a well-behaved
prediction is going to rely on having equal intervals on "scales"
in use, either as predictors or as criterion. Inevitably, that
affects the testing, too, whenever the *residuals" are not
well-behaved.

If you have "cm" of rot, *perhaps* taking the log is
reasonable -- but probably not a log with any add-on.
So the log only applies to the non-zero values, and you
have a two-variable criterion, where the second variable
is "Zero vs. non-zero". One *strategy* where the zero is
an extreme, below the numeric range, is to create a composite
score: Composite = log(cm) + (0 for "zero cm"; K = constant
for non-zero cm of rot).

The ideal value for K is whatever value places "zero cm" at
an *appropriate* distance from the other scores. It will be
larger if the rot/not-rot distinction is especially important.
Or it will be larger if you elect to measure cm as mm or microns,
since the numbers in your model will depend on the units of measure.
It could be as small as 0 if you decided that no-rot was equivalent
to having the measured minimum amount of rot.

--
Rich Ulrich

> Date: Tue, 3 Apr 2012 09:07:43 -0700

> From: [hidden email]
> Subject: Re: Poisson assumptions, hessian matrix
> To: [hidden email]
>
> My data do not fit a poisson distribution because they are overdispersed. I
> would like to use a negative binomial distribution but I believe that is
> only for count data correct? My data is continuous.
>
> My covariates are also pretty far from normally distributed (especially year
> since death because it was taken for a site, not for each particular tree).
> There doesn't seem to be a transformation that will change this. Does anyone
> know if a logistic regression is robust enough that I can still use year of
> death as a covariate?
>

...

lken

Re: Poisson assumptions, hessian matrix

Ok, this is starting to make some more sense. Thanks for the advice, I agree that adding a constant and taking the log is not really a good option, it's just something I saw on another blog.

I'm fairly new to statistics, but I think I am understanding the equation you are proposing. My issue is that by adding K after logging cm rot, I would still have an overwhelming amount of zeros (because I consider no rot to be minimal rot measured).

I am bouncing the idea of coding the trees 0 for not extensive for, and 1 for extensive rot and doing a binary logistic model off my professor. Doing something like this is probably more at my level of statistics.

Swank, Paul R

Re: Poisson assumptions, hessian matrix

There is such a thing as a zero inflated Poisson or negative binomial. The latter might be useful if you used mm and rounded to an integer.

Dr. Paul R. Swank,
Children's Learning Institute
Professor, Department of Pediatrics, Medical School
Adjunct Professor, School of Public Health
University of Texas Health Science Center-Houston

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of lken
Sent: Tuesday, April 03, 2012 1:46 PM
To: [hidden email]
Subject: Re: Poisson assumptions, hessian matrix

Ok, this is starting to make some more sense. Thanks for the advice, I agree
that adding a constant and taking the log is not really a good option, it's
just something I saw on another blog.

I'm fairly new to statistics, but I think I am understanding the equation
you are proposing. My issue is that by adding K after logging cm rot, I
would still have an overwhelming amount of zeros (because I consider no rot
to be minimal rot measured).

I am bouncing the idea of coding the trees 0 for not extensive for, and 1
for extensive rot and doing a binary logistic model off my professor. Doing
something like this is probably more at my level of statistics.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Poisson-assumptions-hessian-matrix-tp5613256p5616085.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Bruce Weaver

Re: Poisson assumptions, hessian matrix

Administrator

In reply to this post by lken

Here's a page with some guidelines for "Regression Models for Discrete and Limited Dependent Variables".

http://division.aomonline.org/rm/1997_forum_regression_models.html

The "sample-selected" regression described there seems to fit your situation pretty closely.

"Sample selected outcomes refer to the situation where responses to a continuous variable (Y) are conditional on a dichotomous variable (Z). Consider the example where a researcher wants to study the work-related correlates of alcohol consumption. One could argue that an alcohol use variable that ranges from zero to some large positive value is confounding two distinct variables. The first variable (Z) is a dichotomy and represents the decision to drink alcohol (0 = nondrinker, 1 = drinker). The second variable (Y) is continuous and represents the amount of alcohol consumed among drinkers. "

The author then suggests a two-step approach:

1) Fit a model for the dichotomous variable (rot vs no rot in your case); and
2) Fit a linear regression for the cases where the dichotomous variable = 1 (i.e., Y = amount of rot in trees with rot > 0).

The author suggests probit regression for step 1, but I would use logistic regression instead.

HTH.

lken wrote

Ok, this is starting to make some more sense. Thanks for the advice, I agree that adding a constant and taking the log is not really a good option, it's just something I saw on another blog.

I'm fairly new to statistics, but I think I am understanding the equation you are proposing. My issue is that by adding K after logging cm rot, I would still have an overwhelming amount of zeros (because I consider no rot to be minimal rot measured).

I am bouncing the idea of coding the trees 0 for not extensive for, and 1 for extensive rot and doing a binary logistic model off my professor. Doing something like this is probably more at my level of statistics.

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

lken

Re: Poisson assumptions, hessian matrix

Thank you, this looks like it's a great solution for my problem. After I take the zeros out I don't have very many samples left, but they meet the assumptions for homogeneity.

Thanks to everyone who put their two cents in, I have been struggling with this for a long time now and really appreciate finally being on the right path.

Ryan

Re: Poisson assumptions, hessian matrix

In reply to this post by Bruce Weaver

Bruce is describing what is sometimes referred to as a two-part or "hurdle model." However, unlike Bruce's suggestion of fitting two models separately, one could fit both simultaneously. Indeed, some research has shown that fitting both simultaneously can be beneficial.

Typically the hurdle component is modeled by a logistic regression while the count component is simultaneously modeled by a zero-truncated Poisson or Negbin. However, I see no reason for the zero-truncated component to have be based on a count distribution. Anyway, fitting a hurdle model (or the closely related zero-inflated model) requires maximization of a non-standard likelihood function. I know of a couple of procedures in SAS that are capable of fitting such models (e.g., NLMIXED, MCMC)--not sure about SPSS.

Whether the benefits of fitting a hurdle model outweigh the benefits of fitting the models separately depends on several factors, not the least of which is having access to software allowing one to specify the non-standard log-likelihood.

One final point that's been bugging me--In general, there is no normality assumption about independent variables in regression. I don't know why that's been floating around in this discussion, but thought this should be dispelled. Perhaps I read that statement out of context.

Ryan

On Apr 3, 2012, at 3:19 PM, Bruce Weaver <[hidden email]> wrote:

> Here's a page with some guidelines for "Regression Models for Discrete and
> Limited Dependent Variables".
>
> http://division.aomonline.org/rm/1997_forum_regression_models.html
>
> The "sample-selected" regression described there seems to fit your situation
> pretty closely.
>
> "Sample selected outcomes refer to the situation where responses to a
> continuous variable (Y) are conditional on a dichotomous variable (Z).
> Consider the example where a researcher wants to study the work-related
> correlates of alcohol consumption. One could argue that an alcohol use
> variable that ranges from zero to some large positive value is confounding
> two distinct variables. The first variable (Z) is a dichotomy and represents
> the decision to drink alcohol (0 = nondrinker, 1 = drinker). The second
> variable (Y) is continuous and represents the amount of alcohol consumed
> among drinkers. "
>
> The author then suggests a two-step approach:
>
> 1) Fit a model for the dichotomous variable (rot vs no rot in your case);
> and
> 2) Fit a linear regression for the cases where the dichotomous variable = 1
> (i.e., Y = amount of rot in trees with rot > 0).
>
> The author suggests probit regression for step 1, but I would use logistic
> regression instead.
>
> HTH.
>
>
>
> lken wrote
>>
>> Ok, this is starting to make some more sense. Thanks for the advice, I
>> agree that adding a constant and taking the log is not really a good
>> option, it's just something I saw on another blog.
>>
>> I'm fairly new to statistics, but I think I am understanding the equation
>> you are proposing. My issue is that by adding K after logging cm rot, I
>> would still have an overwhelming amount of zeros (because I consider no
>> rot to be minimal rot measured).
>>
>> I am bouncing the idea of coding the trees 0 for not extensive for, and 1
>> for extensive rot and doing a binary logistic model off my professor.
>> Doing something like this is probably more at my level of statistics.
>>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Poisson-assumptions-hessian-matrix-tp5613256p5616154.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

Bruce Weaver

Re: Poisson assumptions, hessian matrix

Administrator

I don't disagree with Ryan's suggestion that there are probably benefits to fitting one model. As Jon P said to me off-list, tobit regression could be used -- it is also mentioned on the web-page I gave in my earlier post. From Jon:

Tobit is available in Statistics as the SPSSINC TOBIT REGR extension command or Analyze > Regression > Tobit Regression if the R Essentials are installed.

The reason I suggested the two-step approach was that Iken said, "I'm fairly new to statistics, but I think I am understanding the equation you are proposing." With that in mind, I was thinking that the simpler two-step approach might be more "useful" than the more complicated approach of running a single model. Of course, I am thinking of George Box's famous statement:

"All models are wrong. Some are useful."

HTH.
Bruce

Ryan Black wrote

Bruce is describing what is sometimes referred to as a two-part or "hurdle model." However, unlike Bruce's suggestion of fitting two models separately, one could fit both simultaneously. Indeed, some research has shown that fitting both simultaneously can be beneficial.

Typically the hurdle component is modeled by a logistic regression while the count component is simultaneously modeled by a zero-truncated Poisson or Negbin. However, I see no reason for the zero-truncated component to have be based on a count distribution. Anyway, fitting a hurdle model (or the closely related zero-inflated model) requires maximization of a non-standard likelihood function. I know of a couple of procedures in SAS that are capable of fitting such models (e.g., NLMIXED, MCMC)--not sure about SPSS.

Whether the benefits of fitting a hurdle model outweigh the benefits of fitting the models separately depends on several factors, not the least of which is having access to software allowing one to specify the non-standard log-likelihood.

One final point that's been bugging me--In general, there is no normality assumption about independent variables in regression. I don't know why that's been floating around in this discussion, but thought this should be dispelled. Perhaps I read that statement out of context.

Ryan

On Apr 3, 2012, at 3:19 PM, Bruce Weaver <[hidden email]> wrote:

> Here's a page with some guidelines for "Regression Models for Discrete and
> Limited Dependent Variables".
>
> http://division.aomonline.org/rm/1997_forum_regression_models.html
>
> The "sample-selected" regression described there seems to fit your situation
> pretty closely.
>
> "Sample selected outcomes refer to the situation where responses to a
> continuous variable (Y) are conditional on a dichotomous variable (Z).
> Consider the example where a researcher wants to study the work-related
> correlates of alcohol consumption. One could argue that an alcohol use
> variable that ranges from zero to some large positive value is confounding
> two distinct variables. The first variable (Z) is a dichotomy and represents
> the decision to drink alcohol (0 = nondrinker, 1 = drinker). The second
> variable (Y) is continuous and represents the amount of alcohol consumed
> among drinkers. "
>
> The author then suggests a two-step approach:
>
> 1) Fit a model for the dichotomous variable (rot vs no rot in your case);
> and
> 2) Fit a linear regression for the cases where the dichotomous variable = 1
> (i.e., Y = amount of rot in trees with rot > 0).
>
> The author suggests probit regression for step 1, but I would use logistic
> regression instead.
>
> HTH.
>
>
>
> lken wrote
>>
>> Ok, this is starting to make some more sense. Thanks for the advice, I
>> agree that adding a constant and taking the log is not really a good
>> option, it's just something I saw on another blog.
>>
>> I'm fairly new to statistics, but I think I am understanding the equation
>> you are proposing. My issue is that by adding K after logging cm rot, I
>> would still have an overwhelming amount of zeros (because I consider no
>> rot to be minimal rot measured).
>>
>> I am bouncing the idea of coding the trees 0 for not extensive for, and 1
>> for extensive rot and doing a binary logistic model off my professor.
>> Doing something like this is probably more at my level of statistics.
>>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Poisson-assumptions-hessian-matrix-tp5613256p5616154.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

lken

Re: Poisson assumptions, hessian matrix

I think the normality idea was floating around because after I did a regression with three categorical variables I wanted post hoc tests to see where differences where. I was told this would be like doing an ANCOVA and my variables would need to be properly distributed.

I am going with Bruce's suggestion simply because it is the one that makes the most sense to me. I will, however, bring up doing a simulaneous model with my professor.

I have another question and I'm not sure how to phrase it, so I'm just going to tell everyone what I did and hope my explanation is not too scrambled.

I preformed the analyses and am getting some interesting results for differences between groups. Furthering my analysis, I went into one of the groups to test differences in subgroups.

Essentially, I was looking at the subgroups in just one group at a time. The analysis became 3 subgroups as predictors, two continous covariates, and one binary depedant. I performed a logistic regression and found that one of the covariates was significant, and one was not, so I dropped it and got some good results.

Then, I went into another group with only two subgroups. In this group, the covariate that was significant before is no longer significant. This is my first question, should I drop it even though I know it was significant in the other one and I am essentially doing the same test?

My second question is that when I take them out of the model, I run out of degrees of freedom in the chi-squared goodness of fit test. Can you not do a regression with only two groups? Is the only other option to do a non-parametric?