Hello everyone. New member, hoping I will be able to contribute. Decently versed in old-school SPSS syntax, (more used to working with logreg, mixed models and cox survival) this is my first time looking at this particular type of analysis. I have been working on this for a while and I think I am making things more complicated than they need to be at this point. Any help you can give me to understanding what type of basic analysis this requires is very much appreciated.
I am conducting what should be a ‘simple’ analysis of count data of a rare condition. The data I have to work with is simply counts by year. I have been asked to identify whether or not this condition is increasing, or the shape of the curve. I have 12 observations, no covariates, no zeros. (Data format is one column year number, one column count). Here is what I have done thus far: My first step was to take a look at the data graphically. I do have one outlier in the second year, however having few data points and not being asked to make any specific predictions, I am hoping to leave it in for now (It is in the same direction as the curve, just more extreme on the y axis). I created two and three year moving averages for data smoothing, and am using the three year moving average(rounded). Using SPSS I played with the curve estimation function with the original data, two year moving average, and three year moving average and possible curve shapes based on the graphs. I found that a quadratic function fit the data best, a linear function not at all (quadratic significant, rsquare=.85). However I understand the spss curve fitting analysis may not be appropriate due to being count data, autocorrelation (does this apply for independent health events? These would be different people each year) etc. This took me on a chase through various high level trend analysis texts. I am likely overthinking things at this point as what I need is just a very simple p-value to attach saying yes this is a quadratic relationship, that uses an appropriate (or at least acceptable to reviewers!) test. Eventually I settled back down into Poisson and negative binomial models, both of which I am attempting to learn. I created a quadratic variable of the three year moving average count (count squared). Not being certain about the dispersion in the data (I have not tested this before, am working on it) I conducted a negative binomial model on the data. The deviance over its df is .288 and the pearson chisquare over its df is .161. If I have followed my information correctly this might mean a bit of under(?) dispersion, although I am not certain how large a difference needs to be before it is meaningful. This seems like a small difference. Getting turned around, so I hope I have it right. While the simple curve fitting above was significant, the negative binomial quadratic model is only significant in the test effects (CI crosses 0), not in the omnibus test. Given the original model was a good fit, I think I am missing something here. Not sure what my next step should be at this point, I am having trouble finding information on this test that does not assume that I am already fairly familiar with poisson and negative binomial models. I thought I should check in case I am barking up the wrong tree entirely. Questions: 1) Am I correct in that the curve estimation is a good first step, but would not be appropriate as a ‘final answer’ for obtaining a p-value for a shape of a count curve? 2) Is negative binomial analysis what I am looking for here? Or is there something simpler that I am missing? 3) If negative binomial is correct, what would be my next step? (checking assumptions/residuals I would assume..possibly trying a log of the count and a log of the year number?) Is there a good step by step source for SPSS for checking dispersion etc? (Am willing to purchase material if it is useful!). So many grateful thanks for any help you can give me, even if it is simply directing me to resources! |
So you have a dataset of 12 observations with one observation per year, which is the number of cases of that illness/condition/disease. And, you know nothing about population size each year in which the observed number of cases occurred? So you can't say, for instance, that in 2009, 210 cases occurred in a population of 523,000 in round numbers, let alone accurate numbers? If that is so, then my opinion is that all you can do is use Curvefit, which you've already done, or time series, if you have that module. Pick the best fitting line based on an acceptable criterion.
I'm a rank amateur on count type models but if you knew the population each you could then use either genlin or genlinmixed because one or both allow a base number to be read. However, 12 cases is very small for maximum likelihood. The estimation might not converge or otherwise fail and even if the estimation 'solved', I don't know how good the solution would be. Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Bcteagirl Sent: Tuesday, November 25, 2014 12:16 AM To: [hidden email] Subject: Appropriate test for identifying a quadratic curve of count data over time. Hello everyone. New member, hoping I will be able to contribute. Decently versed in old-school SPSS syntax, (more used to working with logreg, mixed models and cox survival) this is my first time looking at this particular type of analysis. I have been working on this for a while and I think I am making things more complicated than they need to be at this point. Any help you can give me to understanding what type of basic analysis this requires is very much appreciated. I am conducting what should be a ‘simple’ analysis of count data of a rare condition. The data I have to work with is simply counts by year. I have been asked to identify whether or not this condition is increasing, or the shape of the curve. I have 12 observations, no covariates, no zeros. (Data format is one column year number, one column count). Here is what I have done thus far: My first step was to take a look at the data graphically. I do have one outlier in the second year, however having few data points and not being asked to make any specific predictions, I am hoping to leave it in for now (It is in the same direction as the curve, just more extreme on the y axis). I created two and three year moving averages for data smoothing, and am using the three year moving average(rounded). Using SPSS I played with the curve estimation function with the original data, two year moving average, and three year moving average and possible curve shapes based on the graphs. I found that a quadratic function fit the data best, a linear function not at all (quadratic significant, rsquare=.85). However I understand the spss curve fitting analysis may not be appropriate due to being count data, autocorrelation (does this apply for independent health events? These would be different people each year) etc. This took me on a chase through various high level trend analysis texts. I am likely overthinking things at this point as what I need is just a very simple p-value to attach saying yes this is a quadratic relationship, that uses an appropriate (or at least acceptable to reviewers!) test. Eventually I settled back down into Poisson and negative binomial models, both of which I am attempting to learn. I created a quadratic variable of the three year moving average count (count squared). Not being certain about the dispersion in the data (I have not tested this before, am working on it) I conducted a negative binomial model on the data. The deviance over its df is .288 and the pearson chisquare over its df is .161. If I have followed my information correctly this might mean a bit of under(?) dispersion, although I am not certain how large a difference needs to be before it is meaningful. This seems like a small difference. Getting turned around, so I hope I have it right. While the simple curve fitting above was significant, the negative binomial quadratic model is only significant in the test effects (CI crosses 0), not in the omnibus test. Given the original model was a good fit, I think I am missing something here. Not sure what my next step should be at this point, I am having trouble finding information on this test that does not assume that I am already fairly familiar with poisson and negative binomial models. I thought I should check in case I am barking up the wrong tree entirely. Questions: 1) Am I correct in that the curve estimation is a good first step, but would not be appropriate as a ‘final answer’ for obtaining a p-value for a shape of a count curve? 2) Is negative binomial analysis what I am looking for here? Or is there something simpler that I am missing? 3) If negative binomial is correct, what would be my next step? (checking assumptions/residuals I would assume..possibly trying a log of the count and a log of the year number?) Is there a good step by step source for SPSS for checking dispersion etc? (Am willing to purchase material if it is useful!). So many grateful thanks for any help you can give me, even if it is simply directing me to resources! -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Appropriate-test-for-identifying-a-quadratic-curve-of-count-data-over-time-tp5728021.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Thank you for your reply, it is good to have some idea that I am on the right track.
You are correct. These are counts at a hospital, and I don't have any denominator. Not the best data, but it is what I have been asked to work with. Today I have been working on treating the curvefit using time as a time series (I found a teaching resource suggesting that curvefit using time could be used for time series data). I have been testing for autocorrelation in the data. Runs test around either the mean or median of the number of quadratic number of cases are both non-significant (I assume I would use the mean). It is the same with the original number. There is also no evidence of autocorrelation based on looking for values outside the bartlett two standard error bars. The significance test associate with autocorrelation output was non-significant for all but the 7 lag, which is close (.044). No significant partial autocorrelations at any point. Since the other tests are all negative, and my sample is small, I am hoping this will be alright. I believe my next step is to continue with diagnostics.. I have saved residuals etc. |
Free forum by Nabble | Edit this page |