Several months ago I asked about methods for analyzing proportions and I got helpful replies including references to beta distribution regression. Now I’d like advice on analyzing rates such as the number of events per thousand people-days.
What I have is 36 months of data on the number of times per month restraints were used and the number of care-days per month for adolescents in 24 residential treatment units. Care days is the sum of the number of kids nominally present each day across a
month. Kids are admitted and discharged so from month to month the individual kids under care changes but more or less slowly. My interest is relating variation in the unit-level rate to unit characteristics. I’ve framed this as a rate problem because the data don’t seem to fit a poisson process, as I understand it: a defined, constant sample observed over a fixed interval. I can’t see this as a beta distribution problem because there is no
true upper bounds to the rate. I’ve found that the 36 month restraint rate varies widely with the highest rate being 500 times higher than the smallest rate. Given all this the best I can think of is to log transform the unit-level rates to equalize the variance
across levels of the IVs and use normal theory regression. This strikes me as an inadequate solution. Are there better solutions? Thanks, Gene |
Gene,
Are you trying to estimate the monthly rates of restraints placed on adolescents in the 24 residential treatments across the 36 months of the study period? Can you find out the total number of adolescents present on each day of each month in each residential facility? Can you find out which of those adolescents present were restrained each day?
Ryan On Tue, Apr 22, 2014 at 5:29 PM, Maguin, Eugene <[hidden email]> wrote:
|
In reply to this post by Maguin, Eugene
A Poisson model where the exposure is the number of care days per month and the outcome is the count of restraints per month sounds fine from your description. GENLIN in SPSS allows the specification of an offset variable, which here you would just set equal to the logged number of care days.
There are various ways to make the counts or rates normally distributed and more appropriate for OLS. Examples I have seen are: - take the log of the rate - take the square or cube root of the count (useful if the denominator is the same for all units) - weight the responses (more often for continuous outcomes aggregated up - the weight is sometimes the square root of the count used for the denominator in the rate) - take the log of a rate ratio where their is some baseline rate from which to compare (popular in epidemiology) I suspect these are more based on field conventions than any claim to being superior in many particular circumstances - so they are just generically mentioned as possibilities. The only general consensus I know of is if the counts are small and contain zeroes, Poisson models tend to be a preferable choice. Any alternative that involves a log transformation is dependent on the outcome being greater than 0, and the mean of a Poisson variable needs to be around 10 or higher for it to start looking like a normal distribution. When the mean is lower no amount of transformation will make it symmetric. Relevant citations I would recommend are: Kronmal, R. A. (1993). Spurious correlation and the fallacy of the ratio standard revisited. Journal of the Royal Statistical Society. Series A (Statistics in Society), 156(3):379-392. Osgood, D. W. (2000). Poisson-Based regression analysis of aggregate crime rates. Journal of Quantitative Criminology, 16(1):21-43. Online PDF Here Silva, J. M. C. S. and Tenreyro, S. (2006). The log of gravity. Review of Economics and Statistics, 88(4):641-658. Online PDF Here |
I should note in addition that there is not anything that precludes a Poisson model from having a rate greater than 1 - if that what your concern was about in "not being a true rate". The exposure just ends up being an offset that increases the expected value by a fixed amount for that observation - the possible predicted values still have no upper bound in theory.
|
In reply to this post by Ryan
Thanks for your reply, Ryan, I have no kid level data. All of it is aggregated to the unit level. All I can say for a specific unit is that there were these many restraints in this month
and that the recorded number of care-days was 342, for example. Certainly the agency would know that information but I wouldn’t expect that we could get that—even if it were de-identified. The main interest is to relate restraint rate to unit level characteristics,
for instance, whether a certain type of program was used. Gene From: SPSSX(r) Discussion [mailto:[hidden email]]
On Behalf Of Ryan Black Gene, Are you trying to estimate the monthly rates of restraints placed on adolescents in the 24 residential treatments across the 36 months of the study period? Can you find out the total number of adolescents present on each day of each month
in each residential facility? Can you find out which of those adolescents present were restrained each day? Ryan On Tue, Apr 22, 2014 at 5:29 PM, Maguin, Eugene <[hidden email]> wrote: Several months ago I asked about methods for analyzing proportions and I got helpful replies including references to beta distribution regression. Now I’d like advice on analyzing
rates such as the number of events per thousand people-days. What I have is 36 months of data on the number of times per month restraints were used and the number of care-days per month for adolescents in 24 residential treatment units. Care days is the sum
of the number of kids nominally present each day across a month. Kids are admitted and discharged so from month to month the individual kids under care changes but more or less slowly. My interest is relating variation in the unit-level rate to unit characteristics. I’ve framed this as a rate problem because the data don’t seem to fit a poisson process, as I understand it: a defined, constant sample observed over a fixed interval. I can’t see
this as a beta distribution problem because there is no true upper bounds to the rate. I’ve found that the 36 month restraint rate varies widely with the highest rate being 500 times higher than the smallest rate. Given all this the best I can think of is
to log transform the unit-level rates to equalize the variance across levels of the IVs and use normal theory regression. This strikes me as an inadequate solution. Are there better solutions? Thanks, Gene |
Gene,
Unfortunately, this means you cannot take into account within-adolescent correlation. This will introduce some bias into your standard errors. We cannot think of N=342 as the "total number of trials." Obvious, but still worth stating explicitly.
I cannot tell if you have residential treatment unit-specific monthly data. If you do, then you have 36 months of data from the 24 treatment facility units. A count regression model [which takes into account the number of adolescents per month per treatment facility] is a good place to start, but I'd need to see the distribution of the data to help you arrive at the optimal model.
Dependencies in your data. You may have at least two sources in your data that induce correlation: 1. Within-treatment facility correlation 2. Residual correlation conditional upon the treatment facility unit specific effects (e.g., residual autoregressive temporal correlation)
If and how you model such correlation(s) depends on several factors. I will stop now as this post is well outside that which pertains to SPSS.
Ryan On Tue, Apr 22, 2014 at 9:05 PM, Maguin, Eugene <[hidden email]> wrote:
|
Good point Ryan, assuming the correlation within adolescents is positive (e.g. being restrained is "contagious") it will cause overdispersion in the distribution - suggesting a negative binomial model would be appropriate.
A good description of the problem can be found in: Berk, R. and MacDonald, J. (2008). Overdispersion and Poisson regression. Journal of Quantitative Criminology, 24(3):269-284. Online PDF Here It can actually cause not just problems in the standard error, but biased estimates as well. For instance lets say that some of the institutions in Gene's sample are more likely to have adolescents with mental health problems, and these individuals are more likely to act out (and need to be restrained) when they see others act out and are restrained. If no estimate of said mental health population is included in the model, it could cause other parameter estimates to be biased due to the omitted variable. |
In reply to this post by Maguin, Eugene
I once looked at data on "restraints" in long-term psychiatric inpatient units,
with daily data. What was impressive in my data was how few people accounted for almost all the events. Two events per month jumps to 20+, and it stays there until one belligerent patient gets transferred. Since you have "residential treatment units", it sounds like you probably face the same sort of problem. The only "unit characteristic" that has great relevance, IMHO, is who they have to admit and can't get rid of. However, if your data is restricted to aggregate data, you are pretty much stuck with no information you can develop about repeaters. So, I will encourage you to be modest about the inferences that you attempt to draw. - I do wonder, How many of the 24 units do have almost no "events"? Perhaps the main contrast should be "few" versus "many", or perhaps a regression is worthwhile if you restrict the sample to only those units with relatively few. -- Rich Ulrich Date: Tue, 22 Apr 2014 21:29:42 +0000 From: [hidden email] Subject: analyses with true rates To: [hidden email] Several months ago I asked about methods for analyzing proportions and I got helpful replies including references to beta distribution regression. Now I’d like advice on analyzing rates such as the number of events per thousand people-days. What I have is 36 months of data on the number of times per month restraints were used and the number of care-days per month for adolescents in 24 residential treatment units. Care days is the sum of the number of kids nominally present each day across a month. Kids are admitted and discharged so from month to month the individual kids under care changes but more or less slowly. My interest is relating variation in the unit-level rate to unit characteristics.
I’ve framed this as a rate problem because the data don’t seem to fit a poisson process, as I understand it: a defined, constant sample observed over a fixed interval. I can’t see this as a beta distribution problem because there is no true upper bounds to the rate. I’ve found that the 36 month restraint rate varies widely with the highest rate being 500 times higher than the smallest rate. Given all this the best I can think of is to log transform the unit-level rates to equalize the variance across levels of the IVs and use normal theory regression. This strikes me as an inadequate solution. Are there better solutions?
Thanks, Gene |
Free forum by Nabble | Edit this page |