SPSSX Discussion

analyses with true rates

Classic

List

Threaded

8 messages Options

Maguin, Eugene

Apr 22, 2014; 9:29pm

analyses with true rates

Several months ago I asked about methods for analyzing proportions and I got helpful replies including references to beta distribution regression. Now I’d like advice on analyzing rates such as the number of events per thousand people-days. What I have is 36 months of data on the number of times per month restraints were used and the number of care-days per month for adolescents in 24 residential treatment units. Care days is the sum of the number of kids nominally present each day across a month. Kids are admitted and discharged so from month to month the individual kids under care changes but more or less slowly. My interest is relating variation in the unit-level rate to unit characteristics.

I’ve framed this as a rate problem because the data don’t seem to fit a poisson process, as I understand it: a defined, constant sample observed over a fixed interval. I can’t see this as a beta distribution problem because there is no true upper bounds to the rate. I’ve found that the 36 month restraint rate varies widely with the highest rate being 500 times higher than the smallest rate. Given all this the best I can think of is to log transform the unit-level rates to equalize the variance across levels of the IVs and use normal theory regression. This strikes me as an inadequate solution. Are there better solutions?

Thanks, Gene

Ryan

Apr 23, 2014; 12:47am

Re: analyses with true rates

Gene,

Are you trying to estimate the monthly rates of restraints placed on adolescents in the 24 residential treatments across the 36 months of the study period? Can you find out the total number of adolescents present on each day of each month in each residential facility? Can you find out which of those adolescents present were restrained each day?

Ryan

On Tue, Apr 22, 2014 at 5:29 PM, Maguin, Eugene <[hidden email]> wrote:

Several months ago I asked about methods for analyzing proportions and I got helpful replies including references to beta distribution regression. Now I’d like advice on analyzing rates such as the number of events per thousand people-days. What I have is 36 months of data on the number of times per month restraints were used and the number of care-days per month for adolescents in 24 residential treatment units. Care days is the sum of the number of kids nominally present each day across a month. Kids are admitted and discharged so from month to month the individual kids under care changes but more or less slowly. My interest is relating variation in the unit-level rate to unit characteristics.

I’ve framed this as a rate problem because the data don’t seem to fit a poisson process, as I understand it: a defined, constant sample observed over a fixed interval. I can’t see this as a beta distribution problem because there is no true upper bounds to the rate. I’ve found that the 36 month restraint rate varies widely with the highest rate being 500 times higher than the smallest rate. Given all this the best I can think of is to log transform the unit-level rates to equalize the variance across levels of the IVs and use normal theory regression. This strikes me as an inadequate solution. Are there better solutions?

Thanks, Gene

... [show rest of quote]

Andy W

Apr 23, 2014; 12:55am

Re: analyses with true rates

In reply to this post by Maguin, Eugene

A Poisson model where the exposure is the number of care days per month and the outcome is the count of restraints per month sounds fine from your description. GENLIN in SPSS allows the specification of an offset variable, which here you would just set equal to the logged number of care days.

There are various ways to make the counts or rates normally distributed and more appropriate for OLS. Examples I have seen are:

- take the log of the rate
- take the square or cube root of the count (useful if the denominator is the same for all units)
- weight the responses (more often for continuous outcomes aggregated up - the weight is sometimes the square root of the count used for the denominator in the rate)
- take the log of a rate ratio where their is some baseline rate from which to compare (popular in epidemiology)

I suspect these are more based on field conventions than any claim to being superior in many particular circumstances - so they are just generically mentioned as possibilities. The only general consensus I know of is if the counts are small and contain zeroes, Poisson models tend to be a preferable choice. Any alternative that involves a log transformation is dependent on the outcome being greater than 0, and the mean of a Poisson variable needs to be around 10 or higher for it to start looking like a normal distribution. When the mean is lower no amount of transformation will make it symmetric.

Relevant citations I would recommend are:

Kronmal, R. A. (1993). Spurious correlation and the fallacy of the ratio standard revisited. Journal of the Royal Statistical Society. Series A (Statistics in Society), 156(3):379-392.

Osgood, D. W. (2000). Poisson-Based regression analysis of aggregate crime rates. Journal of Quantitative Criminology, 16(1):21-43. Online PDF Here

Silva, J. M. C. S. and Tenreyro, S. (2006). The log of gravity. Review of Economics and Statistics, 88(4):641-658. Online PDF Here

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/

Andy W

Apr 23, 2014; 1:00am

Re: analyses with true rates

I should note in addition that there is not anything that precludes a Poisson model from having a rate greater than 1 - if that what your concern was about in "not being a true rate". The exposure just ends up being an offset that increases the expected value by a fixed amount for that observation - the possible predicted values still have no upper bound in theory.

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/

Maguin, Eugene

Apr 23, 2014; 1:05am

Re: analyses with true rates

In reply to this post by Ryan

Thanks for your reply, Ryan,

I have no kid level data. All of it is aggregated to the unit level. All I can say for a specific unit is that there were these many restraints in this month and that the recorded number of care-days was 342, for example. Certainly the agency would know that information but I wouldn’t expect that we could get that—even if it were de-identified. The main interest is to relate restraint rate to unit level characteristics, for instance, whether a certain type of program was used.

Gene

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Ryan Black
Sent: Tuesday, April 22, 2014 8:48 PM
To: [hidden email]
Subject: Re: analyses with true rates

Gene,

Ryan

On Tue, Apr 22, 2014 at 5:29 PM, Maguin, Eugene <[hidden email]> wrote:

Thanks, Gene

Ryan

Apr 23, 2014; 4:36am

Re: analyses with true rates

Gene,

Unfortunately, this means you cannot take into account within-adolescent correlation. This will introduce some bias into your standard errors.

We cannot think of N=342 as the "total number of trials." Obvious, but still worth stating explicitly.

I cannot tell if you have residential treatment unit-specific monthly data. If you do, then you have 36 months of data from the 24 treatment facility units.

A count regression model [which takes into account the number of adolescents per month per treatment facility] is a good place to start, but I'd need to see the distribution of the data to help you arrive at the optimal model.

Dependencies in your data. You may have at least two sources in your data that induce correlation:

1. Within-treatment facility correlation

2. Residual correlation conditional upon the treatment facility unit specific effects (e.g., residual autoregressive temporal correlation)

If and how you model such correlation(s) depends on several factors.

I will stop now as this post is well outside that which pertains to SPSS.

Ryan

On Tue, Apr 22, 2014 at 9:05 PM, Maguin, Eugene <[hidden email]> wrote:

Thanks for your reply, Ryan,

I have no kid level data. All of it is aggregated to the unit level. All I can say for a specific unit is that there were these many restraints in this month and that the recorded number of care-days was 342, for example. Certainly the agency would know that information but I wouldn’t expect that we could get that—even if it were de-identified. The main interest is to relate restraint rate to unit level characteristics, for instance, whether a certain type of program was used.

Gene

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Ryan Black
Sent: Tuesday, April 22, 2014 8:48 PM
To: [hidden email]
Subject: Re: analyses with true rates

Gene,

Are you trying to estimate the monthly rates of restraints placed on adolescents in the 24 residential treatments across the 36 months of the study period? Can you find out the total number of adolescents present on each day of each month in each residential facility? Can you find out which of those adolescents present were restrained each day?

Ryan

On Tue, Apr 22, 2014 at 5:29 PM, Maguin, Eugene <[hidden email]> wrote:

Several months ago I asked about methods for analyzing proportions and I got helpful replies including references to beta distribution regression. Now I’d like advice on analyzing rates such as the number of events per thousand people-days. What I have is 36 months of data on the number of times per month restraints were used and the number of care-days per month for adolescents in 24 residential treatment units. Care days is the sum of the number of kids nominally present each day across a month. Kids are admitted and discharged so from month to month the individual kids under care changes but more or less slowly. My interest is relating variation in the unit-level rate to unit characteristics.

I’ve framed this as a rate problem because the data don’t seem to fit a poisson process, as I understand it: a defined, constant sample observed over a fixed interval. I can’t see this as a beta distribution problem because there is no true upper bounds to the rate. I’ve found that the 36 month restraint rate varies widely with the highest rate being 500 times higher than the smallest rate. Given all this the best I can think of is to log transform the unit-level rates to equalize the variance across levels of the IVs and use normal theory regression. This strikes me as an inadequate solution. Are there better solutions?

Thanks, Gene

... [show rest of quote]

Andy W

Apr 23, 2014; 1:35pm

Re: analyses with true rates

Good point Ryan, assuming the correlation within adolescents is positive (e.g. being restrained is "contagious") it will cause overdispersion in the distribution - suggesting a negative binomial model would be appropriate.

A good description of the problem can be found in:

Berk, R. and MacDonald, J. (2008). Overdispersion and Poisson regression. Journal of Quantitative Criminology, 24(3):269-284. Online PDF Here

It can actually cause not just problems in the standard error, but biased estimates as well. For instance lets say that some of the institutions in Gene's sample are more likely to have adolescents with mental health problems, and these individuals are more likely to act out (and need to be restrained) when they see others act out and are restrained. If no estimate of said mental health population is included in the model, it could cause other parameter estimates to be biased due to the omitted variable.

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/

Rich Ulrich

Apr 23, 2014; 4:54pm

Re: analyses with true rates

In reply to this post by Maguin, Eugene

I once looked at data on "restraints" in long-term psychiatric inpatient units,
with daily data. What was impressive in my data was how few people
accounted for almost all the events. Two events per month jumps to 20+,
and it stays there until one belligerent patient gets transferred.

Since you have "residential treatment units", it sounds like you probably face
the same sort of problem. The only "unit characteristic" that has great
relevance, IMHO, is who they have to admit and can't get rid of.

However, if your data is restricted to aggregate data, you are pretty much
stuck with no information you can develop about repeaters. So, I will
encourage you to be modest about the inferences that you attempt to draw.

- I do wonder, How many of the 24 units do have almost no "events"?
Perhaps the main contrast should be "few" versus "many", or perhaps a
regression is worthwhile if you restrict the sample to only those units with
relatively few.

--
Rich Ulrich

Date: Tue, 22 Apr 2014 21:29:42 +0000
From: [hidden email]
Subject: analyses with true rates
To: [hidden email]

Thanks, Gene