Hello,
I've got a question regarding the use of Cox regression with explanators that vary monotonically with analysis time. I am using the National Longitudinal Survey of Youth to look at the effect of having a prison record on one's hazard of first marriage. If possible, I would like conduct some analyses for a sample that is limited to observations that are incarcerated at some point during the panel. But I'm getting some very strange results when I do so. My independent variable of interest - call it "everjail" - is set equal to zero for person-years who have not yet gone to jail and to one for observations that have been incarcerated. Analysis time is measured in terms of months of age. When I conduct an analysis for my entire sample, the coefficient on everjail is LT one (about .5) and statistically significant. This result persists when I use a number of different combinations of control variables, when I limit my sample to self-reported juvenile delinquents, and in a number of other settings. I then limit my sample only to persons who go to prison at some point during the panel, so the comparison group for those with a prison record at any given point in time consists of a group that has not yet gone to prison but will at some point in the future. Bear in mind that, in this sample, all persons' everjail values will switch from zero to one at some point during the panel and will then remain one for the duration of the panel. This specification attenuates the estimated effect of everjail (it rises to about .7) and the parameter is no longer significant (t < 1). However, an analysis of Schoenfeld residuals from this model suggested that the effect of everjail is not proportionally constant over time. After poking around a bit, I discovered that I got very different coefficients depending on the age of the respondents I was looking at. If, I split the panel roughly in half, there is a strongly negative (coeff = about .3) and statistically significant estimated effect of past incarceration for younger observations. BUT - and this is where I think something is wrong - there is a very large *positive* and statistically significant coefficient (coeff = about 9) for older observations. This suggests to me that something may not be working as I'd expect for this particular specification. I encounter roughly similar problems if, rather than splitting up the sample, I use time-varying covariates, or if I include a "goes to prison at some point during the panel" control dummy variable rather than simply limiting the sample to this group. But the problem ONLY occurs if I limit the sample to those who go to prison at some point during the panel. If I look at results for older observations in the sample as a whole, the parameter estimate on everjail is correctly-signed. It may be that having gone to prison has no effect on one's probability of marrying, but I find it hard to believe that it has a strongly *positive* effect for older people. Here's my theory, which I'm hoping I can get some reaction to: both my dependent variable (the hazard of first marriage) and my key independent variable (everjail) are strongly positively correlated with analysis time (age). This is true for obvious reasons for the dependent variable, and it is true for everjail since the sample is limited to observations who go to prison at some point during the panel - if you are in this subsample and haven't gone to jail today, you're likely to do so next year, and if you don't do so next year, you're certain to do sometime after that, and so forth. So, since both variables are positively correlated with analysis time, and since the "older" sample is limited to people who are guaranteed to have some years in which everjail == 1 (members of the younger sample may not go to prison until they are older) and a certain this percentage of whom are also going to marry, is age confounding the relationship between everjail and the hazard of marrying? It doesn't seem like this should be possible, since age - my measure of analysis time - is explicitly being controlled for in the baseline hazard. So, my basic question is this: if I'm using a Cox model and have an independent variable that - like the dependent variable in a Cox analysis - varies monotonically with analysis time, does that introduce some sort of strange timing issue into the analysis? Should I expect to get odd parameter estimates in a situation like this, or am I doubting my results when in fact I shouldn't be? I'm stumped, so any and all advice would be most welcome!! Cheers, Adam Thomas John F. Kennedy School of Government, Harvard University |
Free forum by Nabble | Edit this page |