Cox Regression Using Explanators That Vary Monotonically With Analysis Time

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Cox Regression Using Explanators That Vary Monotonically With Analysis Time

Adam Thomas-4
Hello,

I've got a question regarding the use of Cox regression with explanators
that vary monotonically with analysis time.  I am using the National
Longitudinal Survey of Youth to look at the effect of having a prison record
on one's hazard of first marriage.  If possible, I would like conduct some
analyses for a sample that is limited to observations that are incarcerated
at some point during the panel.  But I'm getting some very strange results
when I do so.  My independent variable of interest - call it "everjail" - is
set equal to zero for person-years who have not yet gone to jail and to one
for observations that have been incarcerated.  Analysis time is measured in
terms of months of age.  When I conduct an analysis for my entire sample,
the coefficient on everjail is LT one (about .5) and statistically
significant.  This result persists when I use a number of different
combinations of control variables, when I limit my sample to self-reported
juvenile delinquents, and in a number of other settings.

I then limit my sample only to persons who go to prison at some point during
the panel, so the comparison group for those with a prison record at any
given point in time consists of a group that has not yet gone to prison but
will at some point in the future.  Bear in mind that, in this sample, all
persons' everjail values will switch from zero to one at some point during
the panel and will then remain one for the duration of the panel.  This
specification attenuates the estimated effect of everjail (it rises to about
.7) and  the parameter is no longer significant (t < 1).  However, an
analysis of Schoenfeld residuals from this model suggested that the effect
of everjail is not proportionally constant over time.

After poking around a bit, I discovered that I got very different
coefficients depending on the age of the respondents I was looking at.  If,
I split the panel roughly in half, there is a strongly negative (coeff =
about .3) and statistically significant estimated effect of past
incarceration for younger observations.  BUT - and this is where I think
something is wrong - there is a very large *positive* and statistically
significant coefficient (coeff = about 9) for older observations.

This suggests to me that something may not be working as I'd expect for this
particular specification.  I encounter roughly similar problems if, rather
than splitting up the sample, I use time-varying covariates, or if I include
a "goes to prison at some point during the panel" control dummy variable
rather than simply limiting the sample to this group.  But the problem ONLY
occurs if I limit the sample to those who go to prison at some point during
the panel.  If I look at results for older observations in the sample as a
whole, the parameter estimate on everjail is correctly-signed.  It may be
that having gone to prison has no effect on one's probability of marrying,
but I find it hard to believe that it has a strongly *positive* effect for
older people.  Here's my theory, which I'm hoping I can get some reaction
to: both my dependent variable (the hazard of first marriage) and my key
independent variable (everjail) are strongly positively correlated with
analysis time (age).  This is  true for obvious reasons for the dependent
variable, and it is true for everjail since the sample is limited to
observations who go to prison at some point during the panel - if you are in
this subsample and haven't gone to jail today, you're likely to do so next
year, and if you don't do so next year, you're certain to do sometime after
that, and so forth.  So, since both variables are positively correlated with
analysis time, and since the "older" sample is limited to people who are
guaranteed to have  some years in which everjail == 1 (members of the
younger sample may not go to prison until they are older) and a certain this
percentage of whom are also going to marry, is age confounding the
relationship between everjail and the hazard of marrying?  It doesn't seem
like this should be possible, since age - my measure of analysis time - is
explicitly being controlled for in the baseline hazard.

So, my basic question is this: if I'm using a Cox model and have an
independent variable that - like the dependent variable in a Cox analysis -
varies monotonically with analysis time, does that introduce some sort of
strange timing issue into the analysis?  Should I expect to get odd
parameter estimates in a situation like this, or am I doubting my results
when in fact I shouldn't be?  I'm stumped, so any and all advice would be
most welcome!!

Cheers,
Adam Thomas
John F. Kennedy School of Government,
Harvard University