Discrete time survival

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Discrete time survival

Maguin, Eugene
I'm working on a discrete time survival analysis (and following along with Singer and Willett's chapters) and I've run into a problem that may occur fairly often in discrete time. A bit of background: 42 out of 261 persons had the event over a span of about 1200 days, which is my time unit. I elected to work in discrete time rather than continuous time because I thought, maybe incorrectly, that discrete time would be easier to work with and present to an inexperienced audience. I'm aggregating time to eliminate time chunks with no events. Using a three month aggregation, I have one time chunk with no events. The estimation gives a large B coefficient for that chunk, which is no surprise. My question is whether an acceptable remedy is to add a tiny value to that cell, changing the 0 to, say, .001, if you think in crosstab terms. Then, if it an acceptable remedy, is the doing of it just a matter of changing the event status indicator from 0 to .001 for a randomly chosen case a!
 t that time point?

Thanks, Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Discrete time survival

Hector Maletta
I do not think it is wise to correct reality to make it fit the model, but
(if anything) the other way around.
On the other hand, survival analysis is about probabilities, and it is quite
likely that at any particular period the number of events is larger or
smaller than predicted, including zero events as a limiting case. When the
total number of events is 42 over 1200 periods, zero-event periods are a
necessity.
Time chunks might be useful for some purposes, but (1) by lumping time
periods together you are deliberately forfeiting information; and (2) the
actual length of the chunks is arbitrary: your conclusions may vary if you
use 2 months, 3 months, or 17 or 43 days as your time-chunk convention. At
any rate one should try different ways of carving the chunks, aiming at
using the shortest chunk that is possible to avoid unstable results (e.g. a
coefficient estimate that is extremely sensible to the random occurrence of
one more or one less event in a given chunk).
You do not clarify what kind of survival analysis you are doing. If it is
Cox regression, recall that call regression uses only information on the
succession or time-order of events, and not on the exact length of time
elapsed.
Also, Cox regression generates a function for the hazard rate (or conversely
the survival probability) including EXP(BX), with B coefficients for each
predictor, and these B coefficients are one single set of coefficients for
the entire analysis, not one coefficient per chunk of time. Thus I do not
understand what you mean by "a large B coefficient for that chunk".

Hope this helps.

Hector

-----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de
Maguin, Eugene
Enviado el: Wednesday, July 11, 2012 12:54
Para: [hidden email]
Asunto: Discrete time survival

I'm working on a discrete time survival analysis (and following along with
Singer and Willett's chapters) and I've run into a problem that may occur
fairly often in discrete time. A bit of background: 42 out of 261 persons
had the event over a span of about 1200 days, which is my time unit. I
elected to work in discrete time rather than continuous time because I
thought, maybe incorrectly, that discrete time would be easier to work with
and present to an inexperienced audience. I'm aggregating time to eliminate
time chunks with no events. Using a three month aggregation, I have one time
chunk with no events. The estimation gives a large B coefficient for that
chunk, which is no surprise. My question is whether an acceptable remedy is
to add a tiny value to that cell, changing the 0 to, say, .001, if you think
in crosstab terms. Then, if it an acceptable remedy, is the doing of it just
a matter of changing the event status indicator from 0 to .001 for a
randomly chosen case a!
 t that time point?

Thanks, Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Discrete time survival

Maguin, Eugene
Thanks, Hector. I'm not using Cox regression. I've constructed a so-called person-period data and am using logistic regression to analyze it. I'm using 'time chunks' to mean periods. Maybe confusing nomenclature. In each period there is a probability of the event and an associated odds. Relative to the odds for the reference period an odds ratio can be computed each other period. The B that referred to was the coefficient for one of the periods.

Using a longer aggregation period will eliminate no event periods, I understand.

I have read that using Cox regression with discrete time may give problems because of the multiple events in a period (ties).

Gene Maguin



-----Original Message-----
From: Hector Maletta [mailto:[hidden email]]
Sent: Wednesday, July 11, 2012 12:26 PM
To: Maguin, Eugene; [hidden email]
Subject: RE: Discrete time survival

I do not think it is wise to correct reality to make it fit the model, but (if anything) the other way around.
On the other hand, survival analysis is about probabilities, and it is quite likely that at any particular period the number of events is larger or smaller than predicted, including zero events as a limiting case. When the total number of events is 42 over 1200 periods, zero-event periods are a necessity.
Time chunks might be useful for some purposes, but (1) by lumping time periods together you are deliberately forfeiting information; and (2) the actual length of the chunks is arbitrary: your conclusions may vary if you use 2 months, 3 months, or 17 or 43 days as your time-chunk convention. At any rate one should try different ways of carving the chunks, aiming at using the shortest chunk that is possible to avoid unstable results (e.g. a coefficient estimate that is extremely sensible to the random occurrence of one more or one less event in a given chunk).
You do not clarify what kind of survival analysis you are doing. If it is Cox regression, recall that call regression uses only information on the succession or time-order of events, and not on the exact length of time elapsed.
Also, Cox regression generates a function for the hazard rate (or conversely the survival probability) including EXP(BX), with B coefficients for each predictor, and these B coefficients are one single set of coefficients for the entire analysis, not one coefficient per chunk of time. Thus I do not understand what you mean by "a large B coefficient for that chunk".

Hope this helps.

Hector

-----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Maguin, Eugene Enviado el: Wednesday, July 11, 2012 12:54
Para: [hidden email]
Asunto: Discrete time survival

I'm working on a discrete time survival analysis (and following along with Singer and Willett's chapters) and I've run into a problem that may occur fairly often in discrete time. A bit of background: 42 out of 261 persons had the event over a span of about 1200 days, which is my time unit. I elected to work in discrete time rather than continuous time because I thought, maybe incorrectly, that discrete time would be easier to work with and present to an inexperienced audience. I'm aggregating time to eliminate time chunks with no events. Using a three month aggregation, I have one time chunk with no events. The estimation gives a large B coefficient for that chunk, which is no surprise. My question is whether an acceptable remedy is to add a tiny value to that cell, changing the 0 to, say, .001, if you think in crosstab terms. Then, if it an acceptable remedy, is the doing of it just a matter of changing the event status indicator from 0 to .001 for a randomly chosen case a!
 t that time point?

Thanks, Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Discrete time survival

Hector Maletta
Gene,
I do not know the details of your research program; I thought you were
dealing with a SURVIVAL problem, i.e. trying to ascertain the probabilities
of surviving for a given span of time (after a certain "start" state) before
an event strikes. This is not the same as the probabilities of the event at
various periods in comparison with the probability of the event at some
reference period. An  example of the latter would be to ascertain the
probability of tornados in May, June, July, August, September etc relative
to their probability in the "reference" month, say January. An example of
the former is the probability that a given location would "survive" without
any tornado, from January until a variable date (January to May, January to
June, January to September etc). It is a completely different problem. In
the former you may use logistic regression (probability of an event for
cases described by category 1, 2, 3, 4, ...., k, relative to the reference
probability of the event for cases described by the reference category); for
instance, suppose you investigate the probability of tornados with one
single predictor, "Month of year"; this predictor has 12 categories, one per
month, and you use one of the months (say, May) as your reference category
(you may have data from multiple years for the same location, or from
multiple locations in the same year; multiple years for given location may
seem more logical as an example, given the geographical variability of
tornados, and their relatively stable recurrence over time). You may use
also some other concurrent predictor, say accumulated annual rainfall over
the 12 months to the start of the tornado season, for all years (or
locations) considered. From these data, you obtain the odds of tornados in
September relative to January (probability of tornados in September, divided
by probability of tornados in January), and also the odds of tornados for
all the other months. In total, you obtain eleven odds (one per month, all
relative to January). Your logistic function for the probability of
tornadoes in a given month is p(k)=EXP(BX)/[1+EXP(BX)] where
BX=b0+b1X1+b2X2. The logarithm of the odds is BX, and the odds are BX. The
odds that a tornado happens in a given month k, say July, relative to the
base period (May) in years with rainfall x(t), equal the probability of a
tornado happening in month k, relative to the probability of a tornado
happening in May, for years with cumulative 12-month rainfall=x(t). (I use
May in this example as the probability of tornados in January is likely to
be zero).

In the survival case the situation is different. Suppose you use survival
analysis to predict the chances of surviving different lengths of time
without experiencing a tornado (say, at a given area, like Kansas), using a
"time variable" (month of year) and perhaps some predictor (say, rainfall
again). In this case, time is NOT a predictor: it is the one-directional
dimension along which events can occur. Your hazard function will have only
one predictor (rainfall). The proportional cumulative hazard rate h(t) for a
year with rainfall X=x(t) will mean: "number of events expected to have
occurred from starting time, say January, to month t, in years with rainfall
x(t)". (I use January in this example as the reference time in survival
analysis should be at the start of the relevant period, not in the middle of
it; one may use also March or April, if the start of the tornado season is
always after March). The associated survival probability up to month k, for
a year with rainfall x(t), i.e. p(tk), gives you the probability of not
having a tornado until month k, for year t with rainfall x(t).

Logistic regression estimates the probability of a tornado in each different
month. Survival analysis estimates the probability of not having the tornado
up to each different month.

Cox regression is a particular kind of "proportional hazard" survival
analysis, where (ordinarily) the hazard rates stand to the reference or base
hazard rate in a constant proportion over time. If your chances of survival
are twice as large as mine, guys like you would be twice more likely to
survive than guys like me up to every month, no matter how distant the month
considered. No chance that your survival chances approach mine over time:
they remain twice as large. For example, if 800 mm/yr of accumulated
rainfall up to start of season afford a reduction of 20% in the incidence of
tornadoes (relative to the reference case which has, say, 500 mm/yr
rainfall), this reduction of 20% in the odds of tornadoes will hold for all
time intervals, i.e. for the relative chances of tornadoes up to all months.
Cox regression  may, however, accommodate non proportional hazards by
introducing time-dependent covariates (such as accumulated rainfall UP TO
EACH MONTH). Other models of survival analysis may account for more
sophisticated relationships between time, covariates and events.

Hope this helps.

Hector

-----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de
Maguin, Eugene
Enviado el: Wednesday, July 11, 2012 14:03
Para: [hidden email]
Asunto: Re: Discrete time survival

Thanks, Hector. I'm not using Cox regression. I've constructed a so-called
person-period data and am using logistic regression to analyze it. I'm using
'time chunks' to mean periods. Maybe confusing nomenclature. In each period
there is a probability of the event and an associated odds. Relative to the
odds for the reference period an odds ratio can be computed each other
period. The B that referred to was the coefficient for one of the periods.

Using a longer aggregation period will eliminate no event periods, I
understand.

I have read that using Cox regression with discrete time may give problems
because of the multiple events in a period (ties).

Gene Maguin



-----Original Message-----
From: Hector Maletta [mailto:[hidden email]]
Sent: Wednesday, July 11, 2012 12:26 PM
To: Maguin, Eugene; [hidden email]
Subject: RE: Discrete time survival

I do not think it is wise to correct reality to make it fit the model, but
(if anything) the other way around.
On the other hand, survival analysis is about probabilities, and it is quite
likely that at any particular period the number of events is larger or
smaller than predicted, including zero events as a limiting case. When the
total number of events is 42 over 1200 periods, zero-event periods are a
necessity.
Time chunks might be useful for some purposes, but (1) by lumping time
periods together you are deliberately forfeiting information; and (2) the
actual length of the chunks is arbitrary: your conclusions may vary if you
use 2 months, 3 months, or 17 or 43 days as your time-chunk convention. At
any rate one should try different ways of carving the chunks, aiming at
using the shortest chunk that is possible to avoid unstable results (e.g. a
coefficient estimate that is extremely sensible to the random occurrence of
one more or one less event in a given chunk).
You do not clarify what kind of survival analysis you are doing. If it is
Cox regression, recall that call regression uses only information on the
succession or time-order of events, and not on the exact length of time
elapsed.
Also, Cox regression generates a function for the hazard rate (or conversely
the survival probability) including EXP(BX), with B coefficients for each
predictor, and these B coefficients are one single set of coefficients for
the entire analysis, not one coefficient per chunk of time. Thus I do not
understand what you mean by "a large B coefficient for that chunk".

Hope this helps.

Hector

-----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de
Maguin, Eugene Enviado el: Wednesday, July 11, 2012 12:54
Para: [hidden email]
Asunto: Discrete time survival

I'm working on a discrete time survival analysis (and following along with
Singer and Willett's chapters) and I've run into a problem that may occur
fairly often in discrete time. A bit of background: 42 out of 261 persons
had the event over a span of about 1200 days, which is my time unit. I
elected to work in discrete time rather than continuous time because I
thought, maybe incorrectly, that discrete time would be easier to work with
and present to an inexperienced audience. I'm aggregating time to eliminate
time chunks with no events. Using a three month aggregation, I have one time
chunk with no events. The estimation gives a large B coefficient for that
chunk, which is no surprise. My question is whether an acceptable remedy is
to add a tiny value to that cell, changing the 0 to, say, .001, if you think
in crosstab terms. Then, if it an acceptable remedy, is the doing of it just
a matter of changing the event status indicator from 0 to .001 for a
randomly chosen case a!
 t that time point?

Thanks, Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Discrete time survival

Bruce Weaver
Administrator
Hector, as Gene said in his first post, he is using "discrete time survival analysis", and following advice given in Singer & Willett's (2003) book.  In chapter 11 of that book, S&W show how one can fit a "discrete-time hazard model" using a logistic regression procedure.  To quote:  

"Our goal is to show that although the [discrete-time hazard] model may appear complex and unfamiliar, it can be fit using software that is familiar by applying standard logistic regression analysis in the person-period data set."  

A "person-period data set" has multiple rows per person, one row per period of observation.  There is also an Event indicator variable (call it EVT).  For persons who experience the event, EVT=0 on all rows but the last.  For those who do not experience the event, EVT=0 on all rows.  For more details, see Singer & Willett's book (particularly chapters 11 & 12).
 
HTH.


Hector Maletta wrote
Gene,
I do not know the details of your research program; I thought you were
dealing with a SURVIVAL problem, i.e. trying to ascertain the probabilities
of surviving for a given span of time (after a certain "start" state) before
an event strikes. This is not the same as the probabilities of the event at
various periods in comparison with the probability of the event at some
reference period. An  example of the latter would be to ascertain the
probability of tornados in May, June, July, August, September etc relative
to their probability in the "reference" month, say January. An example of
the former is the probability that a given location would "survive" without
any tornado, from January until a variable date (January to May, January to
June, January to September etc). It is a completely different problem. In
the former you may use logistic regression (probability of an event for
cases described by category 1, 2, 3, 4, ...., k, relative to the reference
probability of the event for cases described by the reference category); for
instance, suppose you investigate the probability of tornados with one
single predictor, "Month of year"; this predictor has 12 categories, one per
month, and you use one of the months (say, May) as your reference category
(you may have data from multiple years for the same location, or from
multiple locations in the same year; multiple years for given location may
seem more logical as an example, given the geographical variability of
tornados, and their relatively stable recurrence over time). You may use
also some other concurrent predictor, say accumulated annual rainfall over
the 12 months to the start of the tornado season, for all years (or
locations) considered. From these data, you obtain the odds of tornados in
September relative to January (probability of tornados in September, divided
by probability of tornados in January), and also the odds of tornados for
all the other months. In total, you obtain eleven odds (one per month, all
relative to January). Your logistic function for the probability of
tornadoes in a given month is p(k)=EXP(BX)/[1+EXP(BX)] where
BX=b0+b1X1+b2X2. The logarithm of the odds is BX, and the odds are BX. The
odds that a tornado happens in a given month k, say July, relative to the
base period (May) in years with rainfall x(t), equal the probability of a
tornado happening in month k, relative to the probability of a tornado
happening in May, for years with cumulative 12-month rainfall=x(t). (I use
May in this example as the probability of tornados in January is likely to
be zero).

In the survival case the situation is different. Suppose you use survival
analysis to predict the chances of surviving different lengths of time
without experiencing a tornado (say, at a given area, like Kansas), using a
"time variable" (month of year) and perhaps some predictor (say, rainfall
again). In this case, time is NOT a predictor: it is the one-directional
dimension along which events can occur. Your hazard function will have only
one predictor (rainfall). The proportional cumulative hazard rate h(t) for a
year with rainfall X=x(t) will mean: "number of events expected to have
occurred from starting time, say January, to month t, in years with rainfall
x(t)". (I use January in this example as the reference time in survival
analysis should be at the start of the relevant period, not in the middle of
it; one may use also March or April, if the start of the tornado season is
always after March). The associated survival probability up to month k, for
a year with rainfall x(t), i.e. p(tk), gives you the probability of not
having a tornado until month k, for year t with rainfall x(t).

Logistic regression estimates the probability of a tornado in each different
month. Survival analysis estimates the probability of not having the tornado
up to each different month.

Cox regression is a particular kind of "proportional hazard" survival
analysis, where (ordinarily) the hazard rates stand to the reference or base
hazard rate in a constant proportion over time. If your chances of survival
are twice as large as mine, guys like you would be twice more likely to
survive than guys like me up to every month, no matter how distant the month
considered. No chance that your survival chances approach mine over time:
they remain twice as large. For example, if 800 mm/yr of accumulated
rainfall up to start of season afford a reduction of 20% in the incidence of
tornadoes (relative to the reference case which has, say, 500 mm/yr
rainfall), this reduction of 20% in the odds of tornadoes will hold for all
time intervals, i.e. for the relative chances of tornadoes up to all months.
Cox regression  may, however, accommodate non proportional hazards by
introducing time-dependent covariates (such as accumulated rainfall UP TO
EACH MONTH). Other models of survival analysis may account for more
sophisticated relationships between time, covariates and events.

Hope this helps.

Hector

-----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de
Maguin, Eugene
Enviado el: Wednesday, July 11, 2012 14:03
Para: [hidden email]
Asunto: Re: Discrete time survival

Thanks, Hector. I'm not using Cox regression. I've constructed a so-called
person-period data and am using logistic regression to analyze it. I'm using
'time chunks' to mean periods. Maybe confusing nomenclature. In each period
there is a probability of the event and an associated odds. Relative to the
odds for the reference period an odds ratio can be computed each other
period. The B that referred to was the coefficient for one of the periods.

Using a longer aggregation period will eliminate no event periods, I
understand.

I have read that using Cox regression with discrete time may give problems
because of the multiple events in a period (ties).

Gene Maguin



-----Original Message-----
From: Hector Maletta [mailto:[hidden email]]
Sent: Wednesday, July 11, 2012 12:26 PM
To: Maguin, Eugene; [hidden email]
Subject: RE: Discrete time survival

I do not think it is wise to correct reality to make it fit the model, but
(if anything) the other way around.
On the other hand, survival analysis is about probabilities, and it is quite
likely that at any particular period the number of events is larger or
smaller than predicted, including zero events as a limiting case. When the
total number of events is 42 over 1200 periods, zero-event periods are a
necessity.
Time chunks might be useful for some purposes, but (1) by lumping time
periods together you are deliberately forfeiting information; and (2) the
actual length of the chunks is arbitrary: your conclusions may vary if you
use 2 months, 3 months, or 17 or 43 days as your time-chunk convention. At
any rate one should try different ways of carving the chunks, aiming at
using the shortest chunk that is possible to avoid unstable results (e.g. a
coefficient estimate that is extremely sensible to the random occurrence of
one more or one less event in a given chunk).
You do not clarify what kind of survival analysis you are doing. If it is
Cox regression, recall that call regression uses only information on the
succession or time-order of events, and not on the exact length of time
elapsed.
Also, Cox regression generates a function for the hazard rate (or conversely
the survival probability) including EXP(BX), with B coefficients for each
predictor, and these B coefficients are one single set of coefficients for
the entire analysis, not one coefficient per chunk of time. Thus I do not
understand what you mean by "a large B coefficient for that chunk".

Hope this helps.

Hector

-----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de
Maguin, Eugene Enviado el: Wednesday, July 11, 2012 12:54
Para: [hidden email]
Asunto: Discrete time survival

I'm working on a discrete time survival analysis (and following along with
Singer and Willett's chapters) and I've run into a problem that may occur
fairly often in discrete time. A bit of background: 42 out of 261 persons
had the event over a span of about 1200 days, which is my time unit. I
elected to work in discrete time rather than continuous time because I
thought, maybe incorrectly, that discrete time would be easier to work with
and present to an inexperienced audience. I'm aggregating time to eliminate
time chunks with no events. Using a three month aggregation, I have one time
chunk with no events. The estimation gives a large B coefficient for that
chunk, which is no surprise. My question is whether an acceptable remedy is
to add a tiny value to that cell, changing the 0 to, say, .001, if you think
in crosstab terms. Then, if it an acceptable remedy, is the doing of it just
a matter of changing the event status indicator from 0 to .001 for a
randomly chosen case a!
 t that time point?

Thanks, Gene Maguin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Discrete time survival

Hector Maletta
Bruce,
I understand that, but to me it is not the same thing. I have not read
Singer & Willet's book, but I tend to think a person-period dataset does not
imply that survival probabilities decrease monotonically over time: if you
start being alive today, it may well happen (with such dataset) that your
predicted probability of surviving up to next November is lower than your
probability of still being alive next December, which would be a bit unreal.

Even a multilevel model with persons and periods cannot account to the
ordered nature of periods: they would be simply several "periods" equivalent
to several "observations", with no particular order.
Perhaps an ordering of periods could be achieved by using "time elapsed" as
one of the predictors, but it all looks to me as a convoluted way of
arriving at the same point.
Perhaps some people can more easily manage logistic regression software than
survival analysis software, but to me they look equally easy or equally
difficult.
And errors of interpretation can arise in either, no matter how familiar the
software (recall the frequent mistake of using the "classification table" in
logistic regression as a criterion of goodness of fit, which is wrong
because "fit" in a probabilistic prediction is a fit of the probability,
measured as a proportion computed over a group of cases, not the actual
occurrence of the event to every individual having p>0.50 and the
non-occurrence where p<0.50).

Hector

-----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Bruce
Weaver
Enviado el: Wednesday, July 11, 2012 16:44
Para: [hidden email]
Asunto: Re: Discrete time survival

Hector, as Gene said in his first post, he is using "discrete time survival
analysis", and following advice given in Singer & Willett's (2003) book.  In
chapter 11 of that book, S&W show how one can fit a "discrete-time hazard
model" using a logistic regression procedure.  To quote:

"Our goal is to show that although the [discrete-time hazard] model may
appear complex and unfamiliar, it can be fit using software that is familiar
by applying standard logistic regression analysis in the /person-period data
set/."

A "person-period data set" has multiple rows per person, one row per period
of observation.  There is also an Event indicator variable (call it EVT).
For persons who experience the event, EVT=0 on all rows but the last.  For
those who do not experience the event, EVT=0 on all rows.  For more details,
see Singer & Willett's book (particularly chapters 11 & 12).

HTH.



Hector Maletta wrote

>
> Gene,
> I do not know the details of your research program; I thought you were
> dealing with a SURVIVAL problem, i.e. trying to ascertain the
> probabilities of surviving for a given span of time (after a certain
> "start" state) before an event strikes. This is not the same as the
> probabilities of the event at various periods in comparison with the
> probability of the event at some reference period. An  example of the
> latter would be to ascertain the probability of tornados in May, June,
> July, August, September etc relative to their probability in the
> "reference" month, say January. An example of the former is the
> probability that a given location would "survive"
> without
> any tornado, from January until a variable date (January to May,
> January to June, January to September etc). It is a completely
> different problem. In the former you may use logistic regression
> (probability of an event for cases described by category 1, 2, 3, 4,
> ...., k, relative to the reference probability of the event for cases
> described by the reference category); for instance, suppose you
> investigate the probability of tornados with one single predictor,
> "Month of year"; this predictor has 12 categories, one per month, and
> you use one of the months (say, May) as your reference category (you
> may have data from multiple years for the same location, or from
> multiple locations in the same year; multiple years for given location
> may seem more logical as an example, given the geographical
> variability of tornados, and their relatively stable recurrence over
> time). You may use also some other concurrent predictor, say
> accumulated annual rainfall over the 12 months to the start of the
> tornado season, for all years (or
> locations) considered. From these data, you obtain the odds of
> tornados in September relative to January (probability of tornados in
> September, divided by probability of tornados in January), and also
> the odds of tornados for all the other months. In total, you obtain
> eleven odds (one per month, all relative to January). Your logistic
> function for the probability of tornadoes in a given month is
> p(k)=EXP(BX)/[1+EXP(BX)] where BX=b0+b1X1+b2X2. The logarithm of the
> odds is BX, and the odds are BX. The odds that a tornado happens in a
> given month k, say July, relative to the base period (May) in years
> with rainfall x(t), equal the probability of a tornado happening in
> month k, relative to the probability of a tornado happening in May,
> for years with cumulative 12-month rainfall=x(t). (I use May in this
> example as the probability of tornados in January is likely to be
> zero).
>
> In the survival case the situation is different. Suppose you use
> survival analysis to predict the chances of surviving different
> lengths of time without experiencing a tornado (say, at a given area,
> like Kansas), using a "time variable" (month of year) and perhaps some
> predictor (say, rainfall again). In this case, time is NOT a
> predictor: it is the one-directional dimension along which events can
> occur. Your hazard function will have only one predictor (rainfall).
> The proportional cumulative hazard rate h(t) for a year with rainfall
> X=x(t) will mean: "number of events expected to have occurred from
> starting time, say January, to month t, in years with rainfall x(t)".
> (I use January in this example as the reference time in survival
> analysis should be at the start of the relevant period, not in the
> middle of it; one may use also March or April, if the start of the
> tornado season is always after March). The associated survival
> probability up to month k, for a year with rainfall x(t), i.e. p(tk),
> gives you the probability of not having a tornado until month k, for
> year t with rainfall x(t).
>
> Logistic regression estimates the probability of a tornado in each
> different month. Survival analysis estimates the probability of not
> having the tornado up to each different month.
>
> Cox regression is a particular kind of "proportional hazard" survival
> analysis, where (ordinarily) the hazard rates stand to the reference
> or base hazard rate in a constant proportion over time. If your
> chances of survival are twice as large as mine, guys like you would be
> twice more likely to survive than guys like me up to every month, no
> matter how distant the month considered. No chance that your survival
> chances approach mine over time:
> they remain twice as large. For example, if 800 mm/yr of accumulated
> rainfall up to start of season afford a reduction of 20% in the
> incidence of tornadoes (relative to the reference case which has, say,
> 500 mm/yr rainfall), this reduction of 20% in the odds of tornadoes
> will hold for all time intervals, i.e. for the relative chances of
> tornadoes up to all months.
> Cox regression  may, however, accommodate non proportional hazards by
> introducing time-dependent covariates (such as accumulated rainfall UP
> TO EACH MONTH). Other models of survival analysis may account for more
> sophisticated relationships between time, covariates and events.
>
> Hope this helps.
>
> Hector
>
> -----Mensaje original-----
> De: SPSSX(r) Discussion [mailto:SPSSX-L@.UGA] En nombre de Maguin,
> Eugene Enviado el: Wednesday, July 11, 2012 14:03
> Para: SPSSX-L@.UGA
> Asunto: Re: Discrete time survival
>
> Thanks, Hector. I'm not using Cox regression. I've constructed a
> so-called person-period data and am using logistic regression to
> analyze it. I'm using 'time chunks' to mean periods. Maybe confusing
> nomenclature. In each period there is a probability of the event and
> an associated odds. Relative to the odds for the reference period an
> odds ratio can be computed each other period. The B that referred to
> was the coefficient for one of the periods.
>
> Using a longer aggregation period will eliminate no event periods, I
> understand.
>
> I have read that using Cox regression with discrete time may give
> problems because of the multiple events in a period (ties).
>
> Gene Maguin
>
>
>
> -----Original Message-----
> From: Hector Maletta [mailto:hmaletta@.com]
> Sent: Wednesday, July 11, 2012 12:26 PM
> To: Maguin, Eugene; SPSSX-L@.UGA
> Subject: RE: Discrete time survival
>
> I do not think it is wise to correct reality to make it fit the model,
> but (if anything) the other way around.
> On the other hand, survival analysis is about probabilities, and it is
> quite likely that at any particular period the number of events is
> larger or smaller than predicted, including zero events as a limiting
> case. When the total number of events is 42 over 1200 periods,
> zero-event periods are a necessity.
> Time chunks might be useful for some purposes, but (1) by lumping time
> periods together you are deliberately forfeiting information; and (2)
> the actual length of the chunks is arbitrary: your conclusions may
> vary if you use 2 months, 3 months, or 17 or 43 days as your
> time-chunk convention. At any rate one should try different ways of
> carving the chunks, aiming at using the shortest chunk that is possible to
avoid unstable results (e.g.

> a
> coefficient estimate that is extremely sensible to the random
> occurrence of one more or one less event in a given chunk).
> You do not clarify what kind of survival analysis you are doing. If it
> is Cox regression, recall that call regression uses only information
> on the succession or time-order of events, and not on the exact length
> of time elapsed.
> Also, Cox regression generates a function for the hazard rate (or
> conversely the survival probability) including EXP(BX), with B
> coefficients for each predictor, and these B coefficients are one
> single set of coefficients for the entire analysis, not one
> coefficient per chunk of time. Thus I do not understand what you mean
> by "a large B coefficient for that chunk".
>
> Hope this helps.
>
> Hector
>
> -----Mensaje original-----
> De: SPSSX(r) Discussion [mailto:SPSSX-L@.UGA] En nombre de Maguin,
> Eugene Enviado el: Wednesday, July 11, 2012 12:54
> Para: SPSSX-L@.UGA
> Asunto: Discrete time survival
>
> I'm working on a discrete time survival analysis (and following along
> with Singer and Willett's chapters) and I've run into a problem that
> may occur fairly often in discrete time. A bit of background: 42 out
> of 261 persons had the event over a span of about 1200 days, which is
> my time unit. I elected to work in discrete time rather than
> continuous time because I thought, maybe incorrectly, that discrete
> time would be easier to work with and present to an inexperienced
> audience. I'm aggregating time to eliminate time chunks with no
> events. Using a three month aggregation, I have one time chunk with no
> events. The estimation gives a large B coefficient for that chunk,
> which is no surprise. My question is whether an acceptable remedy is
> to add a tiny value to that cell, changing the 0 to, say, .001, if you
> think in crosstab terms. Then, if it an acceptable remedy, is the
> doing of it just a matter of changing the event status indicator from
> 0 to .001 for a randomly chosen case a!
>  t that time point?
>
> Thanks, Gene Maguin
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the command.
> To leave the list, send the command SIGNOFF SPSSX-L For a list of
> commands to manage subscriptions, send the command INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the command.
> To leave the list, send the command SIGNOFF SPSSX-L For a list of
> commands to manage subscriptions, send the command INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the command.
> To leave the list, send the command SIGNOFF SPSSX-L For a list of
> commands to manage subscriptions, send the command INFO REFCARD
>


-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Discrete-time-survival-tp57141
38p5714149.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Discrete time survival

Poes, Matthew Joseph
Hector, I have this book in front of me.  It seems to me that the censoring variables and event previous variables would ensure that the hazard ratios were accurate, and from what I gather, the claim Singer and Willet are making is they are simply using logistic regression to equal standard hazard modeling, without needing to familiarize yourself with new software.  My understanding from the claim is that they become the same thing, and from what I'm seeing, it appears to be the case.

In the same way, while specialized software exists to perform propensity score matching, all of the methods used in those software's can be equated manually in SPSS or SAS using separate steps.  The separate software makes it easier to perform these functions, but they are introducing new math, so to speak.

The Hazard function is the probability that you experience an event at a given time, given that you didn't experience that event in the previous time.  Doesn't the censoring allow the model to calculate this such that it's not creating a probability for someone already dead?  I assume that was the point you were trying to make?

If what you are saying is that someone's probability of being alive can't increase with time, that isn't true either.  Cancer patients have an increased probability of staying in remission from cancer the longer they go after going into remission with the cancer.  Alcoholics have a higher probability of staying sober the longer they stay sober after treatment (i.e. they are more likely to no relapse once they have put together a great deal of recovery).

The order aspect of time is up to you.  When it comes to any statistical analysis of time, the interpretation of time in a specific order is still researcher interpretation specific, the model can still do "wonky" time comparisons that make no sense.  Statistics doesn't understand that time flows from past to present, only we do.  It is up to you to utilize the variable indicating time and censoring to ensure that the analysis makes sense.

Am I misunderstanding your concern?

Matthew J Poes
Research Data Specialist
Center for Prevention Research and Development
University of Illinois
510 Devonshire Dr.
Champaign, IL 61820
Phone: 217-265-4576
email: [hidden email]



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Hector Maletta
Sent: Wednesday, July 11, 2012 4:09 PM
To: [hidden email]
Subject: Re: Discrete time survival

Bruce,
I understand that, but to me it is not the same thing. I have not read Singer & Willet's book, but I tend to think a person-period dataset does not imply that survival probabilities decrease monotonically over time: if you start being alive today, it may well happen (with such dataset) that your predicted probability of surviving up to next November is lower than your probability of still being alive next December, which would be a bit unreal.

Even a multilevel model with persons and periods cannot account to the ordered nature of periods: they would be simply several "periods" equivalent to several "observations", with no particular order.
Perhaps an ordering of periods could be achieved by using "time elapsed" as one of the predictors, but it all looks to me as a convoluted way of arriving at the same point.
Perhaps some people can more easily manage logistic regression software than survival analysis software, but to me they look equally easy or equally difficult.
And errors of interpretation can arise in either, no matter how familiar the software (recall the frequent mistake of using the "classification table" in logistic regression as a criterion of goodness of fit, which is wrong because "fit" in a probabilistic prediction is a fit of the probability, measured as a proportion computed over a group of cases, not the actual occurrence of the event to every individual having p>0.50 and the non-occurrence where p<0.50).

Hector

-----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Bruce Weaver Enviado el: Wednesday, July 11, 2012 16:44
Para: [hidden email]
Asunto: Re: Discrete time survival

Hector, as Gene said in his first post, he is using "discrete time survival analysis", and following advice given in Singer & Willett's (2003) book.  In chapter 11 of that book, S&W show how one can fit a "discrete-time hazard model" using a logistic regression procedure.  To quote:

"Our goal is to show that although the [discrete-time hazard] model may appear complex and unfamiliar, it can be fit using software that is familiar by applying standard logistic regression analysis in the /person-period data set/."

A "person-period data set" has multiple rows per person, one row per period of observation.  There is also an Event indicator variable (call it EVT).
For persons who experience the event, EVT=0 on all rows but the last.  For those who do not experience the event, EVT=0 on all rows.  For more details, see Singer & Willett's book (particularly chapters 11 & 12).

HTH.



Hector Maletta wrote

>
> Gene,
> I do not know the details of your research program; I thought you were
> dealing with a SURVIVAL problem, i.e. trying to ascertain the
> probabilities of surviving for a given span of time (after a certain
> "start" state) before an event strikes. This is not the same as the
> probabilities of the event at various periods in comparison with the
> probability of the event at some reference period. An  example of the
> latter would be to ascertain the probability of tornados in May, June,
> July, August, September etc relative to their probability in the
> "reference" month, say January. An example of the former is the
> probability that a given location would "survive"
> without
> any tornado, from January until a variable date (January to May,
> January to June, January to September etc). It is a completely
> different problem. In the former you may use logistic regression
> (probability of an event for cases described by category 1, 2, 3, 4,
> ...., k, relative to the reference probability of the event for cases
> described by the reference category); for instance, suppose you
> investigate the probability of tornados with one single predictor,
> "Month of year"; this predictor has 12 categories, one per month, and
> you use one of the months (say, May) as your reference category (you
> may have data from multiple years for the same location, or from
> multiple locations in the same year; multiple years for given location
> may seem more logical as an example, given the geographical
> variability of tornados, and their relatively stable recurrence over
> time). You may use also some other concurrent predictor, say
> accumulated annual rainfall over the 12 months to the start of the
> tornado season, for all years (or
> locations) considered. From these data, you obtain the odds of
> tornados in September relative to January (probability of tornados in
> September, divided by probability of tornados in January), and also
> the odds of tornados for all the other months. In total, you obtain
> eleven odds (one per month, all relative to January). Your logistic
> function for the probability of tornadoes in a given month is
> p(k)=EXP(BX)/[1+EXP(BX)] where BX=b0+b1X1+b2X2. The logarithm of the
> odds is BX, and the odds are BX. The odds that a tornado happens in a
> given month k, say July, relative to the base period (May) in years
> with rainfall x(t), equal the probability of a tornado happening in
> month k, relative to the probability of a tornado happening in May,
> for years with cumulative 12-month rainfall=x(t). (I use May in this
> example as the probability of tornados in January is likely to be
> zero).
>
> In the survival case the situation is different. Suppose you use
> survival analysis to predict the chances of surviving different
> lengths of time without experiencing a tornado (say, at a given area,
> like Kansas), using a "time variable" (month of year) and perhaps some
> predictor (say, rainfall again). In this case, time is NOT a
> predictor: it is the one-directional dimension along which events can
> occur. Your hazard function will have only one predictor (rainfall).
> The proportional cumulative hazard rate h(t) for a year with rainfall
> X=x(t) will mean: "number of events expected to have occurred from
> starting time, say January, to month t, in years with rainfall x(t)".
> (I use January in this example as the reference time in survival
> analysis should be at the start of the relevant period, not in the
> middle of it; one may use also March or April, if the start of the
> tornado season is always after March). The associated survival
> probability up to month k, for a year with rainfall x(t), i.e. p(tk),
> gives you the probability of not having a tornado until month k, for
> year t with rainfall x(t).
>
> Logistic regression estimates the probability of a tornado in each
> different month. Survival analysis estimates the probability of not
> having the tornado up to each different month.
>
> Cox regression is a particular kind of "proportional hazard" survival
> analysis, where (ordinarily) the hazard rates stand to the reference
> or base hazard rate in a constant proportion over time. If your
> chances of survival are twice as large as mine, guys like you would be
> twice more likely to survive than guys like me up to every month, no
> matter how distant the month considered. No chance that your survival
> chances approach mine over time:
> they remain twice as large. For example, if 800 mm/yr of accumulated
> rainfall up to start of season afford a reduction of 20% in the
> incidence of tornadoes (relative to the reference case which has, say,
> 500 mm/yr rainfall), this reduction of 20% in the odds of tornadoes
> will hold for all time intervals, i.e. for the relative chances of
> tornadoes up to all months.
> Cox regression  may, however, accommodate non proportional hazards by
> introducing time-dependent covariates (such as accumulated rainfall UP
> TO EACH MONTH). Other models of survival analysis may account for more
> sophisticated relationships between time, covariates and events.
>
> Hope this helps.
>
> Hector
>
> -----Mensaje original-----
> De: SPSSX(r) Discussion [mailto:SPSSX-L@.UGA] En nombre de Maguin,
> Eugene Enviado el: Wednesday, July 11, 2012 14:03
> Para: SPSSX-L@.UGA
> Asunto: Re: Discrete time survival
>
> Thanks, Hector. I'm not using Cox regression. I've constructed a
> so-called person-period data and am using logistic regression to
> analyze it. I'm using 'time chunks' to mean periods. Maybe confusing
> nomenclature. In each period there is a probability of the event and
> an associated odds. Relative to the odds for the reference period an
> odds ratio can be computed each other period. The B that referred to
> was the coefficient for one of the periods.
>
> Using a longer aggregation period will eliminate no event periods, I
> understand.
>
> I have read that using Cox regression with discrete time may give
> problems because of the multiple events in a period (ties).
>
> Gene Maguin
>
>
>
> -----Original Message-----
> From: Hector Maletta [mailto:hmaletta@.com]
> Sent: Wednesday, July 11, 2012 12:26 PM
> To: Maguin, Eugene; SPSSX-L@.UGA
> Subject: RE: Discrete time survival
>
> I do not think it is wise to correct reality to make it fit the model,
> but (if anything) the other way around.
> On the other hand, survival analysis is about probabilities, and it is
> quite likely that at any particular period the number of events is
> larger or smaller than predicted, including zero events as a limiting
> case. When the total number of events is 42 over 1200 periods,
> zero-event periods are a necessity.
> Time chunks might be useful for some purposes, but (1) by lumping time
> periods together you are deliberately forfeiting information; and (2)
> the actual length of the chunks is arbitrary: your conclusions may
> vary if you use 2 months, 3 months, or 17 or 43 days as your
> time-chunk convention. At any rate one should try different ways of
> carving the chunks, aiming at using the shortest chunk that is
> possible to
avoid unstable results (e.g.

> a
> coefficient estimate that is extremely sensible to the random
> occurrence of one more or one less event in a given chunk).
> You do not clarify what kind of survival analysis you are doing. If it
> is Cox regression, recall that call regression uses only information
> on the succession or time-order of events, and not on the exact length
> of time elapsed.
> Also, Cox regression generates a function for the hazard rate (or
> conversely the survival probability) including EXP(BX), with B
> coefficients for each predictor, and these B coefficients are one
> single set of coefficients for the entire analysis, not one
> coefficient per chunk of time. Thus I do not understand what you mean
> by "a large B coefficient for that chunk".
>
> Hope this helps.
>
> Hector
>
> -----Mensaje original-----
> De: SPSSX(r) Discussion [mailto:SPSSX-L@.UGA] En nombre de Maguin,
> Eugene Enviado el: Wednesday, July 11, 2012 12:54
> Para: SPSSX-L@.UGA
> Asunto: Discrete time survival
>
> I'm working on a discrete time survival analysis (and following along
> with Singer and Willett's chapters) and I've run into a problem that
> may occur fairly often in discrete time. A bit of background: 42 out
> of 261 persons had the event over a span of about 1200 days, which is
> my time unit. I elected to work in discrete time rather than
> continuous time because I thought, maybe incorrectly, that discrete
> time would be easier to work with and present to an inexperienced
> audience. I'm aggregating time to eliminate time chunks with no
> events. Using a three month aggregation, I have one time chunk with no
> events. The estimation gives a large B coefficient for that chunk,
> which is no surprise. My question is whether an acceptable remedy is
> to add a tiny value to that cell, changing the 0 to, say, .001, if you
> think in crosstab terms. Then, if it an acceptable remedy, is the
> doing of it just a matter of changing the event status indicator from
> 0 to .001 for a randomly chosen case a!
>  t that time point?
>
> Thanks, Gene Maguin
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the command.
> To leave the list, send the command SIGNOFF SPSSX-L For a list of
> commands to manage subscriptions, send the command INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the command.
> To leave the list, send the command SIGNOFF SPSSX-L For a list of
> commands to manage subscriptions, send the command INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the command.
> To leave the list, send the command SIGNOFF SPSSX-L For a list of
> commands to manage subscriptions, send the command INFO REFCARD
>


-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Discrete-time-survival-tp57141
38p5714149.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Discrete time survival

Hector Maletta
Thanks, Matthew, for the clarifications.

Regarding your cancer patients:
1. The fact that the probability of staying alive (provided you are not dead
already) increases with the time already spent in remission does not counter
the fact that the survival function is monotonically not increasing: at each
time t, the surviving proportion of the initial population must be not
greater than the surviving proportion at time t-1. This survival rate is not
computed on those surviving to that date, but on the original population of
patients.
2. If the relative hazard rate of a type of patient (relative to the
reference type of patient) changes over time, that is not a proportional
hazard case, and some remedy should be introduced to account for that, such
as time-dependent covariates. Otherwise, the surviving proportion of
patients of type A must stand in a constant proportion to the surviving
proportion of patients of type B, at all times, which is the basic
assumption of Cox regression without time-dependent covariates.

Hector


-----Mensaje original-----
De: Poes, Matthew Joseph [mailto:[hidden email]]
Enviado el: Wednesday, July 11, 2012 18:35
Para: 'Hector Maletta'; '[hidden email]'
Asunto: RE: Discrete time survival

Hector, I have this book in front of me.  It seems to me that the censoring
variables and event previous variables would ensure that the hazard ratios
were accurate, and from what I gather, the claim Singer and Willet are
making is they are simply using logistic regression to equal standard hazard
modeling, without needing to familiarize yourself with new software.  My
understanding from the claim is that they become the same thing, and from
what I'm seeing, it appears to be the case.

In the same way, while specialized software exists to perform propensity
score matching, all of the methods used in those software's can be equated
manually in SPSS or SAS using separate steps.  The separate software makes
it easier to perform these functions, but they are introducing new math, so
to speak.

The Hazard function is the probability that you experience an event at a
given time, given that you didn't experience that event in the previous
time.  Doesn't the censoring allow the model to calculate this such that
it's not creating a probability for someone already dead?  I assume that was
the point you were trying to make?

If what you are saying is that someone's probability of being alive can't
increase with time, that isn't true either.  Cancer patients have an
increased probability of staying in remission from cancer the longer they go
after going into remission with the cancer.  Alcoholics have a higher
probability of staying sober the longer they stay sober after treatment
(i.e. they are more likely to no relapse once they have put together a great
deal of recovery).

The order aspect of time is up to you.  When it comes to any statistical
analysis of time, the interpretation of time in a specific order is still
researcher interpretation specific, the model can still do "wonky" time
comparisons that make no sense.  Statistics doesn't understand that time
flows from past to present, only we do.  It is up to you to utilize the
variable indicating time and censoring to ensure that the analysis makes
sense.

Am I misunderstanding your concern?

Matthew J Poes
Research Data Specialist
Center for Prevention Research and Development University of Illinois 510
Devonshire Dr.
Champaign, IL 61820
Phone: 217-265-4576
email: [hidden email]



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Hector Maletta
Sent: Wednesday, July 11, 2012 4:09 PM
To: [hidden email]
Subject: Re: Discrete time survival

Bruce,
I understand that, but to me it is not the same thing. I have not read
Singer & Willet's book, but I tend to think a person-period dataset does not
imply that survival probabilities decrease monotonically over time: if you
start being alive today, it may well happen (with such dataset) that your
predicted probability of surviving up to next November is lower than your
probability of still being alive next December, which would be a bit unreal.

Even a multilevel model with persons and periods cannot account to the
ordered nature of periods: they would be simply several "periods" equivalent
to several "observations", with no particular order.
Perhaps an ordering of periods could be achieved by using "time elapsed" as
one of the predictors, but it all looks to me as a convoluted way of
arriving at the same point.
Perhaps some people can more easily manage logistic regression software than
survival analysis software, but to me they look equally easy or equally
difficult.
And errors of interpretation can arise in either, no matter how familiar the
software (recall the frequent mistake of using the "classification table" in
logistic regression as a criterion of goodness of fit, which is wrong
because "fit" in a probabilistic prediction is a fit of the probability,
measured as a proportion computed over a group of cases, not the actual
occurrence of the event to every individual having p>0.50 and the
non-occurrence where p<0.50).

Hector

-----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Bruce
Weaver Enviado el: Wednesday, July 11, 2012 16:44
Para: [hidden email]
Asunto: Re: Discrete time survival

Hector, as Gene said in his first post, he is using "discrete time survival
analysis", and following advice given in Singer & Willett's (2003) book.  In
chapter 11 of that book, S&W show how one can fit a "discrete-time hazard
model" using a logistic regression procedure.  To quote:

"Our goal is to show that although the [discrete-time hazard] model may
appear complex and unfamiliar, it can be fit using software that is familiar
by applying standard logistic regression analysis in the /person-period data
set/."

A "person-period data set" has multiple rows per person, one row per period
of observation.  There is also an Event indicator variable (call it EVT).
For persons who experience the event, EVT=0 on all rows but the last.  For
those who do not experience the event, EVT=0 on all rows.  For more details,
see Singer & Willett's book (particularly chapters 11 & 12).

HTH.



Hector Maletta wrote

>
> Gene,
> I do not know the details of your research program; I thought you were
> dealing with a SURVIVAL problem, i.e. trying to ascertain the
> probabilities of surviving for a given span of time (after a certain
> "start" state) before an event strikes. This is not the same as the
> probabilities of the event at various periods in comparison with the
> probability of the event at some reference period. An  example of the
> latter would be to ascertain the probability of tornados in May, June,
> July, August, September etc relative to their probability in the
> "reference" month, say January. An example of the former is the
> probability that a given location would "survive"
> without
> any tornado, from January until a variable date (January to May,
> January to June, January to September etc). It is a completely
> different problem. In the former you may use logistic regression
> (probability of an event for cases described by category 1, 2, 3, 4,
> ...., k, relative to the reference probability of the event for cases
> described by the reference category); for instance, suppose you
> investigate the probability of tornados with one single predictor,
> "Month of year"; this predictor has 12 categories, one per month, and
> you use one of the months (say, May) as your reference category (you
> may have data from multiple years for the same location, or from
> multiple locations in the same year; multiple years for given location
> may seem more logical as an example, given the geographical
> variability of tornados, and their relatively stable recurrence over
> time). You may use also some other concurrent predictor, say
> accumulated annual rainfall over the 12 months to the start of the
> tornado season, for all years (or
> locations) considered. From these data, you obtain the odds of
> tornados in September relative to January (probability of tornados in
> September, divided by probability of tornados in January), and also
> the odds of tornados for all the other months. In total, you obtain
> eleven odds (one per month, all relative to January). Your logistic
> function for the probability of tornadoes in a given month is
> p(k)=EXP(BX)/[1+EXP(BX)] where BX=b0+b1X1+b2X2. The logarithm of the
> odds is BX, and the odds are BX. The odds that a tornado happens in a
> given month k, say July, relative to the base period (May) in years
> with rainfall x(t), equal the probability of a tornado happening in
> month k, relative to the probability of a tornado happening in May,
> for years with cumulative 12-month rainfall=x(t). (I use May in this
> example as the probability of tornados in January is likely to be
> zero).
>
> In the survival case the situation is different. Suppose you use
> survival analysis to predict the chances of surviving different
> lengths of time without experiencing a tornado (say, at a given area,
> like Kansas), using a "time variable" (month of year) and perhaps some
> predictor (say, rainfall again). In this case, time is NOT a
> predictor: it is the one-directional dimension along which events can
> occur. Your hazard function will have only one predictor (rainfall).
> The proportional cumulative hazard rate h(t) for a year with rainfall
> X=x(t) will mean: "number of events expected to have occurred from
> starting time, say January, to month t, in years with rainfall x(t)".
> (I use January in this example as the reference time in survival
> analysis should be at the start of the relevant period, not in the
> middle of it; one may use also March or April, if the start of the
> tornado season is always after March). The associated survival
> probability up to month k, for a year with rainfall x(t), i.e. p(tk),
> gives you the probability of not having a tornado until month k, for
> year t with rainfall x(t).
>
> Logistic regression estimates the probability of a tornado in each
> different month. Survival analysis estimates the probability of not
> having the tornado up to each different month.
>
> Cox regression is a particular kind of "proportional hazard" survival
> analysis, where (ordinarily) the hazard rates stand to the reference
> or base hazard rate in a constant proportion over time. If your
> chances of survival are twice as large as mine, guys like you would be
> twice more likely to survive than guys like me up to every month, no
> matter how distant the month considered. No chance that your survival
> chances approach mine over time:
> they remain twice as large. For example, if 800 mm/yr of accumulated
> rainfall up to start of season afford a reduction of 20% in the
> incidence of tornadoes (relative to the reference case which has, say,
> 500 mm/yr rainfall), this reduction of 20% in the odds of tornadoes
> will hold for all time intervals, i.e. for the relative chances of
> tornadoes up to all months.
> Cox regression  may, however, accommodate non proportional hazards by
> introducing time-dependent covariates (such as accumulated rainfall UP
> TO EACH MONTH). Other models of survival analysis may account for more
> sophisticated relationships between time, covariates and events.
>
> Hope this helps.
>
> Hector
>
> -----Mensaje original-----
> De: SPSSX(r) Discussion [mailto:SPSSX-L@.UGA] En nombre de Maguin,
> Eugene Enviado el: Wednesday, July 11, 2012 14:03
> Para: SPSSX-L@.UGA
> Asunto: Re: Discrete time survival
>
> Thanks, Hector. I'm not using Cox regression. I've constructed a
> so-called person-period data and am using logistic regression to
> analyze it. I'm using 'time chunks' to mean periods. Maybe confusing
> nomenclature. In each period there is a probability of the event and
> an associated odds. Relative to the odds for the reference period an
> odds ratio can be computed each other period. The B that referred to
> was the coefficient for one of the periods.
>
> Using a longer aggregation period will eliminate no event periods, I
> understand.
>
> I have read that using Cox regression with discrete time may give
> problems because of the multiple events in a period (ties).
>
> Gene Maguin
>
>
>
> -----Original Message-----
> From: Hector Maletta [mailto:hmaletta@.com]
> Sent: Wednesday, July 11, 2012 12:26 PM
> To: Maguin, Eugene; SPSSX-L@.UGA
> Subject: RE: Discrete time survival
>
> I do not think it is wise to correct reality to make it fit the model,
> but (if anything) the other way around.
> On the other hand, survival analysis is about probabilities, and it is
> quite likely that at any particular period the number of events is
> larger or smaller than predicted, including zero events as a limiting
> case. When the total number of events is 42 over 1200 periods,
> zero-event periods are a necessity.
> Time chunks might be useful for some purposes, but (1) by lumping time
> periods together you are deliberately forfeiting information; and (2)
> the actual length of the chunks is arbitrary: your conclusions may
> vary if you use 2 months, 3 months, or 17 or 43 days as your
> time-chunk convention. At any rate one should try different ways of
> carving the chunks, aiming at using the shortest chunk that is
> possible to
avoid unstable results (e.g.

> a
> coefficient estimate that is extremely sensible to the random
> occurrence of one more or one less event in a given chunk).
> You do not clarify what kind of survival analysis you are doing. If it
> is Cox regression, recall that call regression uses only information
> on the succession or time-order of events, and not on the exact length
> of time elapsed.
> Also, Cox regression generates a function for the hazard rate (or
> conversely the survival probability) including EXP(BX), with B
> coefficients for each predictor, and these B coefficients are one
> single set of coefficients for the entire analysis, not one
> coefficient per chunk of time. Thus I do not understand what you mean
> by "a large B coefficient for that chunk".
>
> Hope this helps.
>
> Hector
>
> -----Mensaje original-----
> De: SPSSX(r) Discussion [mailto:SPSSX-L@.UGA] En nombre de Maguin,
> Eugene Enviado el: Wednesday, July 11, 2012 12:54
> Para: SPSSX-L@.UGA
> Asunto: Discrete time survival
>
> I'm working on a discrete time survival analysis (and following along
> with Singer and Willett's chapters) and I've run into a problem that
> may occur fairly often in discrete time. A bit of background: 42 out
> of 261 persons had the event over a span of about 1200 days, which is
> my time unit. I elected to work in discrete time rather than
> continuous time because I thought, maybe incorrectly, that discrete
> time would be easier to work with and present to an inexperienced
> audience. I'm aggregating time to eliminate time chunks with no
> events. Using a three month aggregation, I have one time chunk with no
> events. The estimation gives a large B coefficient for that chunk,
> which is no surprise. My question is whether an acceptable remedy is
> to add a tiny value to that cell, changing the 0 to, say, .001, if you
> think in crosstab terms. Then, if it an acceptable remedy, is the
> doing of it just a matter of changing the event status indicator from
> 0 to .001 for a randomly chosen case a!
>  t that time point?
>
> Thanks, Gene Maguin
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the command.
> To leave the list, send the command SIGNOFF SPSSX-L For a list of
> commands to manage subscriptions, send the command INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the command.
> To leave the list, send the command SIGNOFF SPSSX-L For a list of
> commands to manage subscriptions, send the command INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the command.
> To leave the list, send the command SIGNOFF SPSSX-L For a list of
> commands to manage subscriptions, send the command INFO REFCARD
>


-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Discrete-time-survival-tp57141
38p5714149.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Discrete time survival

Maguin, Eugene
In reply to this post by Hector Maletta
Hector, if you want to pursue it, another reference is Paul Allison's #46 Sage 'green book' "Event History Analysis" published in 1984.

Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Hector Maletta
Sent: Wednesday, July 11, 2012 5:09 PM
To: [hidden email]
Subject: Re: Discrete time survival

Bruce,
I understand that, but to me it is not the same thing. I have not read Singer & Willet's book, but I tend to think a person-period dataset does not imply that survival probabilities decrease monotonically over time: if you start being alive today, it may well happen (with such dataset) that your predicted probability of surviving up to next November is lower than your probability of still being alive next December, which would be a bit unreal.

Even a multilevel model with persons and periods cannot account to the ordered nature of periods: they would be simply several "periods" equivalent to several "observations", with no particular order.
Perhaps an ordering of periods could be achieved by using "time elapsed" as one of the predictors, but it all looks to me as a convoluted way of arriving at the same point.
Perhaps some people can more easily manage logistic regression software than survival analysis software, but to me they look equally easy or equally difficult.
And errors of interpretation can arise in either, no matter how familiar the software (recall the frequent mistake of using the "classification table" in logistic regression as a criterion of goodness of fit, which is wrong because "fit" in a probabilistic prediction is a fit of the probability, measured as a proportion computed over a group of cases, not the actual occurrence of the event to every individual having p>0.50 and the non-occurrence where p<0.50).

Hector

-----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Bruce Weaver Enviado el: Wednesday, July 11, 2012 16:44
Para: [hidden email]
Asunto: Re: Discrete time survival

Hector, as Gene said in his first post, he is using "discrete time survival analysis", and following advice given in Singer & Willett's (2003) book.  In chapter 11 of that book, S&W show how one can fit a "discrete-time hazard model" using a logistic regression procedure.  To quote:

"Our goal is to show that although the [discrete-time hazard] model may appear complex and unfamiliar, it can be fit using software that is familiar by applying standard logistic regression analysis in the /person-period data set/."

A "person-period data set" has multiple rows per person, one row per period of observation.  There is also an Event indicator variable (call it EVT).
For persons who experience the event, EVT=0 on all rows but the last.  For those who do not experience the event, EVT=0 on all rows.  For more details, see Singer & Willett's book (particularly chapters 11 & 12).

HTH.



Hector Maletta wrote

>
> Gene,
> I do not know the details of your research program; I thought you were
> dealing with a SURVIVAL problem, i.e. trying to ascertain the
> probabilities of surviving for a given span of time (after a certain
> "start" state) before an event strikes. This is not the same as the
> probabilities of the event at various periods in comparison with the
> probability of the event at some reference period. An  example of the
> latter would be to ascertain the probability of tornados in May, June,
> July, August, September etc relative to their probability in the
> "reference" month, say January. An example of the former is the
> probability that a given location would "survive"
> without
> any tornado, from January until a variable date (January to May,
> January to June, January to September etc). It is a completely
> different problem. In the former you may use logistic regression
> (probability of an event for cases described by category 1, 2, 3, 4,
> ...., k, relative to the reference probability of the event for cases
> described by the reference category); for instance, suppose you
> investigate the probability of tornados with one single predictor,
> "Month of year"; this predictor has 12 categories, one per month, and
> you use one of the months (say, May) as your reference category (you
> may have data from multiple years for the same location, or from
> multiple locations in the same year; multiple years for given location
> may seem more logical as an example, given the geographical
> variability of tornados, and their relatively stable recurrence over
> time). You may use also some other concurrent predictor, say
> accumulated annual rainfall over the 12 months to the start of the
> tornado season, for all years (or
> locations) considered. From these data, you obtain the odds of
> tornados in September relative to January (probability of tornados in
> September, divided by probability of tornados in January), and also
> the odds of tornados for all the other months. In total, you obtain
> eleven odds (one per month, all relative to January). Your logistic
> function for the probability of tornadoes in a given month is
> p(k)=EXP(BX)/[1+EXP(BX)] where BX=b0+b1X1+b2X2. The logarithm of the
> odds is BX, and the odds are BX. The odds that a tornado happens in a
> given month k, say July, relative to the base period (May) in years
> with rainfall x(t), equal the probability of a tornado happening in
> month k, relative to the probability of a tornado happening in May,
> for years with cumulative 12-month rainfall=x(t). (I use May in this
> example as the probability of tornados in January is likely to be
> zero).
>
> In the survival case the situation is different. Suppose you use
> survival analysis to predict the chances of surviving different
> lengths of time without experiencing a tornado (say, at a given area,
> like Kansas), using a "time variable" (month of year) and perhaps some
> predictor (say, rainfall again). In this case, time is NOT a
> predictor: it is the one-directional dimension along which events can
> occur. Your hazard function will have only one predictor (rainfall).
> The proportional cumulative hazard rate h(t) for a year with rainfall
> X=x(t) will mean: "number of events expected to have occurred from
> starting time, say January, to month t, in years with rainfall x(t)".
> (I use January in this example as the reference time in survival
> analysis should be at the start of the relevant period, not in the
> middle of it; one may use also March or April, if the start of the
> tornado season is always after March). The associated survival
> probability up to month k, for a year with rainfall x(t), i.e. p(tk),
> gives you the probability of not having a tornado until month k, for
> year t with rainfall x(t).
>
> Logistic regression estimates the probability of a tornado in each
> different month. Survival analysis estimates the probability of not
> having the tornado up to each different month.
>
> Cox regression is a particular kind of "proportional hazard" survival
> analysis, where (ordinarily) the hazard rates stand to the reference
> or base hazard rate in a constant proportion over time. If your
> chances of survival are twice as large as mine, guys like you would be
> twice more likely to survive than guys like me up to every month, no
> matter how distant the month considered. No chance that your survival
> chances approach mine over time:
> they remain twice as large. For example, if 800 mm/yr of accumulated
> rainfall up to start of season afford a reduction of 20% in the
> incidence of tornadoes (relative to the reference case which has, say,
> 500 mm/yr rainfall), this reduction of 20% in the odds of tornadoes
> will hold for all time intervals, i.e. for the relative chances of
> tornadoes up to all months.
> Cox regression  may, however, accommodate non proportional hazards by
> introducing time-dependent covariates (such as accumulated rainfall UP
> TO EACH MONTH). Other models of survival analysis may account for more
> sophisticated relationships between time, covariates and events.
>
> Hope this helps.
>
> Hector
>
> -----Mensaje original-----
> De: SPSSX(r) Discussion [mailto:SPSSX-L@.UGA] En nombre de Maguin,
> Eugene Enviado el: Wednesday, July 11, 2012 14:03
> Para: SPSSX-L@.UGA
> Asunto: Re: Discrete time survival
>
> Thanks, Hector. I'm not using Cox regression. I've constructed a
> so-called person-period data and am using logistic regression to
> analyze it. I'm using 'time chunks' to mean periods. Maybe confusing
> nomenclature. In each period there is a probability of the event and
> an associated odds. Relative to the odds for the reference period an
> odds ratio can be computed each other period. The B that referred to
> was the coefficient for one of the periods.
>
> Using a longer aggregation period will eliminate no event periods, I
> understand.
>
> I have read that using Cox regression with discrete time may give
> problems because of the multiple events in a period (ties).
>
> Gene Maguin
>
>
>
> -----Original Message-----
> From: Hector Maletta [mailto:hmaletta@.com]
> Sent: Wednesday, July 11, 2012 12:26 PM
> To: Maguin, Eugene; SPSSX-L@.UGA
> Subject: RE: Discrete time survival
>
> I do not think it is wise to correct reality to make it fit the model,
> but (if anything) the other way around.
> On the other hand, survival analysis is about probabilities, and it is
> quite likely that at any particular period the number of events is
> larger or smaller than predicted, including zero events as a limiting
> case. When the total number of events is 42 over 1200 periods,
> zero-event periods are a necessity.
> Time chunks might be useful for some purposes, but (1) by lumping time
> periods together you are deliberately forfeiting information; and (2)
> the actual length of the chunks is arbitrary: your conclusions may
> vary if you use 2 months, 3 months, or 17 or 43 days as your
> time-chunk convention. At any rate one should try different ways of
> carving the chunks, aiming at using the shortest chunk that is
> possible to
avoid unstable results (e.g.

> a
> coefficient estimate that is extremely sensible to the random
> occurrence of one more or one less event in a given chunk).
> You do not clarify what kind of survival analysis you are doing. If it
> is Cox regression, recall that call regression uses only information
> on the succession or time-order of events, and not on the exact length
> of time elapsed.
> Also, Cox regression generates a function for the hazard rate (or
> conversely the survival probability) including EXP(BX), with B
> coefficients for each predictor, and these B coefficients are one
> single set of coefficients for the entire analysis, not one
> coefficient per chunk of time. Thus I do not understand what you mean
> by "a large B coefficient for that chunk".
>
> Hope this helps.
>
> Hector
>
> -----Mensaje original-----
> De: SPSSX(r) Discussion [mailto:SPSSX-L@.UGA] En nombre de Maguin,
> Eugene Enviado el: Wednesday, July 11, 2012 12:54
> Para: SPSSX-L@.UGA
> Asunto: Discrete time survival
>
> I'm working on a discrete time survival analysis (and following along
> with Singer and Willett's chapters) and I've run into a problem that
> may occur fairly often in discrete time. A bit of background: 42 out
> of 261 persons had the event over a span of about 1200 days, which is
> my time unit. I elected to work in discrete time rather than
> continuous time because I thought, maybe incorrectly, that discrete
> time would be easier to work with and present to an inexperienced
> audience. I'm aggregating time to eliminate time chunks with no
> events. Using a three month aggregation, I have one time chunk with no
> events. The estimation gives a large B coefficient for that chunk,
> which is no surprise. My question is whether an acceptable remedy is
> to add a tiny value to that cell, changing the 0 to, say, .001, if you
> think in crosstab terms. Then, if it an acceptable remedy, is the
> doing of it just a matter of changing the event status indicator from
> 0 to .001 for a randomly chosen case a!
>  t that time point?
>
> Thanks, Gene Maguin
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the command.
> To leave the list, send the command SIGNOFF SPSSX-L For a list of
> commands to manage subscriptions, send the command INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the command.
> To leave the list, send the command SIGNOFF SPSSX-L For a list of
> commands to manage subscriptions, send the command INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the command.
> To leave the list, send the command SIGNOFF SPSSX-L For a list of
> commands to manage subscriptions, send the command INFO REFCARD
>


-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Discrete-time-survival-tp57141
38p5714149.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD