I'm working on a discrete time survival analysis (and following along with Singer and Willett's chapters) and I've run into a problem that may occur fairly often in discrete time. A bit of background: 42 out of 261 persons had the event over a span of about 1200 days, which is my time unit. I elected to work in discrete time rather than continuous time because I thought, maybe incorrectly, that discrete time would be easier to work with and present to an inexperienced audience. I'm aggregating time to eliminate time chunks with no events. Using a three month aggregation, I have one time chunk with no events. The estimation gives a large B coefficient for that chunk, which is no surprise. My question is whether an acceptable remedy is to add a tiny value to that cell, changing the 0 to, say, .001, if you think in crosstab terms. Then, if it an acceptable remedy, is the doing of it just a matter of changing the event status indicator from 0 to .001 for a randomly chosen case a!
t that time point? Thanks, Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
I do not think it is wise to correct reality to make it fit the model, but
(if anything) the other way around. On the other hand, survival analysis is about probabilities, and it is quite likely that at any particular period the number of events is larger or smaller than predicted, including zero events as a limiting case. When the total number of events is 42 over 1200 periods, zero-event periods are a necessity. Time chunks might be useful for some purposes, but (1) by lumping time periods together you are deliberately forfeiting information; and (2) the actual length of the chunks is arbitrary: your conclusions may vary if you use 2 months, 3 months, or 17 or 43 days as your time-chunk convention. At any rate one should try different ways of carving the chunks, aiming at using the shortest chunk that is possible to avoid unstable results (e.g. a coefficient estimate that is extremely sensible to the random occurrence of one more or one less event in a given chunk). You do not clarify what kind of survival analysis you are doing. If it is Cox regression, recall that call regression uses only information on the succession or time-order of events, and not on the exact length of time elapsed. Also, Cox regression generates a function for the hazard rate (or conversely the survival probability) including EXP(BX), with B coefficients for each predictor, and these B coefficients are one single set of coefficients for the entire analysis, not one coefficient per chunk of time. Thus I do not understand what you mean by "a large B coefficient for that chunk". Hope this helps. Hector -----Mensaje original----- De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Maguin, Eugene Enviado el: Wednesday, July 11, 2012 12:54 Para: [hidden email] Asunto: Discrete time survival I'm working on a discrete time survival analysis (and following along with Singer and Willett's chapters) and I've run into a problem that may occur fairly often in discrete time. A bit of background: 42 out of 261 persons had the event over a span of about 1200 days, which is my time unit. I elected to work in discrete time rather than continuous time because I thought, maybe incorrectly, that discrete time would be easier to work with and present to an inexperienced audience. I'm aggregating time to eliminate time chunks with no events. Using a three month aggregation, I have one time chunk with no events. The estimation gives a large B coefficient for that chunk, which is no surprise. My question is whether an acceptable remedy is to add a tiny value to that cell, changing the 0 to, say, .001, if you think in crosstab terms. Then, if it an acceptable remedy, is the doing of it just a matter of changing the event status indicator from 0 to .001 for a randomly chosen case a! t that time point? Thanks, Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Thanks, Hector. I'm not using Cox regression. I've constructed a so-called person-period data and am using logistic regression to analyze it. I'm using 'time chunks' to mean periods. Maybe confusing nomenclature. In each period there is a probability of the event and an associated odds. Relative to the odds for the reference period an odds ratio can be computed each other period. The B that referred to was the coefficient for one of the periods.
Using a longer aggregation period will eliminate no event periods, I understand. I have read that using Cox regression with discrete time may give problems because of the multiple events in a period (ties). Gene Maguin -----Original Message----- From: Hector Maletta [mailto:[hidden email]] Sent: Wednesday, July 11, 2012 12:26 PM To: Maguin, Eugene; [hidden email] Subject: RE: Discrete time survival I do not think it is wise to correct reality to make it fit the model, but (if anything) the other way around. On the other hand, survival analysis is about probabilities, and it is quite likely that at any particular period the number of events is larger or smaller than predicted, including zero events as a limiting case. When the total number of events is 42 over 1200 periods, zero-event periods are a necessity. Time chunks might be useful for some purposes, but (1) by lumping time periods together you are deliberately forfeiting information; and (2) the actual length of the chunks is arbitrary: your conclusions may vary if you use 2 months, 3 months, or 17 or 43 days as your time-chunk convention. At any rate one should try different ways of carving the chunks, aiming at using the shortest chunk that is possible to avoid unstable results (e.g. a coefficient estimate that is extremely sensible to the random occurrence of one more or one less event in a given chunk). You do not clarify what kind of survival analysis you are doing. If it is Cox regression, recall that call regression uses only information on the succession or time-order of events, and not on the exact length of time elapsed. Also, Cox regression generates a function for the hazard rate (or conversely the survival probability) including EXP(BX), with B coefficients for each predictor, and these B coefficients are one single set of coefficients for the entire analysis, not one coefficient per chunk of time. Thus I do not understand what you mean by "a large B coefficient for that chunk". Hope this helps. Hector -----Mensaje original----- De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Maguin, Eugene Enviado el: Wednesday, July 11, 2012 12:54 Para: [hidden email] Asunto: Discrete time survival I'm working on a discrete time survival analysis (and following along with Singer and Willett's chapters) and I've run into a problem that may occur fairly often in discrete time. A bit of background: 42 out of 261 persons had the event over a span of about 1200 days, which is my time unit. I elected to work in discrete time rather than continuous time because I thought, maybe incorrectly, that discrete time would be easier to work with and present to an inexperienced audience. I'm aggregating time to eliminate time chunks with no events. Using a three month aggregation, I have one time chunk with no events. The estimation gives a large B coefficient for that chunk, which is no surprise. My question is whether an acceptable remedy is to add a tiny value to that cell, changing the 0 to, say, .001, if you think in crosstab terms. Then, if it an acceptable remedy, is the doing of it just a matter of changing the event status indicator from 0 to .001 for a randomly chosen case a! t that time point? Thanks, Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Gene,
I do not know the details of your research program; I thought you were dealing with a SURVIVAL problem, i.e. trying to ascertain the probabilities of surviving for a given span of time (after a certain "start" state) before an event strikes. This is not the same as the probabilities of the event at various periods in comparison with the probability of the event at some reference period. An example of the latter would be to ascertain the probability of tornados in May, June, July, August, September etc relative to their probability in the "reference" month, say January. An example of the former is the probability that a given location would "survive" without any tornado, from January until a variable date (January to May, January to June, January to September etc). It is a completely different problem. In the former you may use logistic regression (probability of an event for cases described by category 1, 2, 3, 4, ...., k, relative to the reference probability of the event for cases described by the reference category); for instance, suppose you investigate the probability of tornados with one single predictor, "Month of year"; this predictor has 12 categories, one per month, and you use one of the months (say, May) as your reference category (you may have data from multiple years for the same location, or from multiple locations in the same year; multiple years for given location may seem more logical as an example, given the geographical variability of tornados, and their relatively stable recurrence over time). You may use also some other concurrent predictor, say accumulated annual rainfall over the 12 months to the start of the tornado season, for all years (or locations) considered. From these data, you obtain the odds of tornados in September relative to January (probability of tornados in September, divided by probability of tornados in January), and also the odds of tornados for all the other months. In total, you obtain eleven odds (one per month, all relative to January). Your logistic function for the probability of tornadoes in a given month is p(k)=EXP(BX)/[1+EXP(BX)] where BX=b0+b1X1+b2X2. The logarithm of the odds is BX, and the odds are BX. The odds that a tornado happens in a given month k, say July, relative to the base period (May) in years with rainfall x(t), equal the probability of a tornado happening in month k, relative to the probability of a tornado happening in May, for years with cumulative 12-month rainfall=x(t). (I use May in this example as the probability of tornados in January is likely to be zero). In the survival case the situation is different. Suppose you use survival analysis to predict the chances of surviving different lengths of time without experiencing a tornado (say, at a given area, like Kansas), using a "time variable" (month of year) and perhaps some predictor (say, rainfall again). In this case, time is NOT a predictor: it is the one-directional dimension along which events can occur. Your hazard function will have only one predictor (rainfall). The proportional cumulative hazard rate h(t) for a year with rainfall X=x(t) will mean: "number of events expected to have occurred from starting time, say January, to month t, in years with rainfall x(t)". (I use January in this example as the reference time in survival analysis should be at the start of the relevant period, not in the middle of it; one may use also March or April, if the start of the tornado season is always after March). The associated survival probability up to month k, for a year with rainfall x(t), i.e. p(tk), gives you the probability of not having a tornado until month k, for year t with rainfall x(t). Logistic regression estimates the probability of a tornado in each different month. Survival analysis estimates the probability of not having the tornado up to each different month. Cox regression is a particular kind of "proportional hazard" survival analysis, where (ordinarily) the hazard rates stand to the reference or base hazard rate in a constant proportion over time. If your chances of survival are twice as large as mine, guys like you would be twice more likely to survive than guys like me up to every month, no matter how distant the month considered. No chance that your survival chances approach mine over time: they remain twice as large. For example, if 800 mm/yr of accumulated rainfall up to start of season afford a reduction of 20% in the incidence of tornadoes (relative to the reference case which has, say, 500 mm/yr rainfall), this reduction of 20% in the odds of tornadoes will hold for all time intervals, i.e. for the relative chances of tornadoes up to all months. Cox regression may, however, accommodate non proportional hazards by introducing time-dependent covariates (such as accumulated rainfall UP TO EACH MONTH). Other models of survival analysis may account for more sophisticated relationships between time, covariates and events. Hope this helps. Hector -----Mensaje original----- De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Maguin, Eugene Enviado el: Wednesday, July 11, 2012 14:03 Para: [hidden email] Asunto: Re: Discrete time survival Thanks, Hector. I'm not using Cox regression. I've constructed a so-called person-period data and am using logistic regression to analyze it. I'm using 'time chunks' to mean periods. Maybe confusing nomenclature. In each period there is a probability of the event and an associated odds. Relative to the odds for the reference period an odds ratio can be computed each other period. The B that referred to was the coefficient for one of the periods. Using a longer aggregation period will eliminate no event periods, I understand. I have read that using Cox regression with discrete time may give problems because of the multiple events in a period (ties). Gene Maguin -----Original Message----- From: Hector Maletta [mailto:[hidden email]] Sent: Wednesday, July 11, 2012 12:26 PM To: Maguin, Eugene; [hidden email] Subject: RE: Discrete time survival I do not think it is wise to correct reality to make it fit the model, but (if anything) the other way around. On the other hand, survival analysis is about probabilities, and it is quite likely that at any particular period the number of events is larger or smaller than predicted, including zero events as a limiting case. When the total number of events is 42 over 1200 periods, zero-event periods are a necessity. Time chunks might be useful for some purposes, but (1) by lumping time periods together you are deliberately forfeiting information; and (2) the actual length of the chunks is arbitrary: your conclusions may vary if you use 2 months, 3 months, or 17 or 43 days as your time-chunk convention. At any rate one should try different ways of carving the chunks, aiming at using the shortest chunk that is possible to avoid unstable results (e.g. a coefficient estimate that is extremely sensible to the random occurrence of one more or one less event in a given chunk). You do not clarify what kind of survival analysis you are doing. If it is Cox regression, recall that call regression uses only information on the succession or time-order of events, and not on the exact length of time elapsed. Also, Cox regression generates a function for the hazard rate (or conversely the survival probability) including EXP(BX), with B coefficients for each predictor, and these B coefficients are one single set of coefficients for the entire analysis, not one coefficient per chunk of time. Thus I do not understand what you mean by "a large B coefficient for that chunk". Hope this helps. Hector -----Mensaje original----- De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Maguin, Eugene Enviado el: Wednesday, July 11, 2012 12:54 Para: [hidden email] Asunto: Discrete time survival I'm working on a discrete time survival analysis (and following along with Singer and Willett's chapters) and I've run into a problem that may occur fairly often in discrete time. A bit of background: 42 out of 261 persons had the event over a span of about 1200 days, which is my time unit. I elected to work in discrete time rather than continuous time because I thought, maybe incorrectly, that discrete time would be easier to work with and present to an inexperienced audience. I'm aggregating time to eliminate time chunks with no events. Using a three month aggregation, I have one time chunk with no events. The estimation gives a large B coefficient for that chunk, which is no surprise. My question is whether an acceptable remedy is to add a tiny value to that cell, changing the 0 to, say, .001, if you think in crosstab terms. Then, if it an acceptable remedy, is the doing of it just a matter of changing the event status indicator from 0 to .001 for a randomly chosen case a! t that time point? Thanks, Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
Hector, as Gene said in his first post, he is using "discrete time survival analysis", and following advice given in Singer & Willett's (2003) book. In chapter 11 of that book, S&W show how one can fit a "discrete-time hazard model" using a logistic regression procedure. To quote:
"Our goal is to show that although the [discrete-time hazard] model may appear complex and unfamiliar, it can be fit using software that is familiar by applying standard logistic regression analysis in the person-period data set." A "person-period data set" has multiple rows per person, one row per period of observation. There is also an Event indicator variable (call it EVT). For persons who experience the event, EVT=0 on all rows but the last. For those who do not experience the event, EVT=0 on all rows. For more details, see Singer & Willett's book (particularly chapters 11 & 12). HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Bruce,
I understand that, but to me it is not the same thing. I have not read Singer & Willet's book, but I tend to think a person-period dataset does not imply that survival probabilities decrease monotonically over time: if you start being alive today, it may well happen (with such dataset) that your predicted probability of surviving up to next November is lower than your probability of still being alive next December, which would be a bit unreal. Even a multilevel model with persons and periods cannot account to the ordered nature of periods: they would be simply several "periods" equivalent to several "observations", with no particular order. Perhaps an ordering of periods could be achieved by using "time elapsed" as one of the predictors, but it all looks to me as a convoluted way of arriving at the same point. Perhaps some people can more easily manage logistic regression software than survival analysis software, but to me they look equally easy or equally difficult. And errors of interpretation can arise in either, no matter how familiar the software (recall the frequent mistake of using the "classification table" in logistic regression as a criterion of goodness of fit, which is wrong because "fit" in a probabilistic prediction is a fit of the probability, measured as a proportion computed over a group of cases, not the actual occurrence of the event to every individual having p>0.50 and the non-occurrence where p<0.50). Hector -----Mensaje original----- De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Bruce Weaver Enviado el: Wednesday, July 11, 2012 16:44 Para: [hidden email] Asunto: Re: Discrete time survival Hector, as Gene said in his first post, he is using "discrete time survival analysis", and following advice given in Singer & Willett's (2003) book. In chapter 11 of that book, S&W show how one can fit a "discrete-time hazard model" using a logistic regression procedure. To quote: "Our goal is to show that although the [discrete-time hazard] model may appear complex and unfamiliar, it can be fit using software that is familiar by applying standard logistic regression analysis in the /person-period data set/." A "person-period data set" has multiple rows per person, one row per period of observation. There is also an Event indicator variable (call it EVT). For persons who experience the event, EVT=0 on all rows but the last. For those who do not experience the event, EVT=0 on all rows. For more details, see Singer & Willett's book (particularly chapters 11 & 12). HTH. Hector Maletta wrote > > Gene, > I do not know the details of your research program; I thought you were > dealing with a SURVIVAL problem, i.e. trying to ascertain the > probabilities of surviving for a given span of time (after a certain > "start" state) before an event strikes. This is not the same as the > probabilities of the event at various periods in comparison with the > probability of the event at some reference period. An example of the > latter would be to ascertain the probability of tornados in May, June, > July, August, September etc relative to their probability in the > "reference" month, say January. An example of the former is the > probability that a given location would "survive" > without > any tornado, from January until a variable date (January to May, > January to June, January to September etc). It is a completely > different problem. In the former you may use logistic regression > (probability of an event for cases described by category 1, 2, 3, 4, > ...., k, relative to the reference probability of the event for cases > described by the reference category); for instance, suppose you > investigate the probability of tornados with one single predictor, > "Month of year"; this predictor has 12 categories, one per month, and > you use one of the months (say, May) as your reference category (you > may have data from multiple years for the same location, or from > multiple locations in the same year; multiple years for given location > may seem more logical as an example, given the geographical > variability of tornados, and their relatively stable recurrence over > time). You may use also some other concurrent predictor, say > accumulated annual rainfall over the 12 months to the start of the > tornado season, for all years (or > locations) considered. From these data, you obtain the odds of > tornados in September relative to January (probability of tornados in > September, divided by probability of tornados in January), and also > the odds of tornados for all the other months. In total, you obtain > eleven odds (one per month, all relative to January). Your logistic > function for the probability of tornadoes in a given month is > p(k)=EXP(BX)/[1+EXP(BX)] where BX=b0+b1X1+b2X2. The logarithm of the > odds is BX, and the odds are BX. The odds that a tornado happens in a > given month k, say July, relative to the base period (May) in years > with rainfall x(t), equal the probability of a tornado happening in > month k, relative to the probability of a tornado happening in May, > for years with cumulative 12-month rainfall=x(t). (I use May in this > example as the probability of tornados in January is likely to be > zero). > > In the survival case the situation is different. Suppose you use > survival analysis to predict the chances of surviving different > lengths of time without experiencing a tornado (say, at a given area, > like Kansas), using a "time variable" (month of year) and perhaps some > predictor (say, rainfall again). In this case, time is NOT a > predictor: it is the one-directional dimension along which events can > occur. Your hazard function will have only one predictor (rainfall). > The proportional cumulative hazard rate h(t) for a year with rainfall > X=x(t) will mean: "number of events expected to have occurred from > starting time, say January, to month t, in years with rainfall x(t)". > (I use January in this example as the reference time in survival > analysis should be at the start of the relevant period, not in the > middle of it; one may use also March or April, if the start of the > tornado season is always after March). The associated survival > probability up to month k, for a year with rainfall x(t), i.e. p(tk), > gives you the probability of not having a tornado until month k, for > year t with rainfall x(t). > > Logistic regression estimates the probability of a tornado in each > different month. Survival analysis estimates the probability of not > having the tornado up to each different month. > > Cox regression is a particular kind of "proportional hazard" survival > analysis, where (ordinarily) the hazard rates stand to the reference > or base hazard rate in a constant proportion over time. If your > chances of survival are twice as large as mine, guys like you would be > twice more likely to survive than guys like me up to every month, no > matter how distant the month considered. No chance that your survival > chances approach mine over time: > they remain twice as large. For example, if 800 mm/yr of accumulated > rainfall up to start of season afford a reduction of 20% in the > incidence of tornadoes (relative to the reference case which has, say, > 500 mm/yr rainfall), this reduction of 20% in the odds of tornadoes > will hold for all time intervals, i.e. for the relative chances of > tornadoes up to all months. > Cox regression may, however, accommodate non proportional hazards by > introducing time-dependent covariates (such as accumulated rainfall UP > TO EACH MONTH). Other models of survival analysis may account for more > sophisticated relationships between time, covariates and events. > > Hope this helps. > > Hector > > -----Mensaje original----- > De: SPSSX(r) Discussion [mailto:SPSSX-L@.UGA] En nombre de Maguin, > Eugene Enviado el: Wednesday, July 11, 2012 14:03 > Para: SPSSX-L@.UGA > Asunto: Re: Discrete time survival > > Thanks, Hector. I'm not using Cox regression. I've constructed a > so-called person-period data and am using logistic regression to > analyze it. I'm using 'time chunks' to mean periods. Maybe confusing > nomenclature. In each period there is a probability of the event and > an associated odds. Relative to the odds for the reference period an > odds ratio can be computed each other period. The B that referred to > was the coefficient for one of the periods. > > Using a longer aggregation period will eliminate no event periods, I > understand. > > I have read that using Cox regression with discrete time may give > problems because of the multiple events in a period (ties). > > Gene Maguin > > > > -----Original Message----- > From: Hector Maletta [mailto:hmaletta@.com] > Sent: Wednesday, July 11, 2012 12:26 PM > To: Maguin, Eugene; SPSSX-L@.UGA > Subject: RE: Discrete time survival > > I do not think it is wise to correct reality to make it fit the model, > but (if anything) the other way around. > On the other hand, survival analysis is about probabilities, and it is > quite likely that at any particular period the number of events is > larger or smaller than predicted, including zero events as a limiting > case. When the total number of events is 42 over 1200 periods, > zero-event periods are a necessity. > Time chunks might be useful for some purposes, but (1) by lumping time > periods together you are deliberately forfeiting information; and (2) > the actual length of the chunks is arbitrary: your conclusions may > vary if you use 2 months, 3 months, or 17 or 43 days as your > time-chunk convention. At any rate one should try different ways of > carving the chunks, aiming at using the shortest chunk that is possible to > a > coefficient estimate that is extremely sensible to the random > occurrence of one more or one less event in a given chunk). > You do not clarify what kind of survival analysis you are doing. If it > is Cox regression, recall that call regression uses only information > on the succession or time-order of events, and not on the exact length > of time elapsed. > Also, Cox regression generates a function for the hazard rate (or > conversely the survival probability) including EXP(BX), with B > coefficients for each predictor, and these B coefficients are one > single set of coefficients for the entire analysis, not one > coefficient per chunk of time. Thus I do not understand what you mean > by "a large B coefficient for that chunk". > > Hope this helps. > > Hector > > -----Mensaje original----- > De: SPSSX(r) Discussion [mailto:SPSSX-L@.UGA] En nombre de Maguin, > Eugene Enviado el: Wednesday, July 11, 2012 12:54 > Para: SPSSX-L@.UGA > Asunto: Discrete time survival > > I'm working on a discrete time survival analysis (and following along > with Singer and Willett's chapters) and I've run into a problem that > may occur fairly often in discrete time. A bit of background: 42 out > of 261 persons had the event over a span of about 1200 days, which is > my time unit. I elected to work in discrete time rather than > continuous time because I thought, maybe incorrectly, that discrete > time would be easier to work with and present to an inexperienced > audience. I'm aggregating time to eliminate time chunks with no > events. Using a three month aggregation, I have one time chunk with no > events. The estimation gives a large B coefficient for that chunk, > which is no surprise. My question is whether an acceptable remedy is > to add a tiny value to that cell, changing the 0 to, say, .001, if you > think in crosstab terms. Then, if it an acceptable remedy, is the > doing of it just a matter of changing the event status indicator from > 0 to .001 for a randomly chosen case a! > t that time point? > > Thanks, Gene Maguin > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA (not to SPSSX-L), with no body text except the command. > To leave the list, send the command SIGNOFF SPSSX-L For a list of > commands to manage subscriptions, send the command INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA (not to SPSSX-L), with no body text except the command. > To leave the list, send the command SIGNOFF SPSSX-L For a list of > commands to manage subscriptions, send the command INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA (not to SPSSX-L), with no body text except the command. > To leave the list, send the command SIGNOFF SPSSX-L For a list of > commands to manage subscriptions, send the command INFO REFCARD > ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Discrete-time-survival-tp57141 38p5714149.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Hector, I have this book in front of me. It seems to me that the censoring variables and event previous variables would ensure that the hazard ratios were accurate, and from what I gather, the claim Singer and Willet are making is they are simply using logistic regression to equal standard hazard modeling, without needing to familiarize yourself with new software. My understanding from the claim is that they become the same thing, and from what I'm seeing, it appears to be the case.
In the same way, while specialized software exists to perform propensity score matching, all of the methods used in those software's can be equated manually in SPSS or SAS using separate steps. The separate software makes it easier to perform these functions, but they are introducing new math, so to speak. The Hazard function is the probability that you experience an event at a given time, given that you didn't experience that event in the previous time. Doesn't the censoring allow the model to calculate this such that it's not creating a probability for someone already dead? I assume that was the point you were trying to make? If what you are saying is that someone's probability of being alive can't increase with time, that isn't true either. Cancer patients have an increased probability of staying in remission from cancer the longer they go after going into remission with the cancer. Alcoholics have a higher probability of staying sober the longer they stay sober after treatment (i.e. they are more likely to no relapse once they have put together a great deal of recovery). The order aspect of time is up to you. When it comes to any statistical analysis of time, the interpretation of time in a specific order is still researcher interpretation specific, the model can still do "wonky" time comparisons that make no sense. Statistics doesn't understand that time flows from past to present, only we do. It is up to you to utilize the variable indicating time and censoring to ensure that the analysis makes sense. Am I misunderstanding your concern? Matthew J Poes Research Data Specialist Center for Prevention Research and Development University of Illinois 510 Devonshire Dr. Champaign, IL 61820 Phone: 217-265-4576 email: [hidden email] -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Hector Maletta Sent: Wednesday, July 11, 2012 4:09 PM To: [hidden email] Subject: Re: Discrete time survival Bruce, I understand that, but to me it is not the same thing. I have not read Singer & Willet's book, but I tend to think a person-period dataset does not imply that survival probabilities decrease monotonically over time: if you start being alive today, it may well happen (with such dataset) that your predicted probability of surviving up to next November is lower than your probability of still being alive next December, which would be a bit unreal. Even a multilevel model with persons and periods cannot account to the ordered nature of periods: they would be simply several "periods" equivalent to several "observations", with no particular order. Perhaps an ordering of periods could be achieved by using "time elapsed" as one of the predictors, but it all looks to me as a convoluted way of arriving at the same point. Perhaps some people can more easily manage logistic regression software than survival analysis software, but to me they look equally easy or equally difficult. And errors of interpretation can arise in either, no matter how familiar the software (recall the frequent mistake of using the "classification table" in logistic regression as a criterion of goodness of fit, which is wrong because "fit" in a probabilistic prediction is a fit of the probability, measured as a proportion computed over a group of cases, not the actual occurrence of the event to every individual having p>0.50 and the non-occurrence where p<0.50). Hector -----Mensaje original----- De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Bruce Weaver Enviado el: Wednesday, July 11, 2012 16:44 Para: [hidden email] Asunto: Re: Discrete time survival Hector, as Gene said in his first post, he is using "discrete time survival analysis", and following advice given in Singer & Willett's (2003) book. In chapter 11 of that book, S&W show how one can fit a "discrete-time hazard model" using a logistic regression procedure. To quote: "Our goal is to show that although the [discrete-time hazard] model may appear complex and unfamiliar, it can be fit using software that is familiar by applying standard logistic regression analysis in the /person-period data set/." A "person-period data set" has multiple rows per person, one row per period of observation. There is also an Event indicator variable (call it EVT). For persons who experience the event, EVT=0 on all rows but the last. For those who do not experience the event, EVT=0 on all rows. For more details, see Singer & Willett's book (particularly chapters 11 & 12). HTH. Hector Maletta wrote > > Gene, > I do not know the details of your research program; I thought you were > dealing with a SURVIVAL problem, i.e. trying to ascertain the > probabilities of surviving for a given span of time (after a certain > "start" state) before an event strikes. This is not the same as the > probabilities of the event at various periods in comparison with the > probability of the event at some reference period. An example of the > latter would be to ascertain the probability of tornados in May, June, > July, August, September etc relative to their probability in the > "reference" month, say January. An example of the former is the > probability that a given location would "survive" > without > any tornado, from January until a variable date (January to May, > January to June, January to September etc). It is a completely > different problem. In the former you may use logistic regression > (probability of an event for cases described by category 1, 2, 3, 4, > ...., k, relative to the reference probability of the event for cases > described by the reference category); for instance, suppose you > investigate the probability of tornados with one single predictor, > "Month of year"; this predictor has 12 categories, one per month, and > you use one of the months (say, May) as your reference category (you > may have data from multiple years for the same location, or from > multiple locations in the same year; multiple years for given location > may seem more logical as an example, given the geographical > variability of tornados, and their relatively stable recurrence over > time). You may use also some other concurrent predictor, say > accumulated annual rainfall over the 12 months to the start of the > tornado season, for all years (or > locations) considered. From these data, you obtain the odds of > tornados in September relative to January (probability of tornados in > September, divided by probability of tornados in January), and also > the odds of tornados for all the other months. In total, you obtain > eleven odds (one per month, all relative to January). Your logistic > function for the probability of tornadoes in a given month is > p(k)=EXP(BX)/[1+EXP(BX)] where BX=b0+b1X1+b2X2. The logarithm of the > odds is BX, and the odds are BX. The odds that a tornado happens in a > given month k, say July, relative to the base period (May) in years > with rainfall x(t), equal the probability of a tornado happening in > month k, relative to the probability of a tornado happening in May, > for years with cumulative 12-month rainfall=x(t). (I use May in this > example as the probability of tornados in January is likely to be > zero). > > In the survival case the situation is different. Suppose you use > survival analysis to predict the chances of surviving different > lengths of time without experiencing a tornado (say, at a given area, > like Kansas), using a "time variable" (month of year) and perhaps some > predictor (say, rainfall again). In this case, time is NOT a > predictor: it is the one-directional dimension along which events can > occur. Your hazard function will have only one predictor (rainfall). > The proportional cumulative hazard rate h(t) for a year with rainfall > X=x(t) will mean: "number of events expected to have occurred from > starting time, say January, to month t, in years with rainfall x(t)". > (I use January in this example as the reference time in survival > analysis should be at the start of the relevant period, not in the > middle of it; one may use also March or April, if the start of the > tornado season is always after March). The associated survival > probability up to month k, for a year with rainfall x(t), i.e. p(tk), > gives you the probability of not having a tornado until month k, for > year t with rainfall x(t). > > Logistic regression estimates the probability of a tornado in each > different month. Survival analysis estimates the probability of not > having the tornado up to each different month. > > Cox regression is a particular kind of "proportional hazard" survival > analysis, where (ordinarily) the hazard rates stand to the reference > or base hazard rate in a constant proportion over time. If your > chances of survival are twice as large as mine, guys like you would be > twice more likely to survive than guys like me up to every month, no > matter how distant the month considered. No chance that your survival > chances approach mine over time: > they remain twice as large. For example, if 800 mm/yr of accumulated > rainfall up to start of season afford a reduction of 20% in the > incidence of tornadoes (relative to the reference case which has, say, > 500 mm/yr rainfall), this reduction of 20% in the odds of tornadoes > will hold for all time intervals, i.e. for the relative chances of > tornadoes up to all months. > Cox regression may, however, accommodate non proportional hazards by > introducing time-dependent covariates (such as accumulated rainfall UP > TO EACH MONTH). Other models of survival analysis may account for more > sophisticated relationships between time, covariates and events. > > Hope this helps. > > Hector > > -----Mensaje original----- > De: SPSSX(r) Discussion [mailto:SPSSX-L@.UGA] En nombre de Maguin, > Eugene Enviado el: Wednesday, July 11, 2012 14:03 > Para: SPSSX-L@.UGA > Asunto: Re: Discrete time survival > > Thanks, Hector. I'm not using Cox regression. I've constructed a > so-called person-period data and am using logistic regression to > analyze it. I'm using 'time chunks' to mean periods. Maybe confusing > nomenclature. In each period there is a probability of the event and > an associated odds. Relative to the odds for the reference period an > odds ratio can be computed each other period. The B that referred to > was the coefficient for one of the periods. > > Using a longer aggregation period will eliminate no event periods, I > understand. > > I have read that using Cox regression with discrete time may give > problems because of the multiple events in a period (ties). > > Gene Maguin > > > > -----Original Message----- > From: Hector Maletta [mailto:hmaletta@.com] > Sent: Wednesday, July 11, 2012 12:26 PM > To: Maguin, Eugene; SPSSX-L@.UGA > Subject: RE: Discrete time survival > > I do not think it is wise to correct reality to make it fit the model, > but (if anything) the other way around. > On the other hand, survival analysis is about probabilities, and it is > quite likely that at any particular period the number of events is > larger or smaller than predicted, including zero events as a limiting > case. When the total number of events is 42 over 1200 periods, > zero-event periods are a necessity. > Time chunks might be useful for some purposes, but (1) by lumping time > periods together you are deliberately forfeiting information; and (2) > the actual length of the chunks is arbitrary: your conclusions may > vary if you use 2 months, 3 months, or 17 or 43 days as your > time-chunk convention. At any rate one should try different ways of > carving the chunks, aiming at using the shortest chunk that is > possible to > a > coefficient estimate that is extremely sensible to the random > occurrence of one more or one less event in a given chunk). > You do not clarify what kind of survival analysis you are doing. If it > is Cox regression, recall that call regression uses only information > on the succession or time-order of events, and not on the exact length > of time elapsed. > Also, Cox regression generates a function for the hazard rate (or > conversely the survival probability) including EXP(BX), with B > coefficients for each predictor, and these B coefficients are one > single set of coefficients for the entire analysis, not one > coefficient per chunk of time. Thus I do not understand what you mean > by "a large B coefficient for that chunk". > > Hope this helps. > > Hector > > -----Mensaje original----- > De: SPSSX(r) Discussion [mailto:SPSSX-L@.UGA] En nombre de Maguin, > Eugene Enviado el: Wednesday, July 11, 2012 12:54 > Para: SPSSX-L@.UGA > Asunto: Discrete time survival > > I'm working on a discrete time survival analysis (and following along > with Singer and Willett's chapters) and I've run into a problem that > may occur fairly often in discrete time. A bit of background: 42 out > of 261 persons had the event over a span of about 1200 days, which is > my time unit. I elected to work in discrete time rather than > continuous time because I thought, maybe incorrectly, that discrete > time would be easier to work with and present to an inexperienced > audience. I'm aggregating time to eliminate time chunks with no > events. Using a three month aggregation, I have one time chunk with no > events. The estimation gives a large B coefficient for that chunk, > which is no surprise. My question is whether an acceptable remedy is > to add a tiny value to that cell, changing the 0 to, say, .001, if you > think in crosstab terms. Then, if it an acceptable remedy, is the > doing of it just a matter of changing the event status indicator from > 0 to .001 for a randomly chosen case a! > t that time point? > > Thanks, Gene Maguin > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA (not to SPSSX-L), with no body text except the command. > To leave the list, send the command SIGNOFF SPSSX-L For a list of > commands to manage subscriptions, send the command INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA (not to SPSSX-L), with no body text except the command. > To leave the list, send the command SIGNOFF SPSSX-L For a list of > commands to manage subscriptions, send the command INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA (not to SPSSX-L), with no body text except the command. > To leave the list, send the command SIGNOFF SPSSX-L For a list of > commands to manage subscriptions, send the command INFO REFCARD > ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Discrete-time-survival-tp57141 38p5714149.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Thanks, Matthew, for the clarifications.
Regarding your cancer patients: 1. The fact that the probability of staying alive (provided you are not dead already) increases with the time already spent in remission does not counter the fact that the survival function is monotonically not increasing: at each time t, the surviving proportion of the initial population must be not greater than the surviving proportion at time t-1. This survival rate is not computed on those surviving to that date, but on the original population of patients. 2. If the relative hazard rate of a type of patient (relative to the reference type of patient) changes over time, that is not a proportional hazard case, and some remedy should be introduced to account for that, such as time-dependent covariates. Otherwise, the surviving proportion of patients of type A must stand in a constant proportion to the surviving proportion of patients of type B, at all times, which is the basic assumption of Cox regression without time-dependent covariates. Hector -----Mensaje original----- De: Poes, Matthew Joseph [mailto:[hidden email]] Enviado el: Wednesday, July 11, 2012 18:35 Para: 'Hector Maletta'; '[hidden email]' Asunto: RE: Discrete time survival Hector, I have this book in front of me. It seems to me that the censoring variables and event previous variables would ensure that the hazard ratios were accurate, and from what I gather, the claim Singer and Willet are making is they are simply using logistic regression to equal standard hazard modeling, without needing to familiarize yourself with new software. My understanding from the claim is that they become the same thing, and from what I'm seeing, it appears to be the case. In the same way, while specialized software exists to perform propensity score matching, all of the methods used in those software's can be equated manually in SPSS or SAS using separate steps. The separate software makes it easier to perform these functions, but they are introducing new math, so to speak. The Hazard function is the probability that you experience an event at a given time, given that you didn't experience that event in the previous time. Doesn't the censoring allow the model to calculate this such that it's not creating a probability for someone already dead? I assume that was the point you were trying to make? If what you are saying is that someone's probability of being alive can't increase with time, that isn't true either. Cancer patients have an increased probability of staying in remission from cancer the longer they go after going into remission with the cancer. Alcoholics have a higher probability of staying sober the longer they stay sober after treatment (i.e. they are more likely to no relapse once they have put together a great deal of recovery). The order aspect of time is up to you. When it comes to any statistical analysis of time, the interpretation of time in a specific order is still researcher interpretation specific, the model can still do "wonky" time comparisons that make no sense. Statistics doesn't understand that time flows from past to present, only we do. It is up to you to utilize the variable indicating time and censoring to ensure that the analysis makes sense. Am I misunderstanding your concern? Matthew J Poes Research Data Specialist Center for Prevention Research and Development University of Illinois 510 Devonshire Dr. Champaign, IL 61820 Phone: 217-265-4576 email: [hidden email] -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Hector Maletta Sent: Wednesday, July 11, 2012 4:09 PM To: [hidden email] Subject: Re: Discrete time survival Bruce, I understand that, but to me it is not the same thing. I have not read Singer & Willet's book, but I tend to think a person-period dataset does not imply that survival probabilities decrease monotonically over time: if you start being alive today, it may well happen (with such dataset) that your predicted probability of surviving up to next November is lower than your probability of still being alive next December, which would be a bit unreal. Even a multilevel model with persons and periods cannot account to the ordered nature of periods: they would be simply several "periods" equivalent to several "observations", with no particular order. Perhaps an ordering of periods could be achieved by using "time elapsed" as one of the predictors, but it all looks to me as a convoluted way of arriving at the same point. Perhaps some people can more easily manage logistic regression software than survival analysis software, but to me they look equally easy or equally difficult. And errors of interpretation can arise in either, no matter how familiar the software (recall the frequent mistake of using the "classification table" in logistic regression as a criterion of goodness of fit, which is wrong because "fit" in a probabilistic prediction is a fit of the probability, measured as a proportion computed over a group of cases, not the actual occurrence of the event to every individual having p>0.50 and the non-occurrence where p<0.50). Hector -----Mensaje original----- De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Bruce Weaver Enviado el: Wednesday, July 11, 2012 16:44 Para: [hidden email] Asunto: Re: Discrete time survival Hector, as Gene said in his first post, he is using "discrete time survival analysis", and following advice given in Singer & Willett's (2003) book. In chapter 11 of that book, S&W show how one can fit a "discrete-time hazard model" using a logistic regression procedure. To quote: "Our goal is to show that although the [discrete-time hazard] model may appear complex and unfamiliar, it can be fit using software that is familiar by applying standard logistic regression analysis in the /person-period data set/." A "person-period data set" has multiple rows per person, one row per period of observation. There is also an Event indicator variable (call it EVT). For persons who experience the event, EVT=0 on all rows but the last. For those who do not experience the event, EVT=0 on all rows. For more details, see Singer & Willett's book (particularly chapters 11 & 12). HTH. Hector Maletta wrote > > Gene, > I do not know the details of your research program; I thought you were > dealing with a SURVIVAL problem, i.e. trying to ascertain the > probabilities of surviving for a given span of time (after a certain > "start" state) before an event strikes. This is not the same as the > probabilities of the event at various periods in comparison with the > probability of the event at some reference period. An example of the > latter would be to ascertain the probability of tornados in May, June, > July, August, September etc relative to their probability in the > "reference" month, say January. An example of the former is the > probability that a given location would "survive" > without > any tornado, from January until a variable date (January to May, > January to June, January to September etc). It is a completely > different problem. In the former you may use logistic regression > (probability of an event for cases described by category 1, 2, 3, 4, > ...., k, relative to the reference probability of the event for cases > described by the reference category); for instance, suppose you > investigate the probability of tornados with one single predictor, > "Month of year"; this predictor has 12 categories, one per month, and > you use one of the months (say, May) as your reference category (you > may have data from multiple years for the same location, or from > multiple locations in the same year; multiple years for given location > may seem more logical as an example, given the geographical > variability of tornados, and their relatively stable recurrence over > time). You may use also some other concurrent predictor, say > accumulated annual rainfall over the 12 months to the start of the > tornado season, for all years (or > locations) considered. From these data, you obtain the odds of > tornados in September relative to January (probability of tornados in > September, divided by probability of tornados in January), and also > the odds of tornados for all the other months. In total, you obtain > eleven odds (one per month, all relative to January). Your logistic > function for the probability of tornadoes in a given month is > p(k)=EXP(BX)/[1+EXP(BX)] where BX=b0+b1X1+b2X2. The logarithm of the > odds is BX, and the odds are BX. The odds that a tornado happens in a > given month k, say July, relative to the base period (May) in years > with rainfall x(t), equal the probability of a tornado happening in > month k, relative to the probability of a tornado happening in May, > for years with cumulative 12-month rainfall=x(t). (I use May in this > example as the probability of tornados in January is likely to be > zero). > > In the survival case the situation is different. Suppose you use > survival analysis to predict the chances of surviving different > lengths of time without experiencing a tornado (say, at a given area, > like Kansas), using a "time variable" (month of year) and perhaps some > predictor (say, rainfall again). In this case, time is NOT a > predictor: it is the one-directional dimension along which events can > occur. Your hazard function will have only one predictor (rainfall). > The proportional cumulative hazard rate h(t) for a year with rainfall > X=x(t) will mean: "number of events expected to have occurred from > starting time, say January, to month t, in years with rainfall x(t)". > (I use January in this example as the reference time in survival > analysis should be at the start of the relevant period, not in the > middle of it; one may use also March or April, if the start of the > tornado season is always after March). The associated survival > probability up to month k, for a year with rainfall x(t), i.e. p(tk), > gives you the probability of not having a tornado until month k, for > year t with rainfall x(t). > > Logistic regression estimates the probability of a tornado in each > different month. Survival analysis estimates the probability of not > having the tornado up to each different month. > > Cox regression is a particular kind of "proportional hazard" survival > analysis, where (ordinarily) the hazard rates stand to the reference > or base hazard rate in a constant proportion over time. If your > chances of survival are twice as large as mine, guys like you would be > twice more likely to survive than guys like me up to every month, no > matter how distant the month considered. No chance that your survival > chances approach mine over time: > they remain twice as large. For example, if 800 mm/yr of accumulated > rainfall up to start of season afford a reduction of 20% in the > incidence of tornadoes (relative to the reference case which has, say, > 500 mm/yr rainfall), this reduction of 20% in the odds of tornadoes > will hold for all time intervals, i.e. for the relative chances of > tornadoes up to all months. > Cox regression may, however, accommodate non proportional hazards by > introducing time-dependent covariates (such as accumulated rainfall UP > TO EACH MONTH). Other models of survival analysis may account for more > sophisticated relationships between time, covariates and events. > > Hope this helps. > > Hector > > -----Mensaje original----- > De: SPSSX(r) Discussion [mailto:SPSSX-L@.UGA] En nombre de Maguin, > Eugene Enviado el: Wednesday, July 11, 2012 14:03 > Para: SPSSX-L@.UGA > Asunto: Re: Discrete time survival > > Thanks, Hector. I'm not using Cox regression. I've constructed a > so-called person-period data and am using logistic regression to > analyze it. I'm using 'time chunks' to mean periods. Maybe confusing > nomenclature. In each period there is a probability of the event and > an associated odds. Relative to the odds for the reference period an > odds ratio can be computed each other period. The B that referred to > was the coefficient for one of the periods. > > Using a longer aggregation period will eliminate no event periods, I > understand. > > I have read that using Cox regression with discrete time may give > problems because of the multiple events in a period (ties). > > Gene Maguin > > > > -----Original Message----- > From: Hector Maletta [mailto:hmaletta@.com] > Sent: Wednesday, July 11, 2012 12:26 PM > To: Maguin, Eugene; SPSSX-L@.UGA > Subject: RE: Discrete time survival > > I do not think it is wise to correct reality to make it fit the model, > but (if anything) the other way around. > On the other hand, survival analysis is about probabilities, and it is > quite likely that at any particular period the number of events is > larger or smaller than predicted, including zero events as a limiting > case. When the total number of events is 42 over 1200 periods, > zero-event periods are a necessity. > Time chunks might be useful for some purposes, but (1) by lumping time > periods together you are deliberately forfeiting information; and (2) > the actual length of the chunks is arbitrary: your conclusions may > vary if you use 2 months, 3 months, or 17 or 43 days as your > time-chunk convention. At any rate one should try different ways of > carving the chunks, aiming at using the shortest chunk that is > possible to > a > coefficient estimate that is extremely sensible to the random > occurrence of one more or one less event in a given chunk). > You do not clarify what kind of survival analysis you are doing. If it > is Cox regression, recall that call regression uses only information > on the succession or time-order of events, and not on the exact length > of time elapsed. > Also, Cox regression generates a function for the hazard rate (or > conversely the survival probability) including EXP(BX), with B > coefficients for each predictor, and these B coefficients are one > single set of coefficients for the entire analysis, not one > coefficient per chunk of time. Thus I do not understand what you mean > by "a large B coefficient for that chunk". > > Hope this helps. > > Hector > > -----Mensaje original----- > De: SPSSX(r) Discussion [mailto:SPSSX-L@.UGA] En nombre de Maguin, > Eugene Enviado el: Wednesday, July 11, 2012 12:54 > Para: SPSSX-L@.UGA > Asunto: Discrete time survival > > I'm working on a discrete time survival analysis (and following along > with Singer and Willett's chapters) and I've run into a problem that > may occur fairly often in discrete time. A bit of background: 42 out > of 261 persons had the event over a span of about 1200 days, which is > my time unit. I elected to work in discrete time rather than > continuous time because I thought, maybe incorrectly, that discrete > time would be easier to work with and present to an inexperienced > audience. I'm aggregating time to eliminate time chunks with no > events. Using a three month aggregation, I have one time chunk with no > events. The estimation gives a large B coefficient for that chunk, > which is no surprise. My question is whether an acceptable remedy is > to add a tiny value to that cell, changing the 0 to, say, .001, if you > think in crosstab terms. Then, if it an acceptable remedy, is the > doing of it just a matter of changing the event status indicator from > 0 to .001 for a randomly chosen case a! > t that time point? > > Thanks, Gene Maguin > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA (not to SPSSX-L), with no body text except the command. > To leave the list, send the command SIGNOFF SPSSX-L For a list of > commands to manage subscriptions, send the command INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA (not to SPSSX-L), with no body text except the command. > To leave the list, send the command SIGNOFF SPSSX-L For a list of > commands to manage subscriptions, send the command INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA (not to SPSSX-L), with no body text except the command. > To leave the list, send the command SIGNOFF SPSSX-L For a list of > commands to manage subscriptions, send the command INFO REFCARD > ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Discrete-time-survival-tp57141 38p5714149.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Hector Maletta
Hector, if you want to pursue it, another reference is Paul Allison's #46 Sage 'green book' "Event History Analysis" published in 1984.
Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Hector Maletta Sent: Wednesday, July 11, 2012 5:09 PM To: [hidden email] Subject: Re: Discrete time survival Bruce, I understand that, but to me it is not the same thing. I have not read Singer & Willet's book, but I tend to think a person-period dataset does not imply that survival probabilities decrease monotonically over time: if you start being alive today, it may well happen (with such dataset) that your predicted probability of surviving up to next November is lower than your probability of still being alive next December, which would be a bit unreal. Even a multilevel model with persons and periods cannot account to the ordered nature of periods: they would be simply several "periods" equivalent to several "observations", with no particular order. Perhaps an ordering of periods could be achieved by using "time elapsed" as one of the predictors, but it all looks to me as a convoluted way of arriving at the same point. Perhaps some people can more easily manage logistic regression software than survival analysis software, but to me they look equally easy or equally difficult. And errors of interpretation can arise in either, no matter how familiar the software (recall the frequent mistake of using the "classification table" in logistic regression as a criterion of goodness of fit, which is wrong because "fit" in a probabilistic prediction is a fit of the probability, measured as a proportion computed over a group of cases, not the actual occurrence of the event to every individual having p>0.50 and the non-occurrence where p<0.50). Hector -----Mensaje original----- De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Bruce Weaver Enviado el: Wednesday, July 11, 2012 16:44 Para: [hidden email] Asunto: Re: Discrete time survival Hector, as Gene said in his first post, he is using "discrete time survival analysis", and following advice given in Singer & Willett's (2003) book. In chapter 11 of that book, S&W show how one can fit a "discrete-time hazard model" using a logistic regression procedure. To quote: "Our goal is to show that although the [discrete-time hazard] model may appear complex and unfamiliar, it can be fit using software that is familiar by applying standard logistic regression analysis in the /person-period data set/." A "person-period data set" has multiple rows per person, one row per period of observation. There is also an Event indicator variable (call it EVT). For persons who experience the event, EVT=0 on all rows but the last. For those who do not experience the event, EVT=0 on all rows. For more details, see Singer & Willett's book (particularly chapters 11 & 12). HTH. Hector Maletta wrote > > Gene, > I do not know the details of your research program; I thought you were > dealing with a SURVIVAL problem, i.e. trying to ascertain the > probabilities of surviving for a given span of time (after a certain > "start" state) before an event strikes. This is not the same as the > probabilities of the event at various periods in comparison with the > probability of the event at some reference period. An example of the > latter would be to ascertain the probability of tornados in May, June, > July, August, September etc relative to their probability in the > "reference" month, say January. An example of the former is the > probability that a given location would "survive" > without > any tornado, from January until a variable date (January to May, > January to June, January to September etc). It is a completely > different problem. In the former you may use logistic regression > (probability of an event for cases described by category 1, 2, 3, 4, > ...., k, relative to the reference probability of the event for cases > described by the reference category); for instance, suppose you > investigate the probability of tornados with one single predictor, > "Month of year"; this predictor has 12 categories, one per month, and > you use one of the months (say, May) as your reference category (you > may have data from multiple years for the same location, or from > multiple locations in the same year; multiple years for given location > may seem more logical as an example, given the geographical > variability of tornados, and their relatively stable recurrence over > time). You may use also some other concurrent predictor, say > accumulated annual rainfall over the 12 months to the start of the > tornado season, for all years (or > locations) considered. From these data, you obtain the odds of > tornados in September relative to January (probability of tornados in > September, divided by probability of tornados in January), and also > the odds of tornados for all the other months. In total, you obtain > eleven odds (one per month, all relative to January). Your logistic > function for the probability of tornadoes in a given month is > p(k)=EXP(BX)/[1+EXP(BX)] where BX=b0+b1X1+b2X2. The logarithm of the > odds is BX, and the odds are BX. The odds that a tornado happens in a > given month k, say July, relative to the base period (May) in years > with rainfall x(t), equal the probability of a tornado happening in > month k, relative to the probability of a tornado happening in May, > for years with cumulative 12-month rainfall=x(t). (I use May in this > example as the probability of tornados in January is likely to be > zero). > > In the survival case the situation is different. Suppose you use > survival analysis to predict the chances of surviving different > lengths of time without experiencing a tornado (say, at a given area, > like Kansas), using a "time variable" (month of year) and perhaps some > predictor (say, rainfall again). In this case, time is NOT a > predictor: it is the one-directional dimension along which events can > occur. Your hazard function will have only one predictor (rainfall). > The proportional cumulative hazard rate h(t) for a year with rainfall > X=x(t) will mean: "number of events expected to have occurred from > starting time, say January, to month t, in years with rainfall x(t)". > (I use January in this example as the reference time in survival > analysis should be at the start of the relevant period, not in the > middle of it; one may use also March or April, if the start of the > tornado season is always after March). The associated survival > probability up to month k, for a year with rainfall x(t), i.e. p(tk), > gives you the probability of not having a tornado until month k, for > year t with rainfall x(t). > > Logistic regression estimates the probability of a tornado in each > different month. Survival analysis estimates the probability of not > having the tornado up to each different month. > > Cox regression is a particular kind of "proportional hazard" survival > analysis, where (ordinarily) the hazard rates stand to the reference > or base hazard rate in a constant proportion over time. If your > chances of survival are twice as large as mine, guys like you would be > twice more likely to survive than guys like me up to every month, no > matter how distant the month considered. No chance that your survival > chances approach mine over time: > they remain twice as large. For example, if 800 mm/yr of accumulated > rainfall up to start of season afford a reduction of 20% in the > incidence of tornadoes (relative to the reference case which has, say, > 500 mm/yr rainfall), this reduction of 20% in the odds of tornadoes > will hold for all time intervals, i.e. for the relative chances of > tornadoes up to all months. > Cox regression may, however, accommodate non proportional hazards by > introducing time-dependent covariates (such as accumulated rainfall UP > TO EACH MONTH). Other models of survival analysis may account for more > sophisticated relationships between time, covariates and events. > > Hope this helps. > > Hector > > -----Mensaje original----- > De: SPSSX(r) Discussion [mailto:SPSSX-L@.UGA] En nombre de Maguin, > Eugene Enviado el: Wednesday, July 11, 2012 14:03 > Para: SPSSX-L@.UGA > Asunto: Re: Discrete time survival > > Thanks, Hector. I'm not using Cox regression. I've constructed a > so-called person-period data and am using logistic regression to > analyze it. I'm using 'time chunks' to mean periods. Maybe confusing > nomenclature. In each period there is a probability of the event and > an associated odds. Relative to the odds for the reference period an > odds ratio can be computed each other period. The B that referred to > was the coefficient for one of the periods. > > Using a longer aggregation period will eliminate no event periods, I > understand. > > I have read that using Cox regression with discrete time may give > problems because of the multiple events in a period (ties). > > Gene Maguin > > > > -----Original Message----- > From: Hector Maletta [mailto:hmaletta@.com] > Sent: Wednesday, July 11, 2012 12:26 PM > To: Maguin, Eugene; SPSSX-L@.UGA > Subject: RE: Discrete time survival > > I do not think it is wise to correct reality to make it fit the model, > but (if anything) the other way around. > On the other hand, survival analysis is about probabilities, and it is > quite likely that at any particular period the number of events is > larger or smaller than predicted, including zero events as a limiting > case. When the total number of events is 42 over 1200 periods, > zero-event periods are a necessity. > Time chunks might be useful for some purposes, but (1) by lumping time > periods together you are deliberately forfeiting information; and (2) > the actual length of the chunks is arbitrary: your conclusions may > vary if you use 2 months, 3 months, or 17 or 43 days as your > time-chunk convention. At any rate one should try different ways of > carving the chunks, aiming at using the shortest chunk that is > possible to > a > coefficient estimate that is extremely sensible to the random > occurrence of one more or one less event in a given chunk). > You do not clarify what kind of survival analysis you are doing. If it > is Cox regression, recall that call regression uses only information > on the succession or time-order of events, and not on the exact length > of time elapsed. > Also, Cox regression generates a function for the hazard rate (or > conversely the survival probability) including EXP(BX), with B > coefficients for each predictor, and these B coefficients are one > single set of coefficients for the entire analysis, not one > coefficient per chunk of time. Thus I do not understand what you mean > by "a large B coefficient for that chunk". > > Hope this helps. > > Hector > > -----Mensaje original----- > De: SPSSX(r) Discussion [mailto:SPSSX-L@.UGA] En nombre de Maguin, > Eugene Enviado el: Wednesday, July 11, 2012 12:54 > Para: SPSSX-L@.UGA > Asunto: Discrete time survival > > I'm working on a discrete time survival analysis (and following along > with Singer and Willett's chapters) and I've run into a problem that > may occur fairly often in discrete time. A bit of background: 42 out > of 261 persons had the event over a span of about 1200 days, which is > my time unit. I elected to work in discrete time rather than > continuous time because I thought, maybe incorrectly, that discrete > time would be easier to work with and present to an inexperienced > audience. I'm aggregating time to eliminate time chunks with no > events. Using a three month aggregation, I have one time chunk with no > events. The estimation gives a large B coefficient for that chunk, > which is no surprise. My question is whether an acceptable remedy is > to add a tiny value to that cell, changing the 0 to, say, .001, if you > think in crosstab terms. Then, if it an acceptable remedy, is the > doing of it just a matter of changing the event status indicator from > 0 to .001 for a randomly chosen case a! > t that time point? > > Thanks, Gene Maguin > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA (not to SPSSX-L), with no body text except the command. > To leave the list, send the command SIGNOFF SPSSX-L For a list of > commands to manage subscriptions, send the command INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA (not to SPSSX-L), with no body text except the command. > To leave the list, send the command SIGNOFF SPSSX-L For a list of > commands to manage subscriptions, send the command INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA (not to SPSSX-L), with no body text except the command. > To leave the list, send the command SIGNOFF SPSSX-L For a list of > commands to manage subscriptions, send the command INFO REFCARD > ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Discrete-time-survival-tp57141 38p5714149.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |