survival analysis: organizing the data

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

survival analysis: organizing the data

J Scelza
I have a question about survival analysis. We are looking at a data set in
which employment status is recorded for clients at more than one time. For
example, on a quarterly basis for the period of a year. So, we will have
clients who:

1) remained employed for the entire duration of the study;
2) remained unemployed for the entire study;
3) changed from employed at some point to unemployed and remained unemployed
thereafter;
4) changed from unemployed at some point to employed and remained employed
thereafter; or
5) changed from one employment status to the other several times
(oscillating status).

And there can be as many as 5 cases per client, sometimes more.

We want to use survival analysis because we have censored cases in the data
set. Time in measured in days. Employment status is employed = 1, unemployed
= 2. And for now, we are not using any co-variants in the study. We are just
looking at time and employment status.

My question is how to arrange the data set?

The survival analysis is typically used for things like patient survival, so
it only takes into account length of time to patient's death, for example.


Here, our variable - employment status - often changes. So, should we create
one data set that includes only clients who's first status is employed and
then a second data set which includes client's who's first status is
unemployed?

Thanks in advance.
Reply | Threaded
Open this post in threaded view
|

Re: survival analysis: organizing the data

Hector Maletta
         Apparently your subjects can fluctuate between employment and
unemployment. Thus your model is not a simple "time to event" model, like a
life table, in which the "event" is terminal.
         The "time to event" may be counted since the latest "event" or
since the start of observation. In the latter case, observations are left
censored, since they may be in that state since some time ago when the study
commences. In other words: people who are unemployed at the start of the
study and become employed 3 months later have a "time to event" that appears
to be 3 months but is actually longer, since they had been unemployed since
some time before the start of the study. Instead, people who become
unemployed during the study and then find a new job have a perfectly defined
duration of unemployment. Thus your model should include both left and right
censored cases.
         If you do not have any predictor, what is your "model"? You may
have a simple "Markov-like" model in which people change (or do not change)
from state "i" to state "j" in a given time period, and thus your task is
computing the transition rates per unit of time. Or you may have a
Kaplan-Meier model to estimate survival rates. Or you may use the initial
state and the number of prior transitions (or the time since the latest
transition) as predictors, and apply Cox regression to predict the odds of
the next event.

         Perhaps these random thoughts help you to think about how to
proceed.

         Hector

         -----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of J
Scelza
Sent: 25 September 2007 16:42
To: [hidden email]
Subject: survival analysis: organizing the data

         I have a question about survival analysis. We are looking at a data
set in
         which employment status is recorded for clients at more than one
time. For
         example, on a quarterly basis for the period of a year. So, we will
have
         clients who:

         1) remained employed for the entire duration of the study;
         2) remained unemployed for the entire study;
         3) changed from employed at some point to unemployed and remained
unemployed
         thereafter;
         4) changed from unemployed at some point to employed and remained
employed
         thereafter; or
         5) changed from one employment status to the other several times
         (oscillating status).

         And there can be as many as 5 cases per client, sometimes more.

         We want to use survival analysis because we have censored cases in
the data
         set. Time in measured in days. Employment status is employed = 1,
unemployed
         = 2. And for now, we are not using any co-variants in the study. We
are just
         looking at time and employment status.

         My question is how to arrange the data set?

         The survival analysis is typically used for things like patient
survival, so
         it only takes into account length of time to patient's death, for
example.


         Here, our variable - employment status - often changes. So, should
we create
         one data set that includes only clients who's first status is
employed and
         then a second data set which includes client's who's first status
is
         unemployed?

         Thanks in advance.