Cox Regression - interpreting results, output not 'naturally' coded

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Cox Regression - interpreting results, output not 'naturally' coded

Brad
Hi,

In Stata the results of a cox model are 'naturally' coded into dummy
variables, in the sense that _Ivar_1 corresponds to var==1, _var_2 to
var==2, _var_3 to var==3 etc.

This does not appear to be the case in SPSS, if var=2 was missing or
ommitted from the model, the dummy variables in the output would then lose
parity with the values of var; var(1) corresponds correctly to var=1, but
var(2) would then correspond to var=3 (as var=2 is missing).

Where var has many different categories (around 30 in my case), the output
is difficult to interpret.

Is there a way to 'naturally' code the output to the values of var in SPSS?

Also is around 30 catergorical values of the predictor variables too many
for cox modelling?

Many thanks in advance,

Brad

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Cox Regression - interpreting results, output not 'naturally' coded

Maguin, Eugene
1) Recode statement.
2) Depends on sample size.
Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Brad
Sent: Wednesday, October 31, 2012 11:12 AM
To: [hidden email]
Subject: Cox Regression - interpreting results, output not 'naturally' coded

Hi,

In Stata the results of a cox model are 'naturally' coded into dummy variables, in the sense that _Ivar_1 corresponds to var==1, _var_2 to var==2, _var_3 to var==3 etc.

This does not appear to be the case in SPSS, if var=2 was missing or ommitted from the model, the dummy variables in the output would then lose parity with the values of var; var(1) corresponds correctly to var=1, but
var(2) would then correspond to var=3 (as var=2 is missing).

Where var has many different categories (around 30 in my case), the output is difficult to interpret.

Is there a way to 'naturally' code the output to the values of var in SPSS?

Also is around 30 catergorical values of the predictor variables too many for cox modelling?

Many thanks in advance,

Brad

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Cox Regression - interpreting results, output not 'naturally' coded

Brad
In reply to this post by Brad
On Wed, 31 Oct 2012 13:48:04 -0400, Maguin, Eugene <[hidden email]>
wrote:

>1) Recode statement.
>2) Depends on sample size.
>Gene Maguin


1) Recode merely changes the values of the categorical variable. I don't
understand how this addresses the issue. The problem is when the cox model
creates dummy variables 'on the fly' (kind of) to represent each category
value. The new dummy variables created by the cox model does not always
equal the categorical value of the covariate.

If var is my covariate, with categorical values of 0, 1, 2 and 3, the
output of the model will be (assigning 0 as reference):

var
var(1)
var(2)
var(3)

However, if var=2 is omitted from the model, the output becomes:

var
var(1)
var(2)

Now var(2) no longer represents var=2 (It now represents var=3).

To put it another way, In Stata, I could have a value of var=99, and if
this was included the output would become:

var
var_1
var_2
var_3
var_99

Easy to interpret. Var=99 becomes var_99. Whereas in SPSS it would be

var
var(1)
var(2)
var(3)
var(4)

Where var=99 has been assigned to var(4). This becomes very confusing when
you're working with multiple and changing categorical values of the
covariate.

I don't see how a recode command will address this?

Thanks,
Brad

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Cox Regression - interpreting results, output not 'naturally' coded

Maguin, Eugene
In reply to this post by Brad
Well. So, if you haven't already guessed, understand that I don't know stata. Thus, I had to make up what might be going on and translate that into spss. I still think recode is what you need to use. Here's my line of thought. You said you had a categorical variable that was to be a predictor. Let's say the variable had 3 values although your variable had many more. Your stata command must have been something like this:
stcox age i.dose, where i.dose is a categorical variable with three values of, let's say, 1, 2, 3 (whether the 'I.' in front of 'dose' is salient, I don't know, but you would). I also think there is some things missing from the command, like the DV and the event indicator but I don't know that either. Anyway, the way I understood stata to work was this. Stata somehow recognizes that a variable is categorical  and then creates internally a set of contrast variables such that the lowest value, 1 in this case, of the original variable is the reference category and that two contrast variables are to be created. One piece that you emphasized was that the contrast variables are named with respect to the value of original variable. So, here, the two contrast variables are cv2, which contrasts i.dose=2 against i.dose=1, and cv3, which is 3 against 1. The other piece that you emphasized, and maybe I didn't understand correctly, is that cv2=2 and cv3=3. I think of the standard coding !
 as being cv2=1 if i.dose=2 and cv3=1 if i.dose=3. That's an interesting coding scheme but I don't think it matters.

Now you switch to spss and you have something like COXREG VARIABLES = Time WITH dose/STATUS = (1)/
CATEGORICAL = dose/CONTRAST (dose) = DEVIATION (1)/ENTER dose.
And you have the problem you describe, I agree.

Why did I say 'Recode'? Because:
Recode dose(1 3=0)(2=1) into dose2/dose(1 2=0)(3=1) into dose3. /* standard coding.
OR
Recode dose(1 3=0)(2=2) into dose2/dose(1 2=0)(3=3) into dose3. /* stata?? coding.

Then:
COXREG VARIABLES = Time WITH dose2 dose3/STATUS = (1)/ENTER dose2 dose3.

What do you think?
Gene Maguin



-----Original Message-----
From: Brad [mailto:[hidden email]]
Sent: Thursday, November 01, 2012 7:34 AM
To: [hidden email]; Maguin, Eugene
Subject: Re: Cox Regression - interpreting results, output not 'naturally' coded

On Wed, 31 Oct 2012 13:48:04 -0400, Maguin, Eugene <[hidden email]>
wrote:

>1) Recode statement.
>2) Depends on sample size.
>Gene Maguin


1) Recode merely changes the values of the categorical variable. I don't understand how this addresses the issue. The problem is when the cox model creates dummy variables 'on the fly' (kind of) to represent each category value. The new dummy variables created by the cox model does not always equal the categorical value of the covariate.

If var is my covariate, with categorical values of 0, 1, 2 and 3, the output of the model will be (assigning 0 as reference):

var
var(1)
var(2)
var(3)

However, if var=2 is omitted from the model, the output becomes:

var
var(1)
var(2)

Now var(2) no longer represents var=2 (It now represents var=3).

To put it another way, In Stata, I could have a value of var=99, and if this was included the output would become:

var
var_1
var_2
var_3
var_99

Easy to interpret. Var=99 becomes var_99. Whereas in SPSS it would be

var
var(1)
var(2)
var(3)
var(4)

Where var=99 has been assigned to var(4). This becomes very confusing when you're working with multiple and changing categorical values of the covariate.

I don't see how a recode command will address this?

Thanks,
Brad

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Cox Regression - interpreting results, output not 'naturally' coded

Bruce Weaver
Administrator
I haven't been following this thread very closely.  However, you always have the option of computing your own (1-0) indicator variables with names that are meaningful to you, and including k-1 of them in the model as covariates, not factors (where covariate translates to continuous variable and factor to categorical variable).  This creates extra work, especially if there are product terms in the model.  But it should give you what you're after.  

Over to Jon, who will now tell you about a Python-based extension command for computing indicator variables.  ;-)

HTH.



Maguin, Eugene wrote
Well. So, if you haven't already guessed, understand that I don't know stata. Thus, I had to make up what might be going on and translate that into spss. I still think recode is what you need to use. Here's my line of thought. You said you had a categorical variable that was to be a predictor. Let's say the variable had 3 values although your variable had many more. Your stata command must have been something like this:
stcox age i.dose, where i.dose is a categorical variable with three values of, let's say, 1, 2, 3 (whether the 'I.' in front of 'dose' is salient, I don't know, but you would). I also think there is some things missing from the command, like the DV and the event indicator but I don't know that either. Anyway, the way I understood stata to work was this. Stata somehow recognizes that a variable is categorical  and then creates internally a set of contrast variables such that the lowest value, 1 in this case, of the original variable is the reference category and that two contrast variables are to be created. One piece that you emphasized was that the contrast variables are named with respect to the value of original variable. So, here, the two contrast variables are cv2, which contrasts i.dose=2 against i.dose=1, and cv3, which is 3 against 1. The other piece that you emphasized, and maybe I didn't understand correctly, is that cv2=2 and cv3=3. I think of the standard coding !
 as being cv2=1 if i.dose=2 and cv3=1 if i.dose=3. That's an interesting coding scheme but I don't think it matters.

Now you switch to spss and you have something like COXREG VARIABLES = Time WITH dose/STATUS = (1)/
CATEGORICAL = dose/CONTRAST (dose) = DEVIATION (1)/ENTER dose.
And you have the problem you describe, I agree.

Why did I say 'Recode'? Because:
Recode dose(1 3=0)(2=1) into dose2/dose(1 2=0)(3=1) into dose3. /* standard coding.
OR
Recode dose(1 3=0)(2=2) into dose2/dose(1 2=0)(3=3) into dose3. /* stata?? coding.

Then:
COXREG VARIABLES = Time WITH dose2 dose3/STATUS = (1)/ENTER dose2 dose3.

What do you think?
Gene Maguin



-----Original Message-----
From: Brad [mailto:[hidden email]]
Sent: Thursday, November 01, 2012 7:34 AM
To: [hidden email]; Maguin, Eugene
Subject: Re: Cox Regression - interpreting results, output not 'naturally' coded

On Wed, 31 Oct 2012 13:48:04 -0400, Maguin, Eugene <[hidden email]>
wrote:

>1) Recode statement.
>2) Depends on sample size.
>Gene Maguin


1) Recode merely changes the values of the categorical variable. I don't understand how this addresses the issue. The problem is when the cox model creates dummy variables 'on the fly' (kind of) to represent each category value. The new dummy variables created by the cox model does not always equal the categorical value of the covariate.

If var is my covariate, with categorical values of 0, 1, 2 and 3, the output of the model will be (assigning 0 as reference):

var
var(1)
var(2)
var(3)

However, if var=2 is omitted from the model, the output becomes:

var
var(1)
var(2)

Now var(2) no longer represents var=2 (It now represents var=3).

To put it another way, In Stata, I could have a value of var=99, and if this was included the output would become:

var
var_1
var_2
var_3
var_99

Easy to interpret. Var=99 becomes var_99. Whereas in SPSS it would be

var
var(1)
var(2)
var(3)
var(4)

Where var=99 has been assigned to var(4). This becomes very confusing when you're working with multiple and changing categorical values of the covariate.

I don't see how a recode command will address this?

Thanks,
Brad

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Cox Regression - interpreting results, output not 'naturally' coded

Jon K Peck
The SPSSINC CREATE DUMMIES extension command will create and label these indicator variables automatically rather than having to write out all those COMPUTE statements.  It can even do interaction terms.

:-)


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        Bruce Weaver <[hidden email]>
To:        [hidden email],
Date:        11/01/2012 09:01 AM
Subject:        Re: [SPSSX-L] Cox Regression - interpreting results,              output not 'naturally' coded
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




I haven't been following this thread very closely.  However, you always have
the option of computing your own (1-0) indicator variables with names that
are meaningful to you, and including k-1 of them in the model as covariates,
not factors (where covariate translates to continuous variable and factor to
categorical variable).  This creates extra work, especially if there are
product terms in the model.  But it should give you what you're after.

Over to Jon, who will now tell you about a Python-based extension command
for computing indicator variables.  ;-)

HTH.




Maguin, Eugene wrote
> Well. So, if you haven't already guessed, understand that I don't know
> stata. Thus, I had to make up what might be going on and translate that
> into spss. I still think recode is what you need to use. Here's my line of
> thought. You said you had a categorical variable that was to be a
> predictor. Let's say the variable had 3 values although your variable had
> many more. Your stata command must have been something like this:
> stcox age i.dose, where i.dose is a categorical variable with three values
> of, let's say, 1, 2, 3 (whether the 'I.' in front of 'dose' is salient, I
> don't know, but you would). I also think there is some things missing from
> the command, like the DV and the event indicator but I don't know that
> either. Anyway, the way I understood stata to work was this. Stata somehow
> recognizes that a variable is categorical  and then creates internally a
> set of contrast variables such that the lowest value, 1 in this case, of
> the original variable is the reference category and that two contrast
> variables are to be created. One piece that you emphasized was that the
> contrast variables are named with respect to the value of original
> variable. So, here, the two contrast variables are cv2, which contrasts
> i.dose=2 against i.dose=1, and cv3, which is 3 against 1. The other piece
> that you emphasized, and maybe I didn't understand correctly, is that
> cv2=2 and cv3=3. I think of the standard coding !
>  as being cv2=1 if i.dose=2 and cv3=1 if i.dose=3. That's an interesting
> coding scheme but I don't think it matters.
>
> Now you switch to spss and you have something like COXREG VARIABLES = Time
> WITH dose/STATUS = (1)/
> CATEGORICAL = dose/CONTRAST (dose) = DEVIATION (1)/ENTER dose.
> And you have the problem you describe, I agree.
>
> Why did I say 'Recode'? Because:
> Recode dose(1 3=0)(2=1) into dose2/dose(1 2=0)(3=1) into dose3. /*
> standard coding.
> OR
> Recode dose(1 3=0)(2=2) into dose2/dose(1 2=0)(3=3) into dose3. /* stata??
> coding.
>
> Then:
> COXREG VARIABLES = Time WITH dose2 dose3/STATUS = (1)/ENTER dose2 dose3.
>
> What do you think?
> Gene Maguin
>
>
>
> -----Original Message-----
> From: Brad [mailto:

> bradley.kirby@

> ]
> Sent: Thursday, November 01, 2012 7:34 AM
> To:

> SPSSX-L@.UGA

> ; Maguin, Eugene
> Subject: Re: Cox Regression - interpreting results, output not 'naturally'
> coded
>
> On Wed, 31 Oct 2012 13:48:04 -0400, Maguin, Eugene &lt;

> emaguin@

> &gt;
> wrote:
>
>>1) Recode statement.
>>2) Depends on sample size.
>>Gene Maguin
>
>
> 1) Recode merely changes the values of the categorical variable. I don't
> understand how this addresses the issue. The problem is when the cox model
> creates dummy variables 'on the fly' (kind of) to represent each category
> value. The new dummy variables created by the cox model does not always
> equal the categorical value of the covariate.
>
> If var is my covariate, with categorical values of 0, 1, 2 and 3, the
> output of the model will be (assigning 0 as reference):
>
> var
> var(1)
> var(2)
> var(3)
>
> However, if var=2 is omitted from the model, the output becomes:
>
> var
> var(1)
> var(2)
>
> Now var(2) no longer represents var=2 (It now represents var=3).
>
> To put it another way, In Stata, I could have a value of var=99, and if
> this was included the output would become:
>
> var
> var_1
> var_2
> var_3
> var_99
>
> Easy to interpret. Var=99 becomes var_99. Whereas in SPSS it would be
>
> var
> var(1)
> var(2)
> var(3)
> var(4)
>
> Where var=99 has been assigned to var(4). This becomes very confusing when
> you're working with multiple and changing categorical values of the
> covariate.
>
> I don't see how a recode command will address this?
>
> Thanks,
> Brad
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Cox-Regression-interpreting-results-output-not-naturally-coded-tp5715980p5715999.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD