interaction in a linear regression model

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

interaction in a linear regression model

Myung Ki
Hello, everybody.

I have queries about interaction.

Here is model;

Y (Y1-Y4) = b0 + b1X1 + b2X2 + b3X1*X2 + e

In one model, both X1 (4 levels) and X2 (5 levels) are categorical, when Y
is continuous. Proc glm gives me lots of lines from all combinations of
levels. For illustration purpose I thought it might be better to have one
estimate than displaying estimates from all combinations of levels, and I
put X1 and X2 as continuous variable. I am not sure whether this is a right
approach.

In another model, Y and X1 is continous and X2 is categorical(5 levels).
When I put this model, without saying to SAS X2 is categorical, then all
p-value for each Y (Y1-Y4) were significant (P-value was based on Type III
SS). However, if I model X2 as categorical, then all but one Y were not
significant. When I looked at the data and plotted them, the latter looks to
be more sensible. But, to be consistent with previous model in presentation,
I prefer to have one (overall) estimates.

So the question is;
1) whether introducing a categorical data as a continuous variable to create
interaction term is correct and if there is difference what would be correct,
2) In case that categorical variable(s) consist of interaction term, P value
from type III SS can be used for overall assessment of interaction term,
3) If (2) is case, then what would be better way to display so many
estimates and if there is any alternaitve way,

Any suggestion and guidance to relevant references will be appreciated.

Thanks in advance.

Myung ki, PhD

University College London

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: interaction in a linear regression model

mpirritano
If X1 and X2 are categorical then you need recode them in order to enter
them into a linear regression. Dummy coding or effect coding. Otherwise
you're treating the categories in X1 and X2 as if they were continuous
intervals on a scale, which probably doesn't make sense for categorical
variables.

Then to look at interactions you'd look at interactions between each
dummy/ effect coded variable and each other dummy/ effect coded
variable.

My favorite reference for interaction effects in regression is Jaccard &
Turrisi (2003). It's a little green Sage University Paper. Very
thorough.

Good luck.

matt

Matthew Pirritano, Ph.D.
Research Analyst IV
Medical Services Initiative (MSI)
Orange County Health Care Agency
(714) 568-5648


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Myung Ki
Sent: Thursday, June 03, 2010 12:31 PM
To: [hidden email]
Subject: interaction in a linear regression model

Hello, everybody.

I have queries about interaction.

Here is model;

Y (Y1-Y4) = b0 + b1X1 + b2X2 + b3X1*X2 + e

In one model, both X1 (4 levels) and X2 (5 levels) are categorical, when
Y
is continuous. Proc glm gives me lots of lines from all combinations of
levels. For illustration purpose I thought it might be better to have
one
estimate than displaying estimates from all combinations of levels, and
I
put X1 and X2 as continuous variable. I am not sure whether this is a
right
approach.

In another model, Y and X1 is continous and X2 is categorical(5 levels).
When I put this model, without saying to SAS X2 is categorical, then all
p-value for each Y (Y1-Y4) were significant (P-value was based on Type
III
SS). However, if I model X2 as categorical, then all but one Y were not
significant. When I looked at the data and plotted them, the latter
looks to
be more sensible. But, to be consistent with previous model in
presentation,
I prefer to have one (overall) estimates.

So the question is;
1) whether introducing a categorical data as a continuous variable to
create
interaction term is correct and if there is difference what would be
correct,
2) In case that categorical variable(s) consist of interaction term, P
value
from type III SS can be used for overall assessment of interaction term,
3) If (2) is case, then what would be better way to display so many
estimates and if there is any alternaitve way,

Any suggestion and guidance to relevant references will be appreciated.

Thanks in advance.

Myung ki, PhD

University College London

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: interaction in a linear regression model

Garry Gelade
Myung seems to be using *SAS*/GLM. SPSS/GLM can handle categorical variables
directly without need for dummy codings.

Whether it is appropriate to treat the independents as continuous depends on
what they represent and how they are encoded. Often, Likert scale responses
(which strictly speaking are ordinal) are treated as continuous if there are
5 or more categories and no-one seems to object.

Garry

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Pirritano, Matthew
Sent: 03 June 2010 21:35
To: [hidden email]
Subject: Re: interaction in a linear regression model

If X1 and X2 are categorical then you need recode them in order to enter
them into a linear regression. Dummy coding or effect coding. Otherwise
you're treating the categories in X1 and X2 as if they were continuous
intervals on a scale, which probably doesn't make sense for categorical
variables.

Then to look at interactions you'd look at interactions between each
dummy/ effect coded variable and each other dummy/ effect coded
variable.

My favorite reference for interaction effects in regression is Jaccard &
Turrisi (2003). It's a little green Sage University Paper. Very
thorough.

Good luck.

matt

Matthew Pirritano, Ph.D.
Research Analyst IV
Medical Services Initiative (MSI)
Orange County Health Care Agency
(714) 568-5648


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Myung Ki
Sent: Thursday, June 03, 2010 12:31 PM
To: [hidden email]
Subject: interaction in a linear regression model

Hello, everybody.

I have queries about interaction.

Here is model;

Y (Y1-Y4) = b0 + b1X1 + b2X2 + b3X1*X2 + e

In one model, both X1 (4 levels) and X2 (5 levels) are categorical, when
Y
is continuous. Proc glm gives me lots of lines from all combinations of
levels. For illustration purpose I thought it might be better to have
one
estimate than displaying estimates from all combinations of levels, and
I
put X1 and X2 as continuous variable. I am not sure whether this is a
right
approach.

In another model, Y and X1 is continous and X2 is categorical(5 levels).
When I put this model, without saying to SAS X2 is categorical, then all
p-value for each Y (Y1-Y4) were significant (P-value was based on Type
III
SS). However, if I model X2 as categorical, then all but one Y were not
significant. When I looked at the data and plotted them, the latter
looks to
be more sensible. But, to be consistent with previous model in
presentation,
I prefer to have one (overall) estimates.

So the question is;
1) whether introducing a categorical data as a continuous variable to
create
interaction term is correct and if there is difference what would be
correct,
2) In case that categorical variable(s) consist of interaction term, P
value
from type III SS can be used for overall assessment of interaction term,
3) If (2) is case, then what would be better way to display so many
estimates and if there is any alternaitve way,

Any suggestion and guidance to relevant references will be appreciated.

Thanks in advance.

Myung ki, PhD

University College London

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: interaction in a linear regression model

Granaas, Michael
In reply to this post by mpirritano
A nit:  There is also the more general case of orthogonal contrast coding that should be considered.  Off the top of my head Keppel and Wickens have a two chapter introduction that is highly readable.  There should be other texts that do a nice job with the topic as well.  
 
Michael 

****************************************************
Michael Granaas             [hidden email]
Assoc. Prof.                Phone: 605 677 5295
Dept. of Psychology         FAX:  605 677 3195
University of South Dakota
414 E. Clark St.
Vermillion, SD 57069
*****************************************************

From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Pirritano, Matthew [[hidden email]]
Sent: Thursday, June 03, 2010 3:35 PM
To: [hidden email]
Subject: Re: interaction in a linear regression model

If X1 and X2 are categorical then you need recode them in order to enter
them into a linear regression. Dummy coding or effect coding. Otherwise
you're treating the categories in X1 and X2 as if they were continuous
intervals on a scale, which probably doesn't make sense for categorical
variables.

Then to look at interactions you'd look at interactions between each
dummy/ effect coded variable and each other dummy/ effect coded
variable.

My favorite reference for interaction effects in regression is Jaccard &
Turrisi (2003). It's a little green Sage University Paper. Very
thorough.

Good luck.

matt

Matthew Pirritano, Ph.D.
Research Analyst IV
Medical Services Initiative (MSI)
Orange County Health Care Agency
(714) 568-5648


-----Original Message-----
From: SPSSX(r) Discussion [[hidden email]] On Behalf Of
Myung Ki
Sent: Thursday, June 03, 2010 12:31 PM
To: [hidden email]
Subject: interaction in a linear regression model

Hello, everybody.

I have queries about interaction.

Here is model;

Y (Y1-Y4) = b0 + b1X1 + b2X2 + b3X1*X2 + e

In one model, both X1 (4 levels) and X2 (5 levels) are categorical, when
Y
is continuous. Proc glm gives me lots of lines from all combinations of
levels. For illustration purpose I thought it might be better to have
one
estimate than displaying estimates from all combinations of levels, and
I
put X1 and X2 as continuous variable. I am not sure whether this is a
right
approach.

In another model, Y and X1 is continous and X2 is categorical(5 levels).
When I put this model, without saying to SAS X2 is categorical, then all
p-value for each Y (Y1-Y4) were significant (P-value was based on Type
III
SS). However, if I model X2 as categorical, then all but one Y were not
significant. When I looked at the data and plotted them, the latter
looks to
be more sensible. But, to be consistent with previous model in
presentation,
I prefer to have one (overall) estimates.

So the question is;
1) whether introducing a categorical data as a continuous variable to
create
interaction term is correct and if there is difference what would be
correct,
2) In case that categorical variable(s) consist of interaction term, P
value
from type III SS can be used for overall assessment of interaction term,
3) If (2) is case, then what would be better way to display so many
estimates and if there is any alternaitve way,

Any suggestion and guidance to relevant references will be appreciated.

Thanks in advance.

Myung ki, PhD

University College London

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD