Hello,
I would like to include an interaction term between a categorical and a continuous variable in a multiple regression. I would appreciate it if someone could help me with this. Thank you in advance. Best wishes, Maria |
Maria,
You can accomplish this as follows: 1.) Dummy code the categorical variable (k-1 dummy codes for k levels of the categorical variable) 2.) Compute the product of the continuous variable and each one of the dummy codes (k-1 cross products) 3.) Enter the continuous variable and the dummy-coded categorical variable in the regression model (i.e., the main effects) 4.) Then, for the interaction, enter all of the k-1 cross-products from Step #2 as one block. The increment in R-squared in this step is due to the addition of the interaction between the continuous and categorical variable. The beta weights refer to the effects of the continuous variable with specific levels of the categorical variable. If you wish to refer to a text on this matter, you could look at West and Aiken, Cohen and Cohen, or Pedhauzur, for example. Best, Stephen Brand For personalized and professional consultation in statistics and research design, visit www.statisticsdoc.com -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of Maria Sapouna Sent: Thursday, February 22, 2007 5:50 AM To: [hidden email] Subject: Interaction terms in regression Hello, I would like to include an interaction term between a categorical and a continuous variable in a multiple regression. I would appreciate it if someone could help me with this. Thank you in advance. Best wishes, Maria |
Depending on your continuous variable you may also want to include squared terms as well - in a recent investigation I used interaction terms for age and gender, where age is a continuous variable and gender is coded 1=female and 0=male
however, based on some previous research that suggested that age was best modelled as a quadratic rather than linear term the square of age was also included - this resulted in a series of interaction terms which included all combinations of age ´ female and age2 ´ female and female The Mallows' (1973) approach, involves: (I) estimating a model with the 32 combinations of age and gender that are possible (ranging from including none of the combinations, through to including all five); (ii) calculating a summary measure called Mallows' Cp, based on the number of variables 'p' in the model; and, (iii) selecting as the 'best subset' the collection with Mallows' Cp closest to p+1. Mallows CL (1973): Some comments of Cp. Technometrics 15. pp 661--676. Muir Houston Research Fellow CRLL Institute of Education University of Stirling FK9 4LA 01786-46-7615 ________________________________ From: SPSSX(r) Discussion on behalf of Statisticsdoc Sent: Thu 22/02/2007 13:32 To: [hidden email] Subject: Re: Interaction terms in regression Maria, You can accomplish this as follows: 1.) Dummy code the categorical variable (k-1 dummy codes for k levels of the categorical variable) 2.) Compute the product of the continuous variable and each one of the dummy codes (k-1 cross products) 3.) Enter the continuous variable and the dummy-coded categorical variable in the regression model (i.e., the main effects) 4.) Then, for the interaction, enter all of the k-1 cross-products from Step #2 as one block. The increment in R-squared in this step is due to the addition of the interaction between the continuous and categorical variable. The beta weights refer to the effects of the continuous variable with specific levels of the categorical variable. If you wish to refer to a text on this matter, you could look at West and Aiken, Cohen and Cohen, or Pedhauzur, for example. Best, Stephen Brand For personalized and professional consultation in statistics and research design, visit www.statisticsdoc.com -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of Maria Sapouna Sent: Thursday, February 22, 2007 5:50 AM To: [hidden email] Subject: Interaction terms in regression Hello, I would like to include an interaction term between a categorical and a continuous variable in a multiple regression. I would appreciate it if someone could help me with this. Thank you in advance. Best wishes, Maria -- The University of Stirling is a university established in Scotland by charter at Stirling, FK9 4LA. Privileged/Confidential Information may be contained in this message. If you are not the addressee indicated in this message (or responsible for delivery of the message to such person), you may not disclose, copy or deliver this message to anyone and any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. In such case, you should destroy this message and kindly notify the sender by reply email. Please advise immediately if you or your employer do not consent to Internet email for messages of this kind. |
In reply to this post by Maria Sapouna
Some comments.
At 02:50 AM 2/22/2007, Maria Sapouna wrote: >I would like to include an interaction term between a categorical and >a continuous variable in a multiple regression. At 05:32 AM 2/22/2007, Statisticsdoc wrote, giving correct advice: >2.) Compute the product of the continuous variable and [the dummy code >for all but one category of the categorical]; > >3.) Enter [first] the continuous variable and the dummy-coded >categorical variable in the regression model (i.e., the main effects) > >4.) Then, enter all of the k-1 cross-products from Step #2 as one >block. a. In SPSS command REGRESSION, /METHOD=TEST is good for this b. You've just added k-1 independent variables. Check (carefully) that your sample size is still adequate. (Overall warning: Testing interaction effects often raises needed sample size drastically.) At 05:43 AM 2/22/2007, Muir Houston wrote: >You may want to include squared terms as well - in a recent >investigation I used interaction terms for age and gender, where age >is continuous and gender is coded 1=female and 0=male. [In my study,] >the square of age was also included - a series of interaction terms >which included all combinations of age*female and age2*female [where >'age2'=age**2 and [dummy variable for] female. c. This may be useful, but it 'eats' sample size even more quickly: it's 2*(k-1)+1 new independent variables, instead of k-1. (The extra '+1' is the main-effect squared term.) If the categorical is gender, i.e. k=2, this isn't so severe. But even three more independent variables isn't trivial; check sample size requirements very carefully. (Did I say this before?) d. Don't use age**2. For many populations (e.g., any population consisting of adults, say age>=20), age an age**2 are highly correlated, enough to impair precision of estimation badly. Use something like (age-40)**2, if 40 is reasonably near the mean age in your population. |
Free forum by Nabble | Edit this page |