Multiple Regression & Interactions

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Multiple Regression & Interactions

Stats Q
Dear all,

Does anyone know whether there are sample size requirements for dichotomous
predictors in multiple regression? That is, for the dichotomous predictor,
what is the smallest number of cases per group that is allowed?

Also, I have another query regarding interpreting interaction effects in
SPSS’s multiple regression.  When a cross product term is created by
multiplying together a predictor which is positively associated with the DV
(e.g., happiness) and a predictor that is negatively associated with the DV
(e.g., depression), would the resulting product term be expected to show a
positive or negative beta coefficient? I’m sure there’s a really simple
answer to this.

Thank you in advance.
K S Scot
Reply | Threaded
Open this post in threaded view
|

Re: Multiple Regression & Interactions

Jeff-125
At 06:01 AM 8/7/2006, you wrote:

>Does anyone know whether there are sample size requirements for dichotomous
>predictors in multiple regression? That is, for the dichotomous predictor,
>what is the smallest number of cases per group that is allowed?
>
>Also, I have another query regarding interpreting interaction effects in
>SPSS's multiple regression.  When a cross product term is created by
>multiplying together a predictor which is positively associated with the DV
>(e.g., happiness) and a predictor that is negatively associated with the DV
>(e.g., depression), would the resulting product term be expected to show a
>positive or negative beta coefficient? I'm sure there's a really simple
>answer to this.
>
>Thank you in advance.
>K S Scot


Regarding the first issue - I'm not sure I understand - mathematically, the
number of cases doesn't matter as long as it isn't a constant - e.g., all
cases are in the same group. Practically, if you have a small number of
cases in one group, you won't be able to accurately examine the group
differences - i.e., you may get an estimate for the regression coefficient,
but the p value will be high.

Regarding the second issue - the sign of the bivariate correlations doesn't
really matter. What matters is whether there is an interaction between the
effects (by definition). In other words, let's say Happy-days/month is
positively related to amount of time spent outside of house/month, while
depression/month is negatively related. A significant interaction, for
example, might imply that if there are many depressed days/month, the
desire to go outside during happy days is reduced.






Jeff
Reply | Threaded
Open this post in threaded view
|

Re: Multiple Regression & Interactions

Stats Q

Hi Jeff,

Re: the first issue. There are guidelines regarding sample size requirements for multiple regression, but I couldn't find any guidelines regarding whether dichotomous predictors have to have a certain amount of cases per group. I know SPSS will run the analysis anyway, but I imagine results aren't reliable unless there are say more than 20 cases per group. As you say,  "you may get an estimate for the regression coefficient, but the p value will be high".

I follow what you're saying in your example about interactions. I thought there would be a simple way to look at it :-)

Thank you for your help Jeff.


K S Scot


From:  Jeff <[hidden email]>
Reply-To:  Jeff <[hidden email]>
To:  [hidden email]
Subject:  Re: Multiple Regression & Interactions
Date:  Mon, 7 Aug 2006 09:35:37 -0600

>At 06:01 AM 8/7/2006, you wrote:
>>Does anyone know whether there are sample size requirements for
>>dichotomous
>>predictors in multiple regression? That is, for the dichotomous
>>predictor,
>>what is the smallest number of cases per group that is allowed?
>>
>>Also, I have another query regarding interpreting interaction
>>effects in
>>SPSS's multiple regression.  When a cross product term is created
>>by
>>multiplying together a predictor which is positively associated
>>with the DV
>>(e.g., happiness) and a predictor that is negatively associated
>>with the DV
>>(e.g., depression), would the resulting product term be expected to
>>show a
>>positive or negative beta coefficient? I'm sure there's a really
>>simple
>>answer to this.
>>
>>Thank you in advance.
>>K S Scot
>
>
>Regarding the first issue - I'm not sure I understand -
>mathematically, the
>number of cases doesn't matter as long as it isn't a constant -
>e.g., all
>cases are in the same group. Practically, if you have a small number
>of
>cases in one group, you won't be able to accurately examine the
>group
>differences - i.e., you may get an estimate for the regression
>coefficient,
>but the p value will be high.
>
>Regarding the second issue - the sign of the bivariate correlations
>doesn't
>really matter. What matters is whether there is an interaction
>between the
>effects (by definition). In other words, let's say Happy-days/month
>is
>positively related to amount of time spent outside of house/month,
>while
>depression/month is negatively related. A significant interaction,
>for
>example, might imply that if there are many depressed days/month,
>the
>desire to go outside during happy days is reduced.
>
>
>
>
>
>
>Jeff


Hotmail is evolving - be one of the first to try out the Windows Live™ Mail Beta
Reply | Threaded
Open this post in threaded view
|

Re: Multiple Regression & Interactions

Hector Maletta
Jeff,

When one value of a dichotomy has a low proportion, the variance of the
variable is LOWER, but it represents a higher proportion of the observed
proportion. We can say the dichotomous variable with the lower proportion
has a lower standard deviation but a higher coefficient of variation. For a
dichotomy, variance is p(1-p), which has a maximum value of 0.25 for p=0.5
and decreases steadily as p decreases or increases away from 0.5. For p=0.1
or 0.9 variance is 0.1 x 0.9=0.09. Now, the standard error of the estimate
of this proportion is the square root of p(1-p)/n, where n is sample size.
Suppose your total sample is n=100. If p=0.5 the standard error is the
square root of 0.25/100=0.05. An approximate confidence interval of two STD
errors would be +/- 0.10 around the estimate, i.e. from 0.4 to 0.6. You
cannot be sure whether any of the alternatives is in the majority, but at
least you are pretty certain that none is zero. Now if the observed
proportion of one of the alternatives was p=0.10, the standard error of the
variable would be the square root of 0.09/100=0.03. This is lower than the
previous case in absolute terms (0.03 < 0.05) but larger in relation to the
proportion (0.03/0.10 > 0.05/0.50). An approximate confidence interval of
two standard errors would be 0.10 +/- 0.06 going from 0.04 to 0.16. This
interval does not contain the zero value. With this sample size (100) you
can therefore be approx 95% confident that if the sample proportion is 0.10
the population proportion is larger than zero. But with a lower total sample
(say with n=50 or n=25) you will probably not (you work it out as an
exercise). So the minimum sample needed depends on the size of the
proportion (0.10 in this example) and the level of confidence desired (95%
in this example). If your sample is not sufficient for 95% confidence, try
90% confidence. It is riskier, of course, but such are the perils of
statistics. There are also some guys around willing to go for less than 90%,
but don't try it at home. It's too dangerous.



Hector



  _____

De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Stats
Q
Enviado el: Tuesday, August 08, 2006 10:15 AM
Para: [hidden email]
Asunto: Re: Multiple Regression & Interactions



Hi Jeff,

Re: the first issue. There are guidelines regarding sample size requirements
for multiple regression, but I couldn't find any guidelines regarding
whether dichotomous predictors have to have a certain amount of cases per
group. I know SPSS will run the analysis anyway, but I imagine results
aren't reliable unless there are say more than 20 cases per group. As you
say,  "you may get an estimate for the regression coefficient, but the p
value will be high".

I follow what you're saying in your example about interactions. I thought
there would be a simple way to look at it :-)

Thank you for your help Jeff.


K S Scot


  _____


From:  Jeff <[hidden email]>
Reply-To:  Jeff <[hidden email]>
To:  [hidden email]
Subject:  Re: Multiple Regression & Interactions
Date:  Mon, 7 Aug 2006 09:35:37 -0600

>At 06:01 AM 8/7/2006, you wrote:
>>Does anyone know whether there are sample size requirements for
>>dichotomous
>>predictors in multiple regression? That is, for the dichotomous
>>predictor,
>>what is the smallest number of cases per group that is allowed?
>>
>>Also, I have another query regarding interpreting interaction
>>effects in
>>SPSS's multiple regression.  When a cross product term is created
>>by
>>multiplying together a predictor which is positively associated
>>with the DV
>>(e.g., happiness) and a predictor that is negatively associated
>>with the DV
>>(e.g., depression), would the resulting product term be expected to
>>show a
>>positive or negative beta coefficient? I'm sure there's a really
>>simple
>>answer to this.
>>
>>Thank you in advance.
>>K S Scot
>
>
>Regarding the first issue - I'm not sure I understand -
>mathematically, the
>number of cases doesn't matter as long as it isn't a constant -
>e.g., all
>cases are in the same group. Practically, if you have a small number
>of
>cases in one group, you won't be able to accurately examine the
>group
>differences - i.e., you may get an estimate for the regression
>coefficient,
>but the p value will be high.
>
>Regarding the second issue - the sign of the bivariate correlations
>doesn't
>really matter. What matters is whether there is an interaction
>between the
>effects (by definition). In other words, let's say Happy-days/month
>is
>positively related to amount of time spent outside of house/month,
>while
>depression/month is negatively related. A significant interaction,
>for
>example, might imply that if there are many depressed days/month,
>the
>desire to go outside during happy days is reduced.
>
>
>
>
>
>
>Jeff




  _____

Hotmail is evolving - be one of the first to try out the Windows
<http://g.msn.com/8HMBENUK/2734??PS=47575>  LiveT Mail Beta
Reply | Threaded
Open this post in threaded view
|

Re: Multiple Regression & Interactions

F. Gabarrot
In reply to this post by Stats Q
Hi everyone.

An alternative way to determine your sample size, which is much more
reliable than any rule of thumbs (20 participants per group, 10 observations
per parameter, and so on) is to perform a Power Analysis.

Power can be defined as the probability to reject H0 (your null hypothesis)
when it is actually false. In other word, it is the probability to find an
existing effect. Power depends on three factors : your decision thresold
(namely alpha, much of the time alpha is .05, in other words, a 95%
confidence interval), the effect size, and the sample size.

If you know (or if you are able to approximate) the effect size, and you
choose a particular alpha-level, you will then be able to perform a power
analysis in order to determine a recommended sample size given a desired
power. Cohen (1988) recommends a power of .80.

Let's take an example. Imagine you made an experiment with 2 experimental
groups. Each of these groups had 20 participants (classical rule of thumbs :
20 participants per group). The effect you are looking for is expected to
have a medium effect size (Cohen's d = .50, which is equivalent to a .243
effect size correlation), your decision thresold is alpha = .05 (which means
that if your statistical index is below a certain value corresponding to
this thresold, you will not reject H0). Given these informations, power can
be calculated. Your 20-participants-per-group means comparison will have a
power of .46.
If you wish to have a .80 power, your experiment must have about 50
participants per group.

This reflexion concerning power is very very close to Hector's answer.

If you are looking for more informations concerning power, you may read the
following :

Cohen, J. (1988). statistical power analysis for the behavioral sciences
(2nd ed.). Hillsdale, NJ: Erlbaum.

Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155-159.

Regards.

Fabrice.

****************************
Fabrice Gabarrot
PhD Student - Social Psychology
University of Geneva - Switzerland



On Tue, 8 Aug 2006 16:01:44 -0300, Hector Maletta <[hidden email]>
wrote:

>Jeff,
>
>When one value of a dichotomy has a low proportion, the variance of the
>variable is LOWER, but it represents a higher proportion of the observed
>proportion. We can say the dichotomous variable with the lower proportion
>has a lower standard deviation but a higher coefficient of variation. For a
>dichotomy, variance is p(1-p), which has a maximum value of 0.25 for p=0.5
>and decreases steadily as p decreases or increases away from 0.5. For p=0.1
>or 0.9 variance is 0.1 x 0.9=0.09. Now, the standard error of the estimate
>of this proportion is the square root of p(1-p)/n, where n is sample size.
>Suppose your total sample is n=100. If p=0.5 the standard error is the
>square root of 0.25/100=0.05. An approximate confidence interval of two STD
>errors would be +/- 0.10 around the estimate, i.e. from 0.4 to 0.6. You
>cannot be sure whether any of the alternatives is in the majority, but at
>least you are pretty certain that none is zero. Now if the observed
>proportion of one of the alternatives was p=0.10, the standard error of the
>variable would be the square root of 0.09/100=0.03. This is lower than the
>previous case in absolute terms (0.03 < 0.05) but larger in relation to the
>proportion (0.03/0.10 > 0.05/0.50). An approximate confidence interval of
>two standard errors would be 0.10 +/- 0.06 going from 0.04 to 0.16. This
>interval does not contain the zero value. With this sample size (100) you
>can therefore be approx 95% confident that if the sample proportion is 0.10
>the population proportion is larger than zero. But with a lower total sample
>(say with n=50 or n=25) you will probably not (you work it out as an
>exercise). So the minimum sample needed depends on the size of the
>proportion (0.10 in this example) and the level of confidence desired (95%
>in this example). If your sample is not sufficient for 95% confidence, try
>90% confidence. It is riskier, of course, but such are the perils of
>statistics. There are also some guys around willing to go for less than 90%,
>but don't try it at home. It's too dangerous.
>
>
>
>Hector
>
>
>
>  _____
>
>De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Stats
>Q
>Enviado el: Tuesday, August 08, 2006 10:15 AM
>Para: [hidden email]
>Asunto: Re: Multiple Regression & Interactions
>
>
>
>Hi Jeff,
>
>Re: the first issue. There are guidelines regarding sample size requirements
>for multiple regression, but I couldn't find any guidelines regarding
>whether dichotomous predictors have to have a certain amount of cases per
>group. I know SPSS will run the analysis anyway, but I imagine results
>aren't reliable unless there are say more than 20 cases per group. As you
>say,  "you may get an estimate for the regression coefficient, but the p
>value will be high".
>
>I follow what you're saying in your example about interactions. I thought
>there would be a simple way to look at it :-)
>
>Thank you for your help Jeff.
>
>
>K S Scot
>
>
>  _____
>
>
>From:  Jeff <[hidden email]>
>Reply-To:  Jeff <[hidden email]>
>To:  [hidden email]
>Subject:  Re: Multiple Regression & Interactions
>Date:  Mon, 7 Aug 2006 09:35:37 -0600
>>At 06:01 AM 8/7/2006, you wrote:
>>>Does anyone know whether there are sample size requirements for
>>>dichotomous
>>>predictors in multiple regression? That is, for the dichotomous
>>>predictor,
>>>what is the smallest number of cases per group that is allowed?
>>>
>>>Also, I have another query regarding interpreting interaction
>>>effects in
>>>SPSS's multiple regression.  When a cross product term is created
>>>by
>>>multiplying together a predictor which is positively associated
>>>with the DV
>>>(e.g., happiness) and a predictor that is negatively associated
>>>with the DV
>>>(e.g., depression), would the resulting product term be expected to
>>>show a
>>>positive or negative beta coefficient? I'm sure there's a really
>>>simple
>>>answer to this.
>>>
>>>Thank you in advance.
>>>K S Scot
>>
>>
>>Regarding the first issue - I'm not sure I understand -
>>mathematically, the
>>number of cases doesn't matter as long as it isn't a constant -
>>e.g., all
>>cases are in the same group. Practically, if you have a small number
>>of
>>cases in one group, you won't be able to accurately examine the
>>group
>>differences - i.e., you may get an estimate for the regression
>>coefficient,
>>but the p value will be high.
>>
>>Regarding the second issue - the sign of the bivariate correlations
>>doesn't
>>really matter. What matters is whether there is an interaction
>>between the
>>effects (by definition). In other words, let's say Happy-days/month
>>is
>>positively related to amount of time spent outside of house/month,
>>while
>>depression/month is negatively related. A significant interaction,
>>for
>>example, might imply that if there are many depressed days/month,
>>the
>>desire to go outside during happy days is reduced.
>>
>>
>>
>>
>>
>>
>>Jeff
>
>
>
>
>  _____
>
>Hotmail is evolving - be one of the first to try out the Windows
><http://g.msn.com/8HMBENUK/2734??PS=47575>  LiveT Mail Beta