Login  Register

Re: Multiple Regression & Interactions

Posted by Hector Maletta on Aug 08, 2006; 8:01pm
URL: http://spssx-discussion.165.s1.nabble.com/Multiple-Regression-Interactions-tp1070140p1070144.html

Jeff,

When one value of a dichotomy has a low proportion, the variance of the
variable is LOWER, but it represents a higher proportion of the observed
proportion. We can say the dichotomous variable with the lower proportion
has a lower standard deviation but a higher coefficient of variation. For a
dichotomy, variance is p(1-p), which has a maximum value of 0.25 for p=0.5
and decreases steadily as p decreases or increases away from 0.5. For p=0.1
or 0.9 variance is 0.1 x 0.9=0.09. Now, the standard error of the estimate
of this proportion is the square root of p(1-p)/n, where n is sample size.
Suppose your total sample is n=100. If p=0.5 the standard error is the
square root of 0.25/100=0.05. An approximate confidence interval of two STD
errors would be +/- 0.10 around the estimate, i.e. from 0.4 to 0.6. You
cannot be sure whether any of the alternatives is in the majority, but at
least you are pretty certain that none is zero. Now if the observed
proportion of one of the alternatives was p=0.10, the standard error of the
variable would be the square root of 0.09/100=0.03. This is lower than the
previous case in absolute terms (0.03 < 0.05) but larger in relation to the
proportion (0.03/0.10 > 0.05/0.50). An approximate confidence interval of
two standard errors would be 0.10 +/- 0.06 going from 0.04 to 0.16. This
interval does not contain the zero value. With this sample size (100) you
can therefore be approx 95% confident that if the sample proportion is 0.10
the population proportion is larger than zero. But with a lower total sample
(say with n=50 or n=25) you will probably not (you work it out as an
exercise). So the minimum sample needed depends on the size of the
proportion (0.10 in this example) and the level of confidence desired (95%
in this example). If your sample is not sufficient for 95% confidence, try
90% confidence. It is riskier, of course, but such are the perils of
statistics. There are also some guys around willing to go for less than 90%,
but don't try it at home. It's too dangerous.



Hector



  _____

De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Stats
Q
Enviado el: Tuesday, August 08, 2006 10:15 AM
Para: [hidden email]
Asunto: Re: Multiple Regression & Interactions



Hi Jeff,

Re: the first issue. There are guidelines regarding sample size requirements
for multiple regression, but I couldn't find any guidelines regarding
whether dichotomous predictors have to have a certain amount of cases per
group. I know SPSS will run the analysis anyway, but I imagine results
aren't reliable unless there are say more than 20 cases per group. As you
say,  "you may get an estimate for the regression coefficient, but the p
value will be high".

I follow what you're saying in your example about interactions. I thought
there would be a simple way to look at it :-)

Thank you for your help Jeff.


K S Scot


  _____


From:  Jeff <[hidden email]>
Reply-To:  Jeff <[hidden email]>
To:  [hidden email]
Subject:  Re: Multiple Regression & Interactions
Date:  Mon, 7 Aug 2006 09:35:37 -0600

>At 06:01 AM 8/7/2006, you wrote:
>>Does anyone know whether there are sample size requirements for
>>dichotomous
>>predictors in multiple regression? That is, for the dichotomous
>>predictor,
>>what is the smallest number of cases per group that is allowed?
>>
>>Also, I have another query regarding interpreting interaction
>>effects in
>>SPSS's multiple regression.  When a cross product term is created
>>by
>>multiplying together a predictor which is positively associated
>>with the DV
>>(e.g., happiness) and a predictor that is negatively associated
>>with the DV
>>(e.g., depression), would the resulting product term be expected to
>>show a
>>positive or negative beta coefficient? I'm sure there's a really
>>simple
>>answer to this.
>>
>>Thank you in advance.
>>K S Scot
>
>
>Regarding the first issue - I'm not sure I understand -
>mathematically, the
>number of cases doesn't matter as long as it isn't a constant -
>e.g., all
>cases are in the same group. Practically, if you have a small number
>of
>cases in one group, you won't be able to accurately examine the
>group
>differences - i.e., you may get an estimate for the regression
>coefficient,
>but the p value will be high.
>
>Regarding the second issue - the sign of the bivariate correlations
>doesn't
>really matter. What matters is whether there is an interaction
>between the
>effects (by definition). In other words, let's say Happy-days/month
>is
>positively related to amount of time spent outside of house/month,
>while
>depression/month is negatively related. A significant interaction,
>for
>example, might imply that if there are many depressed days/month,
>the
>desire to go outside during happy days is reduced.
>
>
>
>
>
>
>Jeff




  _____

Hotmail is evolving - be one of the first to try out the Windows
<http://g.msn.com/8HMBENUK/2734??PS=47575>  LiveT Mail Beta