Dear all,
Does anyone know whether there are sample size requirements for dichotomous predictors in multiple regression? That is, for the dichotomous predictor, what is the smallest number of cases per group that is allowed? Also, I have another query regarding interpreting interaction effects in SPSSs multiple regression. When a cross product term is created by multiplying together a predictor which is positively associated with the DV (e.g., happiness) and a predictor that is negatively associated with the DV (e.g., depression), would the resulting product term be expected to show a positive or negative beta coefficient? Im sure theres a really simple answer to this. Thank you in advance. K S Scot |
At 06:01 AM 8/7/2006, you wrote:
>Does anyone know whether there are sample size requirements for dichotomous >predictors in multiple regression? That is, for the dichotomous predictor, >what is the smallest number of cases per group that is allowed? > >Also, I have another query regarding interpreting interaction effects in >SPSS's multiple regression. When a cross product term is created by >multiplying together a predictor which is positively associated with the DV >(e.g., happiness) and a predictor that is negatively associated with the DV >(e.g., depression), would the resulting product term be expected to show a >positive or negative beta coefficient? I'm sure there's a really simple >answer to this. > >Thank you in advance. >K S Scot Regarding the first issue - I'm not sure I understand - mathematically, the number of cases doesn't matter as long as it isn't a constant - e.g., all cases are in the same group. Practically, if you have a small number of cases in one group, you won't be able to accurately examine the group differences - i.e., you may get an estimate for the regression coefficient, but the p value will be high. Regarding the second issue - the sign of the bivariate correlations doesn't really matter. What matters is whether there is an interaction between the effects (by definition). In other words, let's say Happy-days/month is positively related to amount of time spent outside of house/month, while depression/month is negatively related. A significant interaction, for example, might imply that if there are many depressed days/month, the desire to go outside during happy days is reduced. Jeff |
Hi Jeff, Re: the first issue. There are guidelines regarding sample size requirements for multiple regression, but I couldn't find any guidelines regarding whether dichotomous predictors have to have a certain amount of cases per group. I know SPSS will run the analysis anyway, but I imagine results aren't reliable unless there are say more than 20 cases per group. As you say, "you may get an estimate for the regression coefficient, but the p value will be high". I follow what you're saying in your example about interactions. I thought there would be a simple way to look at it :-) Thank you for your help Jeff.
Hotmail is evolving - be one of the first to try out the Windows Live Mail Beta |
Jeff,
When one value of a dichotomy has a low proportion, the variance of the variable is LOWER, but it represents a higher proportion of the observed proportion. We can say the dichotomous variable with the lower proportion has a lower standard deviation but a higher coefficient of variation. For a dichotomy, variance is p(1-p), which has a maximum value of 0.25 for p=0.5 and decreases steadily as p decreases or increases away from 0.5. For p=0.1 or 0.9 variance is 0.1 x 0.9=0.09. Now, the standard error of the estimate of this proportion is the square root of p(1-p)/n, where n is sample size. Suppose your total sample is n=100. If p=0.5 the standard error is the square root of 0.25/100=0.05. An approximate confidence interval of two STD errors would be +/- 0.10 around the estimate, i.e. from 0.4 to 0.6. You cannot be sure whether any of the alternatives is in the majority, but at least you are pretty certain that none is zero. Now if the observed proportion of one of the alternatives was p=0.10, the standard error of the variable would be the square root of 0.09/100=0.03. This is lower than the previous case in absolute terms (0.03 < 0.05) but larger in relation to the proportion (0.03/0.10 > 0.05/0.50). An approximate confidence interval of two standard errors would be 0.10 +/- 0.06 going from 0.04 to 0.16. This interval does not contain the zero value. With this sample size (100) you can therefore be approx 95% confident that if the sample proportion is 0.10 the population proportion is larger than zero. But with a lower total sample (say with n=50 or n=25) you will probably not (you work it out as an exercise). So the minimum sample needed depends on the size of the proportion (0.10 in this example) and the level of confidence desired (95% in this example). If your sample is not sufficient for 95% confidence, try 90% confidence. It is riskier, of course, but such are the perils of statistics. There are also some guys around willing to go for less than 90%, but don't try it at home. It's too dangerous. Hector _____ De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Stats Q Enviado el: Tuesday, August 08, 2006 10:15 AM Para: [hidden email] Asunto: Re: Multiple Regression & Interactions Hi Jeff, Re: the first issue. There are guidelines regarding sample size requirements for multiple regression, but I couldn't find any guidelines regarding whether dichotomous predictors have to have a certain amount of cases per group. I know SPSS will run the analysis anyway, but I imagine results aren't reliable unless there are say more than 20 cases per group. As you say, "you may get an estimate for the regression coefficient, but the p value will be high". I follow what you're saying in your example about interactions. I thought there would be a simple way to look at it :-) Thank you for your help Jeff. K S Scot _____ From: Jeff <[hidden email]> Reply-To: Jeff <[hidden email]> To: [hidden email] Subject: Re: Multiple Regression & Interactions Date: Mon, 7 Aug 2006 09:35:37 -0600 >At 06:01 AM 8/7/2006, you wrote: >>Does anyone know whether there are sample size requirements for >>dichotomous >>predictors in multiple regression? That is, for the dichotomous >>predictor, >>what is the smallest number of cases per group that is allowed? >> >>Also, I have another query regarding interpreting interaction >>effects in >>SPSS's multiple regression. When a cross product term is created >>by >>multiplying together a predictor which is positively associated >>with the DV >>(e.g., happiness) and a predictor that is negatively associated >>with the DV >>(e.g., depression), would the resulting product term be expected to >>show a >>positive or negative beta coefficient? I'm sure there's a really >>simple >>answer to this. >> >>Thank you in advance. >>K S Scot > > >Regarding the first issue - I'm not sure I understand - >mathematically, the >number of cases doesn't matter as long as it isn't a constant - >e.g., all >cases are in the same group. Practically, if you have a small number >of >cases in one group, you won't be able to accurately examine the >group >differences - i.e., you may get an estimate for the regression >coefficient, >but the p value will be high. > >Regarding the second issue - the sign of the bivariate correlations >doesn't >really matter. What matters is whether there is an interaction >between the >effects (by definition). In other words, let's say Happy-days/month >is >positively related to amount of time spent outside of house/month, >while >depression/month is negatively related. A significant interaction, >for >example, might imply that if there are many depressed days/month, >the >desire to go outside during happy days is reduced. > > > > > > >Jeff _____ Hotmail is evolving - be one of the first to try out the Windows <http://g.msn.com/8HMBENUK/2734??PS=47575> LiveT Mail Beta |
In reply to this post by Stats Q
Hi everyone.
An alternative way to determine your sample size, which is much more reliable than any rule of thumbs (20 participants per group, 10 observations per parameter, and so on) is to perform a Power Analysis. Power can be defined as the probability to reject H0 (your null hypothesis) when it is actually false. In other word, it is the probability to find an existing effect. Power depends on three factors : your decision thresold (namely alpha, much of the time alpha is .05, in other words, a 95% confidence interval), the effect size, and the sample size. If you know (or if you are able to approximate) the effect size, and you choose a particular alpha-level, you will then be able to perform a power analysis in order to determine a recommended sample size given a desired power. Cohen (1988) recommends a power of .80. Let's take an example. Imagine you made an experiment with 2 experimental groups. Each of these groups had 20 participants (classical rule of thumbs : 20 participants per group). The effect you are looking for is expected to have a medium effect size (Cohen's d = .50, which is equivalent to a .243 effect size correlation), your decision thresold is alpha = .05 (which means that if your statistical index is below a certain value corresponding to this thresold, you will not reject H0). Given these informations, power can be calculated. Your 20-participants-per-group means comparison will have a power of .46. If you wish to have a .80 power, your experiment must have about 50 participants per group. This reflexion concerning power is very very close to Hector's answer. If you are looking for more informations concerning power, you may read the following : Cohen, J. (1988). statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum. Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155-159. Regards. Fabrice. **************************** Fabrice Gabarrot PhD Student - Social Psychology University of Geneva - Switzerland On Tue, 8 Aug 2006 16:01:44 -0300, Hector Maletta <[hidden email]> wrote: >Jeff, > >When one value of a dichotomy has a low proportion, the variance of the >variable is LOWER, but it represents a higher proportion of the observed >proportion. We can say the dichotomous variable with the lower proportion >has a lower standard deviation but a higher coefficient of variation. For a >dichotomy, variance is p(1-p), which has a maximum value of 0.25 for p=0.5 >and decreases steadily as p decreases or increases away from 0.5. For p=0.1 >or 0.9 variance is 0.1 x 0.9=0.09. Now, the standard error of the estimate >of this proportion is the square root of p(1-p)/n, where n is sample size. >Suppose your total sample is n=100. If p=0.5 the standard error is the >square root of 0.25/100=0.05. An approximate confidence interval of two STD >errors would be +/- 0.10 around the estimate, i.e. from 0.4 to 0.6. You >cannot be sure whether any of the alternatives is in the majority, but at >least you are pretty certain that none is zero. Now if the observed >proportion of one of the alternatives was p=0.10, the standard error of the >variable would be the square root of 0.09/100=0.03. This is lower than the >previous case in absolute terms (0.03 < 0.05) but larger in relation to the >proportion (0.03/0.10 > 0.05/0.50). An approximate confidence interval of >two standard errors would be 0.10 +/- 0.06 going from 0.04 to 0.16. This >interval does not contain the zero value. With this sample size (100) you >can therefore be approx 95% confident that if the sample proportion is 0.10 >the population proportion is larger than zero. But with a lower total sample >(say with n=50 or n=25) you will probably not (you work it out as an >exercise). So the minimum sample needed depends on the size of the >proportion (0.10 in this example) and the level of confidence desired (95% >in this example). If your sample is not sufficient for 95% confidence, try >90% confidence. It is riskier, of course, but such are the perils of >statistics. There are also some guys around willing to go for less than 90%, >but don't try it at home. It's too dangerous. > > > >Hector > > > > _____ > >De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Stats >Q >Enviado el: Tuesday, August 08, 2006 10:15 AM >Para: [hidden email] >Asunto: Re: Multiple Regression & Interactions > > > >Hi Jeff, > >Re: the first issue. There are guidelines regarding sample size requirements >for multiple regression, but I couldn't find any guidelines regarding >whether dichotomous predictors have to have a certain amount of cases per >group. I know SPSS will run the analysis anyway, but I imagine results >aren't reliable unless there are say more than 20 cases per group. As you >say, "you may get an estimate for the regression coefficient, but the p >value will be high". > >I follow what you're saying in your example about interactions. I thought >there would be a simple way to look at it :-) > >Thank you for your help Jeff. > > >K S Scot > > > _____ > > >From: Jeff <[hidden email]> >Reply-To: Jeff <[hidden email]> >To: [hidden email] >Subject: Re: Multiple Regression & Interactions >Date: Mon, 7 Aug 2006 09:35:37 -0600 >>At 06:01 AM 8/7/2006, you wrote: >>>Does anyone know whether there are sample size requirements for >>>dichotomous >>>predictors in multiple regression? That is, for the dichotomous >>>predictor, >>>what is the smallest number of cases per group that is allowed? >>> >>>Also, I have another query regarding interpreting interaction >>>effects in >>>SPSS's multiple regression. When a cross product term is created >>>by >>>multiplying together a predictor which is positively associated >>>with the DV >>>(e.g., happiness) and a predictor that is negatively associated >>>with the DV >>>(e.g., depression), would the resulting product term be expected to >>>show a >>>positive or negative beta coefficient? I'm sure there's a really >>>simple >>>answer to this. >>> >>>Thank you in advance. >>>K S Scot >> >> >>Regarding the first issue - I'm not sure I understand - >>mathematically, the >>number of cases doesn't matter as long as it isn't a constant - >>e.g., all >>cases are in the same group. Practically, if you have a small number >>of >>cases in one group, you won't be able to accurately examine the >>group >>differences - i.e., you may get an estimate for the regression >>coefficient, >>but the p value will be high. >> >>Regarding the second issue - the sign of the bivariate correlations >>doesn't >>really matter. What matters is whether there is an interaction >>between the >>effects (by definition). In other words, let's say Happy-days/month >>is >>positively related to amount of time spent outside of house/month, >>while >>depression/month is negatively related. A significant interaction, >>for >>example, might imply that if there are many depressed days/month, >>the >>desire to go outside during happy days is reduced. >> >> >> >> >> >> >>Jeff > > > > > _____ > >Hotmail is evolving - be one of the first to try out the Windows ><http://g.msn.com/8HMBENUK/2734??PS=47575> LiveT Mail Beta |
Free forum by Nabble | Edit this page |