Dear members, My linear regression analysis has seven binary predictors, n=47, and (of course) a continuous dependent. The overall regression anova is nonsignificant (F=1.489, p = .200). The confusing is that two out of the seven predictors are significant (p<.05). I dont think there is multicollinearity problem because the collinearity diagnostics statistics seem look fine. For example, no beta coefficients of predictors greater than 1.0; Tolerance of the predictors range between .559 and .814; VIF of predictors range from 1.224 and 1.669; correlation coefficients among predictors are between .009 and .757 but most are below .30. Any comments are welcome. Thank you. E. |
The overall F not being significant should tell you to stop there.
With seven individual predictors each being tested individually, you are multiplying the chances of obtaining 2 significant t tests by chance. In other words, you think you are testing at alpha = .05, but actually are testing with a larger value of alpha. Many researchers correct for this by doing a Bonferroni correction.. Chances are that your significant findings will not be significant once that is done. David Greenberg, Sociology Department, New York U. On Wed, Jan 13, 2016 at 8:45 PM, E. Bernardo <[hidden email]> wrote: > Dear members, > > My linear regression analysis has seven binary predictors, n=47, and (of > course) a continuous dependent. The overall regression anova is > nonsignificant (F=1.489, p = .200). The confusing is that two out of the > seven predictors are significant (p<.05). I dont think there is > multicollinearity problem because the collinearity diagnostics statistics > seem look fine. For example, no beta coefficients of predictors greater than > 1.0; Tolerance of the predictors range between .559 and .814; VIF of > predictors range from 1.224 and 1.669; correlation coefficients among > predictors are between .009 and .757 but most are below .30. > > Any comments are welcome. > > Thank you. > E. > ===================== To manage your subscription to SPSSX-L, send a message > to [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command SIGNOFF SPSSX-L For a list of > commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Dear David, All the seven predictors were entered together into a multiple regression model (using ENTER method). The overall F was nonsignificant at the same time two of the seven predictors were significant (p<.05). Bonferroni correction is out of context in this discussion because all predictors were entered into the model simultaneously. That is, only one multiple regression was analyzed. Thank you. E. On Thursday, January 14, 2016 9:54 AM, David Greenberg <[hidden email]> wrote: The overall F not being significant should tell you to stop there. With seven individual predictors each being tested individually, you are multiplying the chances of obtaining 2 significant t tests by chance. In other words, you think you are testing at alpha = .05, but actually are testing with a larger value of alpha. Many researchers correct for this by doing a Bonferroni correction.. Chances are that your significant findings will not be significant once that is done. David Greenberg, Sociology Department, New York U. On Wed, Jan 13, 2016 at 8:45 PM, E. Bernardo <[hidden email]> wrote: > Dear members, > > My linear regression analysis has seven binary predictors, n=47, and (of > course) a continuous dependent. The overall regression anova is > nonsignificant (F=1.489, p = .200). The confusing is that two out of the > seven predictors are significant (p<.05). I dont think there is > multicollinearity problem because the collinearity diagnostics statistics > seem look fine. For example, no beta coefficients of predictors greater than > 1.0; Tolerance of the predictors range between .559 and .814; VIF of > predictors range from 1.224 and 1.669; correlation coefficients among > predictors are between .009 and .757 but most are below .30. > > Any comments are welcome. > > Thank you. > E. > ===================== To manage your subscription to SPSSX-L, send a message > to > command. To leave the list, send the command SIGNOFF SPSSX-L For a list of > commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
You are totally mistaken. The point is not to do the correction on
the overall regression. That needs no correction. But you are doing 7 tests on the coefficients. Imagine a world in which, in the population, all those coefficients are zero. If you use a nominal alpha of .05 the probability of getting any one estimate significant by chance is 1 in 20, but with 7 tests, the probability of 2 in 7 is elevated. It is quite a bit higher than .05. David Greenberg On Wed, Jan 13, 2016 at 9:23 PM, E. Bernardo <[hidden email]> wrote: > Dear David, > > All the seven predictors were entered together into a multiple regression > model (using ENTER method). The overall F was nonsignificant at the same > time two of the seven predictors were significant (p<.05). Bonferroni > correction is out of context in this discussion because all predictors were > entered into the model simultaneously. That is, only one multiple regression > was analyzed. > > Thank you. > E. > > > On Thursday, January 14, 2016 9:54 AM, David Greenberg <[hidden email]> wrote: > > > The overall F not being significant should tell you to stop there. > With seven individual predictors each being tested individually, you > are multiplying the chances of obtaining 2 significant t tests by > chance. In other words, you think you are testing at alpha = .05, but > actually are testing with a larger value of alpha. Many researchers > correct for this by doing a Bonferroni correction.. Chances are that > your significant findings will not be significant once that is done. > David Greenberg, Sociology Department, New York U. > > On Wed, Jan 13, 2016 at 8:45 PM, E. Bernardo <[hidden email]> > wrote: >> Dear members, >> >> My linear regression analysis has seven binary predictors, n=47, and (of >> course) a continuous dependent. The overall regression anova is >> nonsignificant (F=1.489, p = .200). The confusing is that two out of the >> seven predictors are significant (p<.05). I dont think there is >> multicollinearity problem because the collinearity diagnostics statistics >> seem look fine. For example, no beta coefficients of predictors greater >> than >> 1.0; Tolerance of the predictors range between .559 and .814; VIF of >> predictors range from 1.224 and 1.669; correlation coefficients among >> predictors are between .009 and .757 but most are below .30. >> >> Any comments are welcome. >> >> Thank you. >> E. >> ===================== To manage your subscription to SPSSX-L, send a >> message > >> to > [hidden email] (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command SIGNOFF SPSSX-L For a list of >> commands to manage subscriptions, send the command INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Here's another way to understand what Greenberg is saying:
(1) The overall F test is an omnibus test that in the case of multiple regression is testing whether the multiple R is significantly different from zero. A non-significant R in this situation implies that there is not correlation between the dependent/outcome variable and the predictors (actually, it can also be interpreted as the Pearson r between the actual values of Y and the predicted values of Y (Y-hat)). (2) You have 7 predictors in your equation, each can be evaluated for significance (either the slope b is not equal to zero or the increase in R^2 produced by the predictor is not greater than zero). Each predictor is evaluated with a per comparison alpha (or alpha-pc) p= .05. With 7 predictors, you have 7 tests done each at alpha-pc. But the problem with multiple testing like this is that we have to remember that there is an overall Type I error rate or alpha-overall, which represents the probability of falsely rejecting a true null hypothesis (in this case, correlations are all equal to zero) after doing 7 tests. (3) The formula for alpha-overall = 1 (1 - alpha-pc)^k where k is the number of tests being done -- in this case k = 7 (the ^7 means raised to the power of 7).. If the alpha-pc = 0.05, then alpha-overall is alpha-overall = 1 - (1-.05)^7 = 1 - (.95)^7 = 1 - (0.6983) alpha-overall = 0.3017 In words, after 7 tests there is a 30% chance that one has committed a Type I error. This is often considered to be unacceptably high and people will tend to set alpha-overall = 0.05 which implies that each alpha-pc has to be reduced. One method is to do the following: "corrected" alpha-pc = alpha-overall/k = .05/7 = 0.007. Now, compare the p-value of each predictor in the equation and see if it less than 0.007. It likely that none will be. (4) The Bonferroni correction is the reduction of the Type I error rate or alpha used with a group of tests. The omnibus F of the regression only tells you that there either is significant relationship between the dependent/criterion variable and the independent/predictors or not. In the case of the multiple regression it does not tell you which predictor is involved in the relationship which is why you have to do additional testing (this is a two-stage testing process). As the number of predictors used increases, the probability that one or more of them will statistically significant by chance (Type I errors) increases and this is what one want to guard against. I hope I was clear. -Mike Palij New York University [hidden email] ----- Original Message ----- From: "David Greenberg" <[hidden email]> To: <[hidden email]> Sent: Wednesday, January 13, 2016 9:41 PM Subject: Re: Confusing SPSS outputs in Linear Regression Analysis > You are totally mistaken. The point is not to do the correction on > the overall regression. That needs no correction. But you are doing 7 > tests on the coefficients. Imagine a world in which, in the > population, all those coefficients are zero. If you use a nominal > alpha of .05 the probability of getting any one estimate significant > by chance is 1 in 20, but with 7 tests, the probability of 2 in 7 is > elevated. It is quite a bit higher than .05. David Greenberg > > On Wed, Jan 13, 2016 at 9:23 PM, E. Bernardo > <[hidden email]> wrote: >> Dear David, >> >> All the seven predictors were entered together into a multiple >> regression >> model (using ENTER method). The overall F was nonsignificant at the >> same >> time two of the seven predictors were significant (p<.05). Bonferroni >> correction is out of context in this discussion because all >> predictors were >> entered into the model simultaneously. That is, only one multiple >> regression >> was analyzed. >> >> Thank you. >> E. >> >> >> On Thursday, January 14, 2016 9:54 AM, David Greenberg <[hidden email]> >> wrote: >> >> >> The overall F not being significant should tell you to stop there. >> With seven individual predictors each being tested individually, you >> are multiplying the chances of obtaining 2 significant t tests by >> chance. In other words, you think you are testing at alpha = .05, but >> actually are testing with a larger value of alpha. Many researchers >> correct for this by doing a Bonferroni correction.. Chances are that >> your significant findings will not be significant once that is done. >> David Greenberg, Sociology Department, New York U. >> >> On Wed, Jan 13, 2016 at 8:45 PM, E. Bernardo >> <[hidden email]> >> wrote: >>> Dear members, >>> >>> My linear regression analysis has seven binary predictors, n=47, and >>> (of >>> course) a continuous dependent. The overall regression anova is >>> nonsignificant (F=1.489, p = .200). The confusing is that two out of >>> the >>> seven predictors are significant (p<.05). I dont think there is >>> multicollinearity problem because the collinearity diagnostics >>> statistics >>> seem look fine. For example, no beta coefficients of predictors >>> greater >>> than >>> 1.0; Tolerance of the predictors range between .559 and .814; VIF of >>> predictors range from 1.224 and 1.669; correlation coefficients >>> among >>> predictors are between .009 and .757 but most are below .30. >>> >>> Any comments are welcome. >>> >>> Thank you. >>> E. >>> ===================== To manage your subscription to SPSSX-L, send a >>> message >> >>> to >> [hidden email] (not to SPSSX-L), with no body text except >> the >>> command. To leave the list, send the command SIGNOFF SPSSX-L For a >>> list of >>> commands to manage subscriptions, send the command INFO REFCARD >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> [hidden email] (not to SPSSX-L), with no body text except >> the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> >> >> > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except > the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
All the posts so far have been working from the assumption that all
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
seven of tests are of equal and independent priority and importance. That may be the case. But, in my own clinical research, having more than two or three "most important" hypotheses is what arises when the work is thoroughly exploratory. In other words: This is not what would (for my sort of research, in any case) be a strong experimental design for /testing/ in a known area; rather, it would seem to be a first stab at finding something. Why do two variables correlate above 0.70? This is exceedingly high for dichotomous measures -- where, in fact, the max-corr is limited by the commensurate skew of the marginal distributions. You might report what you have as a thoroughly exploratory result ... though, I would also look at the t-tests as univariate explorations. Not merely exploratory? Then your design has low power. If I had a client bring these data to me, I would suggest that, in order to have any power for "moderate-size" effects with N=47, they need to select one or two primary hypotheses or test a single, a-priori composite score. -- Rich Ulrich |
Rich makes some good points and I'd like to say a few
things in response regarding the process of planning a statistical analysis. ----- Original Message ----- On Thursday, January 14, 2016 3:05 AM, Rich Ulrich wrote: >All the posts so far have been working from the assumption >that all seven of tests are of equal and independent priority >and importance. This is essentially true mainly for the reason that an analysis was already done and the results were looked at and interpreted. Since all 7 variables were entered into a simultaneous model, the researcher HAD to have some justification for doing so instead of building a regression model from the systematic entry of individual or groups of variables (e.g., does var5 add anything to R^2 AFTER vars var1 to var4 are in the equation). After looking at the results of the simultaneous model, one could do such a model building exercise but now one is on a fishing expedition. Instead of answering research questions, one now is trying to better understand the nature of the data and the patterns that may exist among variables. >That may be the case. But, in my own clinical research, having >more than two or three "most important" hypotheses is what >arises when the work is thoroughly exploratory. In other words: >This is not what would (for my sort of research, in any case) be >a strong experimental design for /testing/ in a known area; >rather, it would seem to be a first stab at finding something. Although I am somewhat in agreement with what Rich says above, if one really has a couple/few "most important" hypotheses, then something like "planned comparisons" (in ANOVA, specific difference between specific means; in multiple regression, specific predictors are used in reduced models in contrast to a full simultaneous model). The amount of knowledge that one has about the phenomenon that one has data on helps to determine what types of statistical analysis and tests one will do. When one has limited knowledge (great ignorance), then using a two-stage process (first, an omnibus test and, if significant, multiple comparisons of some sort) is likely to be used. When one has greater knowledge and is concerned with only a few specific sets of relationships or models, then planned comparisons or testing specific patterns of relationship among variables in either regression analysis or structural equation modeling (the latter shouldn't be done for a sample N=47). In my own experience in analyzing clinical research data, I find that sometimes the researcher is knowledgable, sometimes has no idea what is going on (in more senses than one). In the second case, the researcher may go fishing and do all sorts of additional analyses that were not initially planned (these analyses are driven by not obtaining results one expected) but may be written up as though it were part of the analysis plan. For example, a study of how a clinical intervention (e.g., cognitive behavior therapy or a drug) affects a group of people (e..g., a people with a clinical diagnosis of depression). One does the study and finds no significant effect (e.g., no change between pre and post intervention, no difference between intervention and placebo/control/reference group). This usually causes dissatisfaction in various people (e.g., the principal investigator, the funding agency, etc.), and one way to deal with this is to TRY to find some risgnificant result. So, the researcher may ask that the participants be divided into three groups on the basis of severity of condition (e.g., in the case of depression, these would be low levels of clinical depression, moderate, or high). One does additional analyses and, lo and behold, the high depression group shows an effect but the other goups don't. The problem here is treating this result as though one planned to do it instead of a tentative result/hypothesis that requires new data in order to show that it is an actual result and not a Type I error because one has done a lot of additional testing. >Why do two variables correlate above 0.70? This is exceedingly >high for dichotomous measures -- where, in fact, the max-corr is >limited by the commensurate skew of the marginal distributions. Just to add what Rich says above, the maximum r (phi coefficient) is determined by the proportions in the two dichotomous variables. If we use the values of 0 and 1 for each of two variables, the maximum phi is obtained when the prop(X1 = 1) = prop(X2 = 1), meaning that the phi coefficient has an upper bound of +1.00 and a lower bound of -1.00, like the ordinary Pearson r. But if prop(X1=1) not equal prop(X2=1), this is no longer true and the maimum phi falls into a smaller interval Guilford and Fruchter is just one source on this point (right now I'm using the 5th ed [1973]of their "Fundamental Statistics in Pyshcology and Education", pp306-310, but other sources also provide this info). The maximum value:of phi can be calculated by the following equation (G&F, eqn 14.24, p309) max phi = sqrt[ (p1/q1) * (p2/q2)] where p1 is the proportion of X1 = 1, q1 is the proportion of X1=0, similarly, p2 is proportion X2 = 1, q2 is proportion X2 = 0. Table 14.9 on G&F's p308 shows what happens to phi when p1 = .50 but p2 takes on different values. All this suggests that one should examine the 2x2 table for the two variables involved. >You might report what you have as a thoroughly exploratory result ... >though, I would also look at the t-tests as univariate explorations. >Not merely exploratory? Then your design has low power. Remember that power calculation should be done BEFORE the analysis of data and not used as an excuse AFTER you have the results. Prospective power analysis commits one to specifying the effect sizes one believes exists or at least is interested in. A source on the "evils" of retrospective power analyis is Hoenig, J. M., & Heisey, D. M. (2001). The abuse of power. The American Statistician, 55(1). But one can do a scholar.google.com on the distinction of prospective vs retrospective power analysis and the problems associated with the latter. Ultimately, the problem is that the researcher has no f'n clue about the probability distributions the data have. >If I had a client bring these data to me, I would suggest that, >in order to have any power for "moderate-size" effects with >N=47, they need to select one or two primary hypotheses or >test a single, a-priori composite score. If I had a client come to me with these data, I'd send them to Rich. ;-) -Mike Palij New York University [hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |