Hi everyone,
One of our post-docs is having troubles with a regression model he is trying to run. He is trying to predict outcome in babies based on some pregnancy variables from the mothers collected during gestation. He has 75 participants. He has entered four variables to control for on the first step, and then two other predictors on the 2nd step. So we're trying to see if these two predictors are significant above and beyond the four variables we are controlling for on the first step of the regression. The F change for adding these 2 predictors on Step 2 is significant - and the R2 change is .19 so not big, but not bad. The problem is the overall regression model is not significant. He also has an interaction between baby's gender and outcome. Is this possible? Might it be a problem with multicollinearity? Since it's a pilot study, should we be sticking with reporting partial correlations between pregnancy variables and baby's outcome variable - and partially out the variables we've entered on the first step of the regression? Any ideas greatly appreciated!! Many thanks for the help! Susan |
Was the first model with 4 variables significant? Dr. Paul R. Swank, Professor and Director of Research Children's Learning Institute University of Texas Health Science Center-Houston From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of S Crawford Hi everyone, |
In reply to this post by sgthomson99
Hi Susan,
Some things to try:
1. Assess the bivariate associations between all of your predictor
variables to see if you have some strong correlations among your
predictors.
2. Enter in one variable, the one you consider to be the most
important, then enter in each variable by itself with this one variable and
assess the change in the coefficient of your most important variable. This can
give you an idea of the effect of one variable on another
2. Look at your tolerance to see if you have
multicollinearity From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of S Crawford Sent: Tuesday, March 29, 2011 10:51 AM To: [hidden email] Subject: significant F change, but nonsignificant regression model overall One of our post-docs is having troubles with a regression model he is trying to run. He is trying to predict outcome in babies based on some pregnancy variables from the mothers collected during gestation. He has 75 participants. He has entered four variables to control for on the first step, and then two other predictors on the 2nd step. So we're trying to see if these two predictors are significant above and beyond the four variables we are controlling for on the first step of the regression. The F change for adding these 2 predictors on Step 2 is significant - and the R2 change is .19 so not big, but not bad. The problem is the overall regression model is not significant. He also has an interaction between baby's gender and outcome. Is this possible? Might it be a problem with multicollinearity? Since it's a pilot study, should we be sticking with reporting partial correlations between pregnancy variables and baby's outcome variable - and partially out the variables we've entered on the first step of the regression? Any ideas greatly appreciated!! Many thanks for the help! Susan |
In reply to this post by sgthomson99
> Date: Tue, 29 Mar 2011 17:51:03 +0000
> From: [hidden email] > Subject: significant F change, but nonsignificant regression model overall > To: [hidden email] > > Hi everyone, > One of our post-docs is having troubles with a regression model he is > trying to run. He is trying to predict outcome in babies based on some > pregnancy variables from the mothers collected during gestation. He > has 75 participants. He has entered four variables to control for on > the first step, and then two other predictors on the 2nd step. So > we're trying to see if these two predictors are significant above and > beyond the four variables we are controlling for on the first step of > the regression. > > The F change for adding these 2 predictors on Step 2 is significant - > and the R2 change is .19 so not big, but not bad. The problem is the > overall regression model is not significant. He also has an > interaction between baby's gender and outcome. > > Is this possible? Might it be a problem with multicollinearity? > Assuming that the description is accurate, Paul Swank has pointed at the issue - the first four variables were not significant. But for a designed test of the two, their impact is thoroughly irrelevant. I repeat, the overall test of the regression is totally irrelevant, because the post-doc designated, at the start, the test of two variables. I am unsure of what you mean by saying "interaction" between gender and outcome. "Outcome" usually means "what is being predicted". In a different style of testing, "interaction with outcome" denotes a main effect for one predictor (here: gender). In regression, "interaction" usually refers to a cross-product of two predictors, not Predictor and Outcome. My tentative conclusion is that your statement about Gender is a summary of what shows up among the first 4 variables entered. Should you take it seriously? - Well, it was specified a-priori as a control variable. I don't know what he wants to say about the 4 control variables, but that is a separate matter for discussion. The designed test was positive. -- Rich Ulrich ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by Swank, Paul R
Hello Susan. To expand on what Paul asked, please report the R-squared values and F-tests for both models. My guess is that the R-squared value for model 1 is quite low and not statistically significant.
Also, you said there is an interaction between baby's gender and outcome. I'm not sure what that means. Interactions involve two or more explanatory variables, not an explanatory variable and the outcome variable. Please clarify what you mean. Cheers, Bruce
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Rich Ulrich
My apologies to everyone for mixing up the interaction. I was in too much of a rush and definitely should have had our post-doc email the group with his questions.
The interaction is between baby's gender and one of the predictors. For Model 1 with the 4 covariates entered, the Multiple R is .5, and F test is not significant. For Model 2 where he entered the 4 covariates on Step 1 and the 2 variables he is most interested in on Step 2, the Multiple R is .6 and F test is still not significant. But a priori, he was most interested in the 2 variables entered on Step 2 - and this is where the F change is significant. One of the two variables is significant on Step 2. So can he focus on the F change being significant and ignore the fact that the overall model is not significant and ignore the fact that the 4 variables on Step 1 were not significant either? Thanks so much. Susan > Date: Tue, 29 Mar 2011 14:29:26 -0400 > From: [hidden email] > Subject: Re: significant F change, but nonsignificant regression model overall > To: [hidden email] > > > Date: Tue, 29 Mar 2011 17:51:03 +0000 > > From: [hidden email] > > Subject: significant F change, but nonsignificant regression model overall > > To: [hidden email] > > > > Hi everyone, > > One of our post-docs is having troubles with a regression model he is > > trying to run. He is trying to predict outcome in babies based on some > > pregnancy variables from the mothers collected during gestation. He > > has 75 participants. He has entered four variables to control for on > > the first step, and then two other predictors on the 2nd step. So > > we're trying to see if these two predictors are significant above and > > beyond the four variables we are controlling for on the first step of > > the regression. > > > > The F change for adding these 2 predictors on Step 2 is significant - > > and the R2 change is .19 so not big, but not bad. The problem is the > > overall regression model is not significant. He also has an > > interaction between baby's gender and outcome. > > > > Is this possible? Might it be a problem with multicollinearity? > > > > Assuming that the description is accurate, Paul Swank has > pointed at the issue - the first four variables were not significant. > > But for a designed test of the two, their impact is thoroughly irrelevant. > I repeat, the overall test of the regression is totally irrelevant, > because the post-doc designated, at the start, the test of two variables. > > > I am unsure of what you mean by saying "interaction" between gender > and outcome. "Outcome" usually means "what is being predicted". > In a different style of testing, "interaction with outcome" denotes > a main effect for one predictor (here: gender). > > In regression, "interaction" usually refers to a cross-product of > two predictors, not Predictor and Outcome. My tentative conclusion > is that your statement about Gender is a summary of what shows up > among the first 4 variables entered. Should you take it seriously? > - Well, it was specified a-priori as a control variable. I don't > know what he wants to say about the 4 control variables, but that is > a separate matter for discussion. The designed test was positive. > > -- > Rich Ulrich > > > > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD |
Before making a serious recommendation, I
think that I would like
to know more about the data. However, I
would be hesitant about
reporting the "significant" change in F in the
context of nonsignificant
models. Consider the following
analog: a person conducts a one-way
ANOVA with six levels. The overall ANOVA
is not significant but
post hoc testing reveals one contrast between
means to be significant.
Should one report the one significant post hoc
test and ignore or
downplay the non-significant ANOVA? My
advice here would be
unless you planned on doing just that
particular contrast (in which
case why was the ANOVA and other tests done?)
you should treat
it as a spuriously significant result (i.e.,
the result of doing too many
tests -- one is bound to get a significant
result on a purely chance
basis with enough tests). The situation
described below is not exactly
like this but it does suggest that one should
be suspicious about
the significant change in F. I would
want to know more about the
distributions of the individual variables, the
correlations between
the variables as well as examine their
scatterplots, and take a closer
look at how the difference for R= 0.00
(model 1) minus R=0.00 (model 2)
gives rise to a non-zero difference.
-Mike Palij
New York University
|
I will be out of office on March 29 afternoon from 1pm. I will have very limited access to email. If you need immediate assistance please
contact 479-575-2905. Thank you. |
In reply to this post by Mike
Mike,
You seem to have missed the comment, He has entered four variables to control for on > > > the first step, and then two other predictors on the 2nd step. So > > > we're trying to see if these two predictors are significant above and > > > beyond the four variables we are controlling for on the first step of > > > the regression. ________________________________ > Date: Tue, 29 Mar 2011 15:51:48 -0400 > From: [hidden email] > Subject: Re: significant F change, but nonsignificant regression model > overall > To: [hidden email] > > Before making a serious recommendation, I think that I would like > to know more about the data. However, I would be hesitant about > reporting the "significant" change in F in the context of nonsignificant > models. Consider the following analog: a person conducts a one-way > ANOVA with six levels. The overall ANOVA is not significant but > post hoc testing reveals one contrast between means to be significant. [snip, rest of irrelevant example, and more] It's a designed test, so the other results are irrelevant. -- Rich Ulrich ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Swank, Paul R
Hi Susan,
If this is a pilot study, can I say that you will have more data in the actual study? If profiling is necessary and you would like to avoid interaction, you might want to try C5 or other decision trees that profile with a dependent variable. This might not be the best way but do note that data mining is data driven and you might require at least 300 data so that the data mining model could identify patterns in the data file. Warmest Regards Dorraj Oet Date: Tue, 29 Mar 2011 12:58:22 -0500 From: [hidden email] Subject: Re: significant F change, but nonsignificant regression model overall To: [hidden email] Was the first model with 4 variables significant?
Dr. Paul R. Swank, Professor and Director of Research Children's Learning Institute University of Texas Health Science Center-Houston
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of S Crawford
Hi everyone, |
In reply to this post by Rich Ulrich
On Tuesday, March 29, 2011 11:36 pm, Rich Ulrich wrote:
> > Mike, > You seem to have missed the comment, > >>>> He has entered four variables to control for on >>>> the first step, and then two other predictors on the 2nd step. So >>>> we're trying to see if these two predictors are significant above and >>>> beyond the four variables we are controlling for on the first step of >>>> the regression. No, I didn't miss this comment. Let's review what we might know about the situation (at least from my perspective): (1) The analyst is doing setwise regression, comparable to an ANCOVA, entering 4 variables/covariates as the first set. As mentioned elsewhere, these covariates are NOT significantly related to the dependent variable. This implies that the multiple correlation and its squared version are zero, or R1=0.00. One could, I think, legitimately ask why did one continue to use these as covariates or keep them in the model when the second set was entered -- one argument could be based on the expectation that there is a supressor relationship among the predictors but until we hear from the person who actually ran the analysis, I don't believe this was the strategy. (2) After the second set of predictors were entered there still was NO significant relationship between the predictors and the dependent variable. So, for this model R and R^2 are both equal to zero or R2=0.00 (3) There is a "significant increase in R^2" (F change) when the second set of predictors was entered. This has me puzzled. It is not clear to me why or how this could occur. If R1(set 1/model 1)=0.00 and R2(set 2/model 2)=0.00, then why would R2-R1 != 0.00? I suspect that maybe there really is a pattern of relationships present but that there is insufficient statistical power to detect them (the researcher either needs to get more subjects or better measurements). There may be other reasons but I think one needs to examine the data in order to figure out (one explanation is that it is just a Type I error). Rich, how would you explain what happens in (3) above? -Mike Palij New York University [hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
The problem is one of sample size. The original control variables do not result in a significant model. Does this mean they have no effect? No, it means you don't have enough power to detect that size effect. It may be that the effect size is worrisome enough to demand control even in the absence of significance. If the question really is do x5 and x6 really predict over and above x1 through x4, then they should probably be included. However, if x5 and x6 add significantly to x1-x4, then the fact that x1- x4 do not account for significant variability can pull down the R squared for the full model. Given this is a pilot study, I think we might be okay saying that x5 and x6 do predict significantly over and above x1 - x4 but I would look more at effect size here than statistical significance. The pilot study should be to estimate effect size so that we can design a suitably large full scale study to actually test the hypothesis.
Dr. Paul R. Swank, Professor and Director of Research Children's Learning Institute University of Texas Health Science Center-Houston -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Michael Palij Sent: Wednesday, March 30, 2011 6:34 AM To: [hidden email] Subject: Re: significant F change, but nonsignificant regression model overall On Tuesday, March 29, 2011 11:36 pm, Rich Ulrich wrote: > > Mike, > You seem to have missed the comment, > >>>> He has entered four variables to control for on >>>> the first step, and then two other predictors on the 2nd step. So >>>> we're trying to see if these two predictors are significant above and >>>> beyond the four variables we are controlling for on the first step of >>>> the regression. No, I didn't miss this comment. Let's review what we might know about the situation (at least from my perspective): (1) The analyst is doing setwise regression, comparable to an ANCOVA, entering 4 variables/covariates as the first set. As mentioned elsewhere, these covariates are NOT significantly related to the dependent variable. This implies that the multiple correlation and its squared version are zero, or R1=0.00. One could, I think, legitimately ask why did one continue to use these as covariates or keep them in the model when the second set was entered -- one argument could be based on the expectation that there is a supressor relationship among the predictors but until we hear from the person who actually ran the analysis, I don't believe this was the strategy. (2) After the second set of predictors were entered there still was NO significant relationship between the predictors and the dependent variable. So, for this model R and R^2 are both equal to zero or R2=0.00 (3) There is a "significant increase in R^2" (F change) when the second set of predictors was entered. This has me puzzled. It is not clear to me why or how this could occur. If R1(set 1/model 1)=0.00 and R2(set 2/model 2)=0.00, then why would R2-R1 != 0.00? I suspect that maybe there really is a pattern of relationships present but that there is insufficient statistical power to detect them (the researcher either needs to get more subjects or better measurements). There may be other reasons but I think one needs to examine the data in order to figure out (one explanation is that it is just a Type I error). Rich, how would you explain what happens in (3) above? -Mike Palij New York University [hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
"Swank, Paul R"
On Wednesday, March 30, 2011 11:06 AM, Paul Swank wrote: >The problem is one of sample size. The original control variables >do not result in a significant model. Does this mean they have no >effect? No, it means you don't have enough power to detect that >size effect. A null result implies two possible conditions: (1) The null hypothesis is true. (2) The null hypothesis is false but there is insufficient power to reject it. Which one of the above conditions one chooses to believe depends on a bunch of factors, such as previous research that is comparable to the current study -- if this is truly a pilot study where no one has done something like this before, then condition (1) is, I think, the more prudent choice. However, given peoples' cognitive biases, including the sunk cost effect, one is probably loathe to entertain codnition (1) because no wants one research to support null results outside of SEM or other modeling situations. Of course, as you point out below, one way to determine which condition is most consistent with the evidence is to (a) define a specific effect size that one wants to detect, (b) specify a specific level of statistical power (say, between .80 to .95), and (c) then identify the sample size needed to detect the specified effect size. One might politely ask if this was done before the collection of this data, a good practice that is more often observed in the breach. Perhaps one should read Jack Cohen's writings before going to sleep at night to remember what good research conduct is. If one did, then there would be less ambiguity about which of the two conditions above holds. If one knows what effect size one wants to detect and one has appropriate power (say, .95-.99), then a null result is clearly more consistent with condition (1). A retrospective power analysis is clearly indicated if one believes that condition (2) holds. -Mike Palij New York University [hidden email] >It may be that the effect size is worrisome enough to demand control >even in the absence of significance. If the question really is do x5 and >x6 really predict over and above x1 through x4, then they should >probably be included. However, if x5 and x6 add significantly to x1-x4, >then the fact that x1- x4 do not account for significant variability can pull >down the R squared for the full model. Given this is a pilot study, I think >we might be okay saying that x5 and x6 do predict significantly over >and above x1 - x4 but I would look more at effect size here than statistical >significance. The pilot study should be to estimate effect size so that we >can design a suitably large full scale study to actually test the hypothesis. > >Dr. Paul R. Swank, >Professor and Director of Research >Children's Learning Institute >University of Texas Health Science Center-Houston -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Michael Palij Sent: Wednesday, March 30, 2011 6:34 AM To: [hidden email] Subject: Re: significant F change, but nonsignificant regression model overall On Tuesday, March 29, 2011 11:36 pm, Rich Ulrich wrote: > > Mike, > You seem to have missed the comment, > >>>> He has entered four variables to control for on >>>> the first step, and then two other predictors on the 2nd step. So >>>> we're trying to see if these two predictors are significant above and >>>> beyond the four variables we are controlling for on the first step of >>>> the regression. No, I didn't miss this comment. Let's review what we might know about the situation (at least from my perspective): (1) The analyst is doing setwise regression, comparable to an ANCOVA, entering 4 variables/covariates as the first set. As mentioned elsewhere, these covariates are NOT significantly related to the dependent variable. This implies that the multiple correlation and its squared version are zero, or R1=0.00. One could, I think, legitimately ask why did one continue to use these as covariates or keep them in the model when the second set was entered -- one argument could be based on the expectation that there is a supressor relationship among the predictors but until we hear from the person who actually ran the analysis, I don't believe this was the strategy. (2) After the second set of predictors were entered there still was NO significant relationship between the predictors and the dependent variable. So, for this model R and R^2 are both equal to zero or R2=0.00 (3) There is a "significant increase in R^2" (F change) when the second set of predictors was entered. This has me puzzled. It is not clear to me why or how this could occur. If R1(set 1/model 1)=0.00 and R2(set 2/model 2)=0.00, then why would R2-R1 != 0.00? I suspect that maybe there really is a pattern of relationships present but that there is insufficient statistical power to detect them (the researcher either needs to get more subjects or better measurements). There may be other reasons but I think one needs to examine the data in order to figure out (one explanation is that it is just a Type I error). Rich, how would you explain what happens in (3) above? -Mike Palij New York University [hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Actually the null is never true. Sometimes it is not very false.
Dr. Paul R. Swank, Professor and Director of Research Children's Learning Institute University of Texas Health Science Center-Houston -----Original Message----- From: Mike Palij [mailto:[hidden email]] Sent: Wednesday, March 30, 2011 11:34 AM To: Swank, Paul R; [hidden email] Cc: Mike Palij Subject: Re: significant F change, but nonsignificant regression model overall "Swank, Paul R" On Wednesday, March 30, 2011 11:06 AM, Paul Swank wrote: >The problem is one of sample size. The original control variables >do not result in a significant model. Does this mean they have no >effect? No, it means you don't have enough power to detect that >size effect. A null result implies two possible conditions: (1) The null hypothesis is true. (2) The null hypothesis is false but there is insufficient power to reject it. Which one of the above conditions one chooses to believe depends on a bunch of factors, such as previous research that is comparable to the current study -- if this is truly a pilot study where no one has done something like this before, then condition (1) is, I think, the more prudent choice. However, given peoples' cognitive biases, including the sunk cost effect, one is probably loathe to entertain codnition (1) because no wants one research to support null results outside of SEM or other modeling situations. Of course, as you point out below, one way to determine which condition is most consistent with the evidence is to (a) define a specific effect size that one wants to detect, (b) specify a specific level of statistical power (say, between .80 to .95), and (c) then identify the sample size needed to detect the specified effect size. One might politely ask if this was done before the collection of this data, a good practice that is more often observed in the breach. Perhaps one should read Jack Cohen's writings before going to sleep at night to remember what good research conduct is. If one did, then there would be less ambiguity about which of the two conditions above holds. If one knows what effect size one wants to detect and one has appropriate power (say, .95-.99), then a null result is clearly more consistent with condition (1). A retrospective power analysis is clearly indicated if one believes that condition (2) holds. -Mike Palij New York University [hidden email] >It may be that the effect size is worrisome enough to demand control >even in the absence of significance. If the question really is do x5 and >x6 really predict over and above x1 through x4, then they should >probably be included. However, if x5 and x6 add significantly to x1-x4, >then the fact that x1- x4 do not account for significant variability can pull >down the R squared for the full model. Given this is a pilot study, I think >we might be okay saying that x5 and x6 do predict significantly over >and above x1 - x4 but I would look more at effect size here than statistical >significance. The pilot study should be to estimate effect size so that we >can design a suitably large full scale study to actually test the hypothesis. > >Dr. Paul R. Swank, >Professor and Director of Research >Children's Learning Institute >University of Texas Health Science Center-Houston -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Michael Palij Sent: Wednesday, March 30, 2011 6:34 AM To: [hidden email] Subject: Re: significant F change, but nonsignificant regression model overall On Tuesday, March 29, 2011 11:36 pm, Rich Ulrich wrote: > > Mike, > You seem to have missed the comment, > >>>> He has entered four variables to control for on >>>> the first step, and then two other predictors on the 2nd step. So >>>> we're trying to see if these two predictors are significant above and >>>> beyond the four variables we are controlling for on the first step of >>>> the regression. No, I didn't miss this comment. Let's review what we might know about the situation (at least from my perspective): (1) The analyst is doing setwise regression, comparable to an ANCOVA, entering 4 variables/covariates as the first set. As mentioned elsewhere, these covariates are NOT significantly related to the dependent variable. This implies that the multiple correlation and its squared version are zero, or R1=0.00. One could, I think, legitimately ask why did one continue to use these as covariates or keep them in the model when the second set was entered -- one argument could be based on the expectation that there is a supressor relationship among the predictors but until we hear from the person who actually ran the analysis, I don't believe this was the strategy. (2) After the second set of predictors were entered there still was NO significant relationship between the predictors and the dependent variable. So, for this model R and R^2 are both equal to zero or R2=0.00 (3) There is a "significant increase in R^2" (F change) when the second set of predictors was entered. This has me puzzled. It is not clear to me why or how this could occur. If R1(set 1/model 1)=0.00 and R2(set 2/model 2)=0.00, then why would R2-R1 != 0.00? I suspect that maybe there really is a pattern of relationships present but that there is insufficient statistical power to detect them (the researcher either needs to get more subjects or better measurements). There may be other reasons but I think one needs to examine the data in order to figure out (one explanation is that it is just a Type I error). Rich, how would you explain what happens in (3) above? -Mike Palij New York University [hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Mike
> Date: Wed, 30 Mar 2011 07:34:20 -0400
> From: [hidden email] > Subject: Re: significant F change, but nonsignificant regression model overall > To: [hidden email] > > On Tuesday, March 29, 2011 11:36 pm, Rich Ulrich wrote: > > > > Mike, > > You seem to have missed the comment, > > > >>>> He has entered four variables to control for on > >>>> the first step, and then two other predictors on the 2nd step. So > >>>> we're trying to see if these two predictors are significant above and > >>>> beyond the four variables we are controlling for on the first step of > >>>> the regression. > > No, I didn't miss this comment. Let's review what we might know about > the situation (at least from my perspective): > > (1) The analyst is doing setwise regression, comparable to an ANCOVA, > entering 4 variables/covariates as the first set. As mentioned elsewhere, > these covariates are NOT significantly related to the dependent variable. Mike, No, they are not "doing setwise regression", whatever that new phrase means, if that is what you intended. And they are certainly not doing Stepwise regression, which is what you seem to discuss later. The analysis used an intentional, pre-designated order of entry of terms. The statistical tool was a regression program. There were 4 variables which were "controlled for", as explicitly described. - That analysis tests two variable, with 4 variables being "controlled for." An analogous ANCOVA would be a two-factor design with 4 covariates, where the covariates are included for ... whatever purposes. In more detail -- The "whole ANOVA" will have a test with d.f.= (4+groups-1). The test (or tests) on the two factors do *not* rely on the covariates being either significant or not-significant. If you control for something highly correlated with outcome (pre-scores, often), the covariates are highly significant. If you control for "nuisance" variables, you hope that the nuisance variables do not have much effect because that complicates interpretations. But in either case, you do *not* use the overall test on (covariates + hypotheses) as a guide to the inference on hypotheses. [snip, description of a "stepwise" process; irrelevant to discussion of testing taken as a defined hierarchy.] -- Rich Ulrich ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by Mike
After a few tries, I mimicked this result (more or less) with some randomly generated data and 60 cases.
* Generate data . * X1 to X6 are random numbers. * Only X5 and X6 are related to Y. numeric Y x1 to x6 (f8.2). do repeat x = x1 to x6. - compute x = rv.normal(50,10). end repeat. compute Y = 50 + .2*x5 + .4*x6 + rv.normal(0,15). exe. REGRESSION /STATISTICS COEFF OUTS R ANOVA CHANGE /DEPENDENT Y /METHOD=ENTER x1 to x4 /METHOD=ENTER x5 x6. Model 1: R-sq = 0.027, F(4, 55) = .388, p = .817 Model 2: R-sq = 0.186, F(6, 53) = 2.014, p = .080 Change in R-sq = 0.158, F(2, 53) = 5.15, p = .009 When the goal is to control for potential confounders, one sometimes sees the steps reversed, with the variable (or variables) of main interest entered first, and the potential confounders added on the next step. This is commonly done with logistic regression, for example, where crude and adjusted odds ratios are reported (from models 1 and 2 respectively). For the data above, here's what I get when I do it that way: Model 1: R-sq = 0.148, F(2, 57) = 4.939, p = .011 Model 2: R-sq = 0.186, F(6, 53) = 2.014, p = .080 Change in R-sq = 0.038, F(4, 53) = 0.618, p = .652 Even though the change in R-sq is clearly not significant, I like to compare (via the eyeball test) the coefficients for X5 and X6 in the two models. If there is no confounding, then the values should be pretty similar in the two models. Model Variable B SE p 1 X5 .448 .190 .022 X6 .432 .202 .036 2 X5 .522 .210 .016 X6 .459 .210 .033 Mike, would you be any happier with this second approach to the analysis? Cheers, Bruce
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Rich Ulrich
On Wednesday, March 30, 2011 2:55 PM, Rich Ulrich wrote:
>Mike Palij wrote: >> No, I didn't miss this comment. Let's review what we might know about >> the situation (at least from my perspective): >> >> (1) The analyst is doing setwise regression, comparable to an ANCOVA, >> entering 4 variables/covariates as the first set. As mentioned elsewhere, >> these covariates are NOT significantly related to the dependent variable. > >Mike, >No, they are not "doing setwise regression", whatever that new >phrase means, if that is what you intended. That "new phrase" can be found in Cohen and Cohen (1975) in their Chapter 4 "Sets of Independent Variables". Of particular relevance is section 4.2 "The simultaneous and hierarchical models for sets". What you and the OP described was a hierarchical or sequential setwise regression analysis. See pp127-144 if you have a copy handy. If anything, you should say "whatever that arcane phrase means". As for your description of the analysis, do you really keep variables that don't provide any useful information in the equation? I hope you report shrunken or adjusted R^2 when you report your results because they should be considerably smaller than R^2 as a result of the additional useless predictors. It should give a person pause. -Mike Palij New York University [hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by Bruce Weaver
Here are some of those results again, with adjusted R-squared values added (in response to Mike's comment in another post).
METHOD 1: Entering the 4 confounders first, then the 2 variables of interest Model 1: R-sq = 0.027, F(4, 55) = .388, p = .817 Adj R-sq = -.043 Model 2: R-sq = 0.186, F(6, 53) = 2.014, p = .080 Adj R-sq = .093 Change in R-sq = 0.158, F(2, 53) = 5.15, p = .009 METHOD 2: Entering the 2 variables of interest first, then the 4 confounders Model 1: R-sq = 0.148, F(2, 57) = 4.939, p = .011 Adj R-sq = .118 Model 2: R-sq = 0.186, F(6, 53) = 2.014, p = .080 Adj R-sq = .093 Change in R-sq = 0.038, F(4, 53) = 0.618, p = .652 HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Bruce Weaver
Bruce,
A few points: (1) The OP said the following: |For Model 2 where he entered the 4 covariates on Step 1 and |the 2 variables he is most interested in on Step 2, the Multiple R |is .6 and F test is still not significant. But a priori, he was most |interested in the 2 variables entered on Step 2 - and this is |where the F change is significant. One of the two variables is |significant on Step 2. In your example below both X5 and X6 are significantly related to the dep var but to make it really relevant only one of these should be significant. This may or may not make a difference, depending upon the constraints the data put on the range of allowable values. (2) I don't like the second method of entering the nuisance variables after the critical variables, I still think that it is a foolish thing to do so because (a) it adds no useful information and (b) make the model nonsignificant -- the model with only X5 and X6 is clearly better. As for the significant increase in R^2, I suggest you look at the difference between adjusted R^2 -- that should be much smaller because of the penalty of having 6 predictors in the second model. (3) The goals of doing an ANCOVA have traditionally been (a) reduce the error variance by removing the variance in it that is associated with the covariate (no association, no reduction in error variance, thus no point in keeping the covariates) and (b) if the groups in the ANOVA have different means on the covariate, the ANCOVA adjusts the means to compensate for difference on the covariates. If one has a copy of Howell's 7th ed Stat Methods for Psych, the material on pages 598-609. Both of these require the covariates to be entered first (indeed, in ANCOVA terms, entering the covariates after the regular ANOVA would be bizarre). In the ANCOVA context, keeping nonsignificant covariates makes no sense. -Mike Palij New York University [hidden email] ----- Original Message ----- From: "Bruce Weaver" <[hidden email]> To: <[hidden email]> Sent: Wednesday, March 30, 2011 3:35 PM Subject: Re: significant F change, but nonsignificant regression model overall > After a few tries, I mimicked this result (more or less) with some randomly > generated data and 60 cases. > > * Generate data . > * X1 to X6 are random numbers. > * Only X5 and X6 are related to Y. > > numeric Y x1 to x6 (f8.2). > do repeat x = x1 to x6. > - compute x = rv.normal(50,10). > end repeat. > compute Y = 50 + .2*x5 + .4*x6 + rv.normal(0,15). > exe. > > REGRESSION > /STATISTICS COEFF OUTS R ANOVA CHANGE > /DEPENDENT Y > /METHOD=ENTER x1 to x4 > /METHOD=ENTER x5 x6. > > Model 1: R-sq = 0.027, F(4, 55) = .388, p = .817 > Model 2: R-sq = 0.186, F(6, 53) = 2.014, p = .080 > Change in R-sq = 0.158, F(2, 53) = 5.15, p = .009 > > When the goal is to control for potential confounders, one sometimes sees > the steps reversed, with the variable (or variables) of main interest > entered first, and the potential confounders added on the next step. This > is commonly done with logistic regression, for example, where crude and > adjusted odds ratios are reported (from models 1 and 2 respectively). For > the data above, here's what I get when I do it that way: > > Model 1: R-sq = 0.148, F(2, 57) = 4.939, p = .011 > Model 2: R-sq = 0.186, F(6, 53) = 2.014, p = .080 > Change in R-sq = 0.038, F(4, 53) = 0.618, p = .652 > > Even though the change in R-sq is clearly not significant, I like to compare > (via the eyeball test) the coefficients for X5 and X6 in the two models. If > there is no confounding, then the values should be pretty similar in the two > models. > > Model Variable B SE p > 1 X5 .448 .190 .022 > X6 .432 .202 .036 > 2 X5 .522 .210 .016 > X6 .459 .210 .033 > > > Mike, would you be any happier with this second approach to the analysis? > > Cheers, > Bruce > > > > Mike Palij wrote: >> >> On Tuesday, March 29, 2011 11:36 pm, Rich Ulrich wrote: >> > >> > Mike, >> > You seem to have missed the comment, >> > >> >>>> He has entered four variables to control for on >> >>>> the first step, and then two other predictors on the 2nd >> step. So >> >>>> we're trying to see if these two predictors are >> significant above and >> >>>> beyond the four variables we are controlling for on the >> first step of >> >>>> the regression. >> >> No, I didn't miss this comment. Let's review what we might know about >> the situation (at least from my perspective): >> >> (1) The analyst is doing setwise regression, comparable to an ANCOVA, >> entering 4 variables/covariates as the first set. As mentioned elsewhere, >> these covariates are NOT significantly related to the dependent variable. >> This implies that the multiple correlation and its squared version are >> zero, >> or R1=0.00. One could, I think, legitimately ask why did one continue to >> use these as covariates or keep them in the model when the second set >> was entered -- one argument could be based on the expectation that >> there is a supressor relationship among the predictors but until we hear >> from the person who actually ran the analysis, I don't believe this was >> the strategy. >> >> (2) After the second set of predictors were entered there still was NO >> significant relationship between the predictors and the dependent >> variable. >> So, for this model R and R^2 are both equal to zero or R2=0.00 >> >> (3) There is a "significant increase in R^2" (F change) when the >> second >> set of predictors was entered. This has me puzzled. It is not clear to >> me why or how this could occur. If R1(set 1/model 1)=0.00 and >> R2(set 2/model 2)=0.00, then why would R2-R1 != 0.00? I suspect >> that maybe there really is a pattern of relationships present but that >> there is insufficient statistical power to detect them (the researcher >> either needs to get more subjects or better measurements). There >> may be other reasons but I think one needs to examine the data >> in order to figure out (one explanation is that it is just a Type I >> error). >> >> Rich, how would you explain what happens in (3) above? >> >> -Mike Palij >> New York University >> [hidden email] >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> [hidden email] (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> > > > ----- > -- > Bruce Weaver > [hidden email] > http://sites.google.com/a/lakeheadu.ca/bweaver/ > > "When all else fails, RTFM." > > NOTE: My Hotmail account is not monitored regularly. > To send me an e-mail, please use the address shown above. > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/significant-F-change-but-nonsignificant-regression-model-overall-tp4269810p4272153.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
Hi Mike. I don't have time to address all of your points right now, here are a couple quick comments.
Re your point 1 below, I didn't have enough patience to keep fiddling with it until I got a data set that met that condition. I suspect that one could concoct a data set meeting that condition that still shows the same general pattern of results though. Re your point 2, I should have added that when people use this approach (key variables first, then nuisance variables), they have the option of reverting to the simpler model (and often do) if the nuisance variables add nothing useful. One thing to consider is whether the change in R-sq is statistically significant; but as I mentioned, I also like to check that the coefficients for the key variables are not too different in the too models as well. If they are, it suggests something a bit fishy might be going on. But in some cases, of course, adding the nuisance variables does result in improved fit of the model, and possibly changes in the coefficients for the key variables. So in that case, one would obviously stick with the more complex model. As you've probably seen by now, I posted another message a while ago that has the adjusted R-sq values for my example with random data. And as you probably expected, Adj R-sq for model 2 is lower than for model 1 when the nuisance variables are added second. This is another pretty good sign that one should consider reverting to Model 1. Cheers, Bruce p.s. - Here are the data for my random number example, in case anyone wants to play around with it. ID Y x1 x2 x3 x4 x5 x6 1 68.87 48.79 65.52 32.83 46.57 49.68 50.41 2 77.83 37.03 59.01 50.83 58.44 67.38 65.47 3 74.96 63.20 62.50 44.59 31.19 71.42 58.67 4 66.83 37.10 52.60 58.99 55.92 63.61 52.10 5 89.31 53.34 55.78 42.63 57.21 61.94 40.51 6 88.92 42.98 41.55 49.90 52.92 54.90 32.59 7 84.52 45.10 43.88 67.67 61.20 65.58 25.97 8 71.03 51.50 59.75 40.35 53.97 27.20 57.42 9 90.30 35.67 42.71 47.18 48.81 55.29 57.62 10 82.03 54.73 40.85 49.57 64.83 36.19 51.69 11 74.90 50.93 52.39 44.54 45.33 53.13 47.99 12 81.51 49.38 53.75 49.38 39.24 46.70 40.37 13 98.03 37.70 43.40 49.28 49.51 43.83 44.92 14 97.00 48.67 49.31 40.79 47.47 48.79 47.49 15 68.77 49.13 65.20 55.54 67.96 52.64 57.70 16 89.69 52.83 45.73 55.60 46.59 48.89 37.61 17 41.05 62.22 36.89 31.77 49.43 36.90 32.90 18 69.90 54.74 36.74 63.09 75.08 42.78 51.80 19 72.18 49.39 55.51 35.44 54.24 60.74 41.36 20 71.35 53.96 36.54 22.17 48.72 47.93 32.03 21 67.55 50.33 39.52 49.40 47.92 46.42 49.87 22 57.96 52.97 42.08 61.42 47.01 42.43 52.60 23 72.59 52.14 51.45 48.08 43.25 50.97 54.75 24 79.43 42.07 34.99 28.99 75.75 33.72 45.27 25 98.00 48.57 30.43 46.96 39.50 44.50 47.91 26 92.54 32.80 39.10 53.62 50.50 43.57 57.61 27 69.40 57.39 67.77 68.30 49.33 55.33 66.43 28 69.23 37.75 56.03 64.98 46.18 57.15 42.81 29 60.53 53.97 47.93 48.30 49.18 39.33 49.69 30 45.42 42.87 54.18 50.04 37.56 46.02 39.47 31 75.66 71.98 45.94 57.29 35.81 40.17 49.39 32 87.42 37.58 40.88 52.57 27.52 35.19 57.27 33 69.26 40.32 63.45 56.72 55.60 50.81 48.60 34 67.01 56.82 50.11 41.32 57.04 39.51 54.33 35 108.41 62.20 54.69 54.91 62.37 57.80 55.15 36 89.07 43.66 56.98 38.51 45.55 51.19 64.90 37 91.75 73.27 48.97 58.70 46.24 60.23 52.54 38 79.19 53.14 52.16 35.82 53.97 67.37 41.93 39 93.08 45.82 60.89 44.59 51.37 64.52 54.42 40 25.68 48.56 38.87 51.27 43.72 54.75 41.63 41 76.89 43.52 45.51 32.75 45.15 49.65 44.21 42 87.52 59.25 52.09 57.91 52.07 64.11 43.07 43 91.29 46.26 35.32 46.97 54.77 55.38 80.68 44 75.65 37.66 38.25 52.39 44.49 53.13 55.76 45 79.83 32.62 75.66 49.90 56.71 56.68 54.74 46 93.66 48.84 59.99 39.71 38.28 38.93 68.83 47 77.16 33.17 38.94 53.12 30.47 40.20 61.00 48 78.99 57.50 59.34 54.62 35.23 46.06 40.72 49 62.73 51.51 48.49 70.91 36.68 40.46 43.22 50 85.68 21.41 38.36 62.87 27.46 40.93 56.01 51 62.48 46.58 67.54 47.85 33.46 39.55 45.70 52 61.28 47.16 50.70 37.73 60.64 43.36 55.58 53 107.35 44.61 39.74 60.34 49.34 50.16 52.04 54 65.51 55.57 40.58 44.00 50.03 55.89 54.24 55 85.11 51.98 51.38 46.19 36.61 36.27 64.20 56 91.36 42.71 60.44 66.88 48.58 62.01 53.99 57 89.99 68.91 48.34 49.55 57.85 66.75 53.52 58 55.38 40.13 50.45 41.90 53.80 41.21 41.20 59 117.70 40.08 49.32 50.25 54.41 72.35 51.54 60 67.47 55.77 55.62 52.25 47.86 47.63 41.52
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Free forum by Nabble | Edit this page |