significant F change, but nonsignificant regression model overall

classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

significant F change, but nonsignificant regression model overall

sgthomson99
Hi everyone,
One of our post-docs is having troubles with a regression model he is trying to run.  He is trying to predict outcome in babies based on some pregnancy variables from the mothers collected during gestation.  He has 75 participants.  He has entered four variables to control for on the first step, and then two other predictors on the 2nd step.  So we're trying to see if these two predictors are significant above and beyond the four variables we are controlling for on the first step of the regression.
 
The F change for adding these 2 predictors on Step 2 is significant - and the R2 change is .19 so not big, but not bad.  The problem is the overall regression model is not significant.  He also has an interaction between baby's gender and outcome.
 
Is this possible?  Might it be a problem with multicollinearity? 
 
Since it's a pilot study, should we be sticking with reporting partial correlations between pregnancy variables and baby's outcome variable - and partially out the variables we've entered on the first step of the regression?
 
Any ideas greatly appreciated!!  Many thanks for the help!
 
Susan
Reply | Threaded
Open this post in threaded view
|

Re: significant F change, but nonsignificant regression model overall

Swank, Paul R

Was the first model with 4 variables significant?

 

Dr. Paul R. Swank,

Professor and Director of Research

Children's Learning Institute

University of Texas Health Science Center-Houston

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of S Crawford
Sent: Tuesday, March 29, 2011 12:51 PM
To: [hidden email]
Subject: significant F change, but nonsignificant regression model overall

 

Hi everyone,
One of our post-docs is having troubles with a regression model he is trying to run.  He is trying to predict outcome in babies based on some pregnancy variables from the mothers collected during gestation.  He has 75 participants.  He has entered four variables to control for on the first step, and then two other predictors on the 2nd step.  So we're trying to see if these two predictors are significant above and beyond the four variables we are controlling for on the first step of the regression.
 
The F change for adding these 2 predictors on Step 2 is significant - and the R2 change is .19 so not big, but not bad.  The problem is the overall regression model is not significant.  He also has an interaction between baby's gender and outcome.
 
Is this possible?  Might it be a problem with multicollinearity? 
 
Since it's a pilot study, should we be sticking with reporting partial correlations between pregnancy variables and baby's outcome variable - and partially out the variables we've entered on the first step of the regression?
 
Any ideas greatly appreciated!!  Many thanks for the help!
 
Susan

Reply | Threaded
Open this post in threaded view
|

Re: significant F change, but nonsignificant regression model overall

parisec
In reply to this post by sgthomson99
Hi Susan,
 
Some things to try:
 
1. Assess the bivariate associations between all of your predictor variables to see if you have some strong correlations among your predictors.
2. Enter in one variable, the one you consider to be the most important, then enter in each variable by itself with this one variable and assess the change in the coefficient of your most important variable. This can give you an idea of the effect of one variable on another
2. Look at your tolerance to see if you have multicollinearity


From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of S Crawford
Sent: Tuesday, March 29, 2011 10:51 AM
To: [hidden email]
Subject: significant F change, but nonsignificant regression model overall

Hi everyone,
One of our post-docs is having troubles with a regression model he is trying to run.  He is trying to predict outcome in babies based on some pregnancy variables from the mothers collected during gestation.  He has 75 participants.  He has entered four variables to control for on the first step, and then two other predictors on the 2nd step.  So we're trying to see if these two predictors are significant above and beyond the four variables we are controlling for on the first step of the regression.
 
The F change for adding these 2 predictors on Step 2 is significant - and the R2 change is .19 so not big, but not bad.  The problem is the overall regression model is not significant.  He also has an interaction between baby's gender and outcome.
 
Is this possible?  Might it be a problem with multicollinearity? 
 
Since it's a pilot study, should we be sticking with reporting partial correlations between pregnancy variables and baby's outcome variable - and partially out the variables we've entered on the first step of the regression?
 
Any ideas greatly appreciated!!  Many thanks for the help!
 
Susan
Reply | Threaded
Open this post in threaded view
|

Re: significant F change, but nonsignificant regression model overall

Rich Ulrich
In reply to this post by sgthomson99
> Date: Tue, 29 Mar 2011 17:51:03 +0000
> From: [hidden email]
> Subject: significant F change, but nonsignificant regression model overall
> To: [hidden email]
>
> Hi everyone,
> One of our post-docs is having troubles with a regression model he is
> trying to run. He is trying to predict outcome in babies based on some
> pregnancy variables from the mothers collected during gestation. He
> has 75 participants. He has entered four variables to control for on
> the first step, and then two other predictors on the 2nd step. So
> we're trying to see if these two predictors are significant above and
> beyond the four variables we are controlling for on the first step of
> the regression.
>
> The F change for adding these 2 predictors on Step 2 is significant -
> and the R2 change is .19 so not big, but not bad. The problem is the
> overall regression model is not significant. He also has an
> interaction between baby's gender and outcome.
>
> Is this possible? Might it be a problem with multicollinearity?
>

Assuming that the description is accurate, Paul Swank has
pointed at the issue - the first four variables were not significant.

But for a designed test of the two, their impact is thoroughly irrelevant.
I repeat, the overall test of the regression is totally irrelevant,
because the post-doc designated, at the start, the test of two variables.


I am unsure of what you mean by saying "interaction" between gender
and outcome.  "Outcome" usually means "what is being predicted".
In a different style of testing, "interaction with outcome" denotes
a main effect for one predictor (here: gender).

In regression, "interaction" usually refers to a cross-product of
two predictors, not Predictor and Outcome.  My tentative conclusion
is that your statement about Gender is a summary of what shows up
among the first 4 variables entered.  Should you take it seriously?
 - Well, it was specified a-priori as a control variable.  I don't
know what he wants to say about the 4 control variables, but that is
a separate matter for discussion.  The designed test was positive.

--
Rich Ulrich




=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: significant F change, but nonsignificant regression model overall

Bruce Weaver
Administrator
In reply to this post by Swank, Paul R
Hello Susan.  To expand on what Paul asked, please report the R-squared values and F-tests for both models.  My guess is that the R-squared value for model 1 is quite low and not statistically significant.  

Also, you said there is an interaction between baby's gender and outcome.  I'm not sure what that means.  Interactions involve two or more explanatory variables, not an explanatory variable and the outcome variable.  Please clarify what you mean.

Cheers,
Bruce


Swank, Paul R wrote
Was the first model with 4 variables significant?

Dr. Paul R. Swank,
Professor and Director of Research
Children's Learning Institute
University of Texas Health Science Center-Houston

From: SPSSX(r) Discussion [mailto:SPSSX-L@LISTSERV.UGA.EDU] On Behalf Of S Crawford
Sent: Tuesday, March 29, 2011 12:51 PM
To: SPSSX-L@LISTSERV.UGA.EDU
Subject: significant F change, but nonsignificant regression model overall

Hi everyone,
One of our post-docs is having troubles with a regression model he is trying to run.  He is trying to predict outcome in babies based on some pregnancy variables from the mothers collected during gestation.  He has 75 participants.  He has entered four variables to control for on the first step, and then two other predictors on the 2nd step.  So we're trying to see if these two predictors are significant above and beyond the four variables we are controlling for on the first step of the regression.

The F change for adding these 2 predictors on Step 2 is significant - and the R2 change is .19 so not big, but not bad.  The problem is the overall regression model is not significant.  He also has an interaction between baby's gender and outcome.

Is this possible?  Might it be a problem with multicollinearity?

Since it's a pilot study, should we be sticking with reporting partial correlations between pregnancy variables and baby's outcome variable - and partially out the variables we've entered on the first step of the regression?

Any ideas greatly appreciated!!  Many thanks for the help!

Susan
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: significant F change, but nonsignificant regression model overall

sgthomson99
In reply to this post by Rich Ulrich
My apologies to everyone for mixing up the interaction.  I was in too much of a rush and definitely should have had our post-doc email the group with his questions. 
 
The interaction is between baby's gender and one of the predictors. 

For Model 1 with the 4 covariates entered, the Multiple R is .5, and F test is not significant.
 
For Model 2 where he entered the 4 covariates on Step 1 and the 2 variables he is most interested in on Step 2, the Multiple R is .6 and F test is still not significant.
But a priori, he was most interested in the 2 variables entered on Step 2 - and this is where the F change is significant.  One of the two variables is significant on Step 2.
 
So can he focus on the F change being significant and ignore the fact that the overall model is not significant and ignore the fact that the 4 variables on Step 1 were not significant either?
 
Thanks so much.
 
Susan
 

 
> Date: Tue, 29 Mar 2011 14:29:26 -0400

> From: [hidden email]
> Subject: Re: significant F change, but nonsignificant regression model overall
> To: [hidden email]
>
> > Date: Tue, 29 Mar 2011 17:51:03 +0000
> > From: [hidden email]
> > Subject: significant F change, but nonsignificant regression model overall
> > To: [hidden email]
> >
> > Hi everyone,
> > One of our post-docs is having troubles with a regression model he is
> > trying to run. He is trying to predict outcome in babies based on some
> > pregnancy variables from the mothers collected during gestation. He
> > has 75 participants. He has entered four variables to control for on
> > the first step, and then two other predictors on the 2nd step. So
> > we're trying to see if these two predictors are significant above and
> > beyond the four variables we are controlling for on the first step of
> > the regression.
> >
> > The F change for adding these 2 predictors on Step 2 is significant -
> > and the R2 change is .19 so not big, but not bad. The problem is the
> > overall regression model is not significant. He also has an
> > interaction between baby's gender and outcome.
> >
> > Is this possible? Might it be a problem with multicollinearity?
> >
>
> Assuming that the description is accurate, Paul Swank has
> pointed at the issue - the first four variables were not significant.
>
> But for a designed test of the two, their impact is thoroughly irrelevant.
> I repeat, the overall test of the regression is totally irrelevant,
> because the post-doc designated, at the start, the test of two variables.
>
>
> I am unsure of what you mean by saying "interaction" between gender
> and outcome. "Outcome" usually means "what is being predicted".
> In a different style of testing, "interaction with outcome" denotes
> a main effect for one predictor (here: gender).
>
> In regression, "interaction" usually refers to a cross-product of
> two predictors, not Predictor and Outcome. My tentative conclusion
> is that your statement about Gender is a summary of what shows up
> among the first 4 variables entered. Should you take it seriously?
> - Well, it was specified a-priori as a control variable. I don't
> know what he wants to say about the 4 control variables, but that is
> a separate matter for discussion. The designed test was positive.
>
> --
> Rich Ulrich
>
>
>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: significant F change, but nonsignificant regression model overall

Mike
Before making a serious recommendation, I think that I would like
to know more about the data.  However, I would be hesitant about
reporting the "significant" change in F in the context of nonsignificant
models.  Consider the following analog:  a person conducts a one-way
ANOVA with six levels.  The overall ANOVA is not significant but
post hoc testing reveals one contrast between means to be significant.
Should one report the one significant post hoc test and ignore or
downplay the non-significant ANOVA?  My advice here would be
unless you planned on doing just that particular contrast (in which
case why was the ANOVA and other tests done?) you should treat
it as a spuriously significant result (i.e., the result of doing too many
tests -- one is bound to get a significant result on a purely chance
basis with enough tests). The situation described below is not exactly
like this but it does suggest that one should be suspicious about
the significant change in F.  I would want to know more about the
distributions of the individual variables, the correlations between
the variables as well as examine their scatterplots, and take a closer
look at how the difference for  R= 0.00 (model 1) minus R=0.00 (model 2)
gives rise to a non-zero difference. 
 
-Mike Palij
New York University
 
 
----- Original Message -----
Sent: Tuesday, March 29, 2011 3:06 PM
Subject: Re: significant F change, but nonsignificant regression model overall

My apologies to everyone for mixing up the interaction.  I was in too much of a rush and definitely should have had our post-doc email the group with his questions. 
 
The interaction is between baby's gender and one of the predictors. 

For Model 1 with the 4 covariates entered, the Multiple R is .5, and F test is not significant.
 
For Model 2 where he entered the 4 covariates on Step 1 and the 2 variables he is most interested in on Step 2, the Multiple R is .6 and F test is still not significant.
But a priori, he was most interested in the 2 variables entered on Step 2 - and this is where the F change is significant.  One of the two variables is significant on Step 2.
 
So can he focus on the F change being significant and ignore the fact that the overall model is not significant and ignore the fact that the 4 variables on Step 1 were not significant either?
 
Thanks so much.
 
Susan
 

 

> Date: Tue, 29 Mar 2011 14:29:26 -0400
> From: [hidden email]
> Subject: Re: significant F change, but nonsignificant regression model overall
> To: [hidden email]
>
> > Date: Tue, 29 Mar 2011 17:51:03 +0000
> > From: [hidden email]
> > Subject: significant F change, but nonsignificant regression model overall
> > To: [hidden email]
> >
> > Hi everyone,
> > One of our post-docs is having troubles with a regression model he is
> > trying to run. He is trying to predict outcome in babies based on some
> > pregnancy variables from the mothers collected during gestation. He
> > has 75 participants. He has entered four variables to control for on
> > the first step, and then two other predictors on the 2nd step. So
> > we're trying to see if these two predictors are significant above and
> > beyond the four variables we are controlling for on the first step of
> > the regression.
> >
> > The F change for adding these 2 predictors on Step 2 is significant -
> > and the R2 change is .19 so not big, but not bad. The problem is the
> > overall regression model is not significant. He also has an
> > interaction between baby's gender and outcome.
> >
> > Is this possible? Might it be a problem with multicollinearity?
> >
>
> Assuming that the description is accurate, Paul Swank has
> pointed at the issue - the first four variables were not significant.
>
> But for a designed test of the two, their impact is thoroughly irrelevant.
> I repeat, the overall test of the regression is totally irrelevant,
> because the post-doc designated, at the start, the test of two variables.
>
>
> I am unsure of what you mean by saying "interaction" between gender
> and outcome. "Outcome" usually means "what is being predicted".
> In a different style of testing, "interaction with outcome" denotes
> a main effect for one predictor (here: gender).
>
> In regression, "interaction" usually refers to a cross-product of
> two predictors, not Predictor and Outcome. My tentative conclusion
> is that your statement about Gender is a summary of what shows up
> among the first 4 variables entered. Should you take it seriously?
> - Well, it was specified a-priori as a control variable. I don't
> know what he wants to say about the 4 control variables, but that is
> a separate matter for discussion. The designed test was positive.
>
> --
> Rich Ulrich
>
>
>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Automatic reply: significant F change, but nonsignificant regression model overall

Ling Ting

I will be out of office on March 29 afternoon from 1pm.  I will have very limited access to email.   If you need immediate assistance please contact 479-575-2905.  Thank you.

Reply | Threaded
Open this post in threaded view
|

Re: significant F change, but nonsignificant regression model overall

Rich Ulrich
In reply to this post by Mike
Mike,
You seem to have missed the comment,

          He has entered four variables to control for on
> > > the first step, and then two other predictors on the 2nd step. So
> > > we're trying to see if these two predictors are significant above and
> > > beyond the four variables we are controlling for on the first step of
> > > the regression.


________________________________

> Date: Tue, 29 Mar 2011 15:51:48 -0400
> From: [hidden email]
> Subject: Re: significant F change, but nonsignificant regression model
> overall
> To: [hidden email]
>
> Before making a serious recommendation, I think that I would like
> to know more about the data. However, I would be hesitant about
> reporting the "significant" change in F in the context of nonsignificant
> models. Consider the following analog: a person conducts a one-way
> ANOVA with six levels. The overall ANOVA is not significant but
> post hoc testing reveals one contrast between means to be significant.

[snip, rest of irrelevant example, and more]

It's a designed test, so the other results are irrelevant.

--
Rich Ulrich



=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: significant F change, but nonsignificant regression model overall

Jarrod Teo-2
In reply to this post by Swank, Paul R
Hi Susan,
 
If this is a pilot study, can I say that you will have more data in the actual study? If profiling is necessary and you would like to avoid interaction, you might want to try C5 or other decision trees that profile with a dependent variable.
 
This might not be the best way but do note that data mining is data driven and you might require at least 300 data so that the data mining model could identify patterns in the data file.
 
Warmest Regards
Dorraj Oet
 

Date: Tue, 29 Mar 2011 12:58:22 -0500
From: [hidden email]
Subject: Re: significant F change, but nonsignificant regression model overall
To: [hidden email]

Was the first model with 4 variables significant?

 

Dr. Paul R. Swank,

Professor and Director of Research

Children's Learning Institute

University of Texas Health Science Center-Houston

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of S Crawford
Sent: Tuesday, March 29, 2011 12:51 PM
To: [hidden email]
Subject: significant F change, but nonsignificant regression model overall

 

Hi everyone,
One of our post-docs is having troubles with a regression model he is trying to run.  He is trying to predict outcome in babies based on some pregnancy variables from the mothers collected during gestation.  He has 75 participants.  He has entered four variables to control for on the first step, and then two other predictors on the 2nd step.  So we're trying to see if these two predictors are significant above and beyond the four variables we are controlling for on the first step of the regression.
 
The F change for adding these 2 predictors on Step 2 is significant - and the R2 change is .19 so not big, but not bad.  The problem is the overall regression model is not significant.  He also has an interaction between baby's gender and outcome.
 
Is this possible?  Might it be a problem with multicollinearity?   
Since it's a pilot study, should we be sticking with reporting partial correlations between pregnancy variables and baby's outcome variable - and partially out the variables we've entered on the first step of the regression?
 
Any ideas greatly appreciated!!  Many thanks for the help!
 
Susan

Reply | Threaded
Open this post in threaded view
|

Re: significant F change, but nonsignificant regression model overall

Mike
In reply to this post by Rich Ulrich
On Tuesday, March 29, 2011 11:36 pm, Rich Ulrich wrote:
>
> Mike,
> You seem to have missed the comment,
>
>>>> He has entered four variables to control for on
>>>> the first step, and then two other predictors on the 2nd step. So
>>>> we're trying to see if these two predictors are significant above and
>>>> beyond the four variables we are controlling for on the first step of
>>>> the regression.

No, I didn't miss this comment.  Let's review what we might know about
the situation (at least from my perspective):

(1) The analyst is doing setwise regression, comparable to an ANCOVA,
entering 4 variables/covariates as the first set.  As mentioned elsewhere,
these covariates are NOT significantly related to the dependent variable.
This implies that the multiple correlation and its squared version are zero,
or R1=0.00.  One could, I think, legitimately ask why did one continue to
use these as covariates or keep them in the model when the second set
was entered -- one argument could be based on the expectation that
there is a supressor relationship among the predictors but until we hear
from the person who actually ran the analysis, I don't believe this was
the strategy.

(2) After the second set of predictors were entered there still was NO
significant relationship between the predictors and the dependent variable.
So, for this model R and R^2 are both equal to zero or R2=0.00

(3) There is a "significant increase in R^2" (F change) when the second
set of predictors was entered.  This has me puzzled.  It is not clear to
me why or how this could occur.  If R1(set 1/model 1)=0.00 and
R2(set 2/model 2)=0.00, then why would R2-R1 != 0.00?  I suspect
that maybe there really is a pattern of relationships present but that
there is insufficient statistical power to detect them (the researcher
either needs to get more subjects or better measurements). There
may be other reasons but I think one needs to examine the data
in order to figure out (one explanation is that it is just a Type I
error).

Rich, how would you explain what happens in (3) above?

-Mike Palij
New York University
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: significant F change, but nonsignificant regression model overall

Swank, Paul R
The problem is one of sample size. The original control variables do not result in a significant model. Does this mean they have no effect? No, it means you don't have enough power to detect that size effect. It may be that the effect size is worrisome enough to demand control even in the absence of significance. If the question really is do x5 and x6 really predict over and above x1 through x4, then they should probably be included. However, if x5 and x6 add significantly to x1-x4, then the fact that x1- x4 do not account for significant variability can pull down the R squared for the full model. Given this is a pilot study, I think we might be okay saying that x5 and x6 do predict significantly over and above x1 - x4 but I would look more at effect size here than statistical significance. The pilot study should be to estimate effect size so that we can design a suitably large full scale study to actually test the hypothesis.

Dr. Paul R. Swank,
Professor and Director of Research
Children's Learning Institute
University of Texas Health Science Center-Houston


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Michael Palij
Sent: Wednesday, March 30, 2011 6:34 AM
To: [hidden email]
Subject: Re: significant F change, but nonsignificant regression model overall

On Tuesday, March 29, 2011 11:36 pm, Rich Ulrich wrote:
>
> Mike,
> You seem to have missed the comment,
>
>>>> He has entered four variables to control for on
>>>> the first step, and then two other predictors on the 2nd step. So
>>>> we're trying to see if these two predictors are significant above and
>>>> beyond the four variables we are controlling for on the first step of
>>>> the regression.

No, I didn't miss this comment.  Let's review what we might know about
the situation (at least from my perspective):

(1) The analyst is doing setwise regression, comparable to an ANCOVA,
entering 4 variables/covariates as the first set.  As mentioned elsewhere,
these covariates are NOT significantly related to the dependent variable.
This implies that the multiple correlation and its squared version are zero,
or R1=0.00.  One could, I think, legitimately ask why did one continue to
use these as covariates or keep them in the model when the second set
was entered -- one argument could be based on the expectation that
there is a supressor relationship among the predictors but until we hear
from the person who actually ran the analysis, I don't believe this was
the strategy.

(2) After the second set of predictors were entered there still was NO
significant relationship between the predictors and the dependent variable.
So, for this model R and R^2 are both equal to zero or R2=0.00

(3) There is a "significant increase in R^2" (F change) when the second
set of predictors was entered.  This has me puzzled.  It is not clear to
me why or how this could occur.  If R1(set 1/model 1)=0.00 and
R2(set 2/model 2)=0.00, then why would R2-R1 != 0.00?  I suspect
that maybe there really is a pattern of relationships present but that
there is insufficient statistical power to detect them (the researcher
either needs to get more subjects or better measurements). There
may be other reasons but I think one needs to examine the data
in order to figure out (one explanation is that it is just a Type I
error).

Rich, how would you explain what happens in (3) above?

-Mike Palij
New York University
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: significant F change, but nonsignificant regression model overall

Mike
"Swank, Paul R"
On Wednesday, March 30, 2011 11:06 AM, Paul Swank wrote:
>The problem is one of sample size. The original control variables
>do not result in a significant model. Does this mean they have no
>effect? No, it means you don't have enough power to detect that
>size effect.

A null result implies two possible conditions:

(1)  The null hypothesis is true.

(2)  The null hypothesis is false but there is insufficient power
to reject it.

Which one of the above conditions one chooses to believe depends
on a bunch of factors, such as previous research that is comparable
to the current study -- if this is truly a pilot study where no one has
done something like this before, then condition (1) is, I think, the
more prudent choice.  However, given peoples' cognitive biases,
including the sunk cost effect, one is probably loathe to entertain
codnition (1) because no wants one research to support null results
outside of SEM or other modeling situations.

Of course, as you point out below, one way to determine which
condition is most consistent with the evidence is to (a) define a
specific effect size that one wants to detect, (b) specify a specific
level of statistical power (say, between .80 to .95), and (c) then
identify the sample size needed to detect the specified effect size.

One might politely ask if this was done before the collection of
this data, a good practice that is more often observed in the breach.
Perhaps one should read Jack Cohen's writings before going to sleep
at night to remember what good research conduct is.  If one did,
then there would be less ambiguity about which of the two conditions
above holds.  If one knows what effect size one wants to detect and
one has appropriate power (say, .95-.99), then a null result is clearly
more consistent with condition (1).  A retrospective power analysis
is clearly indicated if one believes that condition (2) holds.

-Mike Palij
New York University
[hidden email]

>It may be that the effect size is worrisome enough to demand control
>even in the absence of significance. If the question really is do x5 and
>x6 really predict over and above x1 through x4, then they should
>probably be included. However, if x5 and x6 add significantly to x1-x4,
>then the fact that x1- x4 do not account for significant variability can pull
>down the R squared for the full model. Given this is a pilot study, I think
>we might be okay saying that x5 and x6 do predict significantly over
>and above x1 - x4 but I would look more at effect size here than statistical
>significance. The pilot study should be to estimate effect size so that we
>can design a suitably large full scale study to actually test the hypothesis.
>
>Dr. Paul R. Swank,
>Professor and Director of Research
>Children's Learning Institute
>University of Texas Health Science Center-Houston

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Michael Palij
Sent: Wednesday, March 30, 2011 6:34 AM
To: [hidden email]
Subject: Re: significant F change, but nonsignificant regression model overall

On Tuesday, March 29, 2011 11:36 pm, Rich Ulrich wrote:
>
> Mike,
> You seem to have missed the comment,
>
>>>> He has entered four variables to control for on
>>>> the first step, and then two other predictors on the 2nd step. So
>>>> we're trying to see if these two predictors are significant above and
>>>> beyond the four variables we are controlling for on the first step of
>>>> the regression.

No, I didn't miss this comment.  Let's review what we might know about
the situation (at least from my perspective):

(1) The analyst is doing setwise regression, comparable to an ANCOVA,
entering 4 variables/covariates as the first set.  As mentioned elsewhere,
these covariates are NOT significantly related to the dependent variable.
This implies that the multiple correlation and its squared version are zero,
or R1=0.00.  One could, I think, legitimately ask why did one continue to
use these as covariates or keep them in the model when the second set
was entered -- one argument could be based on the expectation that
there is a supressor relationship among the predictors but until we hear
from the person who actually ran the analysis, I don't believe this was
the strategy.

(2) After the second set of predictors were entered there still was NO
significant relationship between the predictors and the dependent variable.
So, for this model R and R^2 are both equal to zero or R2=0.00

(3) There is a "significant increase in R^2" (F change) when the second
set of predictors was entered.  This has me puzzled.  It is not clear to
me why or how this could occur.  If R1(set 1/model 1)=0.00 and
R2(set 2/model 2)=0.00, then why would R2-R1 != 0.00?  I suspect
that maybe there really is a pattern of relationships present but that
there is insufficient statistical power to detect them (the researcher
either needs to get more subjects or better measurements). There
may be other reasons but I think one needs to examine the data
in order to figure out (one explanation is that it is just a Type I
error).

Rich, how would you explain what happens in (3) above?

-Mike Palij
New York University
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: significant F change, but nonsignificant regression model overall

Swank, Paul R
Actually the null is never true. Sometimes it is not very false.

Dr. Paul R. Swank,
Professor and Director of Research
Children's Learning Institute
University of Texas Health Science Center-Houston


-----Original Message-----
From: Mike Palij [mailto:[hidden email]]
Sent: Wednesday, March 30, 2011 11:34 AM
To: Swank, Paul R; [hidden email]
Cc: Mike Palij
Subject: Re: significant F change, but nonsignificant regression model overall

"Swank, Paul R"
On Wednesday, March 30, 2011 11:06 AM, Paul Swank wrote:
>The problem is one of sample size. The original control variables
>do not result in a significant model. Does this mean they have no
>effect? No, it means you don't have enough power to detect that
>size effect.

A null result implies two possible conditions:

(1)  The null hypothesis is true.

(2)  The null hypothesis is false but there is insufficient power
to reject it.

Which one of the above conditions one chooses to believe depends
on a bunch of factors, such as previous research that is comparable
to the current study -- if this is truly a pilot study where no one has
done something like this before, then condition (1) is, I think, the
more prudent choice.  However, given peoples' cognitive biases,
including the sunk cost effect, one is probably loathe to entertain
codnition (1) because no wants one research to support null results
outside of SEM or other modeling situations.

Of course, as you point out below, one way to determine which
condition is most consistent with the evidence is to (a) define a
specific effect size that one wants to detect, (b) specify a specific
level of statistical power (say, between .80 to .95), and (c) then
identify the sample size needed to detect the specified effect size.

One might politely ask if this was done before the collection of
this data, a good practice that is more often observed in the breach.
Perhaps one should read Jack Cohen's writings before going to sleep
at night to remember what good research conduct is.  If one did,
then there would be less ambiguity about which of the two conditions
above holds.  If one knows what effect size one wants to detect and
one has appropriate power (say, .95-.99), then a null result is clearly
more consistent with condition (1).  A retrospective power analysis
is clearly indicated if one believes that condition (2) holds.

-Mike Palij
New York University
[hidden email]

>It may be that the effect size is worrisome enough to demand control
>even in the absence of significance. If the question really is do x5 and
>x6 really predict over and above x1 through x4, then they should
>probably be included. However, if x5 and x6 add significantly to x1-x4,
>then the fact that x1- x4 do not account for significant variability can pull
>down the R squared for the full model. Given this is a pilot study, I think
>we might be okay saying that x5 and x6 do predict significantly over
>and above x1 - x4 but I would look more at effect size here than statistical
>significance. The pilot study should be to estimate effect size so that we
>can design a suitably large full scale study to actually test the hypothesis.
>
>Dr. Paul R. Swank,
>Professor and Director of Research
>Children's Learning Institute
>University of Texas Health Science Center-Houston

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Michael Palij
Sent: Wednesday, March 30, 2011 6:34 AM
To: [hidden email]
Subject: Re: significant F change, but nonsignificant regression model overall

On Tuesday, March 29, 2011 11:36 pm, Rich Ulrich wrote:
>
> Mike,
> You seem to have missed the comment,
>
>>>> He has entered four variables to control for on
>>>> the first step, and then two other predictors on the 2nd step. So
>>>> we're trying to see if these two predictors are significant above and
>>>> beyond the four variables we are controlling for on the first step of
>>>> the regression.

No, I didn't miss this comment.  Let's review what we might know about
the situation (at least from my perspective):

(1) The analyst is doing setwise regression, comparable to an ANCOVA,
entering 4 variables/covariates as the first set.  As mentioned elsewhere,
these covariates are NOT significantly related to the dependent variable.
This implies that the multiple correlation and its squared version are zero,
or R1=0.00.  One could, I think, legitimately ask why did one continue to
use these as covariates or keep them in the model when the second set
was entered -- one argument could be based on the expectation that
there is a supressor relationship among the predictors but until we hear
from the person who actually ran the analysis, I don't believe this was
the strategy.

(2) After the second set of predictors were entered there still was NO
significant relationship between the predictors and the dependent variable.
So, for this model R and R^2 are both equal to zero or R2=0.00

(3) There is a "significant increase in R^2" (F change) when the second
set of predictors was entered.  This has me puzzled.  It is not clear to
me why or how this could occur.  If R1(set 1/model 1)=0.00 and
R2(set 2/model 2)=0.00, then why would R2-R1 != 0.00?  I suspect
that maybe there really is a pattern of relationships present but that
there is insufficient statistical power to detect them (the researcher
either needs to get more subjects or better measurements). There
may be other reasons but I think one needs to examine the data
in order to figure out (one explanation is that it is just a Type I
error).

Rich, how would you explain what happens in (3) above?

-Mike Palij
New York University
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: significant F change, but nonsignificant regression model overall

Rich Ulrich
In reply to this post by Mike
> Date: Wed, 30 Mar 2011 07:34:20 -0400
> From: [hidden email]
> Subject: Re: significant F change, but nonsignificant regression model overall
> To: [hidden email]
>
> On Tuesday, March 29, 2011 11:36 pm, Rich Ulrich wrote:
> >
> > Mike,
> > You seem to have missed the comment,
> >
> >>>> He has entered four variables to control for on
> >>>> the first step, and then two other predictors on the 2nd step. So
> >>>> we're trying to see if these two predictors are significant above and
> >>>> beyond the four variables we are controlling for on the first step of
> >>>> the regression.
>
> No, I didn't miss this comment. Let's review what we might know about
> the situation (at least from my perspective):
>
> (1) The analyst is doing setwise regression, comparable to an ANCOVA,
> entering 4 variables/covariates as the first set. As mentioned elsewhere,
> these covariates are NOT significantly related to the dependent variable.

Mike,
No, they are not "doing setwise regression", whatever that new
phrase means, if that is what you intended.  And they are certainly
not doing Stepwise regression, which is what you seem to discuss
later.

The analysis used an intentional, pre-designated order of entry of terms.
The statistical tool was a regression program.  There were 4 variables
which were "controlled for", as explicitly described.

 - That analysis tests two variable, with 4 variables being "controlled for."
An analogous ANCOVA would be a two-factor design with  4 covariates, where
the covariates are included for ... whatever purposes.  In more detail --
The "whole ANOVA"  will have a test with  d.f.= (4+groups-1).  The test
(or tests) on the two factors do *not*  rely on the covariates being
either significant or not-significant. If you control for something
highly correlated with outcome (pre-scores, often), the covariates are
highly significant.  If you control for "nuisance" variables, you hope
that the nuisance variables do not have much effect because that
complicates interpretations.  But in either case, you do *not*  use
the overall test on (covariates + hypotheses) as a guide to the
inference on hypotheses.

[snip, description of a "stepwise" process; irrelevant to
discussion of testing taken as a defined hierarchy.]

--
Rich Ulrich



=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: significant F change, but nonsignificant regression model overall

Bruce Weaver
Administrator
In reply to this post by Mike
After a few tries, I mimicked this result (more or less) with some randomly generated data and 60 cases.

* Generate data .
* X1 to X6 are random numbers.
* Only X5 and X6 are related to Y.

numeric Y x1 to x6 (f8.2).
do repeat x = x1 to x6.
-  compute x = rv.normal(50,10).
end repeat.
compute Y = 50 + .2*x5 + .4*x6 + rv.normal(0,15).
exe.

REGRESSION
  /STATISTICS COEFF OUTS R ANOVA CHANGE
  /DEPENDENT Y
  /METHOD=ENTER x1 to x4
  /METHOD=ENTER x5 x6.

Model 1:  R-sq = 0.027, F(4, 55) = .388, p = .817
Model 2:  R-sq = 0.186, F(6, 53) = 2.014, p = .080
Change in R-sq = 0.158, F(2, 53) = 5.15, p = .009

When the goal is to control for potential confounders, one sometimes sees the steps reversed, with the variable (or variables) of main interest entered first, and the potential confounders added on the next step.  This is commonly done with logistic regression, for example, where crude and adjusted odds ratios are reported (from models 1 and 2 respectively).  For the data above, here's what I get when I do it that way:

Model 1:  R-sq = 0.148, F(2, 57) = 4.939, p = .011
Model 2:  R-sq = 0.186, F(6, 53) = 2.014, p = .080
Change in R-sq = 0.038, F(4, 53) = 0.618, p = .652

Even though the change in R-sq is clearly not significant, I like to compare (via the eyeball test) the coefficients for X5 and X6 in the two models.  If there is no confounding, then the values should be pretty similar in the two models.

Model   Variable     B      SE      p
1          X5         .448   .190   .022
            X6        .432    .202   .036
2          X5        .522    .210   .016
            X6        .459    .210   .033


Mike, would you be any happier with this second approach to the analysis?  

Cheers,
Bruce


Mike Palij wrote
On Tuesday, March 29, 2011 11:36 pm, Rich Ulrich wrote:
>
> Mike,
> You seem to have missed the comment,
>
>>>> He has entered four variables to control for on
>>>> the first step, and then two other predictors on the 2nd step. So
>>>> we're trying to see if these two predictors are significant above and
>>>> beyond the four variables we are controlling for on the first step of
>>>> the regression.

No, I didn't miss this comment.  Let's review what we might know about
the situation (at least from my perspective):

(1) The analyst is doing setwise regression, comparable to an ANCOVA,
entering 4 variables/covariates as the first set.  As mentioned elsewhere,
these covariates are NOT significantly related to the dependent variable.
This implies that the multiple correlation and its squared version are zero,
or R1=0.00.  One could, I think, legitimately ask why did one continue to
use these as covariates or keep them in the model when the second set
was entered -- one argument could be based on the expectation that
there is a supressor relationship among the predictors but until we hear
from the person who actually ran the analysis, I don't believe this was
the strategy.

(2) After the second set of predictors were entered there still was NO
significant relationship between the predictors and the dependent variable.
So, for this model R and R^2 are both equal to zero or R2=0.00

(3) There is a "significant increase in R^2" (F change) when the second
set of predictors was entered.  This has me puzzled.  It is not clear to
me why or how this could occur.  If R1(set 1/model 1)=0.00 and
R2(set 2/model 2)=0.00, then why would R2-R1 != 0.00?  I suspect
that maybe there really is a pattern of relationships present but that
there is insufficient statistical power to detect them (the researcher
either needs to get more subjects or better measurements). There
may be other reasons but I think one needs to examine the data
in order to figure out (one explanation is that it is just a Type I
error).

Rich, how would you explain what happens in (3) above?

-Mike Palij
New York University
mp26@nyu.edu

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: significant F change, but nonsignificant regression model overall

Mike
In reply to this post by Rich Ulrich
On Wednesday, March 30, 2011 2:55 PM, Rich Ulrich wrote:
>Mike Palij wrote:

>> No, I didn't miss this comment. Let's review what we might know about
>> the situation (at least from my perspective):
>>
>> (1) The analyst is doing setwise regression, comparable to an ANCOVA,
>> entering 4 variables/covariates as the first set. As mentioned elsewhere,
>> these covariates are NOT significantly related to the dependent variable.
>
>Mike,
>No, they are not "doing setwise regression", whatever that new
>phrase means, if that is what you intended.

That "new phrase" can be found in Cohen and Cohen (1975) in their
Chapter 4 "Sets of Independent Variables". Of particular relevance
is section 4.2 "The simultaneous and hierarchical models for sets".
What you and the OP described was a hierarchical or sequential
setwise regression analysis.  See pp127-144 if you have a copy
handy.  If anything, you should say "whatever that arcane phrase
means".

As for your description of the analysis, do you really keep variables
that don't provide any useful information in the equation?  I hope you
report shrunken or adjusted R^2 when you report your results because
they should be considerably smaller than R^2 as a result of the additional
useless predictors.  It should give a person pause.

-Mike Palij
New York University
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: significant F change, but nonsignificant regression model overall

Bruce Weaver
Administrator
In reply to this post by Bruce Weaver
Here are some of those results again, with adjusted R-squared values added (in response to Mike's comment in another post).

METHOD 1:  Entering the 4 confounders first, then the 2 variables of interest

Model 1:  R-sq = 0.027, F(4, 55) = .388, p = .817     Adj R-sq = -.043
Model 2:  R-sq = 0.186, F(6, 53) = 2.014, p = .080   Adj R-sq =  .093
Change in R-sq = 0.158, F(2, 53) = 5.15, p = .009


METHOD 2:  Entering the 2 variables of interest first, then the 4 confounders

Model 1:  R-sq = 0.148, F(2, 57) = 4.939, p = .011   Adj R-sq = .118
Model 2:  R-sq = 0.186, F(6, 53) = 2.014, p = .080   Adj R-sq =  .093
Change in R-sq = 0.038, F(4, 53) = 0.618, p = .652

HTH.


Bruce Weaver wrote
After a few tries, I mimicked this result (more or less) with some randomly generated data and 60 cases.

* Generate data .
* X1 to X6 are random numbers.
* Only X5 and X6 are related to Y.

numeric Y x1 to x6 (f8.2).
do repeat x = x1 to x6.
-  compute x = rv.normal(50,10).
end repeat.
compute Y = 50 + .2*x5 + .4*x6 + rv.normal(0,15).
exe.

REGRESSION
  /STATISTICS COEFF OUTS R ANOVA CHANGE
  /DEPENDENT Y
  /METHOD=ENTER x1 to x4
  /METHOD=ENTER x5 x6.

Model 1:  R-sq = 0.027, F(4, 55) = .388, p = .817
Model 2:  R-sq = 0.186, F(6, 53) = 2.014, p = .080
Change in R-sq = 0.158, F(2, 53) = 5.15, p = .009

When the goal is to control for potential confounders, one sometimes sees the steps reversed, with the variable (or variables) of main interest entered first, and the potential confounders added on the next step.  This is commonly done with logistic regression, for example, where crude and adjusted odds ratios are reported (from models 1 and 2 respectively).  For the data above, here's what I get when I do it that way:

Model 1:  R-sq = 0.148, F(2, 57) = 4.939, p = .011
Model 2:  R-sq = 0.186, F(6, 53) = 2.014, p = .080
Change in R-sq = 0.038, F(4, 53) = 0.618, p = .652

Even though the change in R-sq is clearly not significant, I like to compare (via the eyeball test) the coefficients for X5 and X6 in the two models.  If there is no confounding, then the values should be pretty similar in the two models.

Model   Variable     B      SE      p
1          X5         .448   .190   .022
            X6        .432    .202   .036
2          X5        .522    .210   .016
            X6        .459    .210   .033


Mike, would you be any happier with this second approach to the analysis?  

Cheers,
Bruce


Mike Palij wrote
On Tuesday, March 29, 2011 11:36 pm, Rich Ulrich wrote:
>
> Mike,
> You seem to have missed the comment,
>
>>>> He has entered four variables to control for on
>>>> the first step, and then two other predictors on the 2nd step. So
>>>> we're trying to see if these two predictors are significant above and
>>>> beyond the four variables we are controlling for on the first step of
>>>> the regression.

No, I didn't miss this comment.  Let's review what we might know about
the situation (at least from my perspective):

(1) The analyst is doing setwise regression, comparable to an ANCOVA,
entering 4 variables/covariates as the first set.  As mentioned elsewhere,
these covariates are NOT significantly related to the dependent variable.
This implies that the multiple correlation and its squared version are zero,
or R1=0.00.  One could, I think, legitimately ask why did one continue to
use these as covariates or keep them in the model when the second set
was entered -- one argument could be based on the expectation that
there is a supressor relationship among the predictors but until we hear
from the person who actually ran the analysis, I don't believe this was
the strategy.

(2) After the second set of predictors were entered there still was NO
significant relationship between the predictors and the dependent variable.
So, for this model R and R^2 are both equal to zero or R2=0.00

(3) There is a "significant increase in R^2" (F change) when the second
set of predictors was entered.  This has me puzzled.  It is not clear to
me why or how this could occur.  If R1(set 1/model 1)=0.00 and
R2(set 2/model 2)=0.00, then why would R2-R1 != 0.00?  I suspect
that maybe there really is a pattern of relationships present but that
there is insufficient statistical power to detect them (the researcher
either needs to get more subjects or better measurements). There
may be other reasons but I think one needs to examine the data
in order to figure out (one explanation is that it is just a Type I
error).

Rich, how would you explain what happens in (3) above?

-Mike Palij
New York University
mp26@nyu.edu

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: significant F change, but nonsignificant regression model overall

Mike
In reply to this post by Bruce Weaver
Bruce,

A few points:

(1) The OP said the following:

|For Model 2 where he entered the 4 covariates on Step 1 and
|the 2 variables he is most interested in on Step 2, the Multiple R
|is .6 and F test is still not significant. But a priori, he was most
|interested in the 2 variables entered on Step 2 - and this is
|where the F change is significant.  One of the two variables is
|significant on Step 2.

In your example below both X5 and X6 are significantly related
to the dep var but to make it really relevant only one of these
should be significant.  This may or may not make a difference,
depending upon the constraints the data put on the range of
allowable values.

(2)  I don't like the second method of entering the nuisance
variables after the critical variables, I still think that it is a foolish
thing to do so because (a) it adds no useful information and (b) make
the model nonsignificant -- the model with only X5 and X6
is clearly better.  As for the significant increase in R^2, I suggest
you look at the difference between adjusted R^2 -- that should
be much smaller because of the penalty of having 6 predictors
in the second model.

(3)  The goals of doing an ANCOVA have traditionally been
(a) reduce the error variance by removing the variance in it that is
associated with the covariate (no association, no reduction in
error variance, thus no point in keeping the covariates) and
(b) if the groups in the ANOVA have different means on the
covariate, the ANCOVA adjusts the means to compensate for
difference on the covariates.  If one has a copy of Howell's 7th ed
Stat Methods for Psych, the material on pages 598-609.  Both
of these require the covariates to be entered first (indeed, in
ANCOVA terms, entering the covariates after the regular
ANOVA would be bizarre).  In the ANCOVA context, keeping
nonsignificant covariates makes no sense.

-Mike Palij
New York University
[hidden email]



----- Original Message -----
From: "Bruce Weaver" <[hidden email]>
To: <[hidden email]>
Sent: Wednesday, March 30, 2011 3:35 PM
Subject: Re: significant F change, but nonsignificant regression model overall


> After a few tries, I mimicked this result (more or less) with some randomly
> generated data and 60 cases.
>
> * Generate data .
> * X1 to X6 are random numbers.
> * Only X5 and X6 are related to Y.
>
> numeric Y x1 to x6 (f8.2).
> do repeat x = x1 to x6.
> -  compute x = rv.normal(50,10).
> end repeat.
> compute Y = 50 + .2*x5 + .4*x6 + rv.normal(0,15).
> exe.
>
> REGRESSION
>  /STATISTICS COEFF OUTS R ANOVA CHANGE
>  /DEPENDENT Y
>  /METHOD=ENTER x1 to x4
>  /METHOD=ENTER x5 x6.
>
> Model 1:  R-sq = 0.027, F(4, 55) = .388, p = .817
> Model 2:  R-sq = 0.186, F(6, 53) = 2.014, p = .080
> Change in R-sq = 0.158, F(2, 53) = 5.15, p = .009
>
> When the goal is to control for potential confounders, one sometimes sees
> the steps reversed, with the variable (or variables) of main interest
> entered first, and the potential confounders added on the next step.  This
> is commonly done with logistic regression, for example, where crude and
> adjusted odds ratios are reported (from models 1 and 2 respectively).  For
> the data above, here's what I get when I do it that way:
>
> Model 1:  R-sq = 0.148, F(2, 57) = 4.939, p = .011
> Model 2:  R-sq = 0.186, F(6, 53) = 2.014, p = .080
> Change in R-sq = 0.038, F(4, 53) = 0.618, p = .652
>
> Even though the change in R-sq is clearly not significant, I like to compare
> (via the eyeball test) the coefficients for X5 and X6 in the two models.  If
> there is no confounding, then the values should be pretty similar in the two
> models.
>
> Model   Variable     B      SE      p
> 1          X5         .448   .190   .022
>            X6        .432    .202   .036
> 2          X5        .522    .210   .016
>            X6        .459    .210   .033
>
>
> Mike, would you be any happier with this second approach to the analysis?
>
> Cheers,
> Bruce
>
>
>
> Mike Palij wrote:
>>
>> On Tuesday, March 29, 2011 11:36 pm, Rich Ulrich wrote:
>> &gt;
>> &gt; Mike,
>> &gt; You seem to have missed the comment,
>> &gt;
>> &gt;&gt;&gt;&gt; He has entered four variables to control for on
>> &gt;&gt;&gt;&gt; the first step, and then two other predictors on the 2nd
>> step. So
>> &gt;&gt;&gt;&gt; we're trying to see if these two predictors are
>> significant above and
>> &gt;&gt;&gt;&gt; beyond the four variables we are controlling for on the
>> first step of
>> &gt;&gt;&gt;&gt; the regression.
>>
>> No, I didn't miss this comment.  Let's review what we might know about
>> the situation (at least from my perspective):
>>
>> (1) The analyst is doing setwise regression, comparable to an ANCOVA,
>> entering 4 variables/covariates as the first set.  As mentioned elsewhere,
>> these covariates are NOT significantly related to the dependent variable.
>> This implies that the multiple correlation and its squared version are
>> zero,
>> or R1=0.00.  One could, I think, legitimately ask why did one continue to
>> use these as covariates or keep them in the model when the second set
>> was entered -- one argument could be based on the expectation that
>> there is a supressor relationship among the predictors but until we hear
>> from the person who actually ran the analysis, I don't believe this was
>> the strategy.
>>
>> (2) After the second set of predictors were entered there still was NO
>> significant relationship between the predictors and the dependent
>> variable.
>> So, for this model R and R^2 are both equal to zero or R2=0.00
>>
>> (3) There is a &quot;significant increase in R^2&quot; (F change) when the
>> second
>> set of predictors was entered.  This has me puzzled.  It is not clear to
>> me why or how this could occur.  If R1(set 1/model 1)=0.00 and
>> R2(set 2/model 2)=0.00, then why would R2-R1 != 0.00?  I suspect
>> that maybe there really is a pattern of relationships present but that
>> there is insufficient statistical power to detect them (the researcher
>> either needs to get more subjects or better measurements). There
>> may be other reasons but I think one needs to examine the data
>> in order to figure out (one explanation is that it is just a Type I
>> error).
>>
>> Rich, how would you explain what happens in (3) above?
>>
>> -Mike Palij
>> New York University
>> [hidden email]
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/significant-F-change-but-nonsignificant-regression-model-overall-tp4269810p4272153.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: significant F change, but nonsignificant regression model overall

Bruce Weaver
Administrator
Hi Mike.  I don't have time to address all of your points right now, here are a couple quick comments.

Re your point 1 below, I didn't have enough patience to keep fiddling with it until I got a data set that met that condition.  I suspect that one could concoct a data set meeting that condition that still shows the same general pattern of results though.

Re your point 2, I should have added that when people use this approach (key variables first, then nuisance variables), they have the option of reverting to the simpler model (and often do) if the nuisance variables add nothing useful.  One thing to consider is whether the change in R-sq is statistically significant; but as I mentioned, I also like to check that the coefficients for the key variables are not too different in the too models as well.  If they are, it suggests something a bit fishy might be going on.

But in some cases, of course, adding the nuisance variables does result in improved fit of the model, and possibly changes in the coefficients for the key variables.  So in that case, one would obviously stick with the more complex model.

As you've probably seen by now, I posted another message a while ago that has the adjusted R-sq values for my example with random data.  And as you probably expected, Adj R-sq for model 2 is lower than for model 1 when the nuisance variables are added second.  This is another pretty good sign that one should consider reverting to Model 1.

Cheers,
Bruce

p.s. - Here are the data for my random number example, in case anyone wants to play around with it.

   ID        Y       x1       x2       x3       x4       x5       x6
    1    68.87    48.79    65.52    32.83    46.57    49.68    50.41
    2    77.83    37.03    59.01    50.83    58.44    67.38    65.47
    3    74.96    63.20    62.50    44.59    31.19    71.42    58.67
    4    66.83    37.10    52.60    58.99    55.92    63.61    52.10
    5    89.31    53.34    55.78    42.63    57.21    61.94    40.51
    6    88.92    42.98    41.55    49.90    52.92    54.90    32.59
    7    84.52    45.10    43.88    67.67    61.20    65.58    25.97
    8    71.03    51.50    59.75    40.35    53.97    27.20    57.42
    9    90.30    35.67    42.71    47.18    48.81    55.29    57.62
   10    82.03    54.73    40.85    49.57    64.83    36.19    51.69
   11    74.90    50.93    52.39    44.54    45.33    53.13    47.99
   12    81.51    49.38    53.75    49.38    39.24    46.70    40.37
   13    98.03    37.70    43.40    49.28    49.51    43.83    44.92
   14    97.00    48.67    49.31    40.79    47.47    48.79    47.49
   15    68.77    49.13    65.20    55.54    67.96    52.64    57.70
   16    89.69    52.83    45.73    55.60    46.59    48.89    37.61
   17    41.05    62.22    36.89    31.77    49.43    36.90    32.90
   18    69.90    54.74    36.74    63.09    75.08    42.78    51.80
   19    72.18    49.39    55.51    35.44    54.24    60.74    41.36
   20    71.35    53.96    36.54    22.17    48.72    47.93    32.03
   21    67.55    50.33    39.52    49.40    47.92    46.42    49.87
   22    57.96    52.97    42.08    61.42    47.01    42.43    52.60
   23    72.59    52.14    51.45    48.08    43.25    50.97    54.75
   24    79.43    42.07    34.99    28.99    75.75    33.72    45.27
   25    98.00    48.57    30.43    46.96    39.50    44.50    47.91
   26    92.54    32.80    39.10    53.62    50.50    43.57    57.61
   27    69.40    57.39    67.77    68.30    49.33    55.33    66.43
   28    69.23    37.75    56.03    64.98    46.18    57.15    42.81
   29    60.53    53.97    47.93    48.30    49.18    39.33    49.69
   30    45.42    42.87    54.18    50.04    37.56    46.02    39.47
   31    75.66    71.98    45.94    57.29    35.81    40.17    49.39
   32    87.42    37.58    40.88    52.57    27.52    35.19    57.27
   33    69.26    40.32    63.45    56.72    55.60    50.81    48.60
   34    67.01    56.82    50.11    41.32    57.04    39.51    54.33
   35   108.41    62.20    54.69    54.91    62.37    57.80    55.15
   36    89.07    43.66    56.98    38.51    45.55    51.19    64.90
   37    91.75    73.27    48.97    58.70    46.24    60.23    52.54
   38    79.19    53.14    52.16    35.82    53.97    67.37    41.93
   39    93.08    45.82    60.89    44.59    51.37    64.52    54.42
   40    25.68    48.56    38.87    51.27    43.72    54.75    41.63
   41    76.89    43.52    45.51    32.75    45.15    49.65    44.21
   42    87.52    59.25    52.09    57.91    52.07    64.11    43.07
   43    91.29    46.26    35.32    46.97    54.77    55.38    80.68
   44    75.65    37.66    38.25    52.39    44.49    53.13    55.76
   45    79.83    32.62    75.66    49.90    56.71    56.68    54.74
   46    93.66    48.84    59.99    39.71    38.28    38.93    68.83
   47    77.16    33.17    38.94    53.12    30.47    40.20    61.00
   48    78.99    57.50    59.34    54.62    35.23    46.06    40.72
   49    62.73    51.51    48.49    70.91    36.68    40.46    43.22
   50    85.68    21.41    38.36    62.87    27.46    40.93    56.01
   51    62.48    46.58    67.54    47.85    33.46    39.55    45.70
   52    61.28    47.16    50.70    37.73    60.64    43.36    55.58
   53   107.35    44.61    39.74    60.34    49.34    50.16    52.04
   54    65.51    55.57    40.58    44.00    50.03    55.89    54.24
   55    85.11    51.98    51.38    46.19    36.61    36.27    64.20
   56    91.36    42.71    60.44    66.88    48.58    62.01    53.99
   57    89.99    68.91    48.34    49.55    57.85    66.75    53.52
   58    55.38    40.13    50.45    41.90    53.80    41.21    41.20
   59   117.70    40.08    49.32    50.25    54.41    72.35    51.54
   60    67.47    55.77    55.62    52.25    47.86    47.63    41.52


Mike Palij wrote
Bruce,

A few points:

(1) The OP said the following:

|For Model 2 where he entered the 4 covariates on Step 1 and
|the 2 variables he is most interested in on Step 2, the Multiple R
|is .6 and F test is still not significant. But a priori, he was most
|interested in the 2 variables entered on Step 2 - and this is
|where the F change is significant.  One of the two variables is
|significant on Step 2.

In your example below both X5 and X6 are significantly related
to the dep var but to make it really relevant only one of these
should be significant.  This may or may not make a difference,
depending upon the constraints the data put on the range of
allowable values.

(2)  I don't like the second method of entering the nuisance
variables after the critical variables, I still think that it is a foolish
thing to do so because (a) it adds no useful information and (b) make
the model nonsignificant -- the model with only X5 and X6
is clearly better.  As for the significant increase in R^2, I suggest
you look at the difference between adjusted R^2 -- that should
be much smaller because of the penalty of having 6 predictors
in the second model.

(3)  The goals of doing an ANCOVA have traditionally been
(a) reduce the error variance by removing the variance in it that is
associated with the covariate (no association, no reduction in
error variance, thus no point in keeping the covariates) and
(b) if the groups in the ANOVA have different means on the
covariate, the ANCOVA adjusts the means to compensate for
difference on the covariates.  If one has a copy of Howell's 7th ed
Stat Methods for Psych, the material on pages 598-609.  Both
of these require the covariates to be entered first (indeed, in
ANCOVA terms, entering the covariates after the regular
ANOVA would be bizarre).  In the ANCOVA context, keeping
nonsignificant covariates makes no sense.

-Mike Palij
New York University
[hidden email]



----- Original Message -----
From: "Bruce Weaver" <[hidden email]>
To: <[hidden email]>
Sent: Wednesday, March 30, 2011 3:35 PM
Subject: Re: significant F change, but nonsignificant regression model overall


> After a few tries, I mimicked this result (more or less) with some randomly
> generated data and 60 cases.
>
> * Generate data .
> * X1 to X6 are random numbers.
> * Only X5 and X6 are related to Y.
>
> numeric Y x1 to x6 (f8.2).
> do repeat x = x1 to x6.
> -  compute x = rv.normal(50,10).
> end repeat.
> compute Y = 50 + .2*x5 + .4*x6 + rv.normal(0,15).
> exe.
>
> REGRESSION
>  /STATISTICS COEFF OUTS R ANOVA CHANGE
>  /DEPENDENT Y
>  /METHOD=ENTER x1 to x4
>  /METHOD=ENTER x5 x6.
>
> Model 1:  R-sq = 0.027, F(4, 55) = .388, p = .817
> Model 2:  R-sq = 0.186, F(6, 53) = 2.014, p = .080
> Change in R-sq = 0.158, F(2, 53) = 5.15, p = .009
>
> When the goal is to control for potential confounders, one sometimes sees
> the steps reversed, with the variable (or variables) of main interest
> entered first, and the potential confounders added on the next step.  This
> is commonly done with logistic regression, for example, where crude and
> adjusted odds ratios are reported (from models 1 and 2 respectively).  For
> the data above, here's what I get when I do it that way:
>
> Model 1:  R-sq = 0.148, F(2, 57) = 4.939, p = .011
> Model 2:  R-sq = 0.186, F(6, 53) = 2.014, p = .080
> Change in R-sq = 0.038, F(4, 53) = 0.618, p = .652
>
> Even though the change in R-sq is clearly not significant, I like to compare
> (via the eyeball test) the coefficients for X5 and X6 in the two models.  If
> there is no confounding, then the values should be pretty similar in the two
> models.
>
> Model   Variable     B      SE      p
> 1          X5         .448   .190   .022
>            X6        .432    .202   .036
> 2          X5        .522    .210   .016
>            X6        .459    .210   .033
>
>
> Mike, would you be any happier with this second approach to the analysis?
>
> Cheers,
> Bruce
>
>
>
> Mike Palij wrote:
>>
>> On Tuesday, March 29, 2011 11:36 pm, Rich Ulrich wrote:
>> >
>> > Mike,
>> > You seem to have missed the comment,
>> >
>> >>>> He has entered four variables to control for on
>> >>>> the first step, and then two other predictors on the 2nd
>> step. So
>> >>>> we're trying to see if these two predictors are
>> significant above and
>> >>>> beyond the four variables we are controlling for on the
>> first step of
>> >>>> the regression.
>>
>> No, I didn't miss this comment.  Let's review what we might know about
>> the situation (at least from my perspective):
>>
>> (1) The analyst is doing setwise regression, comparable to an ANCOVA,
>> entering 4 variables/covariates as the first set.  As mentioned elsewhere,
>> these covariates are NOT significantly related to the dependent variable.
>> This implies that the multiple correlation and its squared version are
>> zero,
>> or R1=0.00.  One could, I think, legitimately ask why did one continue to
>> use these as covariates or keep them in the model when the second set
>> was entered -- one argument could be based on the expectation that
>> there is a supressor relationship among the predictors but until we hear
>> from the person who actually ran the analysis, I don't believe this was
>> the strategy.
>>
>> (2) After the second set of predictors were entered there still was NO
>> significant relationship between the predictors and the dependent
>> variable.
>> So, for this model R and R^2 are both equal to zero or R2=0.00
>>
>> (3) There is a "significant increase in R^2" (F change) when the
>> second
>> set of predictors was entered.  This has me puzzled.  It is not clear to
>> me why or how this could occur.  If R1(set 1/model 1)=0.00 and
>> R2(set 2/model 2)=0.00, then why would R2-R1 != 0.00?  I suspect
>> that maybe there really is a pattern of relationships present but that
>> there is insufficient statistical power to detect them (the researcher
>> either needs to get more subjects or better measurements). There
>> may be other reasons but I think one needs to examine the data
>> in order to figure out (one explanation is that it is just a Type I
>> error).
>>
>> Rich, how would you explain what happens in (3) above?
>>
>> -Mike Palij
>> New York University
>> [hidden email]
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/significant-F-change-but-nonsignificant-regression-model-overall-tp4269810p4272153.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
12