Re: significant F change, but nonsignificant regression model overall

Posted by Mike on
URL: http://spssx-discussion.165.s1.nabble.com/significant-F-change-but-nonsignificant-regression-model-overall-tp4269810p4273376.html

On Wednesday, March 30, 2011 11:08 PM, Rich Ulrich wrote:

>Mike Palij had written in response to Ulrich's earlier post:
>>Rich Ulrich wrote:
>> >Mike,
>> >No, they are not "doing setwise regression", whatever that new
>> >phrase means, if that is what you intended.
>>
>> That "new phrase" can be found in Cohen and Cohen (1975) in their
>> Chapter 4 "Sets of Independent Variables". Of particular relevance
>> is section 4.2 "The simultaneous and hierarchical models for sets".
>> What you and the OP described was a hierarchical or sequential
>> setwise regression analysis.
>
> Fine.  I would not have stumbled over the phrase, if you had not
> continued on so differently, with an explicit description of "stepwise"
> that expects decreasing contributions of the next variables.

Rich, where do I use the term "stepwise" in my original post or even
refer to a process that relies on the entry or removal of variables in/out
of a regression on the basis of some criterion that is typically used
in stepwise procedures? Here's what I said:

[contents of previous post by Mike P]
|No, I didn't miss this comment.  Let's review what we might know about
|the situation (at least from my perspective):
|
|(1) The analyst is doing setwise regression, comparable to an ANCOVA,
|entering 4 variables/covariates as the first set.  As mentioned elsewhere,
|these covariates are NOT significantly related to the dependent variable.
|This implies that the multiple correlation and its squared version are zero,
|or R1=0.00.  One could, I think, legitimately ask why did one continue to
|use these as covariates or keep them in the model when the second set
|was entered -- one argument could be based on the expectation that
|there is a supressor relationship among the predictors but until we hear
|from the person who actually ran the analysis, I don't believe this was
|the strategy.
|
|(2) After the second set of predictors were entered there still was NO
|significant relationship between the predictors and the dependent variable.
|So, for this model R and R^2 are both equal to zero or R2=0.00
|
|(3) There is a "significant increase in R^2" (F change) when the second
|set of predictors was entered.  This has me puzzled.  It is not clear to
|me why or how this could occur.  If R1(set 1/model 1)=0.00 and
|R2(set 2/model 2)=0.00, then why would R2-R1 != 0.00?  I suspect
|that maybe there really is a pattern of relationships present but that
|there is insufficient statistical power to detect them (the researcher
|either needs to get more subjects or better measurements). There
|may be other reasons but I think one needs to examine the data
|in order to figure out (one explanation is that it is just a Type I
|error).
|
|Rich, how would you explain what happens in (3) above?

> Cohen & Cohen is a book I own, I've read, and I've recommended
> multiple times. Based on your comments here, and discussion in later
> posts, we are now discussing the same model. But you were way
> off, in what I responded to.

You took issue with my use of the term "setwise" regression and
then went on to confuse it with stepwise procedures and you're
telling me that I'm "way off"?

>> As for your description of the analysis, do you really keep variables
>> that don't provide any useful information in the equation?
>
> Yes.  In my area (research in psychiatry), when the prescribed testing
> controls for several variables, that is what is ordinarily reported --
> especially if there is discernible difference in outcomes.  Sometimes
> the coefficients vary a tad, even for "nonsignificant" nuisance
> covariates.  Depending on the circumstances, it is sometimes acceptable
> to report the simpler equation; considering that option raises the risk
> or suspicion of cherry-picking of results.

First, if you check for my name in PubMed, you'll that I have been
involved in psychiatric research, so I'm somewhat familiar with what
gets done there.  Second, you simply do not make sense in what you
say above.  The point behind ANCOVA or the use of covariates in
multiple regression is to reduce the error variance by identifying variables
that are systematically related to the dependent variable and/or to
adjust for differences among groups on the covariates -- see Howell's
presentation on pages 598-621 in his 7th ed Stat Methods for Psych.
I agree that if a group of covariates are used, you should report whether
they are significant or not, but then you go on to use only those that
are significant in the subsequent stages of analysis.

>      I hope you
>> report shrunken or adjusted R^2 when you report your results because
>> they should be considerably smaller than R^2 as a result of the additional
>> useless predictors. It should give a person pause.
>
> For 2 variables with 75 subjects, the reduction is not large.  Of course,
> the effect for 6 variables is larger, but that R^2 is clearly of no interest.

Perhaps you missed Bruce Weaver's simulated data example?  First,
if you enter the covariates first and, though not significant, retain them
in the complete or simultaneous model, he got this:

|METHOD 1:  Entering the 4 confounders first, then the 2 variables of
|interest
|Model 1:  R-sq = 0.027, F(4, 55) = .388, p = .817,  Adj R-sq = -.043
|Model 2:  R-sq = 0.186, F(6, 53) = 2.014, p = .080,  Adj R-sq =  .093
|Change in R-sq = 0.158, F(2, 53) = 5.15, p = .009

Note that the model including the nonsignificant covariates produces
a NEGATIVE adjusted-R-sq.  Including the covariates into the model
raises the Adj-R-sq but consider what happens if one enter the two
variables of interest first:

|METHOD 2:  Entering the 2 variables of interest first, then the 4
|confounders
|Model 1:  R-sq = 0.148, F(2, 57) = 4.939, p = .011, Adj R-sq = .118
|Model 2:  R-sq = 0.186, F(6, 53) = 2.014, p = .080, Adj R-sq =  .093
|Change in R-sq = 0.038, F(4, 53) = 0.618, p = .652

On the basis of ordinary R-sq the full model appears to be the better
model but one has to remember that as the number of predictor/IVs
increases in a regression equation, so will R-sq which is one reason
why one focuses on the adjusted-R-sq. So, with just the two variables
of interest, the adjusted-R-sq= .118.  Add in the nonsignificant covariates
increases the regular R-sq but REDUCES the adjusted-R-sq.  This would
have become apparent if one had left out the nonsignificant covariates
in Method 1.  Their inclusion confuses the matter and may lead one
to think that the full model is better while in fact the reduced model
with just 2 predictors/IVs is the best model.

I don't doubt that you may be doing things consistent with what
others may have done in terms of analysis but think about it, what
is the justification for it?  Does this practice enlighten or confuse?

-Mike Palij
New York University
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD