significant F change, but nonsignificant regression model overall

classic Classic list List threaded Threaded
24 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: significant F change, but nonsignificant regression model overall

Mike
On Wednesday, March 30, 2011 5:58 PM, Bruce Weaver wrote:
> Hi Mike.  I don't have time to address all of your points right now, here are
> a couple quick comments.

No problem.  I hear that there's a life beyond this mailing list. ;-)

> Re your point 1 below, I didn't have enough patience to keep fiddling with
> it until I got a data set that met that condition.  I suspect that one could
> concoct a data set meeting that condition that still shows the same general
> pattern of results though.

I think you did great is coming up with the simulated dataset.  The
real issue is what does the actual dataset show.  I have a feeling it's
not a pretty picture.

> Re your point 2, I should have added that when people use this approach (key
> variables first, then nuisance variables), they have the option of reverting
> to the simpler model (and often do) if the nuisance variables add nothing
> useful.  One thing to consider is whether the change in R-sq is
> statistically significant; but as I mentioned, I also like to check that the
> coefficients for the key variables are not too different in the too models
> as well.  If they are, it suggests something a bit fishy might be going on.
>
> But in some cases, of course, adding the nuisance variables does result in
> improved fit of the model, and possibly changes in the coefficients for the
> key variables.  So in that case, one would obviously stick with the more
> complex model.

Once upon a time when I was into doing logistic regression, I remember
that such things were done but that it was the opposite of what one would
do in an ANOVA/ANCOVA framework.  All I can remember is that
it worked even though I wasn't completely happy about that (I remember
figuring out why that should happen but I've forgotten that, like I've
forgotten how to do certain proofs off the top of my head).

> As you've probably seen by now, I posted another message a while ago that
> has the adjusted R-sq values for my example with random data.  And as you
> probably expected, Adj R-sq for model 2 is lower than for model 1 when the
> nuisance variables are added second.  This is another pretty good sign that
> one should consider reverting to Model 1.

Yeah, the Adj-R-sqs were pretty ugly.  I did a double-take in your
analysis where you had a negative Adj-R-sq.  Nothing says pathological
case like a negative Adj-R-sq.  In the second method, I copy the
results below:

Model 1:  R-sq = 0.148, F(2, 57) = 4.939, p = .011   Adj R-sq = .118
Model 2:  R-sq = 0.186, F(6, 53) = 2.014, p = .080   Adj R-sq =  .093

Just looking at the R-sq, one would think that adding the nonsignificant
nuisance variables increases R-sq but one has to remember that the value
of R increases with the number of predictors in the equation.  The Adj-R-sq
for the full model clearly indicates that (a) the additional variables are
impairing the model and (b) one should be cautious in relying on plain R-sq.

-Mike Palij
New York University
[hidden email]

> Cheers,
> Bruce
>
> p.s. - Here are the data for my random number example, in case anyone wants
> to play around with it.
>
>   ID        Y       x1       x2       x3       x4       x5       x6
>    1    68.87    48.79    65.52    32.83    46.57    49.68    50.41
>    2    77.83    37.03    59.01    50.83    58.44    67.38    65.47
>    3    74.96    63.20    62.50    44.59    31.19    71.42    58.67
>    4    66.83    37.10    52.60    58.99    55.92    63.61    52.10
>    5    89.31    53.34    55.78    42.63    57.21    61.94    40.51
>    6    88.92    42.98    41.55    49.90    52.92    54.90    32.59
>    7    84.52    45.10    43.88    67.67    61.20    65.58    25.97
>    8    71.03    51.50    59.75    40.35    53.97    27.20    57.42
>    9    90.30    35.67    42.71    47.18    48.81    55.29    57.62
>   10    82.03    54.73    40.85    49.57    64.83    36.19    51.69
>   11    74.90    50.93    52.39    44.54    45.33    53.13    47.99
>   12    81.51    49.38    53.75    49.38    39.24    46.70    40.37
>   13    98.03    37.70    43.40    49.28    49.51    43.83    44.92
>   14    97.00    48.67    49.31    40.79    47.47    48.79    47.49
>   15    68.77    49.13    65.20    55.54    67.96    52.64    57.70
>   16    89.69    52.83    45.73    55.60    46.59    48.89    37.61
>   17    41.05    62.22    36.89    31.77    49.43    36.90    32.90
>   18    69.90    54.74    36.74    63.09    75.08    42.78    51.80
>   19    72.18    49.39    55.51    35.44    54.24    60.74    41.36
>   20    71.35    53.96    36.54    22.17    48.72    47.93    32.03
>   21    67.55    50.33    39.52    49.40    47.92    46.42    49.87
>   22    57.96    52.97    42.08    61.42    47.01    42.43    52.60
>   23    72.59    52.14    51.45    48.08    43.25    50.97    54.75
>   24    79.43    42.07    34.99    28.99    75.75    33.72    45.27
>   25    98.00    48.57    30.43    46.96    39.50    44.50    47.91
>   26    92.54    32.80    39.10    53.62    50.50    43.57    57.61
>   27    69.40    57.39    67.77    68.30    49.33    55.33    66.43
>   28    69.23    37.75    56.03    64.98    46.18    57.15    42.81
>   29    60.53    53.97    47.93    48.30    49.18    39.33    49.69
>   30    45.42    42.87    54.18    50.04    37.56    46.02    39.47
>   31    75.66    71.98    45.94    57.29    35.81    40.17    49.39
>   32    87.42    37.58    40.88    52.57    27.52    35.19    57.27
>   33    69.26    40.32    63.45    56.72    55.60    50.81    48.60
>   34    67.01    56.82    50.11    41.32    57.04    39.51    54.33
>   35   108.41    62.20    54.69    54.91    62.37    57.80    55.15
>   36    89.07    43.66    56.98    38.51    45.55    51.19    64.90
>   37    91.75    73.27    48.97    58.70    46.24    60.23    52.54
>   38    79.19    53.14    52.16    35.82    53.97    67.37    41.93
>   39    93.08    45.82    60.89    44.59    51.37    64.52    54.42
>   40    25.68    48.56    38.87    51.27    43.72    54.75    41.63
>   41    76.89    43.52    45.51    32.75    45.15    49.65    44.21
>   42    87.52    59.25    52.09    57.91    52.07    64.11    43.07
>   43    91.29    46.26    35.32    46.97    54.77    55.38    80.68
>   44    75.65    37.66    38.25    52.39    44.49    53.13    55.76
>   45    79.83    32.62    75.66    49.90    56.71    56.68    54.74
>   46    93.66    48.84    59.99    39.71    38.28    38.93    68.83
>   47    77.16    33.17    38.94    53.12    30.47    40.20    61.00
>   48    78.99    57.50    59.34    54.62    35.23    46.06    40.72
>   49    62.73    51.51    48.49    70.91    36.68    40.46    43.22
>   50    85.68    21.41    38.36    62.87    27.46    40.93    56.01
>   51    62.48    46.58    67.54    47.85    33.46    39.55    45.70
>   52    61.28    47.16    50.70    37.73    60.64    43.36    55.58
>   53   107.35    44.61    39.74    60.34    49.34    50.16    52.04
>   54    65.51    55.57    40.58    44.00    50.03    55.89    54.24
>   55    85.11    51.98    51.38    46.19    36.61    36.27    64.20
>   56    91.36    42.71    60.44    66.88    48.58    62.01    53.99
>   57    89.99    68.91    48.34    49.55    57.85    66.75    53.52
>   58    55.38    40.13    50.45    41.90    53.80    41.21    41.20
>   59   117.70    40.08    49.32    50.25    54.41    72.35    51.54
>   60    67.47    55.77    55.62    52.25    47.86    47.63    41.52
>
>
>
> Mike Palij wrote:
>>
>> Bruce,
>>
>> A few points:
>>
>> (1) The OP said the following:
>>
>> |For Model 2 where he entered the 4 covariates on Step 1 and
>> |the 2 variables he is most interested in on Step 2, the Multiple R
>> |is .6 and F test is still not significant. But a priori, he was most
>> |interested in the 2 variables entered on Step 2 - and this is
>> |where the F change is significant.  One of the two variables is
>> |significant on Step 2.
>>
>> In your example below both X5 and X6 are significantly related
>> to the dep var but to make it really relevant only one of these
>> should be significant.  This may or may not make a difference,
>> depending upon the constraints the data put on the range of
>> allowable values.
>>
>> (2)  I don't like the second method of entering the nuisance
>> variables after the critical variables, I still think that it is a foolish
>> thing to do so because (a) it adds no useful information and (b) make
>> the model nonsignificant -- the model with only X5 and X6
>> is clearly better.  As for the significant increase in R^2, I suggest
>> you look at the difference between adjusted R^2 -- that should
>> be much smaller because of the penalty of having 6 predictors
>> in the second model.
>>
>> (3)  The goals of doing an ANCOVA have traditionally been
>> (a) reduce the error variance by removing the variance in it that is
>> associated with the covariate (no association, no reduction in
>> error variance, thus no point in keeping the covariates) and
>> (b) if the groups in the ANOVA have different means on the
>> covariate, the ANCOVA adjusts the means to compensate for
>> difference on the covariates.  If one has a copy of Howell's 7th ed
>> Stat Methods for Psych, the material on pages 598-609.  Both
>> of these require the covariates to be entered first (indeed, in
>> ANCOVA terms, entering the covariates after the regular
>> ANOVA would be bizarre).  In the ANCOVA context, keeping
>> nonsignificant covariates makes no sense.
>>
>> -Mike Palij
>> New York University
>> [hidden email]
>>
>>
>>
>> ----- Original Message -----
>> From: "Bruce Weaver" <[hidden email]>
>> To: <[hidden email]>
>> Sent: Wednesday, March 30, 2011 3:35 PM
>> Subject: Re: significant F change, but nonsignificant regression model
>> overall
>>
>>
>>> After a few tries, I mimicked this result (more or less) with some
>>> randomly
>>> generated data and 60 cases.
>>>
>>> * Generate data .
>>> * X1 to X6 are random numbers.
>>> * Only X5 and X6 are related to Y.
>>>
>>> numeric Y x1 to x6 (f8.2).
>>> do repeat x = x1 to x6.
>>> -  compute x = rv.normal(50,10).
>>> end repeat.
>>> compute Y = 50 + .2*x5 + .4*x6 + rv.normal(0,15).
>>> exe.
>>>
>>> REGRESSION
>>>  /STATISTICS COEFF OUTS R ANOVA CHANGE
>>>  /DEPENDENT Y
>>>  /METHOD=ENTER x1 to x4
>>>  /METHOD=ENTER x5 x6.
>>>
>>> Model 1:  R-sq = 0.027, F(4, 55) = .388, p = .817
>>> Model 2:  R-sq = 0.186, F(6, 53) = 2.014, p = .080
>>> Change in R-sq = 0.158, F(2, 53) = 5.15, p = .009
>>>
>>> When the goal is to control for potential confounders, one sometimes sees
>>> the steps reversed, with the variable (or variables) of main interest
>>> entered first, and the potential confounders added on the next step.
>>> This
>>> is commonly done with logistic regression, for example, where crude and
>>> adjusted odds ratios are reported (from models 1 and 2 respectively).
>>> For
>>> the data above, here's what I get when I do it that way:
>>>
>>> Model 1:  R-sq = 0.148, F(2, 57) = 4.939, p = .011
>>> Model 2:  R-sq = 0.186, F(6, 53) = 2.014, p = .080
>>> Change in R-sq = 0.038, F(4, 53) = 0.618, p = .652
>>>
>>> Even though the change in R-sq is clearly not significant, I like to
>>> compare
>>> (via the eyeball test) the coefficients for X5 and X6 in the two models.
>>> If
>>> there is no confounding, then the values should be pretty similar in the
>>> two
>>> models.
>>>
>>> Model   Variable     B      SE      p
>>> 1          X5         .448   .190   .022
>>>            X6        .432    .202   .036
>>> 2          X5        .522    .210   .016
>>>            X6        .459    .210   .033
>>>
>>>
>>> Mike, would you be any happier with this second approach to the analysis?
>>>
>>> Cheers,
>>> Bruce
>>>
>>>
>>>
>>> Mike Palij wrote:
>>>>
>>>> On Tuesday, March 29, 2011 11:36 pm, Rich Ulrich wrote:
>>>> >
>>>> > Mike,
>>>> > You seem to have missed the comment,
>>>> >
>>>> >>>> He has entered four variables to control for on
>>>> >>>> the first step, and then two other predictors on the
>>>> 2nd
>>>> step. So
>>>> >>>> we're trying to see if these two predictors are
>>>> significant above and
>>>> >>>> beyond the four variables we are controlling for on the
>>>> first step of
>>>> >>>> the regression.
>>>>
>>>> No, I didn't miss this comment.  Let's review what we might know about
>>>> the situation (at least from my perspective):
>>>>
>>>> (1) The analyst is doing setwise regression, comparable to an ANCOVA,
>>>> entering 4 variables/covariates as the first set.  As mentioned
>>>> elsewhere,
>>>> these covariates are NOT significantly related to the dependent
>>>> variable.
>>>> This implies that the multiple correlation and its squared version are
>>>> zero,
>>>> or R1=0.00.  One could, I think, legitimately ask why did one continue
>>>> to
>>>> use these as covariates or keep them in the model when the second set
>>>> was entered -- one argument could be based on the expectation that
>>>> there is a supressor relationship among the predictors but until we hear
>>>> from the person who actually ran the analysis, I don't believe this was
>>>> the strategy.
>>>>
>>>> (2) After the second set of predictors were entered there still was NO
>>>> significant relationship between the predictors and the dependent
>>>> variable.
>>>> So, for this model R and R^2 are both equal to zero or R2=0.00
>>>>
>>>> (3) There is a "significant increase in R^2" (F change) when
>>>> the
>>>> second
>>>> set of predictors was entered.  This has me puzzled.  It is not clear to
>>>> me why or how this could occur.  If R1(set 1/model 1)=0.00 and
>>>> R2(set 2/model 2)=0.00, then why would R2-R1 != 0.00?  I suspect
>>>> that maybe there really is a pattern of relationships present but that
>>>> there is insufficient statistical power to detect them (the researcher
>>>> either needs to get more subjects or better measurements). There
>>>> may be other reasons but I think one needs to examine the data
>>>> in order to figure out (one explanation is that it is just a Type I
>>>> error).
>>>>
>>>> Rich, how would you explain what happens in (3) above?
>>>>
>>>> -Mike Palij
>>>> New York University
>>>> [hidden email]
>>>>
>>>> =====================
>>>> To manage your subscription to SPSSX-L, send a message to
>>>> [hidden email] (not to SPSSX-L), with no body text except the
>>>> command. To leave the list, send the command
>>>> SIGNOFF SPSSX-L
>>>> For a list of commands to manage subscriptions, send the command
>>>> INFO REFCARD
>>>>
>>>
>>>
>>> -----
>>> --
>>> Bruce Weaver
>>> [hidden email]
>>> http://sites.google.com/a/lakeheadu.ca/bweaver/
>>>
>>> "When all else fails, RTFM."
>>>
>>> NOTE: My Hotmail account is not monitored regularly.
>>> To send me an e-mail, please use the address shown above.
>>>
>>> --
>>> View this message in context:
>>> http://spssx-discussion.1045642.n5.nabble.com/significant-F-change-but-nonsignificant-regression-model-overall-tp4269810p4272153.html
>>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>> [hidden email] (not to SPSSX-L), with no body text except the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/significant-F-change-but-nonsignificant-regression-model-overall-tp4269810p4272426.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: significant F change, but nonsignificant regression model overall

Rich Ulrich
In reply to this post by Mike
> Date: Wed, 30 Mar 2011 15:56:30 -0400
> From: [hidden email]
> Subject: Re: significant F change, but nonsignificant regression model overall
> To: [hidden email]
>
> On Wednesday, March 30, 2011 2:55 PM, Rich Ulrich wrote:
> >Mike Palij wrote:
>
> >> No, I didn't miss this comment. Let's review what we might know about
> >> the situation (at least from my perspective):
> >>
> >> (1) The analyst is doing setwise regression, comparable to an ANCOVA,
> >> entering 4 variables/covariates as the first set. As mentioned elsewhere,
> >> these covariates are NOT significantly related to the dependent variable.
> >
> >Mike,
> >No, they are not "doing setwise regression", whatever that new
> >phrase means, if that is what you intended.
>
> That "new phrase" can be found in Cohen and Cohen (1975) in their
> Chapter 4 "Sets of Independent Variables". Of particular relevance
> is section 4.2 "The simultaneous and hierarchical models for sets".
> What you and the OP described was a hierarchical or sequential
> setwise regression analysis.


Fine.  I would not have stumbled over the phrase, if you had not
continued on so differently, with an explicit description of "stepwise"
that expects decreasing contributions of the next variables.  Cohen &
Cohen is a book I own, I've read, and I've recommended multiple times.
Based on your comments here, and discussion in later posts, we are
now discussing the same model. But you were way off, in what I responded to.

                See pp127-144 if you have a copy
> handy. If anything, you should say "whatever that arcane phrase
> means".
>
> As for your description of the analysis, do you really keep variables
> that don't provide any useful information in the equation?

Yes.  In my area (research in psychiatry), when the prescribed testing
controls for several variables, that is what is ordinarily reported --
especially if there is discernible difference in outcomes.  Sometimes
the coefficients vary a tad, even for "nonsignificant" nuisance
covariates.  Depending on the circumstances, it is sometimes acceptable
to report the simpler equation; considering that option raises the risk
or suspicion of cherry-picking of results.


      I hope you
> report shrunken or adjusted R^2 when you report your results because
> they should be considerably smaller than R^2 as a result of the additional
> useless predictors. It should give a person pause.

For 2 variables with 75 subjects, the reduction is not large.  Of course,
the effect for 6 variables is larger, but that R^2 is clearly of no interest.

--
Rich Ulrich



=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: significant F change, but nonsignificant regression model overall

Mike
On Wednesday, March 30, 2011 11:08 PM, Rich Ulrich wrote:

>Mike Palij had written in response to Ulrich's earlier post:
>>Rich Ulrich wrote:
>> >Mike,
>> >No, they are not "doing setwise regression", whatever that new
>> >phrase means, if that is what you intended.
>>
>> That "new phrase" can be found in Cohen and Cohen (1975) in their
>> Chapter 4 "Sets of Independent Variables". Of particular relevance
>> is section 4.2 "The simultaneous and hierarchical models for sets".
>> What you and the OP described was a hierarchical or sequential
>> setwise regression analysis.
>
> Fine.  I would not have stumbled over the phrase, if you had not
> continued on so differently, with an explicit description of "stepwise"
> that expects decreasing contributions of the next variables.

Rich, where do I use the term "stepwise" in my original post or even
refer to a process that relies on the entry or removal of variables in/out
of a regression on the basis of some criterion that is typically used
in stepwise procedures? Here's what I said:

[contents of previous post by Mike P]
|No, I didn't miss this comment.  Let's review what we might know about
|the situation (at least from my perspective):
|
|(1) The analyst is doing setwise regression, comparable to an ANCOVA,
|entering 4 variables/covariates as the first set.  As mentioned elsewhere,
|these covariates are NOT significantly related to the dependent variable.
|This implies that the multiple correlation and its squared version are zero,
|or R1=0.00.  One could, I think, legitimately ask why did one continue to
|use these as covariates or keep them in the model when the second set
|was entered -- one argument could be based on the expectation that
|there is a supressor relationship among the predictors but until we hear
|from the person who actually ran the analysis, I don't believe this was
|the strategy.
|
|(2) After the second set of predictors were entered there still was NO
|significant relationship between the predictors and the dependent variable.
|So, for this model R and R^2 are both equal to zero or R2=0.00
|
|(3) There is a "significant increase in R^2" (F change) when the second
|set of predictors was entered.  This has me puzzled.  It is not clear to
|me why or how this could occur.  If R1(set 1/model 1)=0.00 and
|R2(set 2/model 2)=0.00, then why would R2-R1 != 0.00?  I suspect
|that maybe there really is a pattern of relationships present but that
|there is insufficient statistical power to detect them (the researcher
|either needs to get more subjects or better measurements). There
|may be other reasons but I think one needs to examine the data
|in order to figure out (one explanation is that it is just a Type I
|error).
|
|Rich, how would you explain what happens in (3) above?

> Cohen & Cohen is a book I own, I've read, and I've recommended
> multiple times. Based on your comments here, and discussion in later
> posts, we are now discussing the same model. But you were way
> off, in what I responded to.

You took issue with my use of the term "setwise" regression and
then went on to confuse it with stepwise procedures and you're
telling me that I'm "way off"?

>> As for your description of the analysis, do you really keep variables
>> that don't provide any useful information in the equation?
>
> Yes.  In my area (research in psychiatry), when the prescribed testing
> controls for several variables, that is what is ordinarily reported --
> especially if there is discernible difference in outcomes.  Sometimes
> the coefficients vary a tad, even for "nonsignificant" nuisance
> covariates.  Depending on the circumstances, it is sometimes acceptable
> to report the simpler equation; considering that option raises the risk
> or suspicion of cherry-picking of results.

First, if you check for my name in PubMed, you'll that I have been
involved in psychiatric research, so I'm somewhat familiar with what
gets done there.  Second, you simply do not make sense in what you
say above.  The point behind ANCOVA or the use of covariates in
multiple regression is to reduce the error variance by identifying variables
that are systematically related to the dependent variable and/or to
adjust for differences among groups on the covariates -- see Howell's
presentation on pages 598-621 in his 7th ed Stat Methods for Psych.
I agree that if a group of covariates are used, you should report whether
they are significant or not, but then you go on to use only those that
are significant in the subsequent stages of analysis.

>      I hope you
>> report shrunken or adjusted R^2 when you report your results because
>> they should be considerably smaller than R^2 as a result of the additional
>> useless predictors. It should give a person pause.
>
> For 2 variables with 75 subjects, the reduction is not large.  Of course,
> the effect for 6 variables is larger, but that R^2 is clearly of no interest.

Perhaps you missed Bruce Weaver's simulated data example?  First,
if you enter the covariates first and, though not significant, retain them
in the complete or simultaneous model, he got this:

|METHOD 1:  Entering the 4 confounders first, then the 2 variables of
|interest
|Model 1:  R-sq = 0.027, F(4, 55) = .388, p = .817,  Adj R-sq = -.043
|Model 2:  R-sq = 0.186, F(6, 53) = 2.014, p = .080,  Adj R-sq =  .093
|Change in R-sq = 0.158, F(2, 53) = 5.15, p = .009

Note that the model including the nonsignificant covariates produces
a NEGATIVE adjusted-R-sq.  Including the covariates into the model
raises the Adj-R-sq but consider what happens if one enter the two
variables of interest first:

|METHOD 2:  Entering the 2 variables of interest first, then the 4
|confounders
|Model 1:  R-sq = 0.148, F(2, 57) = 4.939, p = .011, Adj R-sq = .118
|Model 2:  R-sq = 0.186, F(6, 53) = 2.014, p = .080, Adj R-sq =  .093
|Change in R-sq = 0.038, F(4, 53) = 0.618, p = .652

On the basis of ordinary R-sq the full model appears to be the better
model but one has to remember that as the number of predictor/IVs
increases in a regression equation, so will R-sq which is one reason
why one focuses on the adjusted-R-sq. So, with just the two variables
of interest, the adjusted-R-sq= .118.  Add in the nonsignificant covariates
increases the regular R-sq but REDUCES the adjusted-R-sq.  This would
have become apparent if one had left out the nonsignificant covariates
in Method 1.  Their inclusion confuses the matter and may lead one
to think that the full model is better while in fact the reduced model
with just 2 predictors/IVs is the best model.

I don't doubt that you may be doing things consistent with what
others may have done in terms of analysis but think about it, what
is the justification for it?  Does this practice enlighten or confuse?

-Mike Palij
New York University
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: significant F change, but nonsignificant regression model overall

Rich Ulrich
Warning.  This post is about statistics/ design rather than SPSS.



> Date: Thu, 31 Mar 2011 09:34:16 -0400
> From: [hidden email]
> Subject: Re: significant F change, but nonsignificant regression model overall
> To: [hidden email]
>
> On Wednesday, March 30, 2011 11:08 PM, Rich Ulrich wrote:
> >Mike Palij had written in response to Ulrich's earlier post:
> >>Rich Ulrich wrote:
> >> >Mike,

[snip]
I'm going to snip a bunch and respond to just a couple of points
on strategies of experimental design.

> Rich, where do I use the term "stepwise" in my original post or even
> refer to a process that relies on the entry or removal of variables in/out
> of a regression on the basis of some criterion that is typically used
> in stepwise procedures? Here's what I said:

Here is what I was judging by --
***** excerpted from Mike's earlier post
(3) There is a "significant increase in R^2" (F change) when the second
set of predictors was entered.  This has me puzzled.  It is not clear to
me why or how this could occur. ***** end of excerpt

Was this a rhetorical use of, "not clear to me"?

I took it for being a failure to grasp a simple and appropriate
model of testing.

It now seems that perhaps it was a rhetorical slur on a style of
testing that Mike would not use.  I think the error is Mike's.

[snip several paragraphs]

>
> First, if you check for my name in PubMed, you'll that I have been
> involved in psychiatric research, so I'm somewhat familiar with what
> gets done there. Second, you simply do not make sense in what you
> say above. The point behind ANCOVA or the use of covariates in
> multiple regression is to reduce the error variance by identifying variables
> that are systematically related to the dependent variable and/or to
> adjust for differences among groups on the covariates -- see Howell's
> presentation on pages 598-621 in his 7th ed Stat Methods for Psych.
> I agree that if a group of covariates are used, you should report whether
> they are significant or not, but then you go on to use only those that
> are significant in the subsequent stages of analysis.

Concerning the last phrase - I do not consider *that*  to be a
generally acceptable style of testing, though many people resort
to it, often justified by the lack of sufficient d.f. to support
larger analyses.  It has problems in "model-building" -- which might
be remediated by sufficient cross-validation -- but I don't think it
is, at all, an acceptable strategy for dealing with a set of 4 nuisance
variables, as in this example of "testing".

I find it hard to believe that Howell's book, which has a good
reputation, would broadly endorse the notion, "Use only the significant
covariates for subsequent analyses."  That strategy reeks of all
the problems known for stepwise selection.


[snip, Bruce's example]

>[ concerning the example ...]
> On the basis of ordinary R-sq the full model appears to be the better
> model but one has to remember that as the number of predictor/IVs
> increases in a regression equation, so will R-sq which is one reason
> why one focuses on the adjusted-R-sq. So, with just the two variables
> of interest, the adjusted-R-sq= .118. Add in the nonsignificant covariates
> increases the regular R-sq but REDUCES the adjusted-R-sq. This would
> have become apparent if one had left out the nonsignificant covariates
> in Method 1. Their inclusion confuses the matter and may lead one
> to think that the full model is better while in fact the reduced model
> with just 2 predictors/IVs is the best model.

As I understand the problem on hand, it was intended to be a *test*
of two variables, and not an exercise in model-building for any
predictive purposes.  - I've mainly been concerned with adjusted
R^2  when considering actual prediction, not testing. - And prediction
needs much larger R^2 than any of these.

>
> I don't doubt that you may be doing things consistent with what
> others may have done in terms of analysis but think about it, what
> is the justification for it? Does this practice enlighten or confuse?

Conservative principles of testing have to assume that there
can be *problems* -- from nuisance variables, or whatever.  I
think it is a mistake to confuse testing, where effects are apt
to be marginal, with model-building, where you don't have anything
if you don't have very strong effects, somewhere, at the start.

As to testing --
Mike seems to recommend that one can use multivariate test on several
potential confounding variables, and then omit them all when the overall
test on them (like this one) fails to reject.  I hope that there are
not many journals that accept this strategy.

--
Rich Ulrich




=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
12