Re: significant F change, but nonsignificant regression model overall

Posted by Mike on
URL: http://spssx-discussion.165.s1.nabble.com/significant-F-change-but-nonsignificant-regression-model-overall-tp4269810p4272463.html

On Wednesday, March 30, 2011 5:58 PM, Bruce Weaver wrote:
> Hi Mike.  I don't have time to address all of your points right now, here are
> a couple quick comments.

No problem.  I hear that there's a life beyond this mailing list. ;-)

> Re your point 1 below, I didn't have enough patience to keep fiddling with
> it until I got a data set that met that condition.  I suspect that one could
> concoct a data set meeting that condition that still shows the same general
> pattern of results though.

I think you did great is coming up with the simulated dataset.  The
real issue is what does the actual dataset show.  I have a feeling it's
not a pretty picture.

> Re your point 2, I should have added that when people use this approach (key
> variables first, then nuisance variables), they have the option of reverting
> to the simpler model (and often do) if the nuisance variables add nothing
> useful.  One thing to consider is whether the change in R-sq is
> statistically significant; but as I mentioned, I also like to check that the
> coefficients for the key variables are not too different in the too models
> as well.  If they are, it suggests something a bit fishy might be going on.
>
> But in some cases, of course, adding the nuisance variables does result in
> improved fit of the model, and possibly changes in the coefficients for the
> key variables.  So in that case, one would obviously stick with the more
> complex model.

Once upon a time when I was into doing logistic regression, I remember
that such things were done but that it was the opposite of what one would
do in an ANOVA/ANCOVA framework.  All I can remember is that
it worked even though I wasn't completely happy about that (I remember
figuring out why that should happen but I've forgotten that, like I've
forgotten how to do certain proofs off the top of my head).

> As you've probably seen by now, I posted another message a while ago that
> has the adjusted R-sq values for my example with random data.  And as you
> probably expected, Adj R-sq for model 2 is lower than for model 1 when the
> nuisance variables are added second.  This is another pretty good sign that
> one should consider reverting to Model 1.

Yeah, the Adj-R-sqs were pretty ugly.  I did a double-take in your
analysis where you had a negative Adj-R-sq.  Nothing says pathological
case like a negative Adj-R-sq.  In the second method, I copy the
results below:

Model 1:  R-sq = 0.148, F(2, 57) = 4.939, p = .011   Adj R-sq = .118
Model 2:  R-sq = 0.186, F(6, 53) = 2.014, p = .080   Adj R-sq =  .093

Just looking at the R-sq, one would think that adding the nonsignificant
nuisance variables increases R-sq but one has to remember that the value
of R increases with the number of predictors in the equation.  The Adj-R-sq
for the full model clearly indicates that (a) the additional variables are
impairing the model and (b) one should be cautious in relying on plain R-sq.

-Mike Palij
New York University
[hidden email]

> Cheers,
> Bruce
>
> p.s. - Here are the data for my random number example, in case anyone wants
> to play around with it.
>
>   ID        Y       x1       x2       x3       x4       x5       x6
>    1    68.87    48.79    65.52    32.83    46.57    49.68    50.41
>    2    77.83    37.03    59.01    50.83    58.44    67.38    65.47
>    3    74.96    63.20    62.50    44.59    31.19    71.42    58.67
>    4    66.83    37.10    52.60    58.99    55.92    63.61    52.10
>    5    89.31    53.34    55.78    42.63    57.21    61.94    40.51
>    6    88.92    42.98    41.55    49.90    52.92    54.90    32.59
>    7    84.52    45.10    43.88    67.67    61.20    65.58    25.97
>    8    71.03    51.50    59.75    40.35    53.97    27.20    57.42
>    9    90.30    35.67    42.71    47.18    48.81    55.29    57.62
>   10    82.03    54.73    40.85    49.57    64.83    36.19    51.69
>   11    74.90    50.93    52.39    44.54    45.33    53.13    47.99
>   12    81.51    49.38    53.75    49.38    39.24    46.70    40.37
>   13    98.03    37.70    43.40    49.28    49.51    43.83    44.92
>   14    97.00    48.67    49.31    40.79    47.47    48.79    47.49
>   15    68.77    49.13    65.20    55.54    67.96    52.64    57.70
>   16    89.69    52.83    45.73    55.60    46.59    48.89    37.61
>   17    41.05    62.22    36.89    31.77    49.43    36.90    32.90
>   18    69.90    54.74    36.74    63.09    75.08    42.78    51.80
>   19    72.18    49.39    55.51    35.44    54.24    60.74    41.36
>   20    71.35    53.96    36.54    22.17    48.72    47.93    32.03
>   21    67.55    50.33    39.52    49.40    47.92    46.42    49.87
>   22    57.96    52.97    42.08    61.42    47.01    42.43    52.60
>   23    72.59    52.14    51.45    48.08    43.25    50.97    54.75
>   24    79.43    42.07    34.99    28.99    75.75    33.72    45.27
>   25    98.00    48.57    30.43    46.96    39.50    44.50    47.91
>   26    92.54    32.80    39.10    53.62    50.50    43.57    57.61
>   27    69.40    57.39    67.77    68.30    49.33    55.33    66.43
>   28    69.23    37.75    56.03    64.98    46.18    57.15    42.81
>   29    60.53    53.97    47.93    48.30    49.18    39.33    49.69
>   30    45.42    42.87    54.18    50.04    37.56    46.02    39.47
>   31    75.66    71.98    45.94    57.29    35.81    40.17    49.39
>   32    87.42    37.58    40.88    52.57    27.52    35.19    57.27
>   33    69.26    40.32    63.45    56.72    55.60    50.81    48.60
>   34    67.01    56.82    50.11    41.32    57.04    39.51    54.33
>   35   108.41    62.20    54.69    54.91    62.37    57.80    55.15
>   36    89.07    43.66    56.98    38.51    45.55    51.19    64.90
>   37    91.75    73.27    48.97    58.70    46.24    60.23    52.54
>   38    79.19    53.14    52.16    35.82    53.97    67.37    41.93
>   39    93.08    45.82    60.89    44.59    51.37    64.52    54.42
>   40    25.68    48.56    38.87    51.27    43.72    54.75    41.63
>   41    76.89    43.52    45.51    32.75    45.15    49.65    44.21
>   42    87.52    59.25    52.09    57.91    52.07    64.11    43.07
>   43    91.29    46.26    35.32    46.97    54.77    55.38    80.68
>   44    75.65    37.66    38.25    52.39    44.49    53.13    55.76
>   45    79.83    32.62    75.66    49.90    56.71    56.68    54.74
>   46    93.66    48.84    59.99    39.71    38.28    38.93    68.83
>   47    77.16    33.17    38.94    53.12    30.47    40.20    61.00
>   48    78.99    57.50    59.34    54.62    35.23    46.06    40.72
>   49    62.73    51.51    48.49    70.91    36.68    40.46    43.22
>   50    85.68    21.41    38.36    62.87    27.46    40.93    56.01
>   51    62.48    46.58    67.54    47.85    33.46    39.55    45.70
>   52    61.28    47.16    50.70    37.73    60.64    43.36    55.58
>   53   107.35    44.61    39.74    60.34    49.34    50.16    52.04
>   54    65.51    55.57    40.58    44.00    50.03    55.89    54.24
>   55    85.11    51.98    51.38    46.19    36.61    36.27    64.20
>   56    91.36    42.71    60.44    66.88    48.58    62.01    53.99
>   57    89.99    68.91    48.34    49.55    57.85    66.75    53.52
>   58    55.38    40.13    50.45    41.90    53.80    41.21    41.20
>   59   117.70    40.08    49.32    50.25    54.41    72.35    51.54
>   60    67.47    55.77    55.62    52.25    47.86    47.63    41.52
>
>
>
> Mike Palij wrote:
>>
>> Bruce,
>>
>> A few points:
>>
>> (1) The OP said the following:
>>
>> |For Model 2 where he entered the 4 covariates on Step 1 and
>> |the 2 variables he is most interested in on Step 2, the Multiple R
>> |is .6 and F test is still not significant. But a priori, he was most
>> |interested in the 2 variables entered on Step 2 - and this is
>> |where the F change is significant.  One of the two variables is
>> |significant on Step 2.
>>
>> In your example below both X5 and X6 are significantly related
>> to the dep var but to make it really relevant only one of these
>> should be significant.  This may or may not make a difference,
>> depending upon the constraints the data put on the range of
>> allowable values.
>>
>> (2)  I don't like the second method of entering the nuisance
>> variables after the critical variables, I still think that it is a foolish
>> thing to do so because (a) it adds no useful information and (b) make
>> the model nonsignificant -- the model with only X5 and X6
>> is clearly better.  As for the significant increase in R^2, I suggest
>> you look at the difference between adjusted R^2 -- that should
>> be much smaller because of the penalty of having 6 predictors
>> in the second model.
>>
>> (3)  The goals of doing an ANCOVA have traditionally been
>> (a) reduce the error variance by removing the variance in it that is
>> associated with the covariate (no association, no reduction in
>> error variance, thus no point in keeping the covariates) and
>> (b) if the groups in the ANOVA have different means on the
>> covariate, the ANCOVA adjusts the means to compensate for
>> difference on the covariates.  If one has a copy of Howell's 7th ed
>> Stat Methods for Psych, the material on pages 598-609.  Both
>> of these require the covariates to be entered first (indeed, in
>> ANCOVA terms, entering the covariates after the regular
>> ANOVA would be bizarre).  In the ANCOVA context, keeping
>> nonsignificant covariates makes no sense.
>>
>> -Mike Palij
>> New York University
>> [hidden email]
>>
>>
>>
>> ----- Original Message -----
>> From: "Bruce Weaver" <[hidden email]>
>> To: <[hidden email]>
>> Sent: Wednesday, March 30, 2011 3:35 PM
>> Subject: Re: significant F change, but nonsignificant regression model
>> overall
>>
>>
>>> After a few tries, I mimicked this result (more or less) with some
>>> randomly
>>> generated data and 60 cases.
>>>
>>> * Generate data .
>>> * X1 to X6 are random numbers.
>>> * Only X5 and X6 are related to Y.
>>>
>>> numeric Y x1 to x6 (f8.2).
>>> do repeat x = x1 to x6.
>>> -  compute x = rv.normal(50,10).
>>> end repeat.
>>> compute Y = 50 + .2*x5 + .4*x6 + rv.normal(0,15).
>>> exe.
>>>
>>> REGRESSION
>>>  /STATISTICS COEFF OUTS R ANOVA CHANGE
>>>  /DEPENDENT Y
>>>  /METHOD=ENTER x1 to x4
>>>  /METHOD=ENTER x5 x6.
>>>
>>> Model 1:  R-sq = 0.027, F(4, 55) = .388, p = .817
>>> Model 2:  R-sq = 0.186, F(6, 53) = 2.014, p = .080
>>> Change in R-sq = 0.158, F(2, 53) = 5.15, p = .009
>>>
>>> When the goal is to control for potential confounders, one sometimes sees
>>> the steps reversed, with the variable (or variables) of main interest
>>> entered first, and the potential confounders added on the next step.
>>> This
>>> is commonly done with logistic regression, for example, where crude and
>>> adjusted odds ratios are reported (from models 1 and 2 respectively).
>>> For
>>> the data above, here's what I get when I do it that way:
>>>
>>> Model 1:  R-sq = 0.148, F(2, 57) = 4.939, p = .011
>>> Model 2:  R-sq = 0.186, F(6, 53) = 2.014, p = .080
>>> Change in R-sq = 0.038, F(4, 53) = 0.618, p = .652
>>>
>>> Even though the change in R-sq is clearly not significant, I like to
>>> compare
>>> (via the eyeball test) the coefficients for X5 and X6 in the two models.
>>> If
>>> there is no confounding, then the values should be pretty similar in the
>>> two
>>> models.
>>>
>>> Model   Variable     B      SE      p
>>> 1          X5         .448   .190   .022
>>>            X6        .432    .202   .036
>>> 2          X5        .522    .210   .016
>>>            X6        .459    .210   .033
>>>
>>>
>>> Mike, would you be any happier with this second approach to the analysis?
>>>
>>> Cheers,
>>> Bruce
>>>
>>>
>>>
>>> Mike Palij wrote:
>>>>
>>>> On Tuesday, March 29, 2011 11:36 pm, Rich Ulrich wrote:
>>>> >
>>>> > Mike,
>>>> > You seem to have missed the comment,
>>>> >
>>>> >>>> He has entered four variables to control for on
>>>> >>>> the first step, and then two other predictors on the
>>>> 2nd
>>>> step. So
>>>> >>>> we're trying to see if these two predictors are
>>>> significant above and
>>>> >>>> beyond the four variables we are controlling for on the
>>>> first step of
>>>> >>>> the regression.
>>>>
>>>> No, I didn't miss this comment.  Let's review what we might know about
>>>> the situation (at least from my perspective):
>>>>
>>>> (1) The analyst is doing setwise regression, comparable to an ANCOVA,
>>>> entering 4 variables/covariates as the first set.  As mentioned
>>>> elsewhere,
>>>> these covariates are NOT significantly related to the dependent
>>>> variable.
>>>> This implies that the multiple correlation and its squared version are
>>>> zero,
>>>> or R1=0.00.  One could, I think, legitimately ask why did one continue
>>>> to
>>>> use these as covariates or keep them in the model when the second set
>>>> was entered -- one argument could be based on the expectation that
>>>> there is a supressor relationship among the predictors but until we hear
>>>> from the person who actually ran the analysis, I don't believe this was
>>>> the strategy.
>>>>
>>>> (2) After the second set of predictors were entered there still was NO
>>>> significant relationship between the predictors and the dependent
>>>> variable.
>>>> So, for this model R and R^2 are both equal to zero or R2=0.00
>>>>
>>>> (3) There is a "significant increase in R^2" (F change) when
>>>> the
>>>> second
>>>> set of predictors was entered.  This has me puzzled.  It is not clear to
>>>> me why or how this could occur.  If R1(set 1/model 1)=0.00 and
>>>> R2(set 2/model 2)=0.00, then why would R2-R1 != 0.00?  I suspect
>>>> that maybe there really is a pattern of relationships present but that
>>>> there is insufficient statistical power to detect them (the researcher
>>>> either needs to get more subjects or better measurements). There
>>>> may be other reasons but I think one needs to examine the data
>>>> in order to figure out (one explanation is that it is just a Type I
>>>> error).
>>>>
>>>> Rich, how would you explain what happens in (3) above?
>>>>
>>>> -Mike Palij
>>>> New York University
>>>> [hidden email]
>>>>
>>>> =====================
>>>> To manage your subscription to SPSSX-L, send a message to
>>>> [hidden email] (not to SPSSX-L), with no body text except the
>>>> command. To leave the list, send the command
>>>> SIGNOFF SPSSX-L
>>>> For a list of commands to manage subscriptions, send the command
>>>> INFO REFCARD
>>>>
>>>
>>>
>>> -----
>>> --
>>> Bruce Weaver
>>> [hidden email]
>>> http://sites.google.com/a/lakeheadu.ca/bweaver/
>>>
>>> "When all else fails, RTFM."
>>>
>>> NOTE: My Hotmail account is not monitored regularly.
>>> To send me an e-mail, please use the address shown above.
>>>
>>> --
>>> View this message in context:
>>> http://spssx-discussion.1045642.n5.nabble.com/significant-F-change-but-nonsignificant-regression-model-overall-tp4269810p4272153.html
>>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>> [hidden email] (not to SPSSX-L), with no body text except the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/significant-F-change-but-nonsignificant-regression-model-overall-tp4269810p4272426.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD