SPSSX Discussion

Confusing SPSS outputs in Linear Regression Analysis

Classic

List

Threaded

7 messages Options

E. Bernardo

Jan 14, 2016; 1:45am

Confusing SPSS outputs in Linear Regression Analysis

298 posts

Dear members,

My linear regression analysis has seven binary predictors, n=47, and (of course) a continuous dependent. The overall regression anova is nonsignificant (F=1.489, p = .200). The confusing is that two out of the seven predictors are significant (p<.05). I dont think there is multicollinearity problem because the collinearity diagnostics statistics seem look fine. For example, no beta coefficients of predictors greater than 1.0; Tolerance of the predictors range between .559 and .814; VIF of predictors range from 1.224 and 1.669; correlation coefficients among predictors are between .009 and .757 but most are below .30.

Any comments are welcome.

Thank you.

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

David Greenberg

Jan 14, 2016; 1:54am

Re: Confusing SPSS outputs in Linear Regression Analysis

74 posts

The overall F not being significant should tell you to stop there.
With seven individual predictors each being tested individually, you
are multiplying the chances of obtaining 2 significant t tests by
chance. In other words, you think you are testing at alpha = .05, but
actually are testing with a larger value of alpha. Many researchers
correct for this by doing a Bonferroni correction.. Chances are that
your significant findings will not be significant once that is done.
David Greenberg, Sociology Department, New York U.

On Wed, Jan 13, 2016 at 8:45 PM, E. Bernardo <[hidden email]> wrote:

> Dear members,
>
> My linear regression analysis has seven binary predictors, n=47, and (of
> course) a continuous dependent. The overall regression anova is
> nonsignificant (F=1.489, p = .200). The confusing is that two out of the
> seven predictors are significant (p<.05). I dont think there is
> multicollinearity problem because the collinearity diagnostics statistics
> seem look fine. For example, no beta coefficients of predictors greater than
> 1.0; Tolerance of the predictors range between .559 and .814; VIF of
> predictors range from 1.224 and 1.669; correlation coefficients among
> predictors are between .009 and .757 but most are below .30.
>
> Any comments are welcome.
>
> Thank you.
> E.
> ===================== To manage your subscription to SPSSX-L, send a message
> to [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
> commands to manage subscriptions, send the command INFO REFCARD

... [show rest of quote]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

E. Bernardo

Jan 14, 2016; 2:23am

Re: Confusing SPSS outputs in Linear Regression Analysis

298 posts

Dear David,

All the seven predictors were entered together into a multiple regression model (using ENTER method). The overall F was nonsignificant at the same time two of the seven predictors were significant (p<.05). Bonferroni correction is out of context in this discussion because all predictors were entered into the model simultaneously. That is, only one multiple regression was analyzed.

Thank you.

On Thursday, January 14, 2016 9:54 AM, David Greenberg <[hidden email]> wrote:

> to

[hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
> commands to manage subscriptions, send the command INFO REFCARD

... [show rest of quote]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

David Greenberg

Jan 14, 2016; 2:41am

Re: Confusing SPSS outputs in Linear Regression Analysis

74 posts

You are totally mistaken. The point is not to do the correction on
the overall regression. That needs no correction. But you are doing 7
tests on the coefficients. Imagine a world in which, in the
population, all those coefficients are zero. If you use a nominal
alpha of .05 the probability of getting any one estimate significant
by chance is 1 in 20, but with 7 tests, the probability of 2 in 7 is
elevated. It is quite a bit higher than .05. David Greenberg

On Wed, Jan 13, 2016 at 9:23 PM, E. Bernardo <[hidden email]> wrote:

> Dear David,
>
> All the seven predictors were entered together into a multiple regression
> model (using ENTER method). The overall F was nonsignificant at the same
> time two of the seven predictors were significant (p<.05). Bonferroni
> correction is out of context in this discussion because all predictors were
> entered into the model simultaneously. That is, only one multiple regression
> was analyzed.
>
> Thank you.
> E.
>
>
> On Thursday, January 14, 2016 9:54 AM, David Greenberg <[hidden email]> wrote:
>
>
> The overall F not being significant should tell you to stop there.
> With seven individual predictors each being tested individually, you
> are multiplying the chances of obtaining 2 significant t tests by
> chance. In other words, you think you are testing at alpha = .05, but
> actually are testing with a larger value of alpha. Many researchers
> correct for this by doing a Bonferroni correction.. Chances are that
> your significant findings will not be significant once that is done.
> David Greenberg, Sociology Department, New York U.
>
> On Wed, Jan 13, 2016 at 8:45 PM, E. Bernardo <[hidden email]>
> wrote:
>> Dear members,
>>
>> My linear regression analysis has seven binary predictors, n=47, and (of
>> course) a continuous dependent. The overall regression anova is
>> nonsignificant (F=1.489, p = .200). The confusing is that two out of the
>> seven predictors are significant (p<.05). I dont think there is
>> multicollinearity problem because the collinearity diagnostics statistics
>> seem look fine. For example, no beta coefficients of predictors greater
>> than
>> 1.0; Tolerance of the predictors range between .559 and .814; VIF of
>> predictors range from 1.224 and 1.669; correlation coefficients among
>> predictors are between .009 and .757 but most are below .30.
>>
>> Any comments are welcome.
>>
>> Thank you.
>> E.
>> ===================== To manage your subscription to SPSSX-L, send a
>> message
>
>> to
> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
>> commands to manage subscriptions, send the command INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
>

... [show rest of quote]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Mike

Jan 14, 2016; 4:00am

Re: Confusing SPSS outputs in Linear Regression Analysis

385 posts

Here's another way to understand what Greenberg is saying:

(1) The overall F test is an omnibus test that in the case of
multiple regression is testing whether the multiple R is significantly
different from zero. A non-significant R in this situation implies
that there is not correlation between the dependent/outcome
variable and the predictors (actually, it can also be interpreted
as the Pearson r between the actual values of Y and the predicted
values of Y (Y-hat)).

(2) You have 7 predictors in your equation, each can be evaluated
for significance (either the slope b is not equal to zero or the
increase in R^2 produced by the predictor is not greater than zero).
Each predictor is evaluated with a per comparison alpha (or
alpha-pc) p= .05. With 7 predictors, you have 7 tests done
each at alpha-pc. But the problem with multiple testing like
this is that we have to remember that there is an overall Type I
error rate or alpha-overall, which represents the probability
of falsely rejecting a true null hypothesis (in this case, correlations
are all equal to zero) after doing 7 tests.

(3) The formula for alpha-overall = 1 (1 - alpha-pc)^k
where k is the number of tests being done -- in this case
k = 7 (the ^7 means raised to the power of 7).. If the
alpha-pc = 0.05, then alpha-overall is

alpha-overall = 1 - (1-.05)^7 = 1 - (.95)^7 = 1 - (0.6983)
alpha-overall = 0.3017

In words, after 7 tests there is a 30% chance that one
has committed a Type I error. This is often considered
to be unacceptably high and people will tend to set
alpha-overall = 0.05 which implies that each alpha-pc
has to be reduced. One method is to do the following:
"corrected" alpha-pc = alpha-overall/k = .05/7 = 0.007.
Now, compare the p-value of each predictor in the equation
and see if it less than 0.007. It likely that none will be.

(4) The Bonferroni correction is the reduction of the Type I
error rate or alpha used with a group of tests. The omnibus F
of the regression only tells you that there either is significant
relationship between the dependent/criterion variable and
the independent/predictors or not. In the case of the multiple
regression it does not tell you which predictor is involved
in the relationship which is why you have to do additional
testing (this is a two-stage testing process). As the number
of predictors used increases, the probability that one or
more of them will statistically significant by chance (Type I
errors) increases and this is what one want to guard against.

I hope I was clear.

-Mike Palij
New York University
[hidden email]

----- Original Message -----
From: "David Greenberg" <[hidden email]>
To: <[hidden email]>
Sent: Wednesday, January 13, 2016 9:41 PM
Subject: Re: Confusing SPSS outputs in Linear Regression Analysis

> You are totally mistaken. The point is not to do the correction on
> the overall regression. That needs no correction. But you are doing 7
> tests on the coefficients. Imagine a world in which, in the
> population, all those coefficients are zero. If you use a nominal
> alpha of .05 the probability of getting any one estimate significant
> by chance is 1 in 20, but with 7 tests, the probability of 2 in 7 is
> elevated. It is quite a bit higher than .05. David Greenberg
>
> On Wed, Jan 13, 2016 at 9:23 PM, E. Bernardo
> <[hidden email]> wrote:
>> Dear David,
>>
>> All the seven predictors were entered together into a multiple
>> regression
>> model (using ENTER method). The overall F was nonsignificant at the
>> same
>> time two of the seven predictors were significant (p<.05). Bonferroni
>> correction is out of context in this discussion because all
>> predictors were
>> entered into the model simultaneously. That is, only one multiple
>> regression
>> was analyzed.
>>
>> Thank you.
>> E.
>>
>>
>> On Thursday, January 14, 2016 9:54 AM, David Greenberg <[hidden email]>
>> wrote:
>>
>>
>> The overall F not being significant should tell you to stop there.
>> With seven individual predictors each being tested individually, you
>> are multiplying the chances of obtaining 2 significant t tests by
>> chance. In other words, you think you are testing at alpha = .05, but
>> actually are testing with a larger value of alpha. Many researchers
>> correct for this by doing a Bonferroni correction.. Chances are that
>> your significant findings will not be significant once that is done.
>> David Greenberg, Sociology Department, New York U.
>>
>> On Wed, Jan 13, 2016 at 8:45 PM, E. Bernardo
>> <[hidden email]>
>> wrote:
>>> Dear members,
>>>
>>> My linear regression analysis has seven binary predictors, n=47, and
>>> (of
>>> course) a continuous dependent. The overall regression anova is
>>> nonsignificant (F=1.489, p = .200). The confusing is that two out of
>>> the
>>> seven predictors are significant (p<.05). I dont think there is
>>> multicollinearity problem because the collinearity diagnostics
>>> statistics
>>> seem look fine. For example, no beta coefficients of predictors
>>> greater
>>> than
>>> 1.0; Tolerance of the predictors range between .559 and .814; VIF of
>>> predictors range from 1.224 and 1.669; correlation coefficients
>>> among
>>> predictors are between .009 and .757 but most are below .30.
>>>
>>> Any comments are welcome.
>>>
>>> Thank you.
>>> E.
>>> ===================== To manage your subscription to SPSSX-L, send a
>>> message
>>
>>> to
>> [hidden email] (not to SPSSX-L), with no body text except
>> the
>>> command. To leave the list, send the command SIGNOFF SPSSX-L For a
>>> list of
>>> commands to manage subscriptions, send the command INFO REFCARD
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except
>> the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>>
>>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except
> the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

... [show rest of quote]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Rich Ulrich

Jan 14, 2016; 8:05am

Re: Confusing SPSS outputs in Linear Regression Analysis

1067 posts

All the posts so far have been working from the assumption that all
seven of tests are of equal and independent priority and importance.

That may be the case. But, in my own clinical research, having more
than two or three "most important" hypotheses is what arises when the
work is thoroughly exploratory. In other words: This is not what would
(for my sort of research, in any case) be a strong experimental design
for /testing/ in a known area; rather, it would seem to be a first stab
at finding something.

Why do two variables correlate above 0.70? This is exceedingly high
for dichotomous measures -- where, in fact, the max-corr is limited
by the commensurate skew of the marginal distributions.

You might report what you have as a thoroughly exploratory result ...
though, I would also look at the t-tests as univariate explorations.
Not merely exploratory? Then your design has low power.

If I had a client bring these data to me, I would suggest that, in order
to have any power for "moderate-size" effects with N=47, they need to
select one or two primary hypotheses or test a single, a-priori composite
score.

--
Rich Ulrich

Mike

Jan 14, 2016; 2:56pm

Re: Confusing SPSS outputs in Linear Regression Analysis

385 posts

Rich makes some good points and I'd like to say a few
things in response regarding the process of planning a
statistical analysis.

----- Original Message -----
On Thursday, January 14, 2016 3:05 AM, Rich Ulrich wrote:
>All the posts so far have been working from the assumption
>that all seven of tests are of equal and independent priority
>and importance.

This is essentially true mainly for the reason that an analysis
was already done and the results were looked at and interpreted.
Since all 7 variables were entered into a simultaneous model,
the researcher HAD to have some justification for doing so
instead of building a regression model from the systematic
entry of individual or groups of variables (e.g., does var5
add anything to R^2 AFTER vars var1 to var4 are in the
equation). After looking at the results of the simultaneous
model, one could do such a model building exercise but
now one is on a fishing expedition. Instead of answering
research questions, one now is trying to better understand
the nature of the data and the patterns that may exist among
variables.

>That may be the case. But, in my own clinical research, having
>more than two or three "most important" hypotheses is what
>arises when the work is thoroughly exploratory. In other words:
>This is not what would (for my sort of research, in any case) be
>a strong experimental design for /testing/ in a known area;
>rather, it would seem to be a first stab at finding something.

Although I am somewhat in agreement with what Rich says above,
if one really has a couple/few "most important" hypotheses, then
something like "planned comparisons" (in ANOVA, specific
difference between specific means; in multiple regression,
specific predictors are used in reduced models in contrast to
a full simultaneous model).

The amount of knowledge that one has about the phenomenon
that one has data on helps to determine what types of statistical
analysis and tests one will do. When one has limited knowledge
(great ignorance), then using a two-stage process (first, an
omnibus test and, if significant, multiple comparisons of some sort)
is likely to be used. When one has greater knowledge and is
concerned with only a few specific sets of relationships or models,
then planned comparisons or testing specific patterns of relationship
among variables in either regression analysis or structural equation
modeling (the latter shouldn't be done for a sample N=47).

In my own experience in analyzing clinical research data, I find
that sometimes the researcher is knowledgable, sometimes
has no idea what is going on (in more senses than one). In
the second case, the researcher may go fishing and do all sorts
of additional analyses that were not initially planned (these analyses
are driven by not obtaining results one expected) but may be
written up as though it were part of the analysis plan. For example,
a study of how a clinical intervention (e.g., cognitive behavior therapy
or a drug) affects a group of people (e..g., a people with a clinical
diagnosis of depression). One does the study and finds no
significant effect (e.g., no change between pre and post intervention,
no difference between intervention and placebo/control/reference group).
This usually causes dissatisfaction in various people (e.g., the
principal investigator, the funding agency, etc.), and one way to
deal with this is to TRY to find some risgnificant result. So,
the researcher may ask that the participants be divided into
three groups on the basis of severity of condition (e.g., in the
case of depression, these would be low levels of clinical
depression, moderate, or high). One does additional analyses
and, lo and behold, the high depression group shows an effect
but the other goups don't. The problem here is treating this result
as though one planned to do it instead of a tentative result/hypothesis
that requires new data in order to show that it is an actual result
and not a Type I error because one has done a lot of additional
testing.

>Why do two variables correlate above 0.70? This is exceedingly
>high for dichotomous measures -- where, in fact, the max-corr is
>limited by the commensurate skew of the marginal distributions.

Just to add what Rich says above, the maximum r (phi coefficient)
is determined by the proportions in the two dichotomous variables.
If we use the values of 0 and 1 for each of two variables, the maximum
phi is obtained when the prop(X1 = 1) = prop(X2 = 1), meaning that
the phi coefficient has an upper bound of +1.00 and a lower bound
of -1.00, like the ordinary Pearson r. But if prop(X1=1) not equal
prop(X2=1), this is no longer true and the maimum phi falls into a
smaller interval Guilford and Fruchter is just one source on this
point (right now I'm using the 5th ed [1973]of their "Fundamental
Statistics in Pyshcology and Education", pp306-310, but other
sources also provide this info). The maximum value:of phi can
be calculated by the following equation (G&F, eqn 14.24, p309)

max phi = sqrt[ (p1/q1) * (p2/q2)]

where p1 is the proportion of X1 = 1, q1 is the proportion of X1=0,
similarly, p2 is proportion X2 = 1, q2 is proportion X2 = 0.

Table 14.9 on G&F's p308 shows what happens to phi when
p1 = .50 but p2 takes on different values. All this suggests that
one should examine the 2x2 table for the two variables involved.

>You might report what you have as a thoroughly exploratory result ...
>though, I would also look at the t-tests as univariate explorations.
>Not merely exploratory? Then your design has low power.

Remember that power calculation should be done BEFORE the
analysis of data and not used as an excuse AFTER you have the
results. Prospective power analysis commits one to specifying
the effect sizes one believes exists or at least is interested in.
A source on the "evils" of retrospective power analyis is

Hoenig, J. M., & Heisey, D. M. (2001). The abuse of power.
The American Statistician, 55(1).

But one can do a scholar.google.com on the distinction of
prospective vs retrospective power analysis and the problems
associated with the latter. Ultimately, the problem is that the
researcher has no f'n clue about the probability distributions
the data have.

>If I had a client bring these data to me, I would suggest that,
>in order to have any power for "moderate-size" effects with
>N=47, they need to select one or two primary hypotheses or
>test a single, a-priori composite score.

If I had a client come to me with these data, I'd send them to Rich. ;-)

-Mike Palij
New York University
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD