SPSSX Discussion

Q. Why do GLM commands not included multicollinearity diagnostics?

Classic

List

Threaded

3 messages Options

Bruce Weaver

Q. Why do GLM commands not included multicollinearity diagnostics?

Administrator

Another recent thread (see link below) reminded me of a question that has
occurred to me frequently over the years:

Why do the GLM commands (GLM, UNIANOVA, CSGLM) NOT include the same
multicollinearity diagnostics that REGRESSION has?

Taking it further, why do they not compute Fox & Monette's (1992)
Generalized Variance Inflation Factor (gVIF)? Their article was published
in 1992--it's high time SPSS implemented gVIF, IMO!

References

http://spssx-discussion.1045642.n5.nabble.com/Multicollinearity-Logistic-regression-td5739623.html

Fox, J., & Monette, G. (1992). Generalized Collinearity Diagnostics. Journal
of the American Statistical Association, 87(417), 178-183.
doi:10.2307/2290467

-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Jon Peck

Re: Q. Why do GLM commands not included multicollinearity diagnostics?

(Forwarded Bruce's email to project management)

But the GVIF is straightforward to compute from the coefficient correlation matrices as outlined here

https://www.researchgate.net/profile/Jose_Perez223/publication/284031680_Collinearity_diagnostic_applied_in_ridge_estimation_through_the_VIF/links/57471d7208ae14040e28cc78/Collinearity-diagnostic-applied-in-ridge-estimation-through-the-VIF.pdf

Ignoring the ridge part, the formula given can be computed using the MATRIX procedure.

However, as I apparently noted in the link Bruce cited, it's a misnomer to talk of testing for multicollinearity, since it is a matter of degree unless the X'X matrix is singular.

Calculating the VIF using REGRESSION with or without the sampling weight might give a good idea of whether a substantial degree of multicollinearity is present. It also reminds me that ridge regression, while not available as a CS procedure, could be useful in addressing collinearity. RIDGE is available in the CATREG procedure. If using that, though, watch out for zero values in the regressors, because that procedure considers such values to be missing :-(. Add a 1 if necessary.

On Mon, Sep 7, 2020 at 7:49 AM Bruce Weaver <[hidden email]> wrote:

Another recent thread (see link below) reminded me of a question that has
occurred to me frequently over the years:

Why do the GLM commands (GLM, UNIANOVA, CSGLM) NOT include the same
multicollinearity diagnostics that REGRESSION has?

Taking it further, why do they not compute Fox & Monette's (1992)
Generalized Variance Inflation Factor (gVIF)? Their article was published
in 1992--it's high time SPSS implemented gVIF, IMO!

References

http://spssx-discussion.1045642.n5.nabble.com/Multicollinearity-Logistic-regression-td5739623.html

Fox, J., & Monette, G. (1992). Generalized Collinearity Diagnostics. Journal
of the American Statistical Association, 87(417), 178-183.
doi:10.2307/2290467

-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Mike

Re: Q. Why do GLM commands not included multicollinearity diagnostics?

In reply to this post by Bruce Weaver

I am going to speculate here but I think that most statistics programs do

not provide multicollinearity statistics for ANOVA analyses (today we

include these analyses under the heading of GLM) because of a traditional

distinction that had been made, at least in psychology and the biomedical

fields, between when one uses ANOVA analyses and when one uses

multiple regression analysis.

If one has conducts a bona fide true experiment where one manipulates/controls

the independent variables, then one makes sure that the independent

variables are orthogonal to each other (i.e., uncorrelated). For example,

if one uses a 2x2 design, the two variables do not represent attributes

of the subjects/participants (e.g., sex/gender, age, etc.) and the design

is balanced -- either the cells all have the same sample sizes or they are

proportional according to some criteria that keeps the two independent

variables orthogonal.

When the independent variables are orthogonal to each other, the

calculations that one needs to do simplifies relative to the calculations

for multiple regression -- this is why ANOVA analyses in intro psych

stats textbooks have focused on how to calculate the appropriate Sums of

Squares (SS) for the independent variables and error term(s). After

determining the degrees of freedom (df) for each source, Mean Squares

for each source could be easily calculated by MS = SS/df. The

F-ratio is simply F = MS-A/MS-Error (where A is one independent variable)

and so on.

These calculations are inappropriate if the independent variables are

not orthogonal. One now has to use multiple regression to "save"

the analysis because the design has been mucked up, so to speak.

Experimentalists would argue that these types of "true experiments"

allows one to attribute a causal role to each of the independent variables

if appropriate controls are in place (e.g., no confounding variables are

present). The use of Randomized Control Trials (RCTs) to test the effect

of a drug relative to a placebo or a standard treatment depends upon the

orthogonality of independent variables (one example I use is A= Drug vs Placebo,

B= Psychotherapy [2 levels], and the AxB 2-way interaction). In this context,

if there is a main effect of drug, one can legitimately say that the drug had a

causal effect (hopefully, a beneficial effect). Similarly, one can make

a similar statement for psychotherapy but if the interaction is significant,

one doesn't focus on the main effects, instead, one focuses on how

the effect of one ind var varies as a function of the second ind var.

Note that true experiments typically have a small number of independent

variables -- generally 4 or less (Zar's text goes up 5 IV).

If the independent variables are non-orthogonal one really shouldn't

make statements about the causal effects of the independent variables

because the non-orthogonality causes the main effect of one ind var

to be correlated with the main effect of the other ind var -- the

effects are confounded. Experimental psychologists for most of the

20th century used this logic and relied on ANOVA analyses (as did

experimentalists in the biomedical fields which is why the BMDP

statistical package had such great ANOVA programs long before

SPSS did, especially within-subject/repeated measures designs;

SAS had good between-subject design analysis capabilities but

it wasn't until the 1980s that within-subject analyses were made

easier to specify and prevented errors of incorrect design specification).

Today, one might say that researchers are much more casual about

causality and non-orthogonality is seen as less of a problem.

In the 1960s Jack Cohen and others argued that ANOVA analyses of

designs with orthogonal independent variables are a special case of the

multiple regression analysis and showed how the traditional ANOVA

analyses could be done in regression. The first edition of his regression

text brought together the threads of this argument, though many

experimental psychologists didn't see the benefits and regression

was seen as being tainted because it was used with correlated

independent variables which undermined causal attributions. Social

scientists using data from nonexperimental designs would beg to

differ.

So, long story short: for traditional experimentalists using orthogonal

independent variables, there is no multicollinearity so why provide

measures of them? If one is using non-orthogonal/correlated ind vars,

well, one is going to need to know how much non-orthogonality there is.

In the last quarter of the 20th century it became generally accepted

to view ANOVA and regression analyses as specific instances of the

General Linear Model, and the GLM as subset of more general linear/nonlinear

analyses. But the distinction between analyses of true experiments

versus quasi- and nonexperimental designs is maintained by having

separate programs/procedures for ANOVA and regression instead of

a single GLM program (not the one currently in SPSS). In the structural

equation modeling (SEM) field, LISREL and MPLUS (I assume other SEM

programs as well) have in fact presented this view: one can do everything

from t-tests to MANOVA, canonical correlation analysis, and most of

the other multivariate analyses with new capabilities being regularly added.

So why use SPSS when an SEM package can do almost everything one

wants (one can always use Excel or undergraduates to help clean and

organize the data ;-).

Okay, have I exceeded tldr?

-Mike Palij

New York University

[hidden email]

On Mon, Sep 7, 2020 at 9:49 AM Bruce Weaver <[hidden email]> wrote:

Another recent thread (see link below) reminded me of a question that has
occurred to me frequently over the years:

Why do the GLM commands (GLM, UNIANOVA, CSGLM) NOT include the same
multicollinearity diagnostics that REGRESSION has?

Taking it further, why do they not compute Fox & Monette's (1992)
Generalized Variance Inflation Factor (gVIF)? Their article was published
in 1992--it's high time SPSS implemented gVIF, IMO!

References

https://urldefense.proofpoint.com/v2/url?u=http-3A__spssx-2Ddiscussion.1045642.n5.nabble.com_Multicollinearity-2DLogistic-2Dregression-2Dtd5739623.html&d=DwICAg&c=slrrB7dE8n7gBJbeO0g-IQ&r=A8kXUln5f-BYIUaapBvbXA&m=reuGhr4sxMCzFxDlz1ti9Pxdyh4qZnf3crheHfGZ-UY&s=tLN7xiWt6CpB7od8tvntn8xMAVjpBVGu973Evf0LR3c&e=

Fox, J., & Monette, G. (1992). Generalized Collinearity Diagnostics. Journal
of the American Statistical Association, 87(417), 178-183.
doi:10.2307/2290467