Administrator
|
Another recent thread (see link below) reminded me of a question that has
occurred to me frequently over the years: Why do the GLM commands (GLM, UNIANOVA, CSGLM) NOT include the same multicollinearity diagnostics that REGRESSION has? Taking it further, why do they not compute Fox & Monette's (1992) Generalized Variance Inflation Factor (gVIF)? Their article was published in 1992--it's high time SPSS implemented gVIF, IMO! References http://spssx-discussion.1045642.n5.nabble.com/Multicollinearity-Logistic-regression-td5739623.html Fox, J., & Monette, G. (1992). Generalized Collinearity Diagnostics. Journal of the American Statistical Association, 87(417), 178-183. doi:10.2307/2290467 ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
(Forwarded Bruce's email to project management) But the GVIF is straightforward to compute from the coefficient correlation matrices as outlined here Ignoring the ridge part, the formula given can be computed using the MATRIX procedure. However, as I apparently noted in the link Bruce cited, it's a misnomer to talk of testing for multicollinearity, since it is a matter of degree unless the X'X matrix is singular. Calculating the VIF using REGRESSION with or without the sampling weight might give a good idea of whether a substantial degree of multicollinearity is present. It also reminds me that ridge regression, while not available as a CS procedure, could be useful in addressing collinearity. RIDGE is available in the CATREG procedure. If using that, though, watch out for zero values in the regressors, because that procedure considers such values to be missing :-(. Add a 1 if necessary. On Mon, Sep 7, 2020 at 7:49 AM Bruce Weaver <[hidden email]> wrote: Another recent thread (see link below) reminded me of a question that has |
In reply to this post by Bruce Weaver
I am going to speculate here but I think that most statistics programs do not provide multicollinearity statistics for ANOVA analyses (today we include these analyses under the heading of GLM) because of a traditional distinction that had been made, at least in psychology and the biomedical fields, between when one uses ANOVA analyses and when one uses multiple regression analysis. If one has conducts a bona fide true experiment where one manipulates/controls the independent variables, then one makes sure that the independent variables are orthogonal to each other (i.e., uncorrelated). For example, if one uses a 2x2 design, the two variables do not represent attributes of the subjects/participants (e.g., sex/gender, age, etc.) and the design is balanced -- either the cells all have the same sample sizes or they are proportional according to some criteria that keeps the two independent variables orthogonal. When the independent variables are orthogonal to each other, the calculations that one needs to do simplifies relative to the calculations for multiple regression -- this is why ANOVA analyses in intro psych stats textbooks have focused on how to calculate the appropriate Sums of Squares (SS) for the independent variables and error term(s). After determining the degrees of freedom (df) for each source, Mean Squares for each source could be easily calculated by MS = SS/df. The F-ratio is simply F = MS-A/MS-Error (where A is one independent variable) and so on. These calculations are inappropriate if the independent variables are not orthogonal. One now has to use multiple regression to "save" the analysis because the design has been mucked up, so to speak. Experimentalists would argue that these types of "true experiments" allows one to attribute a causal role to each of the independent variables if appropriate controls are in place (e.g., no confounding variables are present). The use of Randomized Control Trials (RCTs) to test the effect of a drug relative to a placebo or a standard treatment depends upon the orthogonality of independent variables (one example I use is A= Drug vs Placebo, B= Psychotherapy [2 levels], and the AxB 2-way interaction). In this context, if there is a main effect of drug, one can legitimately say that the drug had a causal effect (hopefully, a beneficial effect). Similarly, one can make a similar statement for psychotherapy but if the interaction is significant, one doesn't focus on the main effects, instead, one focuses on how the effect of one ind var varies as a function of the second ind var. Note that true experiments typically have a small number of independent variables -- generally 4 or less (Zar's text goes up 5 IV). If the independent variables are non-orthogonal one really shouldn't make statements about the causal effects of the independent variables because the non-orthogonality causes the main effect of one ind var to be correlated with the main effect of the other ind var -- the effects are confounded. Experimental psychologists for most of the 20th century used this logic and relied on ANOVA analyses (as did experimentalists in the biomedical fields which is why the BMDP statistical package had such great ANOVA programs long before SPSS did, especially within-subject/repeated measures designs; SAS had good between-subject design analysis capabilities but it wasn't until the 1980s that within-subject analyses were made easier to specify and prevented errors of incorrect design specification). Today, one might say that researchers are much more casual about causality and non-orthogonality is seen as less of a problem. In the 1960s Jack Cohen and others argued that ANOVA analyses of designs with orthogonal independent variables are a special case of the multiple regression analysis and showed how the traditional ANOVA analyses could be done in regression. The first edition of his regression text brought together the threads of this argument, though many experimental psychologists didn't see the benefits and regression was seen as being tainted because it was used with correlated independent variables which undermined causal attributions. Social scientists using data from nonexperimental designs would beg to differ. So, long story short: for traditional experimentalists using orthogonal independent variables, there is no multicollinearity so why provide measures of them? If one is using non-orthogonal/correlated ind vars, well, one is going to need to know how much non-orthogonality there is. In the last quarter of the 20th century it became generally accepted to view ANOVA and regression analyses as specific instances of the General Linear Model, and the GLM as subset of more general linear/nonlinear analyses. But the distinction between analyses of true experiments versus quasi- and nonexperimental designs is maintained by having separate programs/procedures for ANOVA and regression instead of a single GLM program (not the one currently in SPSS). In the structural equation modeling (SEM) field, LISREL and MPLUS (I assume other SEM programs as well) have in fact presented this view: one can do everything from t-tests to MANOVA, canonical correlation analysis, and most of the other multivariate analyses with new capabilities being regularly added. So why use SPSS when an SEM package can do almost everything one wants (one can always use Excel or undergraduates to help clean and organize the data ;-). Okay, have I exceeded tldr? -Mike Palij New York University On Mon, Sep 7, 2020 at 9:49 AM Bruce Weaver <[hidden email]> wrote: Another recent thread (see link below) reminded me of a question that has |
Free forum by Nabble | Edit this page |