SPSSX Discussion - Re: Interpretation of principal component regression results

Re: Interpretation of principal component regression results

Posted by Jon K Peck on Jun 22, 2012; 1:29pm
URL: http://spssx-discussion.165.s1.nabble.com/Interpretation-of-principal-component-regression-results-tp5713752p5713758.html

You might want to consider Partial Least Squares for this situation. That is available as an extension command for Statistics. Or perhaps ridge, lasso or elastic net regression available in the Categories option.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621

From: Bruce Weaver <[hidden email]>
To: [hidden email]
Date: 06/22/2012 07:11 AM
Subject: Re: [SPSSX-L] Interpretation of principal component regression results
Sent by: "SPSSX(r) Discussion" <[hidden email]>

I don't have time to tackle your questions below. This post is just to acquaint you with an article by Hadi & Ling (1998) that highlights potential problems with PC regression. If you have institutional access to JSTOR, you can download it here: http://www.jstor.org/stable/10.2307/2685559For those without access, here's the abstract. Many textbooks on regression analysis include the methodology of principal components regression (PCR) as a way of treating multicollinearity problems. Although we have not encountered any strong justification of the methodology, we have encountered, through carrying out the methodology in well-known data sets with severe multicollinearity, serious actual and potential pitfalls in the methodology. We address these pitfalls as cautionary notes, numerical examples that use well-known data sets. We also illustrate by theory and example that it is possible for the PCR to fail miserably in the sense that when the response variable is regressed on all of the p principal components (PCs), the first (p - 1) PCs contribute nothing toward the reduction of the residual sum of squares, yet the last PC alone (the one that is always discarded according to PCR methodology) contributes everything. We then give conditions under which the PCR totally fails in the above sense. HTH. RuudM123 wrote > > Hello, > > I have a question about the interpretation of individual variables using a > PCA regression method. And because PCR requires a different interpretation > procedure I would like to ask how the following information should be > interpreted? > > First it should be noted that I use a metric DV, revenue. In addition, I > have 6 IV, all metric. > A correlation matrix of these 6 IV indicate very high pearson correlation > coefficients, even above .90. > In order to remedy the problem of multicollinearity I have used a > principal component analysis to transform the correlated variables into > uncorrelated principal components (factor scores) using the VARIMAX > rotation method. In sum the 6 IV can be explained by 3 components. > > Next I run a OLS multiple regression, with revenue as dependent variable > and the three factor scores as independent. Results indicate significant > R2 changes when a new factor score is added to the first that was > included. Overall, the model with 3 factor scores shows an adjusted R2 of > .875. > *Is there a reason why this R2 is so high based on the use of PCA?* > > In addition, all factor scores have large t-values ranging from 2,875 to > 14,505 that are significant at p < 0,01. > > Now I come to the point of interpretation, and I understand that > interpreting beta coefficients will only tell me that a one-unit increase > in factor 1 will increase revenue by .892. *Although I would like to go > further and interpret the effect of the individual IV included in the > factors.* I thought that I need the factor loadings in order to do so, the > results are provided below: > > The beta coefficients for the factors are as follows: > Factor score 1 = .892 (sign at p < .001) > Factor score 2 = -.246 (sign at p < .01) > Factor score 3 = .177 (sign at p < .001) > > The factor loadings for factor one are as follows: > IV1 = .971 > IV2 = .985 > IV3 = -.952 > > Example interpretation: Factor score 1 is positively related to revenue, > and therefore an increase in factor score 1 will increase revenue by .892. > In addition, the positive loadings for IV1 and IV2 indicate that an > increase in IV1 and IV2 will cause an increase in revenue. Although the > negative loading of IV3 indicate that a decrease of IV3 will cause an > increase in revenue. *Is this interpretation correct?* > > In addition, I would like to conclude that a one-unit increase of IV1 (IV2 > and IV3) will cause an increase (decrease) in revenue of .???? Is it > possible to make such an interpretation, and if so how can I do this in > SPSS?? > > Thanks in advance for your help!! > ----- -- Bruce Weaver [hidden email]http://sites.google.com/a/lakeheadu.ca/bweaver/"When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context:http://spssx-discussion.1045642.n5.nabble.com/Interpretation-of-principal-component-regression-results-tp5713752p5713757.htmlSent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD