Login  Register

Re: Interpretation of principal component regression results

Posted by Jon K Peck on Jun 22, 2012; 1:29pm
URL: http://spssx-discussion.165.s1.nabble.com/Interpretation-of-principal-component-regression-results-tp5713752p5713758.html

You might want to consider Partial Least Squares for this situation.  That is available as an extension command for Statistics.  Or perhaps ridge, lasso or elastic net regression available in the Categories option.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        Bruce Weaver <[hidden email]>
To:        [hidden email]
Date:        06/22/2012 07:11 AM
Subject:        Re: [SPSSX-L] Interpretation of principal component regression              results
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




I don't have time to tackle your questions below.  This post is just to
acquaint you with an article by Hadi & Ling (1998) that highlights potential
problems with PC regression.  If you have institutional access to JSTOR, you
can download it here:

 
http://www.jstor.org/stable/10.2307/2685559

For those without access, here's the abstract.

Many textbooks on regression analysis include the methodology of principal
components regression (PCR) as a way of treating multicollinearity problems.
Although we have not encountered any strong justification of the
methodology, we have encountered, through carrying out the methodology in
well-known data sets with severe multicollinearity, serious actual and
potential pitfalls in the methodology. We address these pitfalls as
cautionary notes, numerical examples that use well-known data sets. We also
illustrate by theory and example that it is possible for the PCR to fail
miserably in the sense that when the response variable is regressed on all
of the p principal components (PCs), the first (p - 1) PCs contribute
nothing toward the reduction of the residual sum of squares, yet the last PC
alone (the one that is always discarded according to PCR methodology)
contributes everything. We then give conditions under which the PCR totally
fails in the above sense.

HTH.


RuudM123 wrote
>
> Hello,
>
> I have a question about the interpretation of individual variables using a
> PCA regression method. And because PCR requires a different interpretation
> procedure I would like to ask how the following information should be
> interpreted?
>
> First it should be noted that I use a metric DV, revenue. In addition, I
> have 6 IV, all metric.
> A correlation matrix of these 6 IV indicate very high pearson correlation
> coefficients, even above .90.
> In order to remedy the problem of multicollinearity I have used a
> principal component analysis to transform the correlated variables into
> uncorrelated principal components (factor scores) using the VARIMAX
> rotation method. In sum the 6 IV can be explained by 3 components.
>
> Next I run a OLS multiple regression, with revenue as dependent variable
> and the three factor scores as independent. Results indicate significant
> R2 changes when a new factor score is added to the first that was
> included. Overall, the model with 3 factor scores shows an adjusted R2 of
> .875.
> *Is there a reason why this R2 is so high based on the use of PCA?*
>
> In addition, all factor scores have large t-values ranging from 2,875 to
> 14,505 that are significant at p < 0,01.
>
> Now I come to the point of interpretation, and I understand that
> interpreting beta coefficients will only tell me that a one-unit increase
> in factor 1 will increase revenue by .892. *Although I would like to go
> further and interpret the effect of the individual IV included in the
> factors.* I thought that I need the factor loadings in order to do so, the
> results are provided below:
>
> The beta coefficients for the factors are as follows:
> Factor score 1 = .892 (sign at p < .001)
> Factor score 2 = -.246 (sign at p < .01)
> Factor score 3 = .177 (sign at p < .001)
>
> The factor loadings for factor one are as follows:
> IV1 = .971
> IV2 = .985
> IV3 = -.952
>
> Example interpretation: Factor score 1 is positively related to revenue,
> and therefore an increase in factor score 1 will increase revenue by .892.
> In addition, the positive loadings for IV1 and IV2 indicate that an
> increase in IV1 and IV2 will cause an increase in revenue. Although the
> negative loading of IV3 indicate that a decrease of IV3 will cause an
> increase in revenue. *Is this interpretation correct?*
>
> In addition, I would like to conclude that a one-unit increase of IV1 (IV2
> and IV3) will cause an increase (decrease) in revenue of .???? Is it
> possible to make such an interpretation, and if so how can I do this in
> SPSS??
>
> Thanks in advance for your help!!
>


-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Interpretation-of-principal-component-regression-results-tp5713752p5713757.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD