SPSSX Discussion - Re: Interpretation of principal component regression results

Re: Interpretation of principal component regression results

Posted by Rich Ulrich on Jun 22, 2012; 6:03pm
URL: http://spssx-discussion.165.s1.nabble.com/Interpretation-of-principal-component-regression-results-tp5713752p5713764.html

No, the PCA is not the main reason that R2 is so high.

From what you describe, the first three variables are
"measuring" almost the same thing as each other; and any
one of them is going to (probably) give a very high R2, even
ignoring the others.

But I expect that you ought to go back to the start, and
re-figure the units or measures of your analysis. Why are
the r's so high among the predictors? (Can you do
something about that from logic, without the mess of PCA?)

Hmm... revenue ... I can imagine where the SIZE of the
enterprises, or what-have-you, is giving big correlations, and
that you could be seeing big r's only as artifacts of failing
to translate your measures to remove the uninteresting
component of size. Businesses sometimes use "revenue
per square foot" in comparing shops.

Super-high intercorrelation of predictors generally means
that you should be measuring *something* differently, if
you want to interpret something like "separate aspects" of
prediction. But -- Do you get a simple, interpretive factor
from your PCA? If the units are the same for two similar
(r > .95) itmes, you would have a simpler story to tell if
you take their simple average. And then, you preserve the
"second dimension" of the two by using their difference as
another predictor. Or, if these were something like "total
floor space" and "storage floor space", you might want to
use one of those scores alone (rather than the average)
and then the difference. Or, "Percent used as storage"
would show a difference in a less size-dependent way.

Of course, going back to what I mentioned before, if
you were predicting Revenues from Floor-space, you might
be better advised to divide Revenues *by* Floor-space in
order to get a size-independent outcome.

I'll repeat: re-figure your units. Eliminate "size" as an
artifact, if that's accounting for high r's. Look at the
measures rationally and first consider simple averages
or ratios that are familiar or that it would be easy to make
sense of.

--
Rich Ulrich

> Date: Fri, 22 Jun 2012 00:47:08 -0700

> From: [hidden email]
> Subject: Interpretation of principal component regression results
> To: [hidden email]
>
> Hello,
>
> I have a question about the interpretation of individual variables using a
> PCA regression method. And because PCR requires a different interpretation
> procedure I would like to ask how the following information should be
> interpreted?
>
> First it should be noted that I use a metric DV, revenue. In addition, I
> have 6 IV, all metric.
> A correlation matrix of these 6 IV indicate very high pearson correlation
> coefficients, even above .90.
> In order to remedy the problem of multicollinearity I have used a principal
> component analysis to transform the correlated variables into uncorrelated
> principal components (factor scores) using the VARIMAX rotation method. In
> sum the 6 IV can be explained by 3 components.
>
> Next I run a OLS multiple regression, with revenue as dependent variable and
> the three factor scores as independent. Results indicate significant R2
> changes when a new factor score is added to the first that was included.
> Overall, the model with 3 factor scores shows an adjusted R2 of .875.
> *Is there a reason why this R2 is so high based on the use of PCA?*
>
> In addition, all factor scores have large t-values ranging from 2,875 to
> 14,505 that are significant at p < 0,01.
>
> Now I come to the point of interpretation, and I understand that
> interpreting beta coefficients will only tell me that a one-unit increase in
> factor 1 will increase revenue by .892. *Although I would like to go further
> and interpret the effect of the individual IV included in the factors.* I
> thought that I need the factor loadings in order to do so, the results are
> provided below:
>
> The beta coefficients for the factors are as follows:
> Factor score 1 = .892 (sign at p < .001)
> Factor score 2 = -.246 (sign at p < .01)
> Factor score 3 = .177 (sign at p < .001)
>
> The factor loadings for factor one are as follows:
> IV1 = .971
> IV2 = .985
> IV3 = -.952
>
> Example interpretation: Factor score 1 is positively related to revenue, and
> therefore an increase in factor score 1 will increase revenue by .892. In
> addition, the positive loadings for IV1 and IV2 indicate that an increase in
> IV1 and IV2 will cause an increase in revenue. Although the negative loading
> of IV3 indicate that a decrease of IV3 will cause an increase in revenue.
> *Is this interpretation correct?*
>
> In addition, I would like to conclude that a one-unit increase of IV1 (IV2
> and IV3) will cause an increase (decrease) in revenue of .???? Is it
> possible to make such an interpretation, and if so how can I do this in
> SPSS??
>
> Thanks in advance for your help!!
>
>
>
>
>
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Interpretation-of-principal-component-regression-results-tp5713752.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD