Interpretation of principal component regression results

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Interpretation of principal component regression results

RuudM123
Hello,

I have a question about the interpretation of individual variables using a PCA regression method. And because PCR requires a different interpretation procedure I would like to ask how the following information should be interpreted?

First it should be noted that I use a metric DV, revenue. In addition, I have 6 IV, all metric.
A correlation matrix of these 6 IV indicate very high pearson correlation coefficients, even above .90.
In order to remedy the problem of multicollinearity I have used a principal component analysis to transform the correlated variables into uncorrelated principal components (factor scores) using the VARIMAX rotation method. In sum the 6 IV can be explained by 3 components.

Next I run a OLS multiple regression, with revenue as dependent variable and the three factor scores as independent. Results indicate significant R2 changes when a new factor score is added to the first that was included. Overall, the model with 3 factor scores shows an adjusted R2 of .875.
Is there a reason why this R2 is so high based on the use of PCA?

In addition, all factor scores have large t-values ranging from 2,875 to 14,505 that are significant at p < 0,01.

Now I come to the point of interpretation, and I understand that interpreting beta coefficients will only tell me that a one-unit increase in factor 1 will increase revenue by .892. Although I would like to go further and interpret the effect of the individual IV included in the factors. I thought that I need the factor loadings in order to do so, the results are provided below:

The beta coefficients for the factors are as follows:
Factor score 1 = .892 (sign at p < .001)
Factor score 2 = -.246 (sign at p < .01)
Factor score 3 = .177 (sign at p < .001)

The factor loadings for factor one are as follows:
IV1 = .971
IV2 = .985
IV3 = -.952

Example interpretation: Factor score 1 is positively related to revenue, and therefore an increase in factor score 1 will increase revenue by .892. In addition, the positive loadings for IV1 and IV2 indicate that an increase in IV1 and IV2 will cause an increase in revenue. Although the negative loading of IV3 indicate that a decrease of IV3 will cause an increase in revenue. Is this interpretation correct?

In addition, I would like to conclude that a one-unit increase of IV1 (IV2 and IV3) will cause an increase (decrease) in revenue of .???? Is it possible to make such an interpretation, and if so how can I do this in SPSS??

Thanks in advance for your help!!




Reply | Threaded
Open this post in threaded view
|

Automatic reply: Interpretation of principal component regression results

Lorin Drake
I will be out of the office on Friday June 22. I will have occasional access to email.
If your message is urgent, please call our switchboard and someone will direct your call: (813) 207-0332.
Reply | Threaded
Open this post in threaded view
|

Re: Interpretation of principal component regression results

Bruce Weaver
Administrator
In reply to this post by RuudM123
I don't have time to tackle your questions below.  This post is just to acquaint you with an article by Hadi & Ling (1998) that highlights potential problems with PC regression.  If you have institutional access to JSTOR, you can download it here:

   http://www.jstor.org/stable/10.2307/2685559

For those without access, here's the abstract.

Many textbooks on regression analysis include the methodology of principal components regression (PCR) as a way of treating multicollinearity problems. Although we have not encountered any strong justification of the methodology, we have encountered, through carrying out the methodology in well-known data sets with severe multicollinearity, serious actual and potential pitfalls in the methodology. We address these pitfalls as cautionary notes, numerical examples that use well-known data sets. We also illustrate by theory and example that it is possible for the PCR to fail miserably in the sense that when the response variable is regressed on all of the p principal components (PCs), the first (p - 1) PCs contribute nothing toward the reduction of the residual sum of squares, yet the last PC alone (the one that is always discarded according to PCR methodology) contributes everything. We then give conditions under which the PCR totally fails in the above sense.

HTH.

RuudM123 wrote
Hello,

I have a question about the interpretation of individual variables using a PCA regression method. And because PCR requires a different interpretation procedure I would like to ask how the following information should be interpreted?

First it should be noted that I use a metric DV, revenue. In addition, I have 6 IV, all metric.
A correlation matrix of these 6 IV indicate very high pearson correlation coefficients, even above .90.
In order to remedy the problem of multicollinearity I have used a principal component analysis to transform the correlated variables into uncorrelated principal components (factor scores) using the VARIMAX rotation method. In sum the 6 IV can be explained by 3 components.

Next I run a OLS multiple regression, with revenue as dependent variable and the three factor scores as independent. Results indicate significant R2 changes when a new factor score is added to the first that was included. Overall, the model with 3 factor scores shows an adjusted R2 of .875.
Is there a reason why this R2 is so high based on the use of PCA?

In addition, all factor scores have large t-values ranging from 2,875 to 14,505 that are significant at p < 0,01.

Now I come to the point of interpretation, and I understand that interpreting beta coefficients will only tell me that a one-unit increase in factor 1 will increase revenue by .892. Although I would like to go further and interpret the effect of the individual IV included in the factors. I thought that I need the factor loadings in order to do so, the results are provided below:

The beta coefficients for the factors are as follows:
Factor score 1 = .892 (sign at p < .001)
Factor score 2 = -.246 (sign at p < .01)
Factor score 3 = .177 (sign at p < .001)

The factor loadings for factor one are as follows:
IV1 = .971
IV2 = .985
IV3 = -.952

Example interpretation: Factor score 1 is positively related to revenue, and therefore an increase in factor score 1 will increase revenue by .892. In addition, the positive loadings for IV1 and IV2 indicate that an increase in IV1 and IV2 will cause an increase in revenue. Although the negative loading of IV3 indicate that a decrease of IV3 will cause an increase in revenue. Is this interpretation correct?

In addition, I would like to conclude that a one-unit increase of IV1 (IV2 and IV3) will cause an increase (decrease) in revenue of .???? Is it possible to make such an interpretation, and if so how can I do this in SPSS??

Thanks in advance for your help!!
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Interpretation of principal component regression results

Poes, Matthew Joseph
In reply to this post by RuudM123
Your interpretation is roughly correct.  I might suggest that you consider the CATREG procedure in SPSS instead though.  It will allow you to deal with the multicolinearity using ridge regression, lasso, or elastic net for linearization, and doesn't suffer the problems mentioned in the shared article by Bruce Weaver.  You can look at the variables independently, or if you prefer, in factor form (You need to do this beforehand, just as you did with the PCR).

The coefficients can be thought of as standardized coefficients in this case, and thus you could treat them as standard deviation units.  If you can figure out what 1 standard deviation unit is, you simply multiply the two.  As far as I know, there is no way to force SPSS to spit that out for you, you will need to hand calculate the values.  However, you can't really make a clear interpretation as you hope with the principle objects you have created, all you can easily say is that a 1 unit increase in that object is equal to a .892*(SD) change in the revenue.  It always ends up a bit more ambiguous.

Hope that helps.

Matthew J Poes
Research Data Specialist
Center for Prevention Research and Development
University of Illinois
510 Devonshire Dr.
Champaign, IL 61820
Phone: 217-265-4576
email: [hidden email]


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of RuudM123
Sent: Friday, June 22, 2012 2:47 AM
To: [hidden email]
Subject: Interpretation of principal component regression results

Hello,

I have a question about the interpretation of individual variables using a PCA regression method. And because PCR requires a different interpretation procedure I would like to ask how the following information should be interpreted?

First it should be noted that I use a metric DV, revenue. In addition, I have 6 IV, all metric.
A correlation matrix of these 6 IV indicate very high pearson correlation coefficients, even above .90.
In order to remedy the problem of multicollinearity I have used a principal component analysis to transform the correlated variables into uncorrelated principal components (factor scores) using the VARIMAX rotation method. In sum the 6 IV can be explained by 3 components.

Next I run a OLS multiple regression, with revenue as dependent variable and the three factor scores as independent. Results indicate significant R2 changes when a new factor score is added to the first that was included.
Overall, the model with 3 factor scores shows an adjusted R2 of .875.
*Is there a reason why this R2 is so high based on the use of PCA?*

In addition, all factor scores have large t-values ranging from 2,875 to
14,505 that are significant at p < 0,01.

Now I come to the point of interpretation, and I understand that interpreting beta coefficients will only tell me that a one-unit increase in factor 1 will increase revenue by .892. *Although I would like to go further and interpret the effect of the individual IV included in the factors.* I thought that I need the factor loadings in order to do so, the results are provided below:

The beta coefficients for the factors are as follows:
Factor score 1 = .892 (sign at p < .001) Factor score 2 = -.246 (sign at p < .01) Factor score 3 = .177 (sign at p < .001)

The factor loadings for factor one are as follows:
IV1 = .971
IV2 = .985
IV3 = -.952

Example interpretation: Factor score 1 is positively related to revenue, and therefore an increase in factor score 1 will increase revenue by .892. In addition, the positive loadings for IV1 and IV2 indicate that an increase in
IV1 and IV2 will cause an increase in revenue. Although the negative loading of IV3 indicate that a decrease of IV3 will cause an increase in revenue.
*Is this interpretation correct?*

In addition, I would like to conclude that a one-unit increase of IV1 (IV2 and IV3) will cause an increase (decrease) in revenue of .???? Is it possible to make such an interpretation, and if so how can I do this in SPSS??

Thanks in advance for your help!!






--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Interpretation-of-principal-component-regression-results-tp5713752.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Interpretation of principal component regression results

Jon K Peck
In reply to this post by Bruce Weaver
You might want to consider Partial Least Squares for this situation.  That is available as an extension command for Statistics.  Or perhaps ridge, lasso or elastic net regression available in the Categories option.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        Bruce Weaver <[hidden email]>
To:        [hidden email]
Date:        06/22/2012 07:11 AM
Subject:        Re: [SPSSX-L] Interpretation of principal component regression              results
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




I don't have time to tackle your questions below.  This post is just to
acquaint you with an article by Hadi & Ling (1998) that highlights potential
problems with PC regression.  If you have institutional access to JSTOR, you
can download it here:

 
http://www.jstor.org/stable/10.2307/2685559

For those without access, here's the abstract.

Many textbooks on regression analysis include the methodology of principal
components regression (PCR) as a way of treating multicollinearity problems.
Although we have not encountered any strong justification of the
methodology, we have encountered, through carrying out the methodology in
well-known data sets with severe multicollinearity, serious actual and
potential pitfalls in the methodology. We address these pitfalls as
cautionary notes, numerical examples that use well-known data sets. We also
illustrate by theory and example that it is possible for the PCR to fail
miserably in the sense that when the response variable is regressed on all
of the p principal components (PCs), the first (p - 1) PCs contribute
nothing toward the reduction of the residual sum of squares, yet the last PC
alone (the one that is always discarded according to PCR methodology)
contributes everything. We then give conditions under which the PCR totally
fails in the above sense.

HTH.


RuudM123 wrote
>
> Hello,
>
> I have a question about the interpretation of individual variables using a
> PCA regression method. And because PCR requires a different interpretation
> procedure I would like to ask how the following information should be
> interpreted?
>
> First it should be noted that I use a metric DV, revenue. In addition, I
> have 6 IV, all metric.
> A correlation matrix of these 6 IV indicate very high pearson correlation
> coefficients, even above .90.
> In order to remedy the problem of multicollinearity I have used a
> principal component analysis to transform the correlated variables into
> uncorrelated principal components (factor scores) using the VARIMAX
> rotation method. In sum the 6 IV can be explained by 3 components.
>
> Next I run a OLS multiple regression, with revenue as dependent variable
> and the three factor scores as independent. Results indicate significant
> R2 changes when a new factor score is added to the first that was
> included. Overall, the model with 3 factor scores shows an adjusted R2 of
> .875.
> *Is there a reason why this R2 is so high based on the use of PCA?*
>
> In addition, all factor scores have large t-values ranging from 2,875 to
> 14,505 that are significant at p < 0,01.
>
> Now I come to the point of interpretation, and I understand that
> interpreting beta coefficients will only tell me that a one-unit increase
> in factor 1 will increase revenue by .892. *Although I would like to go
> further and interpret the effect of the individual IV included in the
> factors.* I thought that I need the factor loadings in order to do so, the
> results are provided below:
>
> The beta coefficients for the factors are as follows:
> Factor score 1 = .892 (sign at p < .001)
> Factor score 2 = -.246 (sign at p < .01)
> Factor score 3 = .177 (sign at p < .001)
>
> The factor loadings for factor one are as follows:
> IV1 = .971
> IV2 = .985
> IV3 = -.952
>
> Example interpretation: Factor score 1 is positively related to revenue,
> and therefore an increase in factor score 1 will increase revenue by .892.
> In addition, the positive loadings for IV1 and IV2 indicate that an
> increase in IV1 and IV2 will cause an increase in revenue. Although the
> negative loading of IV3 indicate that a decrease of IV3 will cause an
> increase in revenue. *Is this interpretation correct?*
>
> In addition, I would like to conclude that a one-unit increase of IV1 (IV2
> and IV3) will cause an increase (decrease) in revenue of .???? Is it
> possible to make such an interpretation, and if so how can I do this in
> SPSS??
>
> Thanks in advance for your help!!
>


-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Interpretation-of-principal-component-regression-results-tp5713752p5713757.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Interpretation of principal component regression results

David Marso
Administrator
In reply to this post by RuudM123
I thought people (should have) stopped doing this sort of thing about 25 years ago in favor of SEM?
--
RuudM123 wrote
Hello,

I have a question about the interpretation of individual variables using a PCA regression method. And because PCR requires a different interpretation procedure I would like to ask how the following information should be interpreted?

First it should be noted that I use a metric DV, revenue. In addition, I have 6 IV, all metric.
A correlation matrix of these 6 IV indicate very high pearson correlation coefficients, even above .90.
In order to remedy the problem of multicollinearity I have used a principal component analysis to transform the correlated variables into uncorrelated principal components (factor scores) using the VARIMAX rotation method. In sum the 6 IV can be explained by 3 components.

Next I run a OLS multiple regression, with revenue as dependent variable and the three factor scores as independent. Results indicate significant R2 changes when a new factor score is added to the first that was included. Overall, the model with 3 factor scores shows an adjusted R2 of .875.
Is there a reason why this R2 is so high based on the use of PCA?

In addition, all factor scores have large t-values ranging from 2,875 to 14,505 that are significant at p < 0,01.

Now I come to the point of interpretation, and I understand that interpreting beta coefficients will only tell me that a one-unit increase in factor 1 will increase revenue by .892. Although I would like to go further and interpret the effect of the individual IV included in the factors. I thought that I need the factor loadings in order to do so, the results are provided below:

The beta coefficients for the factors are as follows:
Factor score 1 = .892 (sign at p < .001)
Factor score 2 = -.246 (sign at p < .01)
Factor score 3 = .177 (sign at p < .001)

The factor loadings for factor one are as follows:
IV1 = .971
IV2 = .985
IV3 = -.952

Example interpretation: Factor score 1 is positively related to revenue, and therefore an increase in factor score 1 will increase revenue by .892. In addition, the positive loadings for IV1 and IV2 indicate that an increase in IV1 and IV2 will cause an increase in revenue. Although the negative loading of IV3 indicate that a decrease of IV3 will cause an increase in revenue. Is this interpretation correct?

In addition, I would like to conclude that a one-unit increase of IV1 (IV2 and IV3) will cause an increase (decrease) in revenue of .???? Is it possible to make such an interpretation, and if so how can I do this in SPSS??

Thanks in advance for your help!!
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Interpretation of principal component regression results

Poes, Matthew Joseph
In reply to this post by RuudM123
I missed this initially in reviewing your email.  I think you should carefully review your steps and be sure you haven't done something silly.  The large R2 and large T values all indicate that you may be trying to predict your DV with an IV or set of IV's that are the same thing.  The amount of variance explained certainly isn't impossible, but its high enough I would start checking.  An R2 that large means correlations of over .9.  Any chance you included the DV or something nearly equal to the DV somewhere in there?  You didn't include them in the PCA right?  It seems to me like maybe your correlation between all IV's and even the DV is very high.  While this could be great (you have found the perfect predictors) it could also mean they are all essentially measuring the same thing.  For instance, if you know all the factors associated with someone's paid taxes, you can predict to a very high degree their salary, but then, they are so inter-related, what's the point.  If y!
 ou know someone's BMI and Height, you can predict to a very high degree their weight, again, not really so useful though.  Just make sure you haven't done something like that.

Matthew J Poes
Research Data Specialist
Center for Prevention Research and Development
University of Illinois
510 Devonshire Dr.
Champaign, IL 61820
Phone: 217-265-4576
email: [hidden email]



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of RuudM123
Sent: Friday, June 22, 2012 2:47 AM
To: [hidden email]
Subject: Interpretation of principal component regression results

Hello,

I have a question about the interpretation of individual variables using a PCA regression method. And because PCR requires a different interpretation procedure I would like to ask how the following information should be interpreted?

First it should be noted that I use a metric DV, revenue. In addition, I have 6 IV, all metric.
A correlation matrix of these 6 IV indicate very high pearson correlation coefficients, even above .90.
In order to remedy the problem of multicollinearity I have used a principal component analysis to transform the correlated variables into uncorrelated principal components (factor scores) using the VARIMAX rotation method. In sum the 6 IV can be explained by 3 components.

Next I run a OLS multiple regression, with revenue as dependent variable and the three factor scores as independent. Results indicate significant R2 changes when a new factor score is added to the first that was included.
Overall, the model with 3 factor scores shows an adjusted R2 of .875.
*Is there a reason why this R2 is so high based on the use of PCA?*

In addition, all factor scores have large t-values ranging from 2,875 to
14,505 that are significant at p < 0,01.

Now I come to the point of interpretation, and I understand that interpreting beta coefficients will only tell me that a one-unit increase in factor 1 will increase revenue by .892. *Although I would like to go further and interpret the effect of the individual IV included in the factors.* I thought that I need the factor loadings in order to do so, the results are provided below:

The beta coefficients for the factors are as follows:
Factor score 1 = .892 (sign at p < .001) Factor score 2 = -.246 (sign at p < .01) Factor score 3 = .177 (sign at p < .001)

The factor loadings for factor one are as follows:
IV1 = .971
IV2 = .985
IV3 = -.952

Example interpretation: Factor score 1 is positively related to revenue, and therefore an increase in factor score 1 will increase revenue by .892. In addition, the positive loadings for IV1 and IV2 indicate that an increase in
IV1 and IV2 will cause an increase in revenue. Although the negative loading of IV3 indicate that a decrease of IV3 will cause an increase in revenue.
*Is this interpretation correct?*

In addition, I would like to conclude that a one-unit increase of IV1 (IV2 and IV3) will cause an increase (decrease) in revenue of .???? Is it possible to make such an interpretation, and if so how can I do this in SPSS??

Thanks in advance for your help!!






--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Interpretation-of-principal-component-regression-results-tp5713752.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

FW: Interpretation of principal component regression results

Anthony Babinec
In reply to this post by RuudM123
In a nutshell, the problem with principal components regression is that
the principal components are formed without taking into account the
association between the predictors and the target variable. As others
have noted, you might consider PLS or methods such as ridge regression,
the lasso, or the elastic net. For a reference on these, see The Elements
of Statistical Learning 2nd edition by Hastie, Tibshirani, and Friedman.
As you learn about these methods, you need to consider whether
standardizing the variables makes a difference in the answer  you get.
A newer method that works well in your situation is correlated component
regression. This method is implemented in CORExpress and in the Excel
add-in XLSTAT. For tutorials and background papers, see the Statistical
Innovations website.

Tony Babinec
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Interpretation of principal component regression results

Rich Ulrich
In reply to this post by RuudM123
No, the PCA is not the main reason that R2 is so high.

From what you describe, the first three variables are
"measuring" almost the same thing as each other; and any
one of them is going to (probably) give a very high R2, even
ignoring the others.

But I expect that you ought to go back to the start, and
re-figure the units or measures of your analysis.  Why are
the r's so high among the predictors?  (Can you do
something about that from logic, without the mess of PCA?)

Hmm...  revenue ...  I can imagine where the SIZE of the
enterprises, or what-have-you, is giving big correlations, and
that you could be seeing big r's only as artifacts of failing
to translate your measures to remove the uninteresting
component of size.  Businesses sometimes use "revenue
per square foot" in comparing shops.

Super-high intercorrelation of predictors generally means
that you should be measuring *something* differently, if
you want to interpret something like "separate aspects" of
prediction.  But -- Do you get a simple, interpretive factor
from your PCA?  If the units are the same for two similar
(r > .95) itmes, you would have a simpler story to tell if
you take their simple average.  And then, you preserve the
"second dimension" of the two by using their difference as
another predictor.  Or, if these were something like "total
floor space" and "storage floor space", you might want to
use one of those scores alone (rather than the average)
and then the difference.  Or, "Percent used as storage"
would show a difference in a less size-dependent way.

Of course, going back to what I mentioned before, if
you were predicting Revenues from Floor-space, you might
be better advised to divide Revenues *by* Floor-space in
order to get a size-independent outcome.

I'll repeat:  re-figure your units.  Eliminate "size" as an
artifact, if that's accounting for high r's.  Look at the
measures rationally and first consider simple averages
or ratios that are familiar or that it would be easy to make
sense of. 

--
Rich Ulrich

> Date: Fri, 22 Jun 2012 00:47:08 -0700

> From: [hidden email]
> Subject: Interpretation of principal component regression results
> To: [hidden email]
>
> Hello,
>
> I have a question about the interpretation of individual variables using a
> PCA regression method. And because PCR requires a different interpretation
> procedure I would like to ask how the following information should be
> interpreted?
>
> First it should be noted that I use a metric DV, revenue. In addition, I
> have 6 IV, all metric.
> A correlation matrix of these 6 IV indicate very high pearson correlation
> coefficients, even above .90.
> In order to remedy the problem of multicollinearity I have used a principal
> component analysis to transform the correlated variables into uncorrelated
> principal components (factor scores) using the VARIMAX rotation method. In
> sum the 6 IV can be explained by 3 components.
>
> Next I run a OLS multiple regression, with revenue as dependent variable and
> the three factor scores as independent. Results indicate significant R2
> changes when a new factor score is added to the first that was included.
> Overall, the model with 3 factor scores shows an adjusted R2 of .875.
> *Is there a reason why this R2 is so high based on the use of PCA?*
>
> In addition, all factor scores have large t-values ranging from 2,875 to
> 14,505 that are significant at p < 0,01.
>
> Now I come to the point of interpretation, and I understand that
> interpreting beta coefficients will only tell me that a one-unit increase in
> factor 1 will increase revenue by .892. *Although I would like to go further
> and interpret the effect of the individual IV included in the factors.* I
> thought that I need the factor loadings in order to do so, the results are
> provided below:
>
> The beta coefficients for the factors are as follows:
> Factor score 1 = .892 (sign at p < .001)
> Factor score 2 = -.246 (sign at p < .01)
> Factor score 3 = .177 (sign at p < .001)
>
> The factor loadings for factor one are as follows:
> IV1 = .971
> IV2 = .985
> IV3 = -.952
>
> Example interpretation: Factor score 1 is positively related to revenue, and
> therefore an increase in factor score 1 will increase revenue by .892. In
> addition, the positive loadings for IV1 and IV2 indicate that an increase in
> IV1 and IV2 will cause an increase in revenue. Although the negative loading
> of IV3 indicate that a decrease of IV3 will cause an increase in revenue.
> *Is this interpretation correct?*
>
> In addition, I would like to conclude that a one-unit increase of IV1 (IV2
> and IV3) will cause an increase (decrease) in revenue of .???? Is it
> possible to make such an interpretation, and if so how can I do this in
> SPSS??
>
> Thanks in advance for your help!!
>
>
>
>
>
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Interpretation-of-principal-component-regression-results-tp5713752.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Interpretation of principal component regression results

RuudM123
In reply to this post by RuudM123
Dear All,

Thank you for all your explanatory comments. I changed my dependent for one that didn't correlate so high with the independent variables and results are good.

For illustration purposes I would like to make sure that I don't do anything wrong, therefore I wanted to ask the following question based on PCR:

Results:
Factor 1 - beta coefficient of .452 which is significant at p < 0,001
IV1 - factor loading of .985
IV3 - factor loading of -.952

Factor 2 - beta coefficient of -.456 which is significant at p < 0,001
IV5 (interaction of IV1*IV3) - factor loading of .854

Conclusions:
- IV1 (in factor 1) has a significant beta coefficient of .452 (for the factor) and a factor loading of .985 (for the variable) = which means that IV1 has a positive linear relation on the dependent.

- IV3 (in factor 1) has a significant beta coefficient of .452 (for the factor) and a factor loading of -.952 (for the variable) = which means that IV2 has a negative linear relation on the dependent.

- And IV5, which is the interaction term of IV1*IV3, (in factor 2) has a significant beta coefficient -.456 (for factor 2) and a factor loading of .854 (for the variable). Because IV5 is the interaction term of IV1 and IV3, does this mean that the relation of IV1 to the dependent starts off as a positive and significant relation, but based on the fact that the size of IV3 increases the initial relation of IV1 to the dependent decreases (at a negative beta coefficient value that PCR can not specify for IV5). Which would mean that IV3 has a moderator effect, in which if its size increases, it will force the initial direct relation of IV1 to the dependent to shrink.

Do you have some elaborating comments on this final interpretation? I am not 100% sure if my interpretation is correct, therefore this final question.

Thank you very much in advance.
Reply | Threaded
Open this post in threaded view
|

Re: Interpretation of principal component regression results

Willbaileyz @ E
In reply to this post by RuudM123
Getting in on the end of this thread so perhaps my question isn't
appropriate but .... You said:

"I changed my dependent for one that didn't correlate so high with the
independent variables and results are good."

This sounds to me that you changed your hypothesis so as to get the results
you are looking for or at least to get 'better' results? Not a viable
research approach so do explain more as to the reasoning and soundness

W

On 6/25/2012 10:45:33 AM, RuudM123 ([hidden email]) wrote:

> Dear All,
>
> Thank you for all your explanatory comments. I changed my dependent for
> one
> that
> didn't correlate so high with the independent variables and results are
> good.
>
> For illustration purposes I would like to make sure that I don't
> do anything
> wrong, therefore I wanted to ask the following question based on PCR:
>
> Results:
> Factor 1 - beta coefficient of .452 which is significant at p < 0,001
> IV1 - factor loading of .985
> IV3 - factor loading of -.952
>
> Factor 2 - beta coefficient of -.456 which is significant at p < 0,001
> IV5 (interaction of IV1*IV3) - factor loading of .854
>
> Conclusions:
> - IV1 (in factor 1) has a significant beta coefficient of .452 (for the
> factor) and a factor loading of .985 (for the variable) = which means
> that
> IV1 has a positive linear relation on the dependent.
>
> - IV3 (in factor 1) has a significant beta coefficient of .452 (for the
> factor) and a factor loading of -.952 (for the variable) = which means
> that
> IV2 has a negative linear relation on the dependent.
>
> - And IV5, which is the interaction term of IV1*IV3, (in factor 2) has a
> significant

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Interpretation of principal component regression results

Rich Ulrich
In reply to this post by RuudM123
Okay.  It sounds like you took my recommendation of
examining the units, and simplifying the problem.  You
originally had predictors 1-6, and high intercorrelations.
Now you have two predictor variables, which are created
out of IV1 and IV3, plus their interaction; and these are
entered and considered from them role in Factors ...
which (?) include some fairly trivial weights for the other
original IVs?

No, I would not be satisfied with the description.
What happened to the other variables?  Should you
focus on saying something -- or, as a first step,
*describing* what you have -- in the best terms that
you can, for IV1 and IV3?  (and incorporate everything
else as a subsequent step).  What you have presented
does not seem like revealing description, even if you could
fill in the terms.

I'd say that it is pretty impossible to use coefficients alone
to interpret the effects of IV1, IV3 and their interaction,
given the *high* correlation between them.  "Scaling"
peculiarities (like basement or ceiling effects, or other
differences in intervals) could account for a vast range
of results.  Explore this.  Look at outcomes for various
explicit combinations;  and look at the *fitted*, predicted
outcomes, similarly.  This will show you what you are asking,
your description of what you have.  It may also (by the
deviations of fit, if they are systematic) suggest where
there is still a problem of fit. 

It might be possible to advise more concretely if you
described your variables concretely.

--
Rich Ulrich


> Date: Mon, 25 Jun 2012 07:45:33 -0700

> From: [hidden email]
> Subject: Re: Interpretation of principal component regression results
> To: [hidden email]
>
> Dear All,
>
> Thank you for all your explanatory comments. I changed my dependent for one
> that didn't correlate so high with the independent variables and results are
> good.
>
> For illustration purposes I would like to make sure that I don't do anything
> wrong, therefore I wanted to ask the following question based on PCR:
>
> Results:
> Factor 1 - beta coefficient of .452 which is significant at p < 0,001
> IV1 - factor loading of .985
> IV3 - factor loading of -.952
>
> Factor 2 - beta coefficient of -.456 which is significant at p < 0,001
> IV5 (interaction of IV1*IV3) - factor loading of .854
>
> Conclusions:
> - IV1 (in factor 1) has a significant beta coefficient of .452 (for the
> factor) and a factor loading of .985 (for the variable) = which means that
> IV1 has a positive linear relation on the dependent.
>
> - IV3 (in factor 1) has a significant beta coefficient of .452 (for the
> factor) and a factor loading of -.952 (for the variable) = which means that
> IV2 has a negative linear relation on the dependent.
>
> - And IV5, which is the interaction term of IV1*IV3, (in factor 2) has a
> significant beta coefficient -.456 (for factor 2) and a factor loading of
> .854 (for the variable). Because IV5 is the interaction term of IV1 and IV3,
> does this mean that the relation of IV1 to the dependent starts off as a
> positive and significant relation, but based on the fact that the size of
> IV3 increases the initial relation of IV1 to the dependent decreases (at a
> negative beta coefficient value that PCR can not specify for IV5). Which
> would mean that IV3 has a moderator effect, in which if its size increases,
> it will force the initial direct relation of IV1 to the dependent to shrink.
>
> Do you have some elaborating comments on this final interpretation? I am not
> 100% sure if my interpretation is correct, therefore this final question.
>
> Thank you very much in advance.
>