Dear All,
I have started to look at PCA in SPSS and have a question regarding iterpreting some of the output, and how this relates to the "mathematical theory". I have the definition of PCA (greatly simplified) for the first component: For an (n x p) matrix of raw (unstandardised) data X, the 1st PC can be given by Y = a1X1 + a2X2 + ..... + apXp where Y is an n x 1 matrix (of new component scores?), ai is an element of the eigenvector which corresponds to the largest eigenvalue of the correlation matrix, and Xi is 1 x n matrix corresponding to row i of the data matrix X. So, then I run PCA (unrotated) in SPSS and get, amongst other things, (1) the (loadings) Component Matrix, (2) the Component Score Coefficient Matrix, and (3) the factor scores, which are saved in my SPSS data sheet. So basically, my question is, where in the SPSS output, if anywhere, is my Y matrix in the above definition, and my weighting values ai? Am I right in saying that my definition relates to the raw (unstandardised) data, but the SPSS output relates to standardised data? Am I right in saying that Y, as it is defined in the definition above, is not displayed in SPSS, however the normalised Y matrix is. And this normalised matrix Y is infact displayed in SPSS as fac1_1 in my SPSS data sheet? Am I right in saying that the (loadings) Component Matrix is a normalised version of my Eigenvector matrix, and if I divide each loading my sqrt (eigenvalue), I will get my eigenvalues, ai, above? I donât think this is correct, because it is the Component Score Coefficient Matrix that is used to calculate fac1_1 etc.. I am confused. Sorry this is so long, I am just trying to straighten it out in my head. Help appreciated. Barry |
Barry:
The ai's that you are referring to are the Factor Score coefficients as displayed in the Factor Score Coefficient Matrix. You are right that these ai's are applied to the standardized X's, so if you need to relate the unstandardized original variables to your factor scores, you need to divide the ai's by the standard deviation of the corresponding original X variables. Dan >From: Barry <[hidden email]> >Reply-To: Barry <[hidden email]> >To: [hidden email] >Subject: Interpretation of PCA >Date: Wed, 7 Mar 2007 06:51:05 -0500 > >Dear All, > >I have started to look at PCA in SPSS and have a question regarding >iterpreting some of the output, and how this relates to the "mathematical >theory". I have the definition of PCA (greatly simplified) for the first >component: > >For an (n x p) matrix of raw (unstandardised) data X, the 1st PC can be >given by > >Y = a1X1 + a2X2 + ..... + apXp > >where Y is an n x 1 matrix (of new component scores?), ai is an element of >the eigenvector which corresponds to the largest eigenvalue of the >correlation matrix, and Xi is 1 x n matrix corresponding to row i of the >data matrix X. > >So, then I run PCA (unrotated) in SPSS and get, amongst other things, (1) >the (loadings) Component Matrix, (2) the Component Score Coefficient >Matrix, and (3) the factor scores, which are saved in my SPSS data sheet. > >So basically, my question is, where in the SPSS output, if anywhere, is my >Y matrix in the above definition, and my weighting values ai? > >Am I right in saying that my definition relates to the raw (unstandardised) >data, but the SPSS output relates to standardised data? > >Am I right in saying that Y, as it is defined in the definition above, is >not displayed in SPSS, however the normalised Y matrix is. And this >normalised matrix Y is infact displayed in SPSS as fac1_1 in my SPSS data >sheet? > >Am I right in saying that the (loadings) Component Matrix is a normalised >version of my Eigenvector matrix, and if I divide each loading my sqrt >(eigenvalue), I will get my eigenvalues, ai, above? I donât think this >is >correct, because it is the Component Score Coefficient Matrix that is used >to calculate fac1_1 etc.. > >I am confused. > >Sorry this is so long, I am just trying to straighten it out in my head. > >Help appreciated. >Barry _________________________________________________________________ Mortgage rates as low as 4.625% - Refinance $150,000 loan for $579 a month. Intro*Terms https://www2.nextag.com/goto.jsp?product=100000035&url=%2fst.jsp&tm=y&search=mortgage_text_links_88_h27f6&disc=y&vers=743&s=4056&p=5117 |
In reply to this post by Barry-43
PCA works on standardized data. If you save
standardized variables, using Descriptives, in a data file called tmp.sav, you can run the syntax below to see how PCA works, either with standardized data using singulare value decomposition, or with correlation matrix using eigenvalue decomposition. The results of both are equal up to possible difference in sign for components. The loadings are in the component matrix. The component (factor) scores (what you call Y) are not displayed but can be saved as variables. The component scores coefficients matrix displays the regresssion coefficients for regresssion of the standardized variables on a component score. Regards, Anita van der Kooij Data Theory Group Leiden University DESCRIPTIVES VARIABLES= varlist /SAVE. MATRIX. * PCA on standardized data (normalized on 1 in stead of N-1) (SVD) *. get zdata /file = 'c:\path\tmp.sav'. compute N = NROW(zdata). compute M = NCOL(zdata). compute zdata = zdata / SQRT(N-1). CALL SVD (zdata, K, singval, L). compute singval = singval(1:m,1:m). compute eigval = singval**2. compute load = L * singval. compute fscores = K( : ,1:m) * SQRT(N-1). print eigval. print load. print fscores. END MATRIX. MATRIX. * PCA on correlation matrix (EVD) *. get zdata /file = 'c:\path\tmp.sav'. compute N = NROW(zdata). compute M = NCOL(zdata). compute zdata = zdata / SQRT(N-1). compute R = T(zdata) * zdata . CALL EIGEN (R, L, eigval). compute eigval = MDIAG(eigval). compute load= L * SQRT(eigval). compute K = (zdata * L) * INV(SQRT(eigval)). compute fscores = K( : ,1:m) * SQRT(N-1). print eigval. print load. print fscores. END MATRIX. ________________________________ From: SPSSX(r) Discussion on behalf of Barry Sent: Wed 07/03/2007 12:51 To: [hidden email] Subject: Interpretation of PCA Dear All, I have started to look at PCA in SPSS and have a question regarding iterpreting some of the output, and how this relates to the "mathematical theory". I have the definition of PCA (greatly simplified) for the first component: For an (n x p) matrix of raw (unstandardised) data X, the 1st PC can be given by Y = a1X1 + a2X2 + ..... + apXp where Y is an n x 1 matrix (of new component scores?), ai is an element of the eigenvector which corresponds to the largest eigenvalue of the correlation matrix, and Xi is 1 x n matrix corresponding to row i of the data matrix X. So, then I run PCA (unrotated) in SPSS and get, amongst other things, (1) the (loadings) Component Matrix, (2) the Component Score Coefficient Matrix, and (3) the factor scores, which are saved in my SPSS data sheet. So basically, my question is, where in the SPSS output, if anywhere, is my Y matrix in the above definition, and my weighting values ai? Am I right in saying that my definition relates to the raw (unstandardised) data, but the SPSS output relates to standardised data? Am I right in saying that Y, as it is defined in the definition above, is not displayed in SPSS, however the normalised Y matrix is. And this normalised matrix Y is infact displayed in SPSS as fac1_1 in my SPSS data sheet? Am I right in saying that the (loadings) Component Matrix is a normalised version of my Eigenvector matrix, and if I divide each loading my sqrt (eigenvalue), I will get my eigenvalues, ai, above? I donâEUR(tm)t think this is correct, because it is the Component Score Coefficient Matrix that is used to calculate fac1_1 etc.. I am confused. Sorry this is so long, I am just trying to straighten it out in my head. Help appreciated. Barry ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. ********************************************************************** |
In reply to this post by Barry-43
Dear Dan and Anita,
Thank you taking the time to reply. However, having read a bit more, I want to question this. I've been reading Dunteman's PCA book, and still haven't got this right in my head. This is what I understand. Please correct me in places where I am wrong. Assumption: we are using standardised data in this analysis. On using the correlation matrix, our aim is initially to calculate the eigenvalues (lambda{i}) and corresponding eigenvectors (a{i}) of the correlation matrix. The eigenvalues identify the amount of variance accounted for by each PC. The eigenvectors (the a{i}'s) are the weightings of the principle components and can be used in the linear expressions of each PC to determine the PC scores. By the scores, I mean the values of the new components that are going to be used to replace our original number of correlated variables, and that can be used in any future analysis instead of the original variables. To obtain the PC loadings, we multiply a {i} by the sq root of lambda{i}. Now, in SPSS, one of the outputs is the loadings matrix (given in SPSS as the Component matrix). Iâm assuming that this is equal to the PC loadings I have mentioned above. So in theory, if I divide each of these loadings by the sq root of the corresponding eigenvalue (that is, lambda{i}), then I get the eigenvectors (the a{i}âs that I am talking above). But SPSS doesnât actually display these eigenvectors (and hence does not specify the weightings which are used in the linear expression for each PC expression.) The expression I mean is one of the form (for example, for 1st component) Y{1} = a{11}X1 + a{12}X2 + so on for whatever number of variables we have. SPSS does however display these things called Component Score coefficients (in the Component Score Coefficient Matrix), and it is these that are used to calculate the component scores (according to SPSS and I think what others have said), which can be saved into the SPSS worksheet. However, as far as I understand (and can see), these Component Score coefficients are not the same as a{i}, eigenvectors, or weightings, which are used in the linear expression for each PC expression. So the component scores calculated in SPSS are not the same as the PC scores I am talking about above. It is this that is causing the confusion in my head. Can you please advise what I do not understand as regards to what I have said above? Many thanks. Barry |
>Please correct me in places where I am wrong.
Corrections inserted below. Regards, Anita >On using the correlation matrix, our aim is initially to calculate the >eigenvalues (lambda{i}) and corresponding eigenvectors (a{i}) of the >correlation matrix. The eigenvalues identify the amount of variance >accounted for by each PC. Yes. >The eigenvectors (the a{i}'s) are the weightings >of the principle components and can be used in the linear expressions of >each PC to determine the PC scores. No, the loadings are the weights. > By the scores, I mean the values of >the new components that are going to be used to replace our original number >of correlated variables, and that can be used in any future analysis >instead of the original variables. To obtain the PC loadings, we multiply a >{i} by the sq root of lambda{i}. Yes. >Now, in SPSS, one of the outputs is the loadings matrix (given in SPSS as >the Component matrix). IâEUR(tm)m assuming that this is equal to the PC loadings >I have mentioned above. Yes. >So in theory, if I divide each of these loadings >by the sq root of the corresponding eigenvalue (that is, lambda{i}), then I >get the eigenvectors (the a{i}âEUR(tm)s that I am talking above). Yes. >But SPSS doesnâEUR(tm)t actually display these eigenvectors (and hence does not specify the >weightings which are used in the linear expression for each PC >expression.) See above: the loadings are the weights to use, not the eigenvectors.. Y{1} = a{11}X1 + a{12}X2 + so on for whatever number of variables we have. SPSS does however display these things called Component Score coefficients (in the Component Score Coefficient Matrix), and it is these that are used to calculate the component scores (according to SPSS and I think what others have said), which can be saved into the SPSS worksheet. No, again, the loadings are used. However, as far as I understand (and can see), these Component Score coefficients are not the same as a{i}, eigenvectors, or weightings, which are used in the linear expression for each PC expression. So the component scores calculated in SPSS are not the same as the PC scores I am talking about above. To obtain the components scores: sum variables weighted with loadings and standardize the result: Y{1} = a{11}X1 + a{12}X2 + ... where a{11} is loading = eigenvectors{11} * SQRT(eigenvalue{1}). component score {1} is ZY{1} is standardized Y{y} . The component score coefficients are the (standardized) regression coefficients if you do regression with ZY{1} the dependent variable and the variables the independents. ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. ********************************************************************** |
You can compute Y{p} using loadings or using eigenvectors.
If you compute Y{p} using the loadings, the mean is 0 and the sum of squares is the square of the pth eigenvalue, so ZY{p} (standardized Y{p}) = Y{p} / eigval{p}. If you compute Y{p} using the eigenvectors, the mean is 0 and the sum of squares is the pth eigenvalue, then ZY{p} = Y{p} / sqrt(eigval{p}). You can see this by writing the SVD of the data: X = K D A' X is the data matrix, K the left singular vectors, A the right singular vectors (is equal to the eigenvectors), and D the diagonal matrix of singular values. with K' K = A' A = I. (svd of correlation matrix R: R = X' X = A D^2 A' ; D^2 is eigenvalues). L = A D is loadings K = component scores = X A D So, K{p} = ( l{11}*X1 + l{21}*X2 + ... ) / eigval{p} = ( a{11}*sqrt(eigval{p}*X1 + a{21}*sqrt(eigval{p}*X2 + ... ) / eigval{p} = ( a{11}*X1 + a{21}*X2 + ... ) * sqrt(eigval{p} / eigval{p} = ( a{11}*X1 + a{21}*X2 + ... ) / sqrt(eigval{p} This is the standard normalization in PCA (called 'variable normalization'); with this normalization the eigenvalues are in the loadings: L' L = D' A' A' D = D^2 = eigenvalues. Other normalizations are often used for biplots (plot of variables and subjects). For example, with 'subject normalization', the eigenvalues are in the component scores: loadings is A, component scores = K D, or symmetrical normalization, spreading the eigenvalues equally over both the loadings and the component scores: loadings = A D^1/2, component scores = K D^1/2. With the CATPCA procedure in the Categories module (Nonlinear PCA, for both categorical and numerical data, can also perform linear PCA), you can request the biplot and choose a normalization option. Regards, Anita van der Kooij Data Theory Group Leiden University ________________________________ From: Kooij, A.J. van der Sent: Mon 12/03/2007 19:06 To: [hidden email] Subject: RE: Re: Interpretation of PCA >Please correct me in places where I am wrong. Corrections inserted below. Regards, Anita >On using the correlation matrix, our aim is initially to calculate the >eigenvalues (lambda{i}) and corresponding eigenvectors (a{i}) of the >correlation matrix. The eigenvalues identify the amount of variance >accounted for by each PC. Yes. >The eigenvectors (the a{i}'s) are the weightings >of the principle components and can be used in the linear expressions of >each PC to determine the PC scores. No, the loadings are the weights. > By the scores, I mean the values of >the new components that are going to be used to replace our original number >of correlated variables, and that can be used in any future analysis >instead of the original variables. To obtain the PC loadings, we multiply a >{i} by the sq root of lambda{i}. Yes. >Now, in SPSS, one of the outputs is the loadings matrix (given in SPSS as >the Component matrix). IâEUR(tm)m assuming that this is equal to the PC loadings >I have mentioned above. Yes. >So in theory, if I divide each of these loadings >by the sq root of the corresponding eigenvalue (that is, lambda{i}), then I >get the eigenvectors (the a{i}âEUR(tm)s that I am talking above). Yes. >But SPSS doesnâEUR(tm)t actually display these eigenvectors (and hence does not specify the >weightings which are used in the linear expression for each PC >expression.) See above: the loadings are the weights to use, not the eigenvectors.. Y{1} = a{11}X1 + a{12}X2 + so on for whatever number of variables we have. SPSS does however display these things called Component Score coefficients (in the Component Score Coefficient Matrix), and it is these that are used to calculate the component scores (according to SPSS and I think what others have said), which can be saved into the SPSS worksheet. No, again, the loadings are used. However, as far as I understand (and can see), these Component Score coefficients are not the same as a{i}, eigenvectors, or weightings, which are used in the linear expression for each PC expression. So the component scores calculated in SPSS are not the same as the PC scores I am talking about above. To obtain the components scores: sum variables weighted with loadings and standardize the result: Y{1} = a{11}X1 + a{12}X2 + ... where a{11} is loading = eigenvectors{11} * SQRT(eigenvalue{1}). component score {1} is ZY{1} is standardized Y{y} . The component score coefficients are the (standardized) regression coefficients if you do regression with ZY{1} the dependent variable and the variables the independents. ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. ********************************************************************** |
Free forum by Nabble | Edit this page |