Hello everyone,
This is my first time posting, so if I need to change anything in this post please let me know! I am working on a paper myself and came across this research topic called AESPI (Aggregated Energy Security Performance Indicator). (The paper can be found here for those interested: https://www.sciencedirect.com/science/article/pii/S0306261912007337) So the same author has successfully applied AESPI in the case of Thailand (https://www.sciencedirect.com/science/article/pii/S0306261914003985) and has included all standardized data for 45 years in all 25 indicators. So here comes my dilemma, I have tried to reproduce their results using SPSS and performing a PCA on the standardized data, but including all variables leads to the "This matrix is not positive definite" error when trying to do a KMO Test. Additionally, the eigenvalues that are offered in the paper are different from mine. I get only 3 components, while they in their paper get 5. I have included the SPSS file I was using and a picture of the orginal data. <http://spssx-discussion.1045642.n5.nabble.com/file/t341397/DataPic.png> <http://spssx-discussion.1045642.n5.nabble.com/file/t341397/Rotated_Values.png> Thailand_Data_Test_Comparison.sav <http://spssx-discussion.1045642.n5.nabble.com/file/t341397/Thailand_Data_Test_Comparison.sav> -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
List members who do not use Nabble can find links to the uploaded files here:
http://spssx-discussion.1045642.n5.nabble.com/Trying-to-reproduce-PCA-analysis-of-a-published-paper-but-not-getting-same-results-td5735493.html HTH. Daradai wrote > Hello everyone, > > This is my first time posting, so if I need to change anything in this > post > please let me know! > I am working on a paper myself and came across this research topic called > AESPI (Aggregated Energy Security Performance Indicator). (The paper can > be > found here for those interested: > https://www.sciencedirect.com/science/article/pii/S0306261912007337) > > So the same author has successfully applied AESPI in the case of Thailand > (https://www.sciencedirect.com/science/article/pii/S0306261914003985) and > has included all standardized data for 45 years in all 25 indicators. > > So here comes my dilemma, I have tried to reproduce their results using > SPSS > and performing a PCA on the standardized data, but including all variables > leads to the "This matrix is not positive definite" error when trying to > do > a KMO Test. > Additionally, the eigenvalues that are offered in the paper are different > from mine. I get only 3 components, while they in their paper get 5. > > I have included the SPSS file I was using and a picture of the orginal > data. > <http://spssx-discussion.1045642.n5.nabble.com/file/t341397/DataPic.png> > <http://spssx-discussion.1045642.n5.nabble.com/file/t341397/Rotated_Values.png> > Thailand_Data_Test_Comparison.sav > <http://spssx-discussion.1045642.n5.nabble.com/file/t341397/Thailand_Data_Test_Comparison.sav> > > > > > > > > > -- > Sent from: http://spssx-discussion.1045642.n5.nabble.com/ > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Daradai
Okay, I'll make a couple of simple points and I'm sure that if I am terribly wrong (or even slightly), someone will correct me.https://stats.idre.ucla.edu/sas/output/factor-analysis/ "b. Eigenvalue – This is the initial eigenvalue. An eigenvalue is the variance of the factor. Because this is an unrotated solution, the first factor will account for the most variance, the second will account for the second highest amount of variance, and so on. Some of the eigenvalues are negative because the matrix is not of full rank. This means that there are probably only four dimensions (corresponding to the four factors whose eigenvalues are greater than zero). Although it is strange to have a negative variance, this happens because the factor analysis is only analyzing the common variance, which is less than the total variance. *******If we were doing a principal components analysis, we would have had 1’s on the diagonal, which means that all of the variance is being analyzed (which is another way of saying that we are assuming that we have no measurement error), and we would not have negative eigenvalues. In general, it is not uncommon to have negative eigenvalues.********" So, make sure that you don't have any negative eigenvalues if you are doing a principal components analysis. Otherwise, you ****might***** want to do a principal factor analysis instead (which may be what your original source did but did not report it correctly). I note that the SPSS output for factor does not provide this warning. (3) The UCLA IDRE center does provide an annotated output for a principle factor analysis which you examine here: https://stats.idre.ucla.edu/spss/output/factor-analysis/ However, let me point out something that presented in the front matter of this webpage. Quoting: " Factor analysis is a technique that requires a large sample size. Factor analysis is based on the correlation matrix of the variables involved, and correlations usually need a large sample size before they stabilize. Tabachnick and Fidell (2001, page 588) cite Comrey and Lee’s (1992) advise regarding sample size: 50 cases is very poor, 100 is poor, 200 is fair, 300 is good, 500 is very good, and 1000 or more is excellent. As a rule of thumb, a bare minimum of 10 observations per variable is necessary to avoid computational difficulties." You say that you have 45 years but the table you present indicates that there are a few variables that do not have values for certain years, meaning, if year is the unit of analysis, you have less than 45 years or a little more than 1 case per variable. What is wrong with this picture? I will leave it to others to suggest ways of dealing with this situation. -Mike Palij New York University On Mon, Feb 5, 2018 at 7:21 PM, Daradai <[hidden email]> wrote: Hello everyone, |
In reply to this post by Daradai
I looked at the tables 4 and 5 (your data, I think), and here are some notes.
The columns ("observations") run from 1986 to 2030; however, the set with "complete data" which a factor analysis will use start with 2004. Twenty-seven observations will be highly unstable for components or factors of 25 variables. If you are analyzing correlations (rather than raw values), you are at about the minimum for full rank. The KMO result says
to me that your data are not full-rank. Somewhere, you have collinearity.
The first three variables (eco-1.1 to eco-1.3) appear to be practically identical across the
range of years. Are these versions of each other?
Most or all the variables show a strong "year" trend. Since we haven't seen Years of 2018 to 2030, I assume that these are projections. The formulas for projecting would produce
collinearity, if the latter columns are linear combinations of the early columns.
A principal component analysis /can/ show you as many components (if full rank) as you have variables. Getting 3 or 5 from a set of data depends on what you specify as options.
-- Rich Ulrich
From: SPSSX(r) Discussion <[hidden email]> on behalf of Daradai <[hidden email]>
Sent: Monday, February 5, 2018 7:21:59 PM To: [hidden email] Subject: Trying to reproduce PCA analysis of a published paper, but not getting same results Hello everyone,
This is my first time posting, so if I need to change anything in this post please let me know! I am working on a paper myself and came across this research topic called AESPI (Aggregated Energy Security Performance Indicator). (The paper can be found here for those interested: https://www.sciencedirect.com/science/article/pii/S0306261912007337) So the same author has successfully applied AESPI in the case of Thailand (https://www.sciencedirect.com/science/article/pii/S0306261914003985) and has included all standardized data for 45 years in all 25 indicators. So here comes my dilemma, I have tried to reproduce their results using SPSS and performing a PCA on the standardized data, but including all variables leads to the "This matrix is not positive definite" error when trying to do a KMO Test. Additionally, the eigenvalues that are offered in the paper are different from mine. I get only 3 components, while they in their paper get 5. I have included the SPSS file I was using and a picture of the orginal data. <http://spssx-discussion.1045642.n5.nabble.com/file/t341397/DataPic.png> <http://spssx-discussion.1045642.n5.nabble.com/file/t341397/Rotated_Values.png> Thailand_Data_Test_Comparison.sav <http://spssx-discussion.1045642.n5.nabble.com/file/t341397/Thailand_Data_Test_Comparison.sav> -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Daradai
Not positive definite (p.d.) means the correlation matrix has either
some zero or some negative (or both) eigenvalues. Zero eigenvalues
appear when there are linear dependencies among variables or when
N<P (number of cases is less than number of variables). Negative
eigenvalues may appear if there were missing data which you deleted
in "pairwise" manner, or when the correlation matrix was not
computed from data but estimated somehow or simply borrowed and
entered with not enough precision.
Note please, besides, that KMO index isn't needed in PCA. It is of value in Factor analysis. PCA easily tolerates non p.d. matrix, but Factor analysis (most methods) doesn't. If you pretend to use PCA as "factor analysis" (i.e. going to interpret factors as real latents generating data) your matrix should be p.d. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Thank you, everyone, for your great help. Unfortunately, I have just started
delving into statistics and SPSS so it will take me some time to understand all of the intricacies you have discussed here. I want to include an answer I just received via mail, who found the solution to my main issue, my data deviating from the source: " It seems that the authors of the article used the option "Replace with mean" under factor analysis Options/Missing Values. In SPSS version 24 this seems to produce the same summary statistics (Table 7), rotated loadings (Table 8), but slightly different KMO/Bartlett results (Table6). " Again, thank you everyone for your great help! -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |