|
Dear list: I conducted a Principal Component analysis with 35 items and 822 participants. The SPSS reveals that there are 8 "Factors" which account for approximately 55% of the variance. A reviewer has asked the question as to whether the 55% of the variance is acceptable. Is anybody aware of a reference that addresses what is considered "acceptable"?? Any thoughts on how one could handle this? martin sherman
Martin F. Sherman, Ph.D. Professor of Psychology Director of Masters Education: Thesis Track Loyola College Psychology Department 222 B Beatty Hall 4501 North Charles Street Baltimore, MD 21210 410 617-2417 (office) 410 617-5341 (fax) [hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
While I cannot put my hands on the specific course notes, I recall learning that a good rule of thumb is 70% variance accounted for.
Anecdotally, I work in market research, and in practice I usually come close, but have rarely achieved that hurdle. Thanks, Brandon -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Martin Sherman Sent: Tuesday, August 05, 2008 10:48 AM To: [hidden email] Subject: Results of a PCA indicates 55% of variance accounted for Dear list: I conducted a Principal Component analysis with 35 items and 822 participants. The SPSS reveals that there are 8 "Factors" which account for approximately 55% of the variance. A reviewer has asked the question as to whether the 55% of the variance is acceptable. Is anybody aware of a reference that addresses what is considered "acceptable"?? Any thoughts on how one could handle this? martin sherman Martin F. Sherman, Ph.D. Professor of Psychology Director of Masters Education: Thesis Track Loyola College Psychology Department 222 B Beatty Hall 4501 North Charles Street Baltimore, MD 21210 410 617-2417 (office) 410 617-5341 (fax) [hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hello,
is it even possible to have a rule for this? I could imagine that in an exact science, say biology e,g,, there can be a method measuring variables with high precission and many items are highly correlated. On the other hand in market research it is possible to ask repondents to evaluate poorly correlated items on a scale 1 to 5. I would expect that in the latter case the explained variance will be much lower - but may we use the same rule for both cases? I would suggest to compare your explained variance to similar studies in the same branche. Anyway, more experienced listmembers will probably give better advice. Btw I also work in market research, and 55 % expl var is not much unusual. best Jindra > ------------ Původní zpráva ------------ > Od: Brandon Paris <[hidden email]> > Předmět: Re: Results of a PCA indicates 55% of variance accounted for > Datum: 05.8.2008 21:04:19 > ---------------------------------------- > While I cannot put my hands on the specific course notes, I recall learning that > a good rule of thumb is 70% variance accounted for. > > Anecdotally, I work in market research, and in practice I usually come close, > but have rarely achieved that hurdle. > > Thanks, > Brandon > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Martin > Sherman > Sent: Tuesday, August 05, 2008 10:48 AM > To: [hidden email] > Subject: Results of a PCA indicates 55% of variance accounted for > > Dear list: I conducted a Principal Component analysis with 35 items and 822 > participants. The SPSS reveals that there are 8 "Factors" which account for > approximately 55% of the variance. A reviewer has asked the question as to > whether the 55% of the variance is acceptable. Is anybody aware of a reference > that addresses what is considered "acceptable"?? Any thoughts on how one could > handle this? martin sherman > > Martin F. Sherman, Ph.D. > Professor of Psychology > Director of Masters Education: Thesis Track > Loyola College > Psychology Department > 222 B Beatty Hall > 4501 North Charles Street > Baltimore, MD 21210 > > 410 617-2417 (office) > 410 617-5341 (fax) > > [hidden email] > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hi,
Looking at other studies in the field is a good suggestion for assessment of what's acceptable. Interesting that the reviewer wasn't familiar with this. I think rules of thumb are fine, as long as they are treated as such . . . I don't recall ever losing sleep over not achieving the rule of thumb I picked up from my professor 14 or 15 years ago. As has been often discussed and debated, the "all eigenvalues greater than 1" approach is a rule of thumb. So, we create and apply them as we see fit (I don't use this particular one, though). I think the issue comes in when the "of thumb" part gets lost and one begins to just treat it as a "rule". This is especially true in EFA/PCA . . . one of the more judgement-oriented quantitative techniques in the toolkit, IMOH. Thanks, Brandon -----Original Message----- From: Jerabek Jindrich [mailto:[hidden email]] Sent: Tuesday, August 05, 2008 5:16 PM To: Brandon Paris Cc: [hidden email] Subject: Re: Results of a PCA indicates 55% of variance accounted for Hello, is it even possible to have a rule for this? I could imagine that in an exact science, say biology e,g,, there can be a method measuring variables with high precission and many items are highly correlated. On the other hand in market research it is possible to ask repondents to evaluate poorly correlated items on a scale 1 to 5. I would expect that in the latter case the explained variance will be much lower - but may we use the same rule for both cases? I would suggest to compare your explained variance to similar studies in the same branche. Anyway, more experienced listmembers will probably give better advice. Btw I also work in market research, and 55 % expl var is not much unusual. best Jindra > ------------ Původní zpráva ------------ > Od: Brandon Paris <[hidden email]> > Předmět: Re: Results of a PCA indicates 55% of variance accounted for > Datum: 05.8.2008 21:04:19 > ---------------------------------------- > While I cannot put my hands on the specific course notes, I recall learning that > a good rule of thumb is 70% variance accounted for. > > Anecdotally, I work in market research, and in practice I usually come close, > but have rarely achieved that hurdle. > > Thanks, > Brandon > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Martin > Sherman > Sent: Tuesday, August 05, 2008 10:48 AM > To: [hidden email] > Subject: Results of a PCA indicates 55% of variance accounted for > > Dear list: I conducted a Principal Component analysis with 35 items and 822 > participants. The SPSS reveals that there are 8 "Factors" which account for > approximately 55% of the variance. A reviewer has asked the question as to > whether the 55% of the variance is acceptable. Is anybody aware of a reference > that addresses what is considered "acceptable"?? Any thoughts on how one could > handle this? martin sherman > > Martin F. Sherman, Ph.D. > Professor of Psychology > Director of Masters Education: Thesis Track > Loyola College > Psychology Department > 222 B Beatty Hall > 4501 North Charles Street > Baltimore, MD 21210 > > 410 617-2417 (office) > 410 617-5341 (fax) > > [hidden email] > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by msherman
Quoting Martin Sherman <[hidden email]>:
> Dear list: I conducted a Principal Component analysis with 35 items > and 822 participants. The SPSS reveals that there are 8 "Factors" > which account for approximately 55% of the variance. A reviewer has > asked the question as to whether the 55% of the variance is > acceptable. Is anybody aware of a reference that addresses what is > considered "acceptable"?? Any thoughts on how one could handle this? Surely this is a question which can only be answered with knowledge of the subject area and the reason for pursuing the research. In practical terms it might or might not be useful. "Principal components" is a method of condensing data rather than a true statistical method (such as maximum likelihood factor analysis). The question is how you (or SPSS by default) decide how many factors there really are in the data. You have retained 8 factors from a possible 35, and from a factor analysis point of view you have been aiming to reproduce the original correlation or covariance matrix as well as possible with the right number of latent variables, after which all other possible "factor" correspond only to random error. When people mine gold the amount of metal extracted is a very small proportion of the ore, but nevertheless it is valuable! David Hitchin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
I agree with David Hitchin that all hinges on the purpose of the analysis.
Also, that even a small amount of variance explained can be a valid result, like gold in ore. But I add some comments. 1. Suppose you obtain the 35 factors, and all factors after the 4th or 8th or whatever have very small eigenvalues (explain very small shares of total variance). If you have a relatively small sample, perhaps those last factors yield values that are not statistically significant, i.e. you cannot decide whether they explain anything at all, or are just random error. But if you have enough cases (say, several million, as I had recently with census data) even the smallest shares of variance may be statistically significant (i.e. you can be 95% confident that they are different from zero in the population, regardless of random error in that particular measurement). In this latter case, should you stop after the 4th or 8th factor, or recognize the presence of the other, minor factors as well? I does not depend on whether they are considered random error (they are not): it depends on the nature of your problem and the nature of your THEORY about how to interpret the data. 2. In Psychology, CPA and other factor analysis techniques are often applied to sets of variables intended to measure the same underlying trait (say, cognitive ability or aggressiveness), and thus you expect that only one factor would dominate the scene, explaining most of the inter-correlation among observed variables. All other correlation observed is alien to your intent and problem. It may be not random, but you're not interested. Perhaps part of the correlation between observed variables reflects respondents' experience with written psychological tests, or English reading ability, or nervousness, or whatever, but if you're only after cognitive ability and you GUESS that cognitive ability is the main factor (not the second or third in importance) then you use the first factor and that's it. Of course, if your items or tests make your first factor explain only 20% of total variance, you'd rather look for better indicators of cognitive ability, or select just those observed variables with higher loadings on the first factor, but that's another story. 3. In other kinds of problems you may not be after one dominant underlying trait, but exploring other possibilities, such as a multiple-dimension concept, e.g. the standard of living, which may be composed of several (correlated or uncorrelated) dimensions, the indicators of which might be only loosely correlated, such as indicators of household equipment and indicators of subjective well-being. In those cases, several different factors may be at play, and there is no way of telling in advance how many. In principle there might be 35 such factors if you start with 35 variables, but probably there would be several groups of variables that are so closely correlated AND substantively connected (e.g. possession of several kinds of electronic home equipment) that you may consider them as indicators of the same underlying factor. In cases like this, you look for the underlying dimensions of your overarching multi-dimensional concept, and (in exploratory stages) may accept various numbers of factors, only limited by number of variables, and the significance of results determined by sample size. The resulting factors might be rotated obliquely if you suspect they might be correlated among them, and not orthogonal; this should ordinarily result in a better structure, i.e. variables associated more closely with one factor or two, not loosely associated with many. In a confirmatory phase you can use a more rigorous model using only the selected factors and their postulated relationships to test the validity of your theory. Hope this helps. Hector -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Hitchin Sent: 06 August 2008 02:37 To: [hidden email] Subject: Re: Results of a PCA indicates 55% of variance accounted for Quoting Martin Sherman <[hidden email]>: > Dear list: I conducted a Principal Component analysis with 35 items > and 822 participants. The SPSS reveals that there are 8 "Factors" > which account for approximately 55% of the variance. A reviewer has > asked the question as to whether the 55% of the variance is > acceptable. Is anybody aware of a reference that addresses what is > considered "acceptable"?? Any thoughts on how one could handle this? Surely this is a question which can only be answered with knowledge of the subject area and the reason for pursuing the research. In practical terms it might or might not be useful. "Principal components" is a method of condensing data rather than a true statistical method (such as maximum likelihood factor analysis). The question is how you (or SPSS by default) decide how many factors there really are in the data. You have retained 8 factors from a possible 35, and from a factor analysis point of view you have been aiming to reproduce the original correlation or covariance matrix as well as possible with the right number of latent variables, after which all other possible "factor" correspond only to random error. When people mine gold the amount of metal extracted is a very small proportion of the ore, but nevertheless it is valuable! David Hitchin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Thanks, Martin, for the additional details. One remark: when you say that
"PCA generated 8 factors" you probably mean that SPSS stopped extracting factors after the eighth one, but this is just because of the default rule of stopping when eigenvalues are below 1. You can override that rule with a subcommand in the FACTOR command stating the number of factors to be extracted (maximum is as many as observed variables in the checklists). Factors with lesser eigenvalues may still be useful, and their interpretation may also be illuminating. When you have your, say, 8 factor scores, do you suppose you should work with those scores, which by definition are not correlated among them, as 8 different variables, or you have a theory that, in fact, you should somehow "add" them together into some kind of overall score representing the whole set of checklists? Do they point all to the same effect (threat exposure, for instance)? In this latter case, you may construct an overall score or scale defined as a weighted sum of all relevant factor scores, using as weights the proportion of variance explained by each factor. I applied this to a set of census variables representing different dimensions of the standard of living of about 2 million households in Bolivia, and using up to 40 factors, some of them very idiosincratic (related to only one or two variables, but statistically significant and --in a substantive way-- significantly affecting the results for some specific cases): you may find the resulting papers at http://ssrn.com/abstract=896029, http://ssrn.com/abstract=896030 and http://ssrn.com/abstract=896032. In that case, besides, most of my variables were categorical variables recoded as dummies, for which classical factor analysis is not the best alternative (CATPCA or categorical principal component analysis would be better, but its current version in SPSS stores the whole database in memory, and with 2 million cases that was too much for my computer). Besides, you can also rotate the factors, not only to obtain a better "structure" (you already have a pretty good one, it seems) but to find out whether some oblique, or correlated, form of the factors is still more revealing. With SPSS you can use several rotation methods maximizing different things. All the best. Hector -----Original Message----- From: Martin Sherman [mailto:[hidden email]] Sent: 06 August 2008 10:00 To: Hector Maletta Subject: Re: Results of a PCA indicates 55% of variance accounted for Hector: Thank you. Let me digest what you have written. I am working with 800 home healthcare aides and have a questionnaire that has three separate checklists (one checklist for household and job related risk, one for environmental exposures (pollution, peeling paint), and one for experiencing different forms of threat [verbal, physical]. The PCA generated 8 factors. The environomental exposures checklist formed one factor with all items loading at least .40 on the factor. The threat items also loaded on one factor.The other items from the household and job-related checklist loaded on 6 factors. The factors were fairly distinct (potential for violence, transportation issues, one major household and job related factor, another smaller household and job related factor, and two smaller factors). All told the eight factors accounted for 55% of the variance and from what I can tell by looking into meta analysis studies of variance accounted for that 55% tends to be the mid-point. Thus we have about 45% of the variance being "noise". When I create factor scores (using unit weighting) all of my factors are correlated with my outcome measures to varying degrees. Just thought I provide you with some background. thanks, again and if you have any other thoughts I would appreciate them. martin >>> Hector Maletta <[hidden email]> 8/6/2008 8:11 AM >>> I agree with David Hitchin that all hinges on the purpose of the analysis. Also, that even a small amount of variance explained can be a valid result, like gold in ore. But I add some comments. 1. Suppose you obtain the 35 factors, and all factors after the 4th or 8th or whatever have very small eigenvalues (explain very small shares of total variance). If you have a relatively small sample, perhaps those last factors yield values that are not statistically significant, i.e. you cannot decide whether they explain anything at all, or are just random error. But if you have enough cases (say, several million, as I had recently with census data) even the smallest shares of variance may be statistically significant (i.e. you can be 95% confident that they are different from zero in the population, regardless of random error in that particular measurement). In this latter case, should you stop after the 4th or 8th factor, or recognize the presence of the other, minor factors as well? I does not depend on whether they are considered random error (they are not): it depends on the nature of your problem and the nature of your THEORY about how to interpret the data. 2. In Psychology, CPA and other factor analysis techniques are often applied to sets of variables intended to measure the same underlying trait (say, cognitive ability or aggressiveness), and thus you expect that only one factor would dominate the scene, explaining most of the inter-correlation among observed variables. All other correlation observed is alien to your intent and problem. It may be not random, but you're not interested. Perhaps part of the correlation between observed variables reflects respondents' experience with written psychological tests, or English reading ability, or nervousness, or whatever, but if you're only after cognitive ability and you GUESS that cognitive ability is the main factor (not the second or third in importance) then you use the first factor and that's it. Of course, if your items or tests make your first factor explain only 20% of total variance, you'd rather look for better indicators of cognitive ability, or select just those observed variables with higher loadings on the first factor, but that's another story. 3. In other kinds of problems you may not be after one dominant underlying trait, but exploring other possibilities, such as a multiple-dimension concept, e.g. the standard of living, which may be composed of several (correlated or uncorrelated) dimensions, the indicators of which might be only loosely correlated, such as indicators of household equipment and indicators of subjective well-being. In those cases, several different factors may be at play, and there is no way of telling in advance how many. In principle there might be 35 such factors if you start with 35 variables, but probably there would be several groups of variables that are so closely correlated AND substantively connected (e.g. possession of several kinds of electronic home equipment) that you may consider them as indicators of the same underlying factor. In cases like this, you look for the underlying dimensions of your overarching multi-dimensional concept, and (in exploratory stages) may accept various numbers of factors, only limited by number of variables, and the significance of results determined by sample size. The resulting factors might be rotated obliquely if you suspect they might be correlated among them, and not orthogonal; this should ordinarily result in a better structure, i.e. variables associated more closely with one factor or two, not loosely associated with many. In a confirmatory phase you can use a more rigorous model using only the selected factors and their postulated relationships to test the validity of your theory. Hope this helps. Hector -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Hitchin Sent: 06 August 2008 02:37 To: [hidden email] Subject: Re: Results of a PCA indicates 55% of variance accounted for Quoting Martin Sherman <[hidden email]>: > Dear list: I conducted a Principal Component analysis with 35 items > and 822 participants. The SPSS reveals that there are 8 "Factors" > which account for approximately 55% of the variance. A reviewer has > asked the question as to whether the 55% of the variance is > acceptable. Is anybody aware of a reference that addresses what is > considered "acceptable"?? Any thoughts on how one could handle this? Surely this is a question which can only be answered with knowledge of the subject area and the reason for pursuing the research. In practical terms it might or might not be useful. "Principal components" is a method of condensing data rather than a true statistical method (such as maximum likelihood factor analysis). The question is how you (or SPSS by default) decide how many factors there really are in the data. You have retained 8 factors from a possible 35, and from a factor analysis point of view you have been aiming to reproduce the original correlation or covariance matrix as well as possible with the right number of latent variables, after which all other possible "factor" correspond only to random error. When people mine gold the amount of metal extracted is a very small proportion of the ore, but nevertheless it is valuable! David Hitchin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
