SPSSX Discussion

Results of a PCA indicates 55% of variance accounted for

Classic

List

Threaded

7 messages Options

msherman

Results of a PCA indicates 55% of variance accounted for

Dear list: I conducted a Principal Component analysis with 35 items and 822 participants. The SPSS reveals that there are 8 "Factors" which account for approximately 55% of the variance. A reviewer has asked the question as to whether the 55% of the variance is acceptable. Is anybody aware of a reference that addresses what is considered "acceptable"?? Any thoughts on how one could handle this? martin sherman

Martin F. Sherman, Ph.D.
Professor of Psychology
Director of Masters Education: Thesis Track
Loyola College
Psychology Department
222 B Beatty Hall
4501 North Charles Street
Baltimore, MD 21210

410 617-2417 (office)
410 617-5341 (fax)

[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Brandon Paris

Re: Results of a PCA indicates 55% of variance accounted for

While I cannot put my hands on the specific course notes, I recall learning that a good rule of thumb is 70% variance accounted for.

Anecdotally, I work in market research, and in practice I usually come close, but have rarely achieved that hurdle.

Thanks,
Brandon

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Martin Sherman
Sent: Tuesday, August 05, 2008 10:48 AM
To: [hidden email]
Subject: Results of a PCA indicates 55% of variance accounted for

Dear list: I conducted a Principal Component analysis with 35 items and 822 participants. The SPSS reveals that there are 8 "Factors" which account for approximately 55% of the variance. A reviewer has asked the question as to whether the 55% of the variance is acceptable. Is anybody aware of a reference that addresses what is considered "acceptable"?? Any thoughts on how one could handle this? martin sherman

Martin F. Sherman, Ph.D.
Professor of Psychology
Director of Masters Education: Thesis Track
Loyola College
Psychology Department
222 B Beatty Hall
4501 North Charles Street
Baltimore, MD 21210

410 617-2417 (office)
410 617-5341 (fax)

[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Jerabek Jindrich

Re: Results of a PCA indicates 55% of variance accounted for

Hello,

is it even possible to have a rule for this?

I could imagine that in an exact science, say biology e,g,, there can be a method measuring variables with high precission and many items are highly correlated. On the other hand in market research it is possible to ask repondents to evaluate poorly correlated items on a scale 1 to 5. I would expect that in the latter case the explained variance will be much lower - but may we use the same rule for both cases?

I would suggest to compare your explained variance to similar studies in the same branche.
Anyway, more experienced listmembers will probably give better advice.

Btw I also work in market research, and 55 % expl var is not much unusual.

best
Jindra

> ------------ Původní zpráva ------------
> Od: Brandon Paris <[hidden email]>
> Předmět: Re: Results of a PCA indicates 55% of variance accounted for
> Datum: 05.8.2008 21:04:19
> ----------------------------------------
> While I cannot put my hands on the specific course notes, I recall learning that
> a good rule of thumb is 70% variance accounted for.
>
> Anecdotally, I work in market research, and in practice I usually come close,
> but have rarely achieved that hurdle.
>
> Thanks,
> Brandon
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Martin
> Sherman
> Sent: Tuesday, August 05, 2008 10:48 AM
> To: [hidden email]
> Subject: Results of a PCA indicates 55% of variance accounted for
>
> Dear list: I conducted a Principal Component analysis with 35 items and 822
> participants. The SPSS reveals that there are 8 "Factors" which account for
> approximately 55% of the variance. A reviewer has asked the question as to
> whether the 55% of the variance is acceptable. Is anybody aware of a reference
> that addresses what is considered "acceptable"?? Any thoughts on how one could
> handle this? martin sherman
>
> Martin F. Sherman, Ph.D.
> Professor of Psychology
> Director of Masters Education: Thesis Track
> Loyola College
> Psychology Department
> 222 B Beatty Hall
> 4501 North Charles Street
> Baltimore, MD 21210
>
> 410 617-2417 (office)
> 410 617-5341 (fax)
>
> [hidden email]
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Brandon Paris

Re: Results of a PCA indicates 55% of variance accounted for

Hi,

Looking at other studies in the field is a good suggestion for assessment of what's acceptable. Interesting that the reviewer wasn't familiar with this.

I think rules of thumb are fine, as long as they are treated as such . . . I don't recall ever losing sleep over not achieving the rule of thumb I picked up from my professor 14 or 15 years ago.

As has been often discussed and debated, the "all eigenvalues greater than 1" approach is a rule of thumb. So, we create and apply them as we see fit (I don't use this particular one, though). I think the issue comes in when the "of thumb" part gets lost and one begins to just treat it as a "rule". This is especially true in EFA/PCA . . . one of the more judgement-oriented quantitative techniques in the toolkit, IMOH.

Thanks,
Brandon

-----Original Message-----
From: Jerabek Jindrich [mailto:[hidden email]]
Sent: Tuesday, August 05, 2008 5:16 PM
To: Brandon Paris
Cc: [hidden email]
Subject: Re: Results of a PCA indicates 55% of variance accounted for

Hello,

is it even possible to have a rule for this?

I could imagine that in an exact science, say biology e,g,, there can be a method measuring variables with high precission and many items are highly correlated. On the other hand in market research it is possible to ask repondents to evaluate poorly correlated items on a scale 1 to 5. I would expect that in the latter case the explained variance will be much lower - but may we use the same rule for both cases?

I would suggest to compare your explained variance to similar studies in the same branche.
Anyway, more experienced listmembers will probably give better advice.

Btw I also work in market research, and 55 % expl var is not much unusual.

best
Jindra

David Hitchin

Re: Results of a PCA indicates 55% of variance accounted for

In reply to this post by msherman

Quoting Martin Sherman <[hidden email]>:

> Dear list: I conducted a Principal Component analysis with 35 items
> and 822 participants. The SPSS reveals that there are 8 "Factors"
> which account for approximately 55% of the variance. A reviewer has
> asked the question as to whether the 55% of the variance is
> acceptable. Is anybody aware of a reference that addresses what is
> considered "acceptable"?? Any thoughts on how one could handle this?

Surely this is a question which can only be answered with knowledge of
the subject area and the reason for pursuing the research. In practical
terms it might or might not be useful.

"Principal components" is a method of condensing data rather than a true
statistical method (such as maximum likelihood factor analysis). The
question is how you (or SPSS by default) decide how many factors there
really are in the data. You have retained 8 factors from a possible 35,
and from a factor analysis point of view you have been aiming to
reproduce the original correlation or covariance matrix as well as
possible with the right number of latent variables, after which all
other possible "factor" correspond only to random error.

When people mine gold the amount of metal extracted is a very small
proportion of the ore, but nevertheless it is valuable!

David Hitchin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Hector Maletta

Re: Results of a PCA indicates 55% of variance accounted for

I agree with David Hitchin that all hinges on the purpose of the analysis.
Also, that even a small amount of variance explained can be a valid result,
like gold in ore. But I add some comments.
1. Suppose you obtain the 35 factors, and all factors after the 4th or 8th
or whatever have very small eigenvalues (explain very small shares of total
variance). If you have a relatively small sample, perhaps those last factors
yield values that are not statistically significant, i.e. you cannot decide
whether they explain anything at all, or are just random error. But if you
have enough cases (say, several million, as I had recently with census data)
even the smallest shares of variance may be statistically significant (i.e.
you can be 95% confident that they are different from zero in the
population, regardless of random error in that particular measurement). In
this latter case, should you stop after the 4th or 8th factor, or recognize
the presence of the other, minor factors as well? I does not depend on
whether they are considered random error (they are not): it depends on the
nature of your problem and the nature of your THEORY about how to interpret
the data.
2. In Psychology, CPA and other factor analysis techniques are often applied
to sets of variables intended to measure the same underlying trait (say,
cognitive ability or aggressiveness), and thus you expect that only one
factor would dominate the scene, explaining most of the inter-correlation
among observed variables. All other correlation observed is alien to your
intent and problem. It may be not random, but you're not interested. Perhaps
part of the correlation between observed variables reflects respondents'
experience with written psychological tests, or English reading ability, or
nervousness, or whatever, but if you're only after cognitive ability and you
GUESS that cognitive ability is the main factor (not the second or third in
importance) then you use the first factor and that's it. Of course, if your
items or tests make your first factor explain only 20% of total variance,
you'd rather look for better indicators of cognitive ability, or select just
those observed variables with higher loadings on the first factor, but
that's another story.
3. In other kinds of problems you may not be after one dominant underlying
trait, but exploring other possibilities, such as a multiple-dimension
concept, e.g. the standard of living, which may be composed of several
(correlated or uncorrelated) dimensions, the indicators of which might be
only loosely correlated, such as indicators of household equipment and
indicators of subjective well-being. In those cases, several different
factors may be at play, and there is no way of telling in advance how many.
In principle there might be 35 such factors if you start with 35 variables,
but probably there would be several groups of variables that are so closely
correlated AND substantively connected (e.g. possession of several kinds of
electronic home equipment) that you may consider them as indicators of the
same underlying factor. In cases like this, you look for the underlying
dimensions of your overarching multi-dimensional concept, and (in
exploratory stages) may accept various numbers of factors, only limited by
number of variables, and the significance of results determined by sample
size. The resulting factors might be rotated obliquely if you suspect they
might be correlated among them, and not orthogonal; this should ordinarily
result in a better structure, i.e. variables associated more closely with
one factor or two, not loosely associated with many. In a confirmatory phase
you can use a more rigorous model using only the selected factors and their
postulated relationships to test the validity of your theory.
Hope this helps.
Hector

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
David Hitchin
Sent: 06 August 2008 02:37
To: [hidden email]
Subject: Re: Results of a PCA indicates 55% of variance accounted for

Quoting Martin Sherman <[hidden email]>:

> Dear list: I conducted a Principal Component analysis with 35 items
> and 822 participants. The SPSS reveals that there are 8 "Factors"
> which account for approximately 55% of the variance. A reviewer has
> asked the question as to whether the 55% of the variance is
> acceptable. Is anybody aware of a reference that addresses what is
> considered "acceptable"?? Any thoughts on how one could handle this?

Surely this is a question which can only be answered with knowledge of
the subject area and the reason for pursuing the research. In practical
terms it might or might not be useful.

"Principal components" is a method of condensing data rather than a true
statistical method (such as maximum likelihood factor analysis). The
question is how you (or SPSS by default) decide how many factors there
really are in the data. You have retained 8 factors from a possible 35,
and from a factor analysis point of view you have been aiming to
reproduce the original correlation or covariance matrix as well as
possible with the right number of latent variables, after which all
other possible "factor" correspond only to random error.

When people mine gold the amount of metal extracted is a very small
proportion of the ore, but nevertheless it is valuable!

David Hitchin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Hector Maletta

Re: Results of a PCA indicates 55% of variance accounted for

Thanks, Martin, for the additional details. One remark: when you say that
"PCA generated 8 factors" you probably mean that SPSS stopped extracting
factors after the eighth one, but this is just because of the default rule
of stopping when eigenvalues are below 1. You can override that rule with a
subcommand in the FACTOR command stating the number of factors to be
extracted (maximum is as many as observed variables in the checklists).
Factors with lesser eigenvalues may still be useful, and their
interpretation may also be illuminating.

When you have your, say, 8 factor scores, do you suppose you should work
with those scores, which by definition are not correlated among them, as 8
different variables, or you have a theory that, in fact, you should somehow
"add" them together into some kind of overall score representing the whole
set of checklists? Do they point all to the same effect (threat exposure,
for instance)?

In this latter case, you may construct an overall score or scale defined as
a weighted sum of all relevant factor scores, using as weights the
proportion of variance explained by each factor. I applied this to a set of
census variables representing different dimensions of the standard of living
of about 2 million households in Bolivia, and using up to 40 factors, some
of them very idiosincratic (related to only one or two variables, but
statistically significant and --in a substantive way-- significantly
affecting the results for some specific cases): you may find the resulting
papers at http://ssrn.com/abstract=896029, http://ssrn.com/abstract=896030
and http://ssrn.com/abstract=896032. In that case, besides, most of my
variables were categorical variables recoded as dummies, for which classical
factor analysis is not the best alternative (CATPCA or categorical principal
component analysis would be better, but its current version in SPSS stores
the whole database in memory, and with 2 million cases that was too much for
my computer).

Besides, you can also rotate the factors, not only to obtain a better
"structure" (you already have a pretty good one, it seems) but to find out
whether some oblique, or correlated, form of the factors is still more
revealing. With SPSS you can use several rotation methods maximizing
different things.

All the best.

Hector

-----Original Message-----
From: Martin Sherman [mailto:[hidden email]]
Sent: 06 August 2008 10:00
To: Hector Maletta
Subject: Re: Results of a PCA indicates 55% of variance accounted for

Hector: Thank you. Let me digest what you have written. I am working with
800 home healthcare aides and have a questionnaire that has three separate
checklists (one checklist for household and job related risk, one for
environmental exposures (pollution, peeling paint), and one for experiencing
different forms of threat [verbal, physical]. The PCA generated 8 factors.
The environomental exposures checklist formed one factor with all items
loading at least .40 on the factor. The threat items also loaded on one
factor.The other items from the household and job-related checklist loaded
on 6 factors. The factors were fairly distinct (potential for violence,
transportation issues, one major household and job related factor, another
smaller household and job related factor, and two smaller factors). All told
the eight factors accounted for 55% of the variance and from what I can tell
by looking into meta analysis studies of variance accounted for that 55%
tends to be the mid-point. Thus we have about 45% of the variance being
"noise". When I create factor scores (using unit weighting) all of my
factors are correlated with my outcome measures to varying degrees. Just
thought I provide you with some background. thanks, again and if you have
any other thoughts I would appreciate them. martin

>>> Hector Maletta <[hidden email]> 8/6/2008 8:11 AM >>>
I agree with David Hitchin that all hinges on the purpose of the analysis.
Also, that even a small amount of variance explained can be a valid result,
like gold in ore. But I add some comments.
1. Suppose you obtain the 35 factors, and all factors after the 4th or 8th
or whatever have very small eigenvalues (explain very small shares of total
variance). If you have a relatively small sample, perhaps those last factors
yield values that are not statistically significant, i.e. you cannot decide
whether they explain anything at all, or are just random error. But if you
have enough cases (say, several million, as I had recently with census data)
even the smallest shares of variance may be statistically significant (i.e.
you can be 95% confident that they are different from zero in the
population, regardless of random error in that particular measurement). In
this latter case, should you stop after the 4th or 8th factor, or recognize
the presence of the other, minor factors as well? I does not depend on
whether they are considered random error (they are not): it depends on the
nature of your problem and the nature of your THEORY about how to interpret
the data.
2. In Psychology, CPA and other factor analysis techniques are often applied
to sets of variables intended to measure the same underlying trait (say,
cognitive ability or aggressiveness), and thus you expect that only one
factor would dominate the scene, explaining most of the inter-correlation
among observed variables. All other correlation observed is alien to your
intent and problem. It may be not random, but you're not interested. Perhaps
part of the correlation between observed variables reflects respondents'
experience with written psychological tests, or English reading ability, or
nervousness, or whatever, but if you're only after cognitive ability and you
GUESS that cognitive ability is the main factor (not the second or third in
importance) then you use the first factor and that's it. Of course, if your
items or tests make your first factor explain only 20% of total variance,
you'd rather look for better indicators of cognitive ability, or select just
those observed variables with higher loadings on the first factor, but
that's another story.
3. In other kinds of problems you may not be after one dominant underlying
trait, but exploring other possibilities, such as a multiple-dimension
concept, e.g. the standard of living, which may be composed of several
(correlated or uncorrelated) dimensions, the indicators of which might be
only loosely correlated, such as indicators of household equipment and
indicators of subjective well-being. In those cases, several different
factors may be at play, and there is no way of telling in advance how many.
In principle there might be 35 such factors if you start with 35 variables,
but probably there would be several groups of variables that are so closely
correlated AND substantively connected (e.g. possession of several kinds of
electronic home equipment) that you may consider them as indicators of the
same underlying factor. In cases like this, you look for the underlying
dimensions of your overarching multi-dimensional concept, and (in
exploratory stages) may accept various numbers of factors, only limited by
number of variables, and the significance of results determined by sample
size. The resulting factors might be rotated obliquely if you suspect they
might be correlated among them, and not orthogonal; this should ordinarily
result in a better structure, i.e. variables associated more closely with
one factor or two, not loosely associated with many. In a confirmatory phase
you can use a more rigorous model using only the selected factors and their
postulated relationships to test the validity of your theory.
Hope this helps.
Hector

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
David Hitchin
Sent: 06 August 2008 02:37
To: [hidden email]
Subject: Re: Results of a PCA indicates 55% of variance accounted for

Quoting Martin Sherman <[hidden email]>:

> Dear list: I conducted a Principal Component analysis with 35 items
> and 822 participants. The SPSS reveals that there are 8 "Factors"
> which account for approximately 55% of the variance. A reviewer has
> asked the question as to whether the 55% of the variance is
> acceptable. Is anybody aware of a reference that addresses what is
> considered "acceptable"?? Any thoughts on how one could handle this?

Surely this is a question which can only be answered with knowledge of the
subject area and the reason for pursuing the research. In practical terms it
might or might not be useful.

"Principal components" is a method of condensing data rather than a true
statistical method (such as maximum likelihood factor analysis). The
question is how you (or SPSS by default) decide how many factors there
really are in the data. You have retained 8 factors from a possible 35, and
from a factor analysis point of view you have been aiming to reproduce the
original correlation or covariance matrix as well as possible with the right
number of latent variables, after which all other possible "factor"
correspond only to random error.

When people mine gold the amount of metal extracted is a very small
proportion of the ore, but nevertheless it is valuable!

David Hitchin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command SIGNOFF SPSSX-L For a list of
commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD