PCA: Shift, Twist, Butterfly

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

PCA: Shift, Twist, Butterfly

jimjohn
im just starting to understand how to conduct PCA, and am having trouble interpreting the results. I've been told that the first component represents a parallel shift, the second represents a twist, and the third represents a butterfly of my original data.

from what i understand, if i have three components, then to come up with my new first factor I take the different components from the component matrix and do:
Component1 * Value Original Variable 1 + Component 2 * Value Original Variable 2 + ...

how does this translate into a parallel shift of my original data? any ideas? thanks so much in advance!
Reply | Threaded
Open this post in threaded view
|

Re: Shift, Twist, Butterfly

Hector Maletta
Jimjohn,
I do not know who told you such nonsense, but IMHO it's all wrong. PCA
identifies underlying factors that "explain" the correlations between your
observed variables. In other words, if you were to measure those underlying
(actually non measurable) factors or components, the correlation between
your observed variables, controlling for the underlying factors, would be
zero or near zero. There is no direct link between each factor and one
specific variable (such as the one you sketch) mandating the multiplication
of variable 1 times component 1 + variable 2 * component 2, etc. Each
component is linked to all variables (though it may be correlated more
strongly with sone of them), and each variable is linked to all underlying
components.
If all your observed variables are closely correlated (e.g. if you have
several independent indexes of the same psychological trait, as in several
intelligence tests), the first component represents the most important
common component that could be constructed to explain the greater part of
the intercorrelations between observed variables. Controlling for that first
component maximises the amount of variance explained by a single (non
observed) variable. It leaves usually a portion of unexplained variance in
observed variables. The second component is the best you can do to explain
the remainded variance in observed variables, and the third and further
components, in turn, explain the remaining variance not explained by
previous factors. In the traditional interpretation going back to the early
20th Century study of cognitive ability by Spearmann, the first factors
represented "general intelligence", and the other components may represent
other independent factors affecting test scores, such as a general
familiarity with test-taking situations, socioeconomic status, specific
abilities linked with some specific tests, and what not.
PCA is but one of various factor analysis techniques. I recommend you first
read some elementary text on factor analysis, to get more acquainted with
the nature, scope and limitations of PCA.

Hector



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
jimjohn
Sent: 23 August 2009 18:03
To: [hidden email]
Subject: PCA: Shift, Twist, Butterfly

im just starting to understand how to conduct PCA, and am having trouble
interpreting the results. I've been told that the first component represents
a parallel shift, the second represents a twist, and the third represents a
butterfly of my original data.

from what i understand, if i have three components, then to come up with my
new first factor I take the different components from the component matrix
and do:
Component1 * Value Original Variable 1 + Component 2 * Value Original
Variable 2 + ...

how does this translate into a parallel shift of my original data? any
ideas? thanks so much in advance!
--
View this message in context:
http://www.nabble.com/PCA%3A-Shift%2C-Twist%2C-Butterfly-tp25108336p25108336
.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/23/09
06:18:00

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Shift, Twist, Butterfly

SR Millis-3
Hector,

I don't agree with you.  Exploratory factor analysis (EPA) is not the same "thing" as principal component analysis (PCA).  PCA is not a type of EFA.  PCA and EFA are different statistical methods that are designed to achieve different objectives (Bentler & Kano, 1990).

EFA is based on the common factor model that postulates that each item in a battery of measured items is a linear function of one or more common factors and one unique factor.  Common factors are latent (unobserved) factors.  Conversely, PCA doesn’t differentiate between common and unique variance.  Hence, principal components are not latent variables---and it isn’t conceptually correct to equate them with common factors.  The goal of PCA is data reduction, i.e., taking sccores on a large set of measured items and reducing the scores on a smaller set of composite variables.  In contrast, the goal of EFA is to identify latent constructs, i.e., understanding the structure of the correlations among the measured variables or items.

Some have argued that PCA and EFA produce similar results.  However, this is not always the case.  Differences emerge are likely when communalities are low (e.g., .40).


Scott R Millis, PhD, ABPP (CN,CL,RP), CStat, CSci
Professor & Director of Research
Dept of Physical Medicine & Rehabilitation
Dept of Emergency Medicine
Wayne State University School of Medicine
261 Mack Blvd
Detroit, MI 48201
Email:  [hidden email]
Tel: 313-993-8085
Fax: 313-966-7682


--- On Sun, 8/23/09, Hector Maletta <[hidden email]> wrote:

> From: Hector Maletta <[hidden email]>
> Subject: Re: Shift, Twist, Butterfly
> To: [hidden email]
> Date: Sunday, August 23, 2009, 7:27 PM
> Jimjohn,
> I do not know who told you such nonsense, but IMHO it's all
> wrong. PCA
> identifies underlying factors that "explain" the
> correlations between your
> observed variables. In other words, if you were to measure
> those underlying
> (actually non measurable) factors or components, the
> correlation between
> your observed variables, controlling for the underlying
> factors, would be
> zero or near zero. There is no direct link between each
> factor and one
> specific variable (such as the one you sketch) mandating
> the multiplication
> of variable 1 times component 1 + variable 2 * component 2,
> etc. Each
> component is linked to all variables (though it may be
> correlated more
> strongly with sone of them), and each variable is linked to
> all underlying
> components.
> If all your observed variables are closely correlated (e.g.
> if you have
> several independent indexes of the same psychological
> trait, as in several
> intelligence tests), the first component represents the
> most important
> common component that could be constructed to explain the
> greater part of
> the intercorrelations between observed variables.
> Controlling for that first
> component maximises the amount of variance explained by a
> single (non
> observed) variable. It leaves usually a portion of
> unexplained variance in
> observed variables. The second component is the best you
> can do to explain
> the remainded variance in observed variables, and the third
> and further
> components, in turn, explain the remaining variance not
> explained by
> previous factors. In the traditional interpretation going
> back to the early
> 20th Century study of cognitive ability by Spearmann, the
> first factors
> represented "general intelligence", and the other
> components may represent
> other independent factors affecting test scores, such as a
> general
> familiarity with test-taking situations, socioeconomic
> status, specific
> abilities linked with some specific tests, and what not.
> PCA is but one of various factor analysis techniques. I
> recommend you first
> read some elementary text on factor analysis, to get more
> acquainted with
> the nature, scope and limitations of PCA.
>
> Hector
>
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]]
> On Behalf Of
> jimjohn
> Sent: 23 August 2009 18:03
> To: [hidden email]
> Subject: PCA: Shift, Twist, Butterfly
>
> im just starting to understand how to conduct PCA, and am
> having trouble
> interpreting the results. I've been told that the first
> component represents
> a parallel shift, the second represents a twist, and the
> third represents a
> butterfly of my original data.
>
> from what i understand, if i have three components, then to
> come up with my
> new first factor I take the different components from the
> component matrix
> and do:
> Component1 * Value Original Variable 1 + Component 2 *
> Value Original
> Variable 2 + ...
>
> how does this translate into a parallel shift of my
> original data? any
> ideas? thanks so much in advance!
> --
> View this message in context:
> http://www.nabble.com/PCA%3A-Shift%2C-Twist%2C-Butterfly-tp25108336p25108336
> .html
> Sent from the SPSSX Discussion mailing list archive at
> Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email]
> (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
> command
> INFO REFCARD
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release
> Date: 08/23/09
> 06:18:00
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email]
> (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
> command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Shift, Twist, Butterfly

Hector Maletta
Scott,
Sorry but I insist in my view. Factor analysis is a general technique, with
many variants. One of the variants concerns the assumptions on commonality:
in PCA the whole variance is subjected to analysis, and therefore the sum of
factors' contributions explains 100% of the variance of all observed
variables, while other variants start with an estimate of the amount of
"common" variance and separate it from the rest of variance, usually
supposed to be due to idiosyncratic characteristics of each observed
variable, unrelated to other observed variables. In this case, the technique
computes factors explaining the common variance only, leaving the
idiosyncratic part unexplained (attributed to peculiarities of each observed
variable). This is hardly such a transcendental difference. From the
mathematical or computational point of view there is really no big deal of a
difference: the same mathematical procedure applies in both cases, the only
difference being in the values at the main diagonal of the intercorrelation
matrix: 1's in PCA, and the assumed commonalities in the other alternatives.

The purpose of the analysis, and the nature of the variables, would dictate
which alternative is more adequate for each analysis, but that hardly
impinges on the problem posed by Jimjohn: I limited myself to his case, in
which he is using PCA.
I do not agree either that PCA is not usable for locating latent factors.
That is precisely what factor analysis do, either under PCA hypotheses or
otherwise. It estimates one or more fictitious variables ('constructs') that
in case they existed would 'explain' (statistically speaking) the observed
correlation of observed variables. In other words, controlling for the
components or factors would reduce or eliminate the correlation between
observed variables. These factors or components are not objective things:
they are just mathematical constructs, and their values (and even their
correlations with observed variables, or 'loadings') would be altered by a
change in the frame of reference (i.e. by rotation). Taking just the major
components (in PCA) and disregarding the rest achieves precisely the kind of
data reduction you are talking about. That is precisely what Spearman did at
the beginning of the 20th Century to identify "general intelligence", in
fact just the first factor extracted in a PCA procedure applied to a number
of inter-correlated cognitive-ability tests. A similar procedure was used,
years later, by Thurstone to extract several factors (corresponding to
various different cognitive abilities such as linguistic, quantitative,
spatial, etc), for which Thurstone added rotation to the repertoire of
factor analysis in order to define the factors in such a way that each was
clearly more correlated with one different set of observed variables (this
neat attribute was, of course, dependent on the particular frame of
reference chosen, and was not a property of the factors themselves; the
factors continued to be just mathematical artifacts, not real things). Other
variants of FA (more exactly, EFA) were later introduced to deal with other
specific (computational) problems.
Eventually, confirmatory factor analysis emerged, in which only certain
pre-specified effects and factors were included. Some of the effects are
postulated to be zero, and the procedure is, at last, not very different
from Structural Equation Modeling. EFA always "succeeds": every data set can
be treated to EFA, and the results cannot be challenged. CFA, instead, may
be refuted if it fails to reproduce the observed data structure.
However different the intents and results, all these procedures are just
variants of the same basic procedure, factor analysis, which is in turn just
a corollary or application of the general linear model where ordinary least
square regression, ANOVA and other statistical workhorses also belong. Their
common feature is that all variables are supposed to be linearly related
with an error term, so that for any set of variables (Y, X, Z, ...) it is
postulated that Y=a + bX + cZ...+e, where "e" is a random error with zero
mean, and the coefficients (a, b, c...) are computed by minimizing the sum
of the squared errors (e2) over all cases.



-----Original Message-----
From: SR Millis [mailto:[hidden email]]
Sent: 23 August 2009 18:55
To: Hector Maletta; SPSS
Subject: Re: Shift, Twist, Butterfly

Hector,

I don't agree with you.  Exploratory factor analysis (EPA) is not the same
"thing" as principal component analysis (PCA).  PCA is not a type of EFA.
PCA and EFA are different statistical methods that are designed to achieve
different objectives (Bentler & Kano, 1990).

EFA is based on the common factor model that postulates that each item in a
battery of measured items is a linear function of one or more common factors
and one unique factor.  Common factors are latent (unobserved) factors.
Conversely, PCA doesn't differentiate between common and unique variance.
Hence, principal components are not latent variables---and it isn't
conceptually correct to equate them with common factors.  The goal of PCA is
data reduction, i.e., taking scores on a large set of measured items and
reducing the scores on a smaller set of composite variables.  In contrast,
the goal of EFA is to identify latent constructs, i.e., understanding the
structure of the correlations among the measured variables or items.

Some have argued that PCA and EFA produce similar results.  However, this is
not always the case.  Differences emerge are likely when communalities are
low (e.g., .40).


Scott R Millis, PhD, ABPP (CN,CL,RP), CStat, CSci
Professor & Director of Research
Dept of Physical Medicine & Rehabilitation
Dept of Emergency Medicine
Wayne State University School of Medicine
261 Mack Blvd
Detroit, MI 48201
Email:  [hidden email]
Tel: 313-993-8085
Fax: 313-966-7682


--- On Sun, 8/23/09, Hector Maletta <[hidden email]> wrote:

> From: Hector Maletta <[hidden email]>
> Subject: Re: Shift, Twist, Butterfly
> To: [hidden email]
> Date: Sunday, August 23, 2009, 7:27 PM
> Jimjohn,
> I do not know who told you such nonsense, but IMHO it's all
> wrong. PCA
> identifies underlying factors that "explain" the
> correlations between your
> observed variables. In other words, if you were to measure
> those underlying
> (actually non measurable) factors or components, the
> correlation between
> your observed variables, controlling for the underlying
> factors, would be
> zero or near zero. There is no direct link between each
> factor and one
> specific variable (such as the one you sketch) mandating
> the multiplication
> of variable 1 times component 1 + variable 2 * component 2,
> etc. Each
> component is linked to all variables (though it may be
> correlated more
> strongly with sone of them), and each variable is linked to
> all underlying
> components.
> If all your observed variables are closely correlated (e.g.
> if you have
> several independent indexes of the same psychological
> trait, as in several
> intelligence tests), the first component represents the
> most important
> common component that could be constructed to explain the
> greater part of
> the intercorrelations between observed variables.
> Controlling for that first
> component maximises the amount of variance explained by a
> single (non
> observed) variable. It leaves usually a portion of
> unexplained variance in
> observed variables. The second component is the best you
> can do to explain
> the remainded variance in observed variables, and the third
> and further
> components, in turn, explain the remaining variance not
> explained by
> previous factors. In the traditional interpretation going
> back to the early
> 20th Century study of cognitive ability by Spearmann, the
> first factors
> represented "general intelligence", and the other
> components may represent
> other independent factors affecting test scores, such as a
> general
> familiarity with test-taking situations, socioeconomic
> status, specific
> abilities linked with some specific tests, and what not.
> PCA is but one of various factor analysis techniques. I
> recommend you first
> read some elementary text on factor analysis, to get more
> acquainted with
> the nature, scope and limitations of PCA.
>
> Hector
>
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]]
> On Behalf Of
> jimjohn
> Sent: 23 August 2009 18:03
> To: [hidden email]
> Subject: PCA: Shift, Twist, Butterfly
>
> im just starting to understand how to conduct PCA, and am
> having trouble
> interpreting the results. I've been told that the first
> component represents
> a parallel shift, the second represents a twist, and the
> third represents a
> butterfly of my original data.
>
> from what i understand, if i have three components, then to
> come up with my
> new first factor I take the different components from the
> component matrix
> and do:
> Component1 * Value Original Variable 1 + Component 2 *
> Value Original
> Variable 2 + ...
>
> how does this translate into a parallel shift of my
> original data? any
> ideas? thanks so much in advance!
> --
> View this message in context:
>
http://www.nabble.com/PCA%3A-Shift%2C-Twist%2C-Butterfly-tp25108336p25108336

> .html
> Sent from the SPSSX Discussion mailing list archive at
> Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email]
> (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
> command
> INFO REFCARD
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release
> Date: 08/23/09
> 06:18:00
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email]
> (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
> command
> INFO REFCARD
>
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/23/09
18:03:00

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Shift, Twist, Butterfly

Swank, Paul R
PC analysis has a fundamentally different purpose that does PA. PC determines components of the measures that account for the maximum amount of variance and are independent of one another. The purpose for such components is data reduction. There is no rotation needed nor is there any concern for what the underlying structure of the measures are. You just want a smaller number of predictors that are independent of each other. PA, on the other hand, is used to try and fathom the underlying structure of the variables. Its purpose is different. You are searching for latent variables with PA analyses. You are trying to partition the variance of the variables into the three parts, common factor variance, specific factor variance, and error variance. PCA does not do this. PA is much closer to CFA that PCA  is. I'd look to the writings of Preacher and MacCallum on Tom Swift and his Electric factor analysis machine for more detail.

Dr. Paul R. Swank,
Professor and Director of Research
Children's Learning Institute
University of Texas Health Science Center-Houston


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Hector Maletta
Sent: Sunday, August 23, 2009 8:04 PM
To: [hidden email]
Subject: Re: Shift, Twist, Butterfly

Scott,
Sorry but I insist in my view. Factor analysis is a general technique, with
many variants. One of the variants concerns the assumptions on commonality:
in PCA the whole variance is subjected to analysis, and therefore the sum of
factors' contributions explains 100% of the variance of all observed
variables, while other variants start with an estimate of the amount of
"common" variance and separate it from the rest of variance, usually
supposed to be due to idiosyncratic characteristics of each observed
variable, unrelated to other observed variables. In this case, the technique
computes factors explaining the common variance only, leaving the
idiosyncratic part unexplained (attributed to peculiarities of each observed
variable). This is hardly such a transcendental difference. From the
mathematical or computational point of view there is really no big deal of a
difference: the same mathematical procedure applies in both cases, the only
difference being in the values at the main diagonal of the intercorrelation
matrix: 1's in PCA, and the assumed commonalities in the other alternatives.

The purpose of the analysis, and the nature of the variables, would dictate
which alternative is more adequate for each analysis, but that hardly
impinges on the problem posed by Jimjohn: I limited myself to his case, in
which he is using PCA.
I do not agree either that PCA is not usable for locating latent factors.
That is precisely what factor analysis do, either under PCA hypotheses or
otherwise. It estimates one or more fictitious variables ('constructs') that
in case they existed would 'explain' (statistically speaking) the observed
correlation of observed variables. In other words, controlling for the
components or factors would reduce or eliminate the correlation between
observed variables. These factors or components are not objective things:
they are just mathematical constructs, and their values (and even their
correlations with observed variables, or 'loadings') would be altered by a
change in the frame of reference (i.e. by rotation). Taking just the major
components (in PCA) and disregarding the rest achieves precisely the kind of
data reduction you are talking about. That is precisely what Spearman did at
the beginning of the 20th Century to identify "general intelligence", in
fact just the first factor extracted in a PCA procedure applied to a number
of inter-correlated cognitive-ability tests. A similar procedure was used,
years later, by Thurstone to extract several factors (corresponding to
various different cognitive abilities such as linguistic, quantitative,
spatial, etc), for which Thurstone added rotation to the repertoire of
factor analysis in order to define the factors in such a way that each was
clearly more correlated with one different set of observed variables (this
neat attribute was, of course, dependent on the particular frame of
reference chosen, and was not a property of the factors themselves; the
factors continued to be just mathematical artifacts, not real things). Other
variants of FA (more exactly, EFA) were later introduced to deal with other
specific (computational) problems.
Eventually, confirmatory factor analysis emerged, in which only certain
pre-specified effects and factors were included. Some of the effects are
postulated to be zero, and the procedure is, at last, not very different
from Structural Equation Modeling. EFA always "succeeds": every data set can
be treated to EFA, and the results cannot be challenged. CFA, instead, may
be refuted if it fails to reproduce the observed data structure.
However different the intents and results, all these procedures are just
variants of the same basic procedure, factor analysis, which is in turn just
a corollary or application of the general linear model where ordinary least
square regression, ANOVA and other statistical workhorses also belong. Their
common feature is that all variables are supposed to be linearly related
with an error term, so that for any set of variables (Y, X, Z, ...) it is
postulated that Y=a + bX + cZ...+e, where "e" is a random error with zero
mean, and the coefficients (a, b, c...) are computed by minimizing the sum
of the squared errors (e2) over all cases.



-----Original Message-----
From: SR Millis [mailto:[hidden email]]
Sent: 23 August 2009 18:55
To: Hector Maletta; SPSS
Subject: Re: Shift, Twist, Butterfly

Hector,

I don't agree with you.  Exploratory factor analysis (EPA) is not the same
"thing" as principal component analysis (PCA).  PCA is not a type of EFA.
PCA and EFA are different statistical methods that are designed to achieve
different objectives (Bentler & Kano, 1990).

EFA is based on the common factor model that postulates that each item in a
battery of measured items is a linear function of one or more common factors
and one unique factor.  Common factors are latent (unobserved) factors.
Conversely, PCA doesn't differentiate between common and unique variance.
Hence, principal components are not latent variables---and it isn't
conceptually correct to equate them with common factors.  The goal of PCA is
data reduction, i.e., taking scores on a large set of measured items and
reducing the scores on a smaller set of composite variables.  In contrast,
the goal of EFA is to identify latent constructs, i.e., understanding the
structure of the correlations among the measured variables or items.

Some have argued that PCA and EFA produce similar results.  However, this is
not always the case.  Differences emerge are likely when communalities are
low (e.g., .40).


Scott R Millis, PhD, ABPP (CN,CL,RP), CStat, CSci
Professor & Director of Research
Dept of Physical Medicine & Rehabilitation
Dept of Emergency Medicine
Wayne State University School of Medicine
261 Mack Blvd
Detroit, MI 48201
Email:  [hidden email]
Tel: 313-993-8085
Fax: 313-966-7682


--- On Sun, 8/23/09, Hector Maletta <[hidden email]> wrote:

> From: Hector Maletta <[hidden email]>
> Subject: Re: Shift, Twist, Butterfly
> To: [hidden email]
> Date: Sunday, August 23, 2009, 7:27 PM
> Jimjohn,
> I do not know who told you such nonsense, but IMHO it's all
> wrong. PCA
> identifies underlying factors that "explain" the
> correlations between your
> observed variables. In other words, if you were to measure
> those underlying
> (actually non measurable) factors or components, the
> correlation between
> your observed variables, controlling for the underlying
> factors, would be
> zero or near zero. There is no direct link between each
> factor and one
> specific variable (such as the one you sketch) mandating
> the multiplication
> of variable 1 times component 1 + variable 2 * component 2,
> etc. Each
> component is linked to all variables (though it may be
> correlated more
> strongly with sone of them), and each variable is linked to
> all underlying
> components.
> If all your observed variables are closely correlated (e.g.
> if you have
> several independent indexes of the same psychological
> trait, as in several
> intelligence tests), the first component represents the
> most important
> common component that could be constructed to explain the
> greater part of
> the intercorrelations between observed variables.
> Controlling for that first
> component maximises the amount of variance explained by a
> single (non
> observed) variable. It leaves usually a portion of
> unexplained variance in
> observed variables. The second component is the best you
> can do to explain
> the remainded variance in observed variables, and the third
> and further
> components, in turn, explain the remaining variance not
> explained by
> previous factors. In the traditional interpretation going
> back to the early
> 20th Century study of cognitive ability by Spearmann, the
> first factors
> represented "general intelligence", and the other
> components may represent
> other independent factors affecting test scores, such as a
> general
> familiarity with test-taking situations, socioeconomic
> status, specific
> abilities linked with some specific tests, and what not.
> PCA is but one of various factor analysis techniques. I
> recommend you first
> read some elementary text on factor analysis, to get more
> acquainted with
> the nature, scope and limitations of PCA.
>
> Hector
>
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]]
> On Behalf Of
> jimjohn
> Sent: 23 August 2009 18:03
> To: [hidden email]
> Subject: PCA: Shift, Twist, Butterfly
>
> im just starting to understand how to conduct PCA, and am
> having trouble
> interpreting the results. I've been told that the first
> component represents
> a parallel shift, the second represents a twist, and the
> third represents a
> butterfly of my original data.
>
> from what i understand, if i have three components, then to
> come up with my
> new first factor I take the different components from the
> component matrix
> and do:
> Component1 * Value Original Variable 1 + Component 2 *
> Value Original
> Variable 2 + ...
>
> how does this translate into a parallel shift of my
> original data? any
> ideas? thanks so much in advance!
> --
> View this message in context:
>
http://www.nabble.com/PCA%3A-Shift%2C-Twist%2C-Butterfly-tp25108336p25108336

> .html
> Sent from the SPSSX Discussion mailing list archive at
> Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email]
> (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
> command
> INFO REFCARD
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release
> Date: 08/23/09
> 06:18:00
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email]
> (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
> command
> INFO REFCARD
>
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/23/09
18:03:00

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Shifts, Twists and Butterflies

Fernando Mazariegos
RE: Shifts, Twists and Butterflies

Hi there,

The terms "shift", "twist" and "butterfly" are used in finance and risk management, mostly for describing yield curve changes.

From http://en.wikipedia.org/wiki/Fixed_income_attribution, which measures returns generated by various sources of  risk in a fixed income portfolio:

  • shift measures the degree to which a curve has moved upwards or downwards, in parallel, across all maturities.
  • twist measures the degree to which the curve has steepened or flattened.
  • curvature (or butterfly, or curve reshaping) measures the degree to which the term structure has become more or less curved.

<snip>…
A factor-based model of yield curve movements is calculated by deriving the covariance matrix of yield shifts at predefined maturities, and calculating the eigenvectors and eigenvalues of this matrix. Each eigenvector corresponds to a fundamental model of the yield curve, and each eigenvector is orthogonal, so that the curve movement on any given day is a linear combination of the basis eigenvectors. The eigenvalues of this matrix then give the relative weights, or importance, of these curve shifts.

So they could be related with factor analysis via covariances and eigenvalues, but are not the same stuff, nor could be interpreted similarly.

My two cents,

Fernando Mazariegos
Director
Mapelligent
Geomarketing in Central America

-----Mensaje original-----
De: SPSSX(r) Discussion [[hidden email]] En nombre de Swank, Paul R
Enviado el: Lunes, 24 de Agosto de 2009 12:45 p.m.
Para: [hidden email]
Asunto: Re: Shift, Twist, Butterfly

PC analysis has a fundamentally different purpose that does PA. PC determines components of the measures that account for the maximum amount of variance and are independent of one another. The purpose for such components is data reduction. There is no rotation needed nor is there any concern for what the underlying structure of the measures are. You just want a smaller number of predictors that are independent of each other. PA, on the other hand, is used to try and fathom the underlying structure of the variables. Its purpose is different. You are searching for latent variables with PA analyses. You are trying to partition the variance of the variables into the three parts, common factor variance, specific factor variance, and error variance. PCA does not do this. PA is much closer to CFA that PCA  is. I'd look to the writings of Preacher and MacCallum on Tom Swift and his Electric factor analysis machine for more detail.

Dr. Paul R. Swank,
Professor and Director of Research
Children's Learning Institute
University of Texas Health Science Center-Houston


-----Original Message-----
From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Hector Maletta
Sent: Sunday, August 23, 2009 8:04 PM
To: [hidden email]
Subject: Re: Shift, Twist, Butterfly

Scott,
Sorry but I insist in my view. Factor analysis is a general technique, with many variants. One of the variants concerns the assumptions on commonality:

in PCA the whole variance is subjected to analysis, and therefore the sum of factors' contributions explains 100% of the variance of all observed variables, while other variants start with an estimate of the amount of "common" variance and separate it from the rest of variance, usually supposed to be due to idiosyncratic characteristics of each observed variable, unrelated to other observed variables. In this case, the technique computes factors explaining the common variance only, leaving the idiosyncratic part unexplained (attributed to peculiarities of each observed variable). This is hardly such a transcendental difference. From the mathematical or computational point of view there is really no big deal of a

difference: the same mathematical procedure applies in both cases, the only difference being in the values at the main diagonal of the intercorrelation

matrix: 1's in PCA, and the assumed commonalities in the other alternatives.

The purpose of the analysis, and the nature of the variables, would dictate which alternative is more adequate for each analysis, but that hardly impinges on the problem posed by Jimjohn: I limited myself to his case, in which he is using PCA.

I do not agree either that PCA is not usable for locating latent factors.
That is precisely what factor analysis do, either under PCA hypotheses or otherwise. It estimates one or more fictitious variables ('constructs') that in case they existed would 'explain' (statistically speaking) the observed correlation of observed variables. In other words, controlling for the components or factors would reduce or eliminate the correlation between observed variables. These factors or components are not objective things:

they are just mathematical constructs, and their values (and even their correlations with observed variables, or 'loadings') would be altered by a change in the frame of reference (i.e. by rotation). Taking just the major components (in PCA) and disregarding the rest achieves precisely the kind of data reduction you are talking about. That is precisely what Spearman did at the beginning of the 20th Century to identify "general intelligence", in fact just the first factor extracted in a PCA procedure applied to a number of inter-correlated cognitive-ability tests. A similar procedure was used, years later, by Thurstone to extract several factors (corresponding to various different cognitive abilities such as linguistic, quantitative, spatial, etc), for which Thurstone added rotation to the repertoire of factor analysis in order to define the factors in such a way that each was clearly more correlated with one different set of observed variables (this neat attribute was, of course, dependent on the particular frame of reference chosen, and was not a property of the factors themselves; the factors continued to be just mathematical artifacts, not real things). Other variants of FA (more exactly, EFA) were later introduced to deal with other specific (computational) problems.

Eventually, confirmatory factor analysis emerged, in which only certain pre-specified effects and factors were included. Some of the effects are postulated to be zero, and the procedure is, at last, not very different from Structural Equation Modeling. EFA always "succeeds": every data set can be treated to EFA, and the results cannot be challenged. CFA, instead, may be refuted if it fails to reproduce the observed data structure.

However different the intents and results, all these procedures are just variants of the same basic procedure, factor analysis, which is in turn just a corollary or application of the general linear model where ordinary least square regression, ANOVA and other statistical workhorses also belong. Their common feature is that all variables are supposed to be linearly related with an error term, so that for any set of variables (Y, X, Z, ...) it is postulated that Y=a + bX + cZ...+e, where "e" is a random error with zero mean, and the coefficients (a, b, c...) are computed by minimizing the sum of the squared errors (e2) over all cases.



-----Original Message-----
From: SR Millis [[hidden email]]
Sent: 23 August 2009 18:55
To: Hector Maletta; SPSS
Subject: Re: Shift, Twist, Butterfly

Hector,

I don't agree with you.  Exploratory factor analysis (EPA) is not the same "thing" as principal component analysis (PCA).  PCA is not a type of EFA.

PCA and EFA are different statistical methods that are designed to achieve different objectives (Bentler & Kano, 1990).

EFA is based on the common factor model that postulates that each item in a battery of measured items is a linear function of one or more common factors and one unique factor.  Common factors are latent (unobserved) factors.

Conversely, PCA doesn't differentiate between common and unique variance.
Hence, principal components are not latent variables---and it isn't conceptually correct to equate them with common factors.  The goal of PCA is data reduction, i.e., taking scores on a large set of measured items and reducing the scores on a smaller set of composite variables.  In contrast, the goal of EFA is to identify latent constructs, i.e., understanding the structure of the correlations among the measured variables or items.

Some have argued that PCA and EFA produce similar results.  However, this is not always the case.  Differences emerge are likely when communalities are low (e.g., .40).


Scott R Millis, PhD, ABPP (CN,CL,RP), CStat, CSci Professor & Director of Research Dept of Physical Medicine & Rehabilitation Dept of Emergency Medicine Wayne State University School of Medicine

261 Mack Blvd
Detroit, MI 48201
Email:  [hidden email]
Tel: 313-993-8085
Fax: 313-966-7682


--- On Sun, 8/23/09, Hector Maletta <[hidden email]> wrote:

> From: Hector Maletta <[hidden email]>
> Subject: Re: Shift, Twist, Butterfly
> To: [hidden email]
> Date: Sunday, August 23, 2009, 7:27 PM Jimjohn, I do not know who told
> you such nonsense, but IMHO it's all wrong. PCA identifies underlying
> factors that "explain" the correlations between your observed
> variables. In other words, if you were to measure those underlying
> (actually non measurable) factors or components, the correlation
> between your observed variables, controlling for the underlying
> factors, would be zero or near zero. There is no direct link between
> each factor and one specific variable (such as the one you sketch)
> mandating the multiplication of variable 1 times component 1 +
> variable 2 * component 2, etc. Each component is linked to all
> variables (though it may be correlated more strongly with sone of
> them), and each variable is linked to all underlying components.
> If all your observed variables are closely correlated (e.g.
> if you have
> several independent indexes of the same psychological trait, as in
> several intelligence tests), the first component represents the most
> important common component that could be constructed to explain the
> greater part of the intercorrelations between observed variables.
> Controlling for that first
> component maximises the amount of variance explained by a single (non
> observed) variable. It leaves usually a portion of unexplained
> variance in observed variables. The second component is the best you
> can do to explain the remainded variance in observed variables, and
> the third and further components, in turn, explain the remaining
> variance not explained by previous factors. In the traditional
> interpretation going back to the early 20th Century study of cognitive
> ability by Spearmann, the first factors represented "general
> intelligence", and the other components may represent other
> independent factors affecting test scores, such as a general
> familiarity with test-taking situations, socioeconomic status,
> specific abilities linked with some specific tests, and what not.
> PCA is but one of various factor analysis techniques. I recommend you
> first read some elementary text on factor analysis, to get more
> acquainted with the nature, scope and limitations of PCA.
>
> Hector
>
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [[hidden email]] On Behalf
> Of jimjohn
> Sent: 23 August 2009 18:03
> To: [hidden email]
> Subject: PCA: Shift, Twist, Butterfly
>
> im just starting to understand how to conduct PCA, and am having
> trouble interpreting the results. I've been told that the first
> component represents a parallel shift, the second represents a twist,
> and the third represents a butterfly of my original data.
>
> from what i understand, if i have three components, then to come up
> with my new first factor I take the different components from the
> component matrix and do:
> Component1 * Value Original Variable 1 + Component 2 * Value Original
> Variable 2 + ...
>
> how does this translate into a parallel shift of my original data? any
> ideas? thanks so much in advance!
> --
> View this message in context:
>
http://www.nabble.com/PCA%3A-Shift%2C-Twist%2C-Butterfly-tp25108336p25108336
> .html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except
> the command. To leave the list, send the command SIGNOFF SPSSX-L For a
> list of commands to manage subscriptions, send the command INFO
> REFCARD No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release
> Date: 08/23/09
> 06:18:00
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except
> the command. To leave the list, send the command SIGNOFF SPSSX-L For a
> list of commands to manage subscriptions, send the command INFO
> REFCARD
>
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/23/09 18:03:00

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Shift, Twist, Butterfly

Hector Maletta
In reply to this post by Hector Maletta
Azam,
SPSS produces factor scores for each case, i.e. the score of each case on
each component, through the added keyword /SAVE in the FACTOR command. These
scores are obtained as linear combinations of variables, weighted by their
loadings with each factor. SPSS creates a new variable for each factor
score, and gives each new variable a standard name, but you can name them as
you wish.

Hector

-----Original Message-----
From: [hidden email] [mailto:[hidden email]]
Sent: 24 August 2009 16:46
To: Hector Maletta
Subject: RE: Shift, Twist, Butterfly

Thanks so much Hector, appreciate that! I am hearing of one thing
called inverse factor loadings regarding PCA's, which should return
factor values for each variable for each case. Has anyone heard of
this? and if so, how can I get SPSS to do that? Thanks in advance!



Quoting Hector Maletta <[hidden email]>:

> Jimjohn,
> I do not know who told you such nonsense, but IMHO it's all wrong. PCA
> identifies underlying factors that "explain" the correlations between your
> observed variables. In other words, if you were to measure those
underlying
> (actually non measurable) factors or components, the correlation between
> your observed variables, controlling for the underlying factors, would be
> zero or near zero. There is no direct link between each factor and one
> specific variable (such as the one you sketch) mandating the
multiplication
> of variable 1 times component 1 + variable 2 * component 2, etc. Each
> component is linked to all variables (though it may be correlated more
> strongly with sone of them), and each variable is linked to all underlying
> components.
> If all your observed variables are closely correlated (e.g. if you have
> several independent indexes of the same psychological trait, as in several
> intelligence tests), the first component represents the most important
> common component that could be constructed to explain the greater part of
> the intercorrelations between observed variables. Controlling for that
first
> component maximises the amount of variance explained by a single (non
> observed) variable. It leaves usually a portion of unexplained variance in
> observed variables. The second component is the best you can do to explain
> the remainded variance in observed variables, and the third and further
> components, in turn, explain the remaining variance not explained by
> previous factors. In the traditional interpretation going back to the
early
> 20th Century study of cognitive ability by Spearmann, the first factors
> represented "general intelligence", and the other components may represent
> other independent factors affecting test scores, such as a general
> familiarity with test-taking situations, socioeconomic status, specific
> abilities linked with some specific tests, and what not.
> PCA is but one of various factor analysis techniques. I recommend you
first

> read some elementary text on factor analysis, to get more acquainted with
> the nature, scope and limitations of PCA.
>
> Hector
>
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> jimjohn
> Sent: 23 August 2009 18:03
> To: [hidden email]
> Subject: PCA: Shift, Twist, Butterfly
>
> im just starting to understand how to conduct PCA, and am having trouble
> interpreting the results. I've been told that the first component
represents
> a parallel shift, the second represents a twist, and the third represents
a
> butterfly of my original data.
>
> from what i understand, if i have three components, then to come up with
my

> new first factor I take the different components from the component matrix
> and do:
> Component1 * Value Original Variable 1 + Component 2 * Value Original
> Variable 2 + ...
>
> how does this translate into a parallel shift of my original data? any
> ideas? thanks so much in advance!
> --
> View this message in context:
>
http://www.nabble.com/PCA%3A-Shift%2C-Twist%2C-Butterfly-tp25108336p25108336

> .html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/23/09
> 06:18:00
>
>



No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/24/09
06:05:00

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Shift, Twist, Butterfly

Hector Maletta
In reply to this post by Swank, Paul R
I insist: PCA and other types of Factor Analysis rest on different
assumptions and may be used for different purposes, but all are essentially
the same statistical model and procedure. One assumes a "commonality" of 1,
the others a commonality < 1, i.e. some unique variance, but except for that
the rest is exactly the same. You can rotate PCA solutions just as you can
rotate other factor analyses: nothing hinders that. As with all statistical
procedures, you choose the variant that best suits your dataset and theory.

Moreover, the objective is not necessarily data reduction, in the sense of
getting a SMALLER number of (latent) variables instead of your original set
of (observed) variables. Your goal might be to replace your k observed
variables with k latent factors, with the goal for instance of getting the
information of your k (correlated) variables separated into (k) orthogonal
or uncorrelated factors.

One thing is the use to which one habitually puts these procedures, and
another thing altogether is the nature of the procedures.
Since all kinds of factor analysis create constructs that do not really
exist, what set of constructs you'd prefer would depend on the kind of
analytical problem you are facing, and the sort of theory you have about the
problem.
Hector

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Swank, Paul R
Sent: 24 August 2009 13:45
To: [hidden email]
Subject: Re: Shift, Twist, Butterfly

PC analysis has a fundamentally different purpose that does PA. PC
determines components of the measures that account for the maximum amount of
variance and are independent of one another. The purpose for such components
is data reduction. There is no rotation needed nor is there any concern for
what the underlying structure of the measures are. You just want a smaller
number of predictors that are independent of each other. PA, on the other
hand, is used to try and fathom the underlying structure of the variables.
Its purpose is different. You are searching for latent variables with PA
analyses. You are trying to partition the variance of the variables into the
three parts, common factor variance, specific factor variance, and error
variance. PCA does not do this. PA is much closer to CFA that PCA  is. I'd
look to the writings of Preacher and MacCallum on Tom Swift and his Electric
factor analysis machine for more detail.

Dr. Paul R. Swank,
Professor and Director of Research
Children's Learning Institute
University of Texas Health Science Center-Houston


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Hector Maletta
Sent: Sunday, August 23, 2009 8:04 PM
To: [hidden email]
Subject: Re: Shift, Twist, Butterfly

Scott,
Sorry but I insist in my view. Factor analysis is a general technique, with
many variants. One of the variants concerns the assumptions on commonality:
in PCA the whole variance is subjected to analysis, and therefore the sum of
factors' contributions explains 100% of the variance of all observed
variables, while other variants start with an estimate of the amount of
"common" variance and separate it from the rest of variance, usually
supposed to be due to idiosyncratic characteristics of each observed
variable, unrelated to other observed variables. In this case, the technique
computes factors explaining the common variance only, leaving the
idiosyncratic part unexplained (attributed to peculiarities of each observed
variable). This is hardly such a transcendental difference. From the
mathematical or computational point of view there is really no big deal of a
difference: the same mathematical procedure applies in both cases, the only
difference being in the values at the main diagonal of the intercorrelation
matrix: 1's in PCA, and the assumed commonalities in the other alternatives.

The purpose of the analysis, and the nature of the variables, would dictate
which alternative is more adequate for each analysis, but that hardly
impinges on the problem posed by Jimjohn: I limited myself to his case, in
which he is using PCA.
I do not agree either that PCA is not usable for locating latent factors.
That is precisely what factor analysis do, either under PCA hypotheses or
otherwise. It estimates one or more fictitious variables ('constructs') that
in case they existed would 'explain' (statistically speaking) the observed
correlation of observed variables. In other words, controlling for the
components or factors would reduce or eliminate the correlation between
observed variables. These factors or components are not objective things:
they are just mathematical constructs, and their values (and even their
correlations with observed variables, or 'loadings') would be altered by a
change in the frame of reference (i.e. by rotation). Taking just the major
components (in PCA) and disregarding the rest achieves precisely the kind of
data reduction you are talking about. That is precisely what Spearman did at
the beginning of the 20th Century to identify "general intelligence", in
fact just the first factor extracted in a PCA procedure applied to a number
of inter-correlated cognitive-ability tests. A similar procedure was used,
years later, by Thurstone to extract several factors (corresponding to
various different cognitive abilities such as linguistic, quantitative,
spatial, etc), for which Thurstone added rotation to the repertoire of
factor analysis in order to define the factors in such a way that each was
clearly more correlated with one different set of observed variables (this
neat attribute was, of course, dependent on the particular frame of
reference chosen, and was not a property of the factors themselves; the
factors continued to be just mathematical artifacts, not real things). Other
variants of FA (more exactly, EFA) were later introduced to deal with other
specific (computational) problems.
Eventually, confirmatory factor analysis emerged, in which only certain
pre-specified effects and factors were included. Some of the effects are
postulated to be zero, and the procedure is, at last, not very different
from Structural Equation Modeling. EFA always "succeeds": every data set can
be treated to EFA, and the results cannot be challenged. CFA, instead, may
be refuted if it fails to reproduce the observed data structure.
However different the intents and results, all these procedures are just
variants of the same basic procedure, factor analysis, which is in turn just
a corollary or application of the general linear model where ordinary least
square regression, ANOVA and other statistical workhorses also belong. Their
common feature is that all variables are supposed to be linearly related
with an error term, so that for any set of variables (Y, X, Z, ...) it is
postulated that Y=a + bX + cZ...+e, where "e" is a random error with zero
mean, and the coefficients (a, b, c...) are computed by minimizing the sum
of the squared errors (e2) over all cases.



-----Original Message-----
From: SR Millis [mailto:[hidden email]]
Sent: 23 August 2009 18:55
To: Hector Maletta; SPSS
Subject: Re: Shift, Twist, Butterfly

Hector,

I don't agree with you.  Exploratory factor analysis (EPA) is not the same
"thing" as principal component analysis (PCA).  PCA is not a type of EFA.
PCA and EFA are different statistical methods that are designed to achieve
different objectives (Bentler & Kano, 1990).

EFA is based on the common factor model that postulates that each item in a
battery of measured items is a linear function of one or more common factors
and one unique factor.  Common factors are latent (unobserved) factors.
Conversely, PCA doesn't differentiate between common and unique variance.
Hence, principal components are not latent variables---and it isn't
conceptually correct to equate them with common factors.  The goal of PCA is
data reduction, i.e., taking scores on a large set of measured items and
reducing the scores on a smaller set of composite variables.  In contrast,
the goal of EFA is to identify latent constructs, i.e., understanding the
structure of the correlations among the measured variables or items.

Some have argued that PCA and EFA produce similar results.  However, this is
not always the case.  Differences emerge are likely when communalities are
low (e.g., .40).


Scott R Millis, PhD, ABPP (CN,CL,RP), CStat, CSci
Professor & Director of Research
Dept of Physical Medicine & Rehabilitation
Dept of Emergency Medicine
Wayne State University School of Medicine
261 Mack Blvd
Detroit, MI 48201
Email:  [hidden email]
Tel: 313-993-8085
Fax: 313-966-7682


--- On Sun, 8/23/09, Hector Maletta <[hidden email]> wrote:

> From: Hector Maletta <[hidden email]>
> Subject: Re: Shift, Twist, Butterfly
> To: [hidden email]
> Date: Sunday, August 23, 2009, 7:27 PM
> Jimjohn,
> I do not know who told you such nonsense, but IMHO it's all
> wrong. PCA
> identifies underlying factors that "explain" the
> correlations between your
> observed variables. In other words, if you were to measure
> those underlying
> (actually non measurable) factors or components, the
> correlation between
> your observed variables, controlling for the underlying
> factors, would be
> zero or near zero. There is no direct link between each
> factor and one
> specific variable (such as the one you sketch) mandating
> the multiplication
> of variable 1 times component 1 + variable 2 * component 2,
> etc. Each
> component is linked to all variables (though it may be
> correlated more
> strongly with sone of them), and each variable is linked to
> all underlying
> components.
> If all your observed variables are closely correlated (e.g.
> if you have
> several independent indexes of the same psychological
> trait, as in several
> intelligence tests), the first component represents the
> most important
> common component that could be constructed to explain the
> greater part of
> the intercorrelations between observed variables.
> Controlling for that first
> component maximises the amount of variance explained by a
> single (non
> observed) variable. It leaves usually a portion of
> unexplained variance in
> observed variables. The second component is the best you
> can do to explain
> the remainded variance in observed variables, and the third
> and further
> components, in turn, explain the remaining variance not
> explained by
> previous factors. In the traditional interpretation going
> back to the early
> 20th Century study of cognitive ability by Spearmann, the
> first factors
> represented "general intelligence", and the other
> components may represent
> other independent factors affecting test scores, such as a
> general
> familiarity with test-taking situations, socioeconomic
> status, specific
> abilities linked with some specific tests, and what not.
> PCA is but one of various factor analysis techniques. I
> recommend you first
> read some elementary text on factor analysis, to get more
> acquainted with
> the nature, scope and limitations of PCA.
>
> Hector
>
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]]
> On Behalf Of
> jimjohn
> Sent: 23 August 2009 18:03
> To: [hidden email]
> Subject: PCA: Shift, Twist, Butterfly
>
> im just starting to understand how to conduct PCA, and am
> having trouble
> interpreting the results. I've been told that the first
> component represents
> a parallel shift, the second represents a twist, and the
> third represents a
> butterfly of my original data.
>
> from what i understand, if i have three components, then to
> come up with my
> new first factor I take the different components from the
> component matrix
> and do:
> Component1 * Value Original Variable 1 + Component 2 *
> Value Original
> Variable 2 + ...
>
> how does this translate into a parallel shift of my
> original data? any
> ideas? thanks so much in advance!
> --
> View this message in context:
>
http://www.nabble.com/PCA%3A-Shift%2C-Twist%2C-Butterfly-tp25108336p25108336

> .html
> Sent from the SPSSX Discussion mailing list archive at
> Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email]
> (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
> command
> INFO REFCARD
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release
> Date: 08/23/09
> 06:18:00
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email]
> (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
> command
> INFO REFCARD
>
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/23/09
18:03:00

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/24/09
06:05:00

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Shift, Twist, Butterfly

Swank, Paul R
Well, I insist to the contrary so we will have to agree to disagree.

Dr. Paul R. Swank,
Professor and Director of Research
Children's Learning Institute
University of Texas Health Science Center-Houston

-----Original Message-----
From: Hector Maletta [mailto:[hidden email]]
Sent: Monday, August 24, 2009 6:03 PM
To: Swank, Paul R; [hidden email]
Subject: RE: Shift, Twist, Butterfly

I insist: PCA and other types of Factor Analysis rest on different
assumptions and may be used for different purposes, but all are essentially
the same statistical model and procedure. One assumes a "commonality" of 1,
the others a commonality < 1, i.e. some unique variance, but except for that
the rest is exactly the same. You can rotate PCA solutions just as you can
rotate other factor analyses: nothing hinders that. As with all statistical
procedures, you choose the variant that best suits your dataset and theory.

Moreover, the objective is not necessarily data reduction, in the sense of
getting a SMALLER number of (latent) variables instead of your original set
of (observed) variables. Your goal might be to replace your k observed
variables with k latent factors, with the goal for instance of getting the
information of your k (correlated) variables separated into (k) orthogonal
or uncorrelated factors.

One thing is the use to which one habitually puts these procedures, and
another thing altogether is the nature of the procedures.
Since all kinds of factor analysis create constructs that do not really
exist, what set of constructs you'd prefer would depend on the kind of
analytical problem you are facing, and the sort of theory you have about the
problem.
Hector

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Swank, Paul R
Sent: 24 August 2009 13:45
To: [hidden email]
Subject: Re: Shift, Twist, Butterfly

PC analysis has a fundamentally different purpose that does PA. PC
determines components of the measures that account for the maximum amount of
variance and are independent of one another. The purpose for such components
is data reduction. There is no rotation needed nor is there any concern for
what the underlying structure of the measures are. You just want a smaller
number of predictors that are independent of each other. PA, on the other
hand, is used to try and fathom the underlying structure of the variables.
Its purpose is different. You are searching for latent variables with PA
analyses. You are trying to partition the variance of the variables into the
three parts, common factor variance, specific factor variance, and error
variance. PCA does not do this. PA is much closer to CFA that PCA  is. I'd
look to the writings of Preacher and MacCallum on Tom Swift and his Electric
factor analysis machine for more detail.

Dr. Paul R. Swank,
Professor and Director of Research
Children's Learning Institute
University of Texas Health Science Center-Houston


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Hector Maletta
Sent: Sunday, August 23, 2009 8:04 PM
To: [hidden email]
Subject: Re: Shift, Twist, Butterfly

Scott,
Sorry but I insist in my view. Factor analysis is a general technique, with
many variants. One of the variants concerns the assumptions on commonality:
in PCA the whole variance is subjected to analysis, and therefore the sum of
factors' contributions explains 100% of the variance of all observed
variables, while other variants start with an estimate of the amount of
"common" variance and separate it from the rest of variance, usually
supposed to be due to idiosyncratic characteristics of each observed
variable, unrelated to other observed variables. In this case, the technique
computes factors explaining the common variance only, leaving the
idiosyncratic part unexplained (attributed to peculiarities of each observed
variable). This is hardly such a transcendental difference. From the
mathematical or computational point of view there is really no big deal of a
difference: the same mathematical procedure applies in both cases, the only
difference being in the values at the main diagonal of the intercorrelation
matrix: 1's in PCA, and the assumed commonalities in the other alternatives.

The purpose of the analysis, and the nature of the variables, would dictate
which alternative is more adequate for each analysis, but that hardly
impinges on the problem posed by Jimjohn: I limited myself to his case, in
which he is using PCA.
I do not agree either that PCA is not usable for locating latent factors.
That is precisely what factor analysis do, either under PCA hypotheses or
otherwise. It estimates one or more fictitious variables ('constructs') that
in case they existed would 'explain' (statistically speaking) the observed
correlation of observed variables. In other words, controlling for the
components or factors would reduce or eliminate the correlation between
observed variables. These factors or components are not objective things:
they are just mathematical constructs, and their values (and even their
correlations with observed variables, or 'loadings') would be altered by a
change in the frame of reference (i.e. by rotation). Taking just the major
components (in PCA) and disregarding the rest achieves precisely the kind of
data reduction you are talking about. That is precisely what Spearman did at
the beginning of the 20th Century to identify "general intelligence", in
fact just the first factor extracted in a PCA procedure applied to a number
of inter-correlated cognitive-ability tests. A similar procedure was used,
years later, by Thurstone to extract several factors (corresponding to
various different cognitive abilities such as linguistic, quantitative,
spatial, etc), for which Thurstone added rotation to the repertoire of
factor analysis in order to define the factors in such a way that each was
clearly more correlated with one different set of observed variables (this
neat attribute was, of course, dependent on the particular frame of
reference chosen, and was not a property of the factors themselves; the
factors continued to be just mathematical artifacts, not real things). Other
variants of FA (more exactly, EFA) were later introduced to deal with other
specific (computational) problems.
Eventually, confirmatory factor analysis emerged, in which only certain
pre-specified effects and factors were included. Some of the effects are
postulated to be zero, and the procedure is, at last, not very different
from Structural Equation Modeling. EFA always "succeeds": every data set can
be treated to EFA, and the results cannot be challenged. CFA, instead, may
be refuted if it fails to reproduce the observed data structure.
However different the intents and results, all these procedures are just
variants of the same basic procedure, factor analysis, which is in turn just
a corollary or application of the general linear model where ordinary least
square regression, ANOVA and other statistical workhorses also belong. Their
common feature is that all variables are supposed to be linearly related
with an error term, so that for any set of variables (Y, X, Z, ...) it is
postulated that Y=a + bX + cZ...+e, where "e" is a random error with zero
mean, and the coefficients (a, b, c...) are computed by minimizing the sum
of the squared errors (e2) over all cases.



-----Original Message-----
From: SR Millis [mailto:[hidden email]]
Sent: 23 August 2009 18:55
To: Hector Maletta; SPSS
Subject: Re: Shift, Twist, Butterfly

Hector,

I don't agree with you.  Exploratory factor analysis (EPA) is not the same
"thing" as principal component analysis (PCA).  PCA is not a type of EFA.
PCA and EFA are different statistical methods that are designed to achieve
different objectives (Bentler & Kano, 1990).

EFA is based on the common factor model that postulates that each item in a
battery of measured items is a linear function of one or more common factors
and one unique factor.  Common factors are latent (unobserved) factors.
Conversely, PCA doesn't differentiate between common and unique variance.
Hence, principal components are not latent variables---and it isn't
conceptually correct to equate them with common factors.  The goal of PCA is
data reduction, i.e., taking scores on a large set of measured items and
reducing the scores on a smaller set of composite variables.  In contrast,
the goal of EFA is to identify latent constructs, i.e., understanding the
structure of the correlations among the measured variables or items.

Some have argued that PCA and EFA produce similar results.  However, this is
not always the case.  Differences emerge are likely when communalities are
low (e.g., .40).


Scott R Millis, PhD, ABPP (CN,CL,RP), CStat, CSci
Professor & Director of Research
Dept of Physical Medicine & Rehabilitation
Dept of Emergency Medicine
Wayne State University School of Medicine
261 Mack Blvd
Detroit, MI 48201
Email:  [hidden email]
Tel: 313-993-8085
Fax: 313-966-7682


--- On Sun, 8/23/09, Hector Maletta <[hidden email]> wrote:

> From: Hector Maletta <[hidden email]>
> Subject: Re: Shift, Twist, Butterfly
> To: [hidden email]
> Date: Sunday, August 23, 2009, 7:27 PM
> Jimjohn,
> I do not know who told you such nonsense, but IMHO it's all
> wrong. PCA
> identifies underlying factors that "explain" the
> correlations between your
> observed variables. In other words, if you were to measure
> those underlying
> (actually non measurable) factors or components, the
> correlation between
> your observed variables, controlling for the underlying
> factors, would be
> zero or near zero. There is no direct link between each
> factor and one
> specific variable (such as the one you sketch) mandating
> the multiplication
> of variable 1 times component 1 + variable 2 * component 2,
> etc. Each
> component is linked to all variables (though it may be
> correlated more
> strongly with sone of them), and each variable is linked to
> all underlying
> components.
> If all your observed variables are closely correlated (e.g.
> if you have
> several independent indexes of the same psychological
> trait, as in several
> intelligence tests), the first component represents the
> most important
> common component that could be constructed to explain the
> greater part of
> the intercorrelations between observed variables.
> Controlling for that first
> component maximises the amount of variance explained by a
> single (non
> observed) variable. It leaves usually a portion of
> unexplained variance in
> observed variables. The second component is the best you
> can do to explain
> the remainded variance in observed variables, and the third
> and further
> components, in turn, explain the remaining variance not
> explained by
> previous factors. In the traditional interpretation going
> back to the early
> 20th Century study of cognitive ability by Spearmann, the
> first factors
> represented "general intelligence", and the other
> components may represent
> other independent factors affecting test scores, such as a
> general
> familiarity with test-taking situations, socioeconomic
> status, specific
> abilities linked with some specific tests, and what not.
> PCA is but one of various factor analysis techniques. I
> recommend you first
> read some elementary text on factor analysis, to get more
> acquainted with
> the nature, scope and limitations of PCA.
>
> Hector
>
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]]
> On Behalf Of
> jimjohn
> Sent: 23 August 2009 18:03
> To: [hidden email]
> Subject: PCA: Shift, Twist, Butterfly
>
> im just starting to understand how to conduct PCA, and am
> having trouble
> interpreting the results. I've been told that the first
> component represents
> a parallel shift, the second represents a twist, and the
> third represents a
> butterfly of my original data.
>
> from what i understand, if i have three components, then to
> come up with my
> new first factor I take the different components from the
> component matrix
> and do:
> Component1 * Value Original Variable 1 + Component 2 *
> Value Original
> Variable 2 + ...
>
> how does this translate into a parallel shift of my
> original data? any
> ideas? thanks so much in advance!
> --
> View this message in context:
>
http://www.nabble.com/PCA%3A-Shift%2C-Twist%2C-Butterfly-tp25108336p25108336

> .html
> Sent from the SPSSX Discussion mailing list archive at
> Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email]
> (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
> command
> INFO REFCARD
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release
> Date: 08/23/09
> 06:18:00
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email]
> (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
> command
> INFO REFCARD
>
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/23/09
18:03:00

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/24/09
06:05:00

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Shift, Twist, Butterfly

Hector Maletta
Done.
Hector

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Swank, Paul R
Sent: 24 August 2009 18:17
To: [hidden email]
Subject: Re: Shift, Twist, Butterfly

Well, I insist to the contrary so we will have to agree to disagree.

Dr. Paul R. Swank,
Professor and Director of Research
Children's Learning Institute
University of Texas Health Science Center-Houston

-----Original Message-----
From: Hector Maletta [mailto:[hidden email]]
Sent: Monday, August 24, 2009 6:03 PM
To: Swank, Paul R; [hidden email]
Subject: RE: Shift, Twist, Butterfly

I insist: PCA and other types of Factor Analysis rest on different
assumptions and may be used for different purposes, but all are essentially
the same statistical model and procedure. One assumes a "commonality" of 1,
the others a commonality < 1, i.e. some unique variance, but except for that
the rest is exactly the same. You can rotate PCA solutions just as you can
rotate other factor analyses: nothing hinders that. As with all statistical
procedures, you choose the variant that best suits your dataset and theory.

Moreover, the objective is not necessarily data reduction, in the sense of
getting a SMALLER number of (latent) variables instead of your original set
of (observed) variables. Your goal might be to replace your k observed
variables with k latent factors, with the goal for instance of getting the
information of your k (correlated) variables separated into (k) orthogonal
or uncorrelated factors.

One thing is the use to which one habitually puts these procedures, and
another thing altogether is the nature of the procedures.
Since all kinds of factor analysis create constructs that do not really
exist, what set of constructs you'd prefer would depend on the kind of
analytical problem you are facing, and the sort of theory you have about the
problem.
Hector

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Swank, Paul R
Sent: 24 August 2009 13:45
To: [hidden email]
Subject: Re: Shift, Twist, Butterfly

PC analysis has a fundamentally different purpose that does PA. PC
determines components of the measures that account for the maximum amount of
variance and are independent of one another. The purpose for such components
is data reduction. There is no rotation needed nor is there any concern for
what the underlying structure of the measures are. You just want a smaller
number of predictors that are independent of each other. PA, on the other
hand, is used to try and fathom the underlying structure of the variables.
Its purpose is different. You are searching for latent variables with PA
analyses. You are trying to partition the variance of the variables into the
three parts, common factor variance, specific factor variance, and error
variance. PCA does not do this. PA is much closer to CFA that PCA  is. I'd
look to the writings of Preacher and MacCallum on Tom Swift and his Electric
factor analysis machine for more detail.

Dr. Paul R. Swank,
Professor and Director of Research
Children's Learning Institute
University of Texas Health Science Center-Houston


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Hector Maletta
Sent: Sunday, August 23, 2009 8:04 PM
To: [hidden email]
Subject: Re: Shift, Twist, Butterfly

Scott,
Sorry but I insist in my view. Factor analysis is a general technique, with
many variants. One of the variants concerns the assumptions on commonality:
in PCA the whole variance is subjected to analysis, and therefore the sum of
factors' contributions explains 100% of the variance of all observed
variables, while other variants start with an estimate of the amount of
"common" variance and separate it from the rest of variance, usually
supposed to be due to idiosyncratic characteristics of each observed
variable, unrelated to other observed variables. In this case, the technique
computes factors explaining the common variance only, leaving the
idiosyncratic part unexplained (attributed to peculiarities of each observed
variable). This is hardly such a transcendental difference. From the
mathematical or computational point of view there is really no big deal of a
difference: the same mathematical procedure applies in both cases, the only
difference being in the values at the main diagonal of the intercorrelation
matrix: 1's in PCA, and the assumed commonalities in the other alternatives.

The purpose of the analysis, and the nature of the variables, would dictate
which alternative is more adequate for each analysis, but that hardly
impinges on the problem posed by Jimjohn: I limited myself to his case, in
which he is using PCA.
I do not agree either that PCA is not usable for locating latent factors.
That is precisely what factor analysis do, either under PCA hypotheses or
otherwise. It estimates one or more fictitious variables ('constructs') that
in case they existed would 'explain' (statistically speaking) the observed
correlation of observed variables. In other words, controlling for the
components or factors would reduce or eliminate the correlation between
observed variables. These factors or components are not objective things:
they are just mathematical constructs, and their values (and even their
correlations with observed variables, or 'loadings') would be altered by a
change in the frame of reference (i.e. by rotation). Taking just the major
components (in PCA) and disregarding the rest achieves precisely the kind of
data reduction you are talking about. That is precisely what Spearman did at
the beginning of the 20th Century to identify "general intelligence", in
fact just the first factor extracted in a PCA procedure applied to a number
of inter-correlated cognitive-ability tests. A similar procedure was used,
years later, by Thurstone to extract several factors (corresponding to
various different cognitive abilities such as linguistic, quantitative,
spatial, etc), for which Thurstone added rotation to the repertoire of
factor analysis in order to define the factors in such a way that each was
clearly more correlated with one different set of observed variables (this
neat attribute was, of course, dependent on the particular frame of
reference chosen, and was not a property of the factors themselves; the
factors continued to be just mathematical artifacts, not real things). Other
variants of FA (more exactly, EFA) were later introduced to deal with other
specific (computational) problems.
Eventually, confirmatory factor analysis emerged, in which only certain
pre-specified effects and factors were included. Some of the effects are
postulated to be zero, and the procedure is, at last, not very different
from Structural Equation Modeling. EFA always "succeeds": every data set can
be treated to EFA, and the results cannot be challenged. CFA, instead, may
be refuted if it fails to reproduce the observed data structure.
However different the intents and results, all these procedures are just
variants of the same basic procedure, factor analysis, which is in turn just
a corollary or application of the general linear model where ordinary least
square regression, ANOVA and other statistical workhorses also belong. Their
common feature is that all variables are supposed to be linearly related
with an error term, so that for any set of variables (Y, X, Z, ...) it is
postulated that Y=a + bX + cZ...+e, where "e" is a random error with zero
mean, and the coefficients (a, b, c...) are computed by minimizing the sum
of the squared errors (e2) over all cases.



-----Original Message-----
From: SR Millis [mailto:[hidden email]]
Sent: 23 August 2009 18:55
To: Hector Maletta; SPSS
Subject: Re: Shift, Twist, Butterfly

Hector,

I don't agree with you.  Exploratory factor analysis (EPA) is not the same
"thing" as principal component analysis (PCA).  PCA is not a type of EFA.
PCA and EFA are different statistical methods that are designed to achieve
different objectives (Bentler & Kano, 1990).

EFA is based on the common factor model that postulates that each item in a
battery of measured items is a linear function of one or more common factors
and one unique factor.  Common factors are latent (unobserved) factors.
Conversely, PCA doesn't differentiate between common and unique variance.
Hence, principal components are not latent variables---and it isn't
conceptually correct to equate them with common factors.  The goal of PCA is
data reduction, i.e., taking scores on a large set of measured items and
reducing the scores on a smaller set of composite variables.  In contrast,
the goal of EFA is to identify latent constructs, i.e., understanding the
structure of the correlations among the measured variables or items.

Some have argued that PCA and EFA produce similar results.  However, this is
not always the case.  Differences emerge are likely when communalities are
low (e.g., .40).


Scott R Millis, PhD, ABPP (CN,CL,RP), CStat, CSci
Professor & Director of Research
Dept of Physical Medicine & Rehabilitation
Dept of Emergency Medicine
Wayne State University School of Medicine
261 Mack Blvd
Detroit, MI 48201
Email:  [hidden email]
Tel: 313-993-8085
Fax: 313-966-7682


--- On Sun, 8/23/09, Hector Maletta <[hidden email]> wrote:

> From: Hector Maletta <[hidden email]>
> Subject: Re: Shift, Twist, Butterfly
> To: [hidden email]
> Date: Sunday, August 23, 2009, 7:27 PM
> Jimjohn,
> I do not know who told you such nonsense, but IMHO it's all
> wrong. PCA
> identifies underlying factors that "explain" the
> correlations between your
> observed variables. In other words, if you were to measure
> those underlying
> (actually non measurable) factors or components, the
> correlation between
> your observed variables, controlling for the underlying
> factors, would be
> zero or near zero. There is no direct link between each
> factor and one
> specific variable (such as the one you sketch) mandating
> the multiplication
> of variable 1 times component 1 + variable 2 * component 2,
> etc. Each
> component is linked to all variables (though it may be
> correlated more
> strongly with sone of them), and each variable is linked to
> all underlying
> components.
> If all your observed variables are closely correlated (e.g.
> if you have
> several independent indexes of the same psychological
> trait, as in several
> intelligence tests), the first component represents the
> most important
> common component that could be constructed to explain the
> greater part of
> the intercorrelations between observed variables.
> Controlling for that first
> component maximises the amount of variance explained by a
> single (non
> observed) variable. It leaves usually a portion of
> unexplained variance in
> observed variables. The second component is the best you
> can do to explain
> the remainded variance in observed variables, and the third
> and further
> components, in turn, explain the remaining variance not
> explained by
> previous factors. In the traditional interpretation going
> back to the early
> 20th Century study of cognitive ability by Spearmann, the
> first factors
> represented "general intelligence", and the other
> components may represent
> other independent factors affecting test scores, such as a
> general
> familiarity with test-taking situations, socioeconomic
> status, specific
> abilities linked with some specific tests, and what not.
> PCA is but one of various factor analysis techniques. I
> recommend you first
> read some elementary text on factor analysis, to get more
> acquainted with
> the nature, scope and limitations of PCA.
>
> Hector
>
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]]
> On Behalf Of
> jimjohn
> Sent: 23 August 2009 18:03
> To: [hidden email]
> Subject: PCA: Shift, Twist, Butterfly
>
> im just starting to understand how to conduct PCA, and am
> having trouble
> interpreting the results. I've been told that the first
> component represents
> a parallel shift, the second represents a twist, and the
> third represents a
> butterfly of my original data.
>
> from what i understand, if i have three components, then to
> come up with my
> new first factor I take the different components from the
> component matrix
> and do:
> Component1 * Value Original Variable 1 + Component 2 *
> Value Original
> Variable 2 + ...
>
> how does this translate into a parallel shift of my
> original data? any
> ideas? thanks so much in advance!
> --
> View this message in context:
>
http://www.nabble.com/PCA%3A-Shift%2C-Twist%2C-Butterfly-tp25108336p25108336

> .html
> Sent from the SPSSX Discussion mailing list archive at
> Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email]
> (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
> command
> INFO REFCARD
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release
> Date: 08/23/09
> 06:18:00
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email]
> (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
> command
> INFO REFCARD
>
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/23/09
18:03:00

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/24/09
06:05:00

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/24/09
12:55:00

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Shift, Twist, Butterfly

Hector Maletta

James,

I agree with you. However, Paul seems not to agree that we agree. So be it.

 

Hector

 


From: James C. Whanger [mailto:[hidden email]]
Sent: 24 August 2009 19:08
To: Hector Maletta
Cc: [hidden email]
Subject: Re: Shift, Twist, Butterfly

 

I think you should instead "agree to agree".  Hector described what PCA and FA have "in common" including similar mathematical underpinnings and how the two statistical procedures relate in a conceptual hierarchy.  Paul and Scott described how PCA and FA "are different" by pointing out that additional mathematical components are obtained from FA and each supports different logical inferences based on results and thus serve different purposes.  Both of these arguments are true and do not contradict each other.

  

On Mon, Aug 24, 2009 at 7:29 PM, Hector Maletta <[hidden email]> wrote:

Done.

Hector

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Swank, Paul R

Sent: 24 August 2009 18:17
To: [hidden email]
Subject: Re: Shift, Twist, Butterfly

Well, I insist to the contrary so we will have to agree to disagree.

Dr. Paul R. Swank,
Professor and Director of Research
Children's Learning Institute
University of Texas Health Science Center-Houston

-----Original Message-----
From: Hector Maletta [mailto:[hidden email]]
Sent: Monday, August 24, 2009 6:03 PM
To: Swank, Paul R; [hidden email]
Subject: RE: Shift, Twist, Butterfly

I insist: PCA and other types of Factor Analysis rest on different
assumptions and may be used for different purposes, but all are essentially
the same statistical model and procedure. One assumes a "commonality" of 1,
the others a commonality < 1, i.e. some unique variance, but except for that
the rest is exactly the same. You can rotate PCA solutions just as you can
rotate other factor analyses: nothing hinders that. As with all statistical
procedures, you choose the variant that best suits your dataset and theory.

Moreover, the objective is not necessarily data reduction, in the sense of
getting a SMALLER number of (latent) variables instead of your original set
of (observed) variables. Your goal might be to replace your k observed
variables with k latent factors, with the goal for instance of getting the
information of your k (correlated) variables separated into (k) orthogonal
or uncorrelated factors.

One thing is the use to which one habitually puts these procedures, and
another thing altogether is the nature of the procedures.
Since all kinds of factor analysis create constructs that do not really
exist, what set of constructs you'd prefer would depend on the kind of
analytical problem you are facing, and the sort of theory you have about the
problem.
Hector

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Swank, Paul R
Sent: 24 August 2009 13:45
To: [hidden email]
Subject: Re: Shift, Twist, Butterfly

PC analysis has a fundamentally different purpose that does PA. PC
determines components of the measures that account for the maximum amount of
variance and are independent of one another. The purpose for such components
is data reduction. There is no rotation needed nor is there any concern for
what the underlying structure of the measures are. You just want a smaller
number of predictors that are independent of each other. PA, on the other
hand, is used to try and fathom the underlying structure of the variables.
Its purpose is different. You are searching for latent variables with PA
analyses. You are trying to partition the variance of the variables into the
three parts, common factor variance, specific factor variance, and error
variance. PCA does not do this. PA is much closer to CFA that PCA  is. I'd
look to the writings of Preacher and MacCallum on Tom Swift and his Electric
factor analysis machine for more detail.

Dr. Paul R. Swank,
Professor and Director of Research
Children's Learning Institute
University of Texas Health Science Center-Houston


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Hector Maletta
Sent: Sunday, August 23, 2009 8:04 PM
To: [hidden email]
Subject: Re: Shift, Twist, Butterfly

Scott,
Sorry but I insist in my view. Factor analysis is a general technique, with
many variants. One of the variants concerns the assumptions on commonality:
in PCA the whole variance is subjected to analysis, and therefore the sum of
factors' contributions explains 100% of the variance of all observed
variables, while other variants start with an estimate of the amount of
"common" variance and separate it from the rest of variance, usually
supposed to be due to idiosyncratic characteristics of each observed
variable, unrelated to other observed variables. In this case, the technique
computes factors explaining the common variance only, leaving the
idiosyncratic part unexplained (attributed to peculiarities of each observed
variable). This is hardly such a transcendental difference. From the
mathematical or computational point of view there is really no big deal of a
difference: the same mathematical procedure applies in both cases, the only
difference being in the values at the main diagonal of the intercorrelation
matrix: 1's in PCA, and the assumed commonalities in the other alternatives.

The purpose of the analysis, and the nature of the variables, would dictate
which alternative is more adequate for each analysis, but that hardly
impinges on the problem posed by Jimjohn: I limited myself to his case, in
which he is using PCA.
I do not agree either that PCA is not usable for locating latent factors.
That is precisely what factor analysis do, either under PCA hypotheses or
otherwise. It estimates one or more fictitious variables ('constructs') that
in case they existed would 'explain' (statistically speaking) the observed
correlation of observed variables. In other words, controlling for the
components or factors would reduce or eliminate the correlation between
observed variables. These factors or components are not objective things:
they are just mathematical constructs, and their values (and even their
correlations with observed variables, or 'loadings') would be altered by a
change in the frame of reference (i.e. by rotation). Taking just the major
components (in PCA) and disregarding the rest achieves precisely the kind of
data reduction you are talking about. That is precisely what Spearman did at
the beginning of the 20th Century to identify "general intelligence", in
fact just the first factor extracted in a PCA procedure applied to a number
of inter-correlated cognitive-ability tests. A similar procedure was used,
years later, by Thurstone to extract several factors (corresponding to
various different cognitive abilities such as linguistic, quantitative,
spatial, etc), for which Thurstone added rotation to the repertoire of
factor analysis in order to define the factors in such a way that each was
clearly more correlated with one different set of observed variables (this
neat attribute was, of course, dependent on the particular frame of
reference chosen, and was not a property of the factors themselves; the
factors continued to be just mathematical artifacts, not real things). Other
variants of FA (more exactly, EFA) were later introduced to deal with other
specific (computational) problems.
Eventually, confirmatory factor analysis emerged, in which only certain
pre-specified effects and factors were included. Some of the effects are
postulated to be zero, and the procedure is, at last, not very different
from Structural Equation Modeling. EFA always "succeeds": every data set can
be treated to EFA, and the results cannot be challenged. CFA, instead, may
be refuted if it fails to reproduce the observed data structure.
However different the intents and results, all these procedures are just
variants of the same basic procedure, factor analysis, which is in turn just
a corollary or application of the general linear model where ordinary least
square regression, ANOVA and other statistical workhorses also belong. Their
common feature is that all variables are supposed to be linearly related
with an error term, so that for any set of variables (Y, X, Z, ...) it is
postulated that Y=a + bX + cZ...+e, where "e" is a random error with zero
mean, and the coefficients (a, b, c...) are computed by minimizing the sum
of the squared errors (e2) over all cases.



-----Original Message-----
From: SR Millis [mailto:[hidden email]]
Sent: 23 August 2009 18:55
To: Hector Maletta; SPSS
Subject: Re: Shift, Twist, Butterfly

Hector,

I don't agree with you.  Exploratory factor analysis (EPA) is not the same
"thing" as principal component analysis (PCA).  PCA is not a type of EFA.
PCA and EFA are different statistical methods that are designed to achieve
different objectives (Bentler & Kano, 1990).

EFA is based on the common factor model that postulates that each item in a
battery of measured items is a linear function of one or more common factors
and one unique factor.  Common factors are latent (unobserved) factors.
Conversely, PCA doesn't differentiate between common and unique variance.
Hence, principal components are not latent variables---and it isn't
conceptually correct to equate them with common factors.  The goal of PCA is
data reduction, i.e., taking scores on a large set of measured items and
reducing the scores on a smaller set of composite variables.  In contrast,
the goal of EFA is to identify latent constructs, i.e., understanding the
structure of the correlations among the measured variables or items.

Some have argued that PCA and EFA produce similar results.  However, this is
not always the case.  Differences emerge are likely when communalities are
low (e.g., .40).


Scott R Millis, PhD, ABPP (CN,CL,RP), CStat, CSci
Professor & Director of Research
Dept of Physical Medicine & Rehabilitation
Dept of Emergency Medicine
Wayne State University School of Medicine
261 Mack Blvd
Detroit, MI 48201
Email:  [hidden email]
Tel: 313-993-8085
Fax: 313-966-7682


--- On Sun, 8/23/09, Hector Maletta <[hidden email]> wrote:

> From: Hector Maletta <[hidden email]>
> Subject: Re: Shift, Twist, Butterfly
> To: [hidden email]
> Date: Sunday, August 23, 2009, 7:27 PM
> Jimjohn,
> I do not know who told you such nonsense, but IMHO it's all
> wrong. PCA
> identifies underlying factors that "explain" the
> correlations between your
> observed variables. In other words, if you were to measure
> those underlying
> (actually non measurable) factors or components, the
> correlation between
> your observed variables, controlling for the underlying
> factors, would be
> zero or near zero. There is no direct link between each
> factor and one
> specific variable (such as the one you sketch) mandating
> the multiplication
> of variable 1 times component 1 + variable 2 * component 2,
> etc. Each
> component is linked to all variables (though it may be
> correlated more
> strongly with sone of them), and each variable is linked to
> all underlying
> components.
> If all your observed variables are closely correlated (e.g.
> if you have
> several independent indexes of the same psychological
> trait, as in several
> intelligence tests), the first component represents the
> most important
> common component that could be constructed to explain the
> greater part of
> the intercorrelations between observed variables.
> Controlling for that first
> component maximises the amount of variance explained by a
> single (non
> observed) variable. It leaves usually a portion of
> unexplained variance in
> observed variables. The second component is the best you
> can do to explain
> the remainded variance in observed variables, and the third
> and further
> components, in turn, explain the remaining variance not
> explained by
> previous factors. In the traditional interpretation going
> back to the early
> 20th Century study of cognitive ability by Spearmann, the
> first factors
> represented "general intelligence", and the other
> components may represent
> other independent factors affecting test scores, such as a
> general
> familiarity with test-taking situations, socioeconomic
> status, specific
> abilities linked with some specific tests, and what not.
> PCA is but one of various factor analysis techniques. I
> recommend you first
> read some elementary text on factor analysis, to get more
> acquainted with
> the nature, scope and limitations of PCA.
>
> Hector
>
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]]
> On Behalf Of
> jimjohn
> Sent: 23 August 2009 18:03
> To: [hidden email]
> Subject: PCA: Shift, Twist, Butterfly
>
> im just starting to understand how to conduct PCA, and am
> having trouble
> interpreting the results. I've been told that the first
> component represents
> a parallel shift, the second represents a twist, and the
> third represents a
> butterfly of my original data.
>
> from what i understand, if i have three components, then to
> come up with my
> new first factor I take the different components from the
> component matrix
> and do:
> Component1 * Value Original Variable 1 + Component 2 *
> Value Original
> Variable 2 + ...
>
> how does this translate into a parallel shift of my
> original data? any
> ideas? thanks so much in advance!
> --
> View this message in context:
>
http://www.nabble.com/PCA%3A-Shift%2C-Twist%2C-Butterfly-tp25108336p25108336
> .html
> Sent from the SPSSX Discussion mailing list archive at
> Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email]
> (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
> command
> INFO REFCARD
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release
> Date: 08/23/09
> 06:18:00
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email]
> (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
> command
> INFO REFCARD
>
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/23/09
18:03:00

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/24/09
06:05:00

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/24/09

12:55:00


=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

 

No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/24/09 12:55:00

Reply | Threaded
Open this post in threaded view
|

Re: Shift, Twist, Butterfly

SR Millis-3
In reply to this post by Hector Maletta
Hector,

I agree that we must disagree: I'm with Paul on this issue.

Scott R Millis, PhD, ABPP (CN,CL,RP), CStat, CSci
Professor & Director of Research
Dept of Physical Medicine & Rehabilitation
Dept of Emergency Medicine
Wayne State University School of Medicine
261 Mack Blvd
Detroit, MI 48201
Email:  [hidden email]
Tel: 313-993-8085
Fax: 313-966-7682


--- On Mon, 8/24/09, Hector Maletta <[hidden email]> wrote:

> From: Hector Maletta <[hidden email]>
> Subject: Re: Shift, Twist, Butterfly
> To: [hidden email]
> Date: Monday, August 24, 2009, 7:29 PM
> Done.
> Hector
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]]
> On Behalf Of
> Swank, Paul R
> Sent: 24 August 2009 18:17
> To: [hidden email]
> Subject: Re: Shift, Twist, Butterfly
>
> Well, I insist to the contrary so we will have to agree to
> disagree.
>
> Dr. Paul R. Swank,
> Professor and Director of Research
> Children's Learning Institute
> University of Texas Health Science Center-Houston
>
> -----Original Message-----
> From: Hector Maletta [mailto:[hidden email]]
> Sent: Monday, August 24, 2009 6:03 PM
> To: Swank, Paul R; [hidden email]
> Subject: RE: Shift, Twist, Butterfly
>
> I insist: PCA and other types of Factor Analysis rest on
> different
> assumptions and may be used for different purposes, but all
> are essentially
> the same statistical model and procedure. One assumes a
> "commonality" of 1,
> the others a commonality < 1, i.e. some unique variance,
> but except for that
> the rest is exactly the same. You can rotate PCA solutions
> just as you can
> rotate other factor analyses: nothing hinders that. As with
> all statistical
> procedures, you choose the variant that best suits your
> dataset and theory.
>
> Moreover, the objective is not necessarily data reduction,
> in the sense of
> getting a SMALLER number of (latent) variables instead of
> your original set
> of (observed) variables. Your goal might be to replace your
> k observed
> variables with k latent factors, with the goal for instance
> of getting the
> information of your k (correlated) variables separated into
> (k) orthogonal
> or uncorrelated factors.
>
> One thing is the use to which one habitually puts these
> procedures, and
> another thing altogether is the nature of the procedures.
> Since all kinds of factor analysis create constructs that
> do not really
> exist, what set of constructs you'd prefer would depend on
> the kind of
> analytical problem you are facing, and the sort of theory
> you have about the
> problem.
> Hector
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]]
> On Behalf Of
> Swank, Paul R
> Sent: 24 August 2009 13:45
> To: [hidden email]
> Subject: Re: Shift, Twist, Butterfly
>
> PC analysis has a fundamentally different purpose that does
> PA. PC
> determines components of the measures that account for the
> maximum amount of
> variance and are independent of one another. The purpose
> for such components
> is data reduction. There is no rotation needed nor is there
> any concern for
> what the underlying structure of the measures are. You just
> want a smaller
> number of predictors that are independent of each other.
> PA, on the other
> hand, is used to try and fathom the underlying structure of
> the variables.
> Its purpose is different. You are searching for latent
> variables with PA
> analyses. You are trying to partition the variance of the
> variables into the
> three parts, common factor variance, specific factor
> variance, and error
> variance. PCA does not do this. PA is much closer to CFA
> that PCA  is. I'd
> look to the writings of Preacher and MacCallum on Tom Swift
> and his Electric
> factor analysis machine for more detail.
>
> Dr. Paul R. Swank,
> Professor and Director of Research
> Children's Learning Institute
> University of Texas Health Science Center-Houston
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]]
> On Behalf Of
> Hector Maletta
> Sent: Sunday, August 23, 2009 8:04 PM
> To: [hidden email]
> Subject: Re: Shift, Twist, Butterfly
>
> Scott,
> Sorry but I insist in my view. Factor analysis is a general
> technique, with
> many variants. One of the variants concerns the assumptions
> on commonality:
> in PCA the whole variance is subjected to analysis, and
> therefore the sum of
> factors' contributions explains 100% of the variance of all
> observed
> variables, while other variants start with an estimate of
> the amount of
> "common" variance and separate it from the rest of
> variance, usually
> supposed to be due to idiosyncratic characteristics of each
> observed
> variable, unrelated to other observed variables. In this
> case, the technique
> computes factors explaining the common variance only,
> leaving the
> idiosyncratic part unexplained (attributed to peculiarities
> of each observed
> variable). This is hardly such a transcendental difference.
> From the
> mathematical or computational point of view there is really
> no big deal of a
> difference: the same mathematical procedure applies in both
> cases, the only
> difference being in the values at the main diagonal of the
> intercorrelation
> matrix: 1's in PCA, and the assumed commonalities in the
> other alternatives.
>
> The purpose of the analysis, and the nature of the
> variables, would dictate
> which alternative is more adequate for each analysis, but
> that hardly
> impinges on the problem posed by Jimjohn: I limited myself
> to his case, in
> which he is using PCA.
> I do not agree either that PCA is not usable for locating
> latent factors.
> That is precisely what factor analysis do, either under PCA
> hypotheses or
> otherwise. It estimates one or more fictitious variables
> ('constructs') that
> in case they existed would 'explain' (statistically
> speaking) the observed
> correlation of observed variables. In other words,
> controlling for the
> components or factors would reduce or eliminate the
> correlation between
> observed variables. These factors or components are not
> objective things:
> they are just mathematical constructs, and their values
> (and even their
> correlations with observed variables, or 'loadings') would
> be altered by a
> change in the frame of reference (i.e. by rotation). Taking
> just the major
> components (in PCA) and disregarding the rest achieves
> precisely the kind of
> data reduction you are talking about. That is precisely
> what Spearman did at
> the beginning of the 20th Century to identify "general
> intelligence", in
> fact just the first factor extracted in a PCA procedure
> applied to a number
> of inter-correlated cognitive-ability tests. A similar
> procedure was used,
> years later, by Thurstone to extract several factors
> (corresponding to
> various different cognitive abilities such as linguistic,
> quantitative,
> spatial, etc), for which Thurstone added rotation to the
> repertoire of
> factor analysis in order to define the factors in such a
> way that each was
> clearly more correlated with one different set of observed
> variables (this
> neat attribute was, of course, dependent on the particular
> frame of
> reference chosen, and was not a property of the factors
> themselves; the
> factors continued to be just mathematical artifacts, not
> real things). Other
> variants of FA (more exactly, EFA) were later introduced to
> deal with other
> specific (computational) problems.
> Eventually, confirmatory factor analysis emerged, in which
> only certain
> pre-specified effects and factors were included. Some of
> the effects are
> postulated to be zero, and the procedure is, at last, not
> very different
> from Structural Equation Modeling. EFA always "succeeds":
> every data set can
> be treated to EFA, and the results cannot be challenged.
> CFA, instead, may
> be refuted if it fails to reproduce the observed data
> structure.
> However different the intents and results, all these
> procedures are just
> variants of the same basic procedure, factor analysis,
> which is in turn just
> a corollary or application of the general linear model
> where ordinary least
> square regression, ANOVA and other statistical workhorses
> also belong. Their
> common feature is that all variables are supposed to be
> linearly related
> with an error term, so that for any set of variables (Y, X,
> Z, ...) it is
> postulated that Y=a + bX + cZ...+e, where "e" is a random
> error with zero
> mean, and the coefficients (a, b, c...) are computed by
> minimizing the sum
> of the squared errors (e2) over all cases.
>
>
>
> -----Original Message-----
> From: SR Millis [mailto:[hidden email]]
> Sent: 23 August 2009 18:55
> To: Hector Maletta; SPSS
> Subject: Re: Shift, Twist, Butterfly
>
> Hector,
>
> I don't agree with you.  Exploratory factor analysis
> (EPA) is not the same
> "thing" as principal component analysis (PCA).  PCA is
> not a type of EFA.
> PCA and EFA are different statistical methods that are
> designed to achieve
> different objectives (Bentler & Kano, 1990).
>
> EFA is based on the common factor model that postulates
> that each item in a
> battery of measured items is a linear function of one or
> more common factors
> and one unique factor.  Common factors are latent
> (unobserved) factors.
> Conversely, PCA doesn't differentiate between common and
> unique variance.
> Hence, principal components are not latent variables---and
> it isn't
> conceptually correct to equate them with common
> factors.  The goal of PCA is
> data reduction, i.e., taking scores on a large set of
> measured items and
> reducing the scores on a smaller set of composite
> variables.  In contrast,
> the goal of EFA is to identify latent constructs, i.e.,
> understanding the
> structure of the correlations among the measured variables
> or items.
>
> Some have argued that PCA and EFA produce similar
> results.  However, this is
> not always the case.  Differences emerge are likely
> when communalities are
> low (e.g., .40).
>
>
> Scott R Millis, PhD, ABPP (CN,CL,RP), CStat, CSci
> Professor & Director of Research
> Dept of Physical Medicine & Rehabilitation
> Dept of Emergency Medicine
> Wayne State University School of Medicine
> 261 Mack Blvd
> Detroit, MI 48201
> Email:  [hidden email]
> Tel: 313-993-8085
> Fax: 313-966-7682
>
>
> --- On Sun, 8/23/09, Hector Maletta <[hidden email]>
> wrote:
>
> > From: Hector Maletta <[hidden email]>
> > Subject: Re: Shift, Twist, Butterfly
> > To: [hidden email]
> > Date: Sunday, August 23, 2009, 7:27 PM
> > Jimjohn,
> > I do not know who told you such nonsense, but IMHO
> it's all
> > wrong. PCA
> > identifies underlying factors that "explain" the
> > correlations between your
> > observed variables. In other words, if you were to
> measure
> > those underlying
> > (actually non measurable) factors or components, the
> > correlation between
> > your observed variables, controlling for the
> underlying
> > factors, would be
> > zero or near zero. There is no direct link between
> each
> > factor and one
> > specific variable (such as the one you sketch)
> mandating
> > the multiplication
> > of variable 1 times component 1 + variable 2 *
> component 2,
> > etc. Each
> > component is linked to all variables (though it may
> be
> > correlated more
> > strongly with sone of them), and each variable is
> linked to
> > all underlying
> > components.
> > If all your observed variables are closely correlated
> (e.g.
> > if you have
> > several independent indexes of the same psychological
> > trait, as in several
> > intelligence tests), the first component represents
> the
> > most important
> > common component that could be constructed to explain
> the
> > greater part of
> > the intercorrelations between observed variables.
> > Controlling for that first
> > component maximises the amount of variance explained
> by a
> > single (non
> > observed) variable. It leaves usually a portion of
> > unexplained variance in
> > observed variables. The second component is the best
> you
> > can do to explain
> > the remainded variance in observed variables, and the
> third
> > and further
> > components, in turn, explain the remaining variance
> not
> > explained by
> > previous factors. In the traditional interpretation
> going
> > back to the early
> > 20th Century study of cognitive ability by Spearmann,
> the
> > first factors
> > represented "general intelligence", and the other
> > components may represent
> > other independent factors affecting test scores, such
> as a
> > general
> > familiarity with test-taking situations,
> socioeconomic
> > status, specific
> > abilities linked with some specific tests, and what
> not.
> > PCA is but one of various factor analysis techniques.
> I
> > recommend you first
> > read some elementary text on factor analysis, to get
> more
> > acquainted with
> > the nature, scope and limitations of PCA.
> >
> > Hector
> >
> >
> >
> > -----Original Message-----
> > From: SPSSX(r) Discussion [mailto:[hidden email]]
> > On Behalf Of
> > jimjohn
> > Sent: 23 August 2009 18:03
> > To: [hidden email]
> > Subject: PCA: Shift, Twist, Butterfly
> >
> > im just starting to understand how to conduct PCA, and
> am
> > having trouble
> > interpreting the results. I've been told that the
> first
> > component represents
> > a parallel shift, the second represents a twist, and
> the
> > third represents a
> > butterfly of my original data.
> >
> > from what i understand, if i have three components,
> then to
> > come up with my
> > new first factor I take the different components from
> the
> > component matrix
> > and do:
> > Component1 * Value Original Variable 1 + Component 2
> *
> > Value Original
> > Variable 2 + ...
> >
> > how does this translate into a parallel shift of my
> > original data? any
> > ideas? thanks so much in advance!
> > --
> > View this message in context:
> >
> http://www.nabble.com/PCA%3A-Shift%2C-Twist%2C-Butterfly-tp25108336p25108336
> > .html
> > Sent from the SPSSX Discussion mailing list archive
> at
> > Nabble.com.
> >
> > =====================
> > To manage your subscription to SPSSX-L, send a message
> to
> > [hidden email]
> > (not to SPSSX-L), with no body text except the
> > command. To leave the list, send the command
> > SIGNOFF SPSSX-L
> > For a list of commands to manage subscriptions, send
> the
> > command
> > INFO REFCARD
> > No virus found in this incoming message.
> > Checked by AVG - www.avg.com
> > Version: 8.5.409 / Virus Database: 270.13.64/2321 -
> Release
> > Date: 08/23/09
> > 06:18:00
> >
> > =====================
> > To manage your subscription to SPSSX-L, send a message
> to
> > [hidden email]
> > (not to SPSSX-L), with no body text except the
> > command. To leave the list, send the command
> > SIGNOFF SPSSX-L
> > For a list of commands to manage subscriptions, send
> the
> > command
> > INFO REFCARD
> >
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release
> Date: 08/23/09
> 18:03:00
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email]
> (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
> command
> INFO REFCARD
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email]
> (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
> command
> INFO REFCARD
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release
> Date: 08/24/09
> 06:05:00
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email]
> (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
> command
> INFO REFCARD
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release
> Date: 08/24/09
> 12:55:00
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email]
> (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the
> command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Shift, Twist, Butterfly

jimjohn
In reply to this post by Hector Maletta
Thanks again Hector! I have a problem with SPSS PCA.
 From waht I understand, I can relate the variables to the factors by
using the component matrix but I don't get anything close to the
actual variables when I solve for this equation using the component
matrix and the factors SPSS saves.

I'm thinking this is probably because SPSS standardizes the factors
and I should be looking at the unstandardized factors. Can anyone
confirm if this is so? If so, is there any way I can get SPSS to
outpout the unstandardized factors instead?

Thanks in advance!



Quoting Hector Maletta <[hidden email]>:

> Jimjohn,
> I do not know who told you such nonsense, but IMHO it's all wrong. PCA
> identifies underlying factors that "explain" the correlations between your
> observed variables. In other words, if you were to measure those underlying
> (actually non measurable) factors or components, the correlation between
> your observed variables, controlling for the underlying factors, would be
> zero or near zero. There is no direct link between each factor and one
> specific variable (such as the one you sketch) mandating the multiplication
> of variable 1 times component 1 + variable 2 * component 2, etc. Each
> component is linked to all variables (though it may be correlated more
> strongly with sone of them), and each variable is linked to all underlying
> components.
> If all your observed variables are closely correlated (e.g. if you have
> several independent indexes of the same psychological trait, as in several
> intelligence tests), the first component represents the most important
> common component that could be constructed to explain the greater part of
> the intercorrelations between observed variables. Controlling for that first
> component maximises the amount of variance explained by a single (non
> observed) variable. It leaves usually a portion of unexplained variance in
> observed variables. The second component is the best you can do to explain
> the remainded variance in observed variables, and the third and further
> components, in turn, explain the remaining variance not explained by
> previous factors. In the traditional interpretation going back to the early
> 20th Century study of cognitive ability by Spearmann, the first factors
> represented "general intelligence", and the other components may represent
> other independent factors affecting test scores, such as a general
> familiarity with test-taking situations, socioeconomic status, specific
> abilities linked with some specific tests, and what not.
> PCA is but one of various factor analysis techniques. I recommend you first
> read some elementary text on factor analysis, to get more acquainted with
> the nature, scope and limitations of PCA.
>
> Hector
>
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> jimjohn
> Sent: 23 August 2009 18:03
> To: [hidden email]
> Subject: PCA: Shift, Twist, Butterfly
>
> im just starting to understand how to conduct PCA, and am having trouble
> interpreting the results. I've been told that the first component represents
> a parallel shift, the second represents a twist, and the third represents a
> butterfly of my original data.
>
> from what i understand, if i have three components, then to come up with my
> new first factor I take the different components from the component matrix
> and do:
> Component1 * Value Original Variable 1 + Component 2 * Value Original
> Variable 2 + ...
>
> how does this translate into a parallel shift of my original data? any
> ideas? thanks so much in advance!
> --
> View this message in context:
> http://www.nabble.com/PCA%3A-Shift%2C-Twist%2C-Butterfly-tp25108336p25108336
> .html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/23/09
> 06:18:00
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Shift, Twist, Butterfly

Hector Maletta
Azam,
The factor scores are standardized (z scores with zero mean and unit
standard deviation), and their are expressed as a function of STANDARDIZED
observed variables, i.e. the observed variables are converted into z-scores
and then multiplied by the component score coefficients in order to compute
component scores.
The same is valid in the reverse case: the (standardized) factor scores may
be used to estimate the predicted (STANDARDIZED) value of an observed
variable. To convert these standardized values of observed variables into
the original observed variables, add the mean and multiply by the standard
deviation of the original observed variables.
There is no such thing as an unstandardized factor score or component score,
because such (unobserved) variables do not have an objective unit of
measurement. Their measurement is just the z-score, i.e. their distance from
their own mean, measured in units of their own standard deviation.
Hector
-----Original Message-----
From: [hidden email] [mailto:[hidden email]]
Sent: 26 August 2009 10:05
To: Hector Maletta
Cc: [hidden email]
Subject: RE: Shift, Twist, Butterfly

Thanks again Hector! I have a problem with SPSS PCA.
 From waht I understand, I can relate the variables to the factors by
using the component matrix but I don't get anything close to the
actual variables when I solve for this equation using the component
matrix and the factors SPSS saves.

I'm thinking this is probably because SPSS standardizes the factors
and I should be looking at the unstandardized factors. Can anyone
confirm if this is so? If so, is there any way I can get SPSS to
outpout the unstandardized factors instead?

Thanks in advance!



Quoting Hector Maletta <[hidden email]>:

> Jimjohn,
> I do not know who told you such nonsense, but IMHO it's all wrong. PCA
> identifies underlying factors that "explain" the correlations between your
> observed variables. In other words, if you were to measure those
underlying
> (actually non measurable) factors or components, the correlation between
> your observed variables, controlling for the underlying factors, would be
> zero or near zero. There is no direct link between each factor and one
> specific variable (such as the one you sketch) mandating the
multiplication
> of variable 1 times component 1 + variable 2 * component 2, etc. Each
> component is linked to all variables (though it may be correlated more
> strongly with sone of them), and each variable is linked to all underlying
> components.
> If all your observed variables are closely correlated (e.g. if you have
> several independent indexes of the same psychological trait, as in several
> intelligence tests), the first component represents the most important
> common component that could be constructed to explain the greater part of
> the intercorrelations between observed variables. Controlling for that
first
> component maximises the amount of variance explained by a single (non
> observed) variable. It leaves usually a portion of unexplained variance in
> observed variables. The second component is the best you can do to explain
> the remainded variance in observed variables, and the third and further
> components, in turn, explain the remaining variance not explained by
> previous factors. In the traditional interpretation going back to the
early
> 20th Century study of cognitive ability by Spearmann, the first factors
> represented "general intelligence", and the other components may represent
> other independent factors affecting test scores, such as a general
> familiarity with test-taking situations, socioeconomic status, specific
> abilities linked with some specific tests, and what not.
> PCA is but one of various factor analysis techniques. I recommend you
first

> read some elementary text on factor analysis, to get more acquainted with
> the nature, scope and limitations of PCA.
>
> Hector
>
>
>
> -----Original Message-----
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
> jimjohn
> Sent: 23 August 2009 18:03
> To: [hidden email]
> Subject: PCA: Shift, Twist, Butterfly
>
> im just starting to understand how to conduct PCA, and am having trouble
> interpreting the results. I've been told that the first component
represents
> a parallel shift, the second represents a twist, and the third represents
a
> butterfly of my original data.
>
> from what i understand, if i have three components, then to come up with
my

> new first factor I take the different components from the component matrix
> and do:
> Component1 * Value Original Variable 1 + Component 2 * Value Original
> Variable 2 + ...
>
> how does this translate into a parallel shift of my original data? any
> ideas? thanks so much in advance!
> --
> View this message in context:
>
http://www.nabble.com/PCA%3A-Shift%2C-Twist%2C-Butterfly-tp25108336p25108336

> .html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/23/09
> 06:18:00
>
>



No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.409 / Virus Database: 270.13.67/2326 - Release Date: 08/25/09
18:07:00

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Shift, Twist, Butterfly

jimjohn
thanks again hector, really appreciate your help! can i bother you
with one more question?

basically i have thirteen variables which each represent the weekly
change in interest rates for different maturities. (so one variable
represents the weekly change in the one month rate, another variable
represents the weekly change in the 2 year rate, etc.)

im trying to use PCA to shorten the amount of correlated variables.
usually, what is expected when PCA is run on these interest rates (or
so im told) is that the first three factors represent 99% of the
variability in these changes. the first factor usually represents a
parallel shift. so if i plot each of my original itnerest rate
variables, it looks upward sloping because the longer the maturity,
the higher the interest rate is. the changes in the shape of this plot
over time will be explained by these three factors. Factor 1 should be
a parallel shift, Factor 2 should be a twist where the shorter
maturity rates are higher and the longer maturity rates are lower.

basically now taht ive run PCA on the change in rates for each
variable, i want to somehow convert my factors back into the original
variables and compare them with how my original variables are to see
if these parallel and twists actually happen. and that way i wuold
know how much the parallel shift is too, so i could explain that on
average, the rates shift parallel by x% and this explains 75% of the
variability.

if i just look at my component matrix, and i plot the components vs
the variables, i do see that component 1 looks parallel and component
2 looks like a twist.

the problem is i dont know how to convert the data back so that i can
compare it with my actual original interest rate variables and see
that factor 1 results in parallel changes to the variables, etc.

at first i was thinking i could use the formula Variable1 = Component
1 * Factor 1 + Component 2 * Factor 2 +..., and I could just isolate
Component 1 * Factor 1 and look at the mean of that distribution,
compare it with the mean of my original variable and that would tell
me how much the parallel shift. but since my results are always
standardized, this mean of the distribution would always be 0.


hope this makes sense, i know this is kind of all over the place. any
ideas? thanks so much!


Quoting Hector Maletta <[hidden email]>:

> Azam,
> The factor scores are standardized (z scores with zero mean and unit
> standard deviation), and their are expressed as a function of STANDARDIZED
> observed variables, i.e. the observed variables are converted into z-scores
> and then multiplied by the component score coefficients in order to compute
> component scores.
> The same is valid in the reverse case: the (standardized) factor scores may
> be used to estimate the predicted (STANDARDIZED) value of an observed
> variable. To convert these standardized values of observed variables into
> the original observed variables, add the mean and multiply by the standard
> deviation of the original observed variables.
> There is no such thing as an unstandardized factor score or component score,
> because such (unobserved) variables do not have an objective unit of
> measurement. Their measurement is just the z-score, i.e. their distance from
> their own mean, measured in units of their own standard deviation.
> Hector
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]]
> Sent: 26 August 2009 10:05
> To: Hector Maletta
> Cc: [hidden email]
> Subject: RE: Shift, Twist, Butterfly
>
> Thanks again Hector! I have a problem with SPSS PCA.
>  From waht I understand, I can relate the variables to the factors by
> using the component matrix but I don't get anything close to the
> actual variables when I solve for this equation using the component
> matrix and the factors SPSS saves.
>
> I'm thinking this is probably because SPSS standardizes the factors
> and I should be looking at the unstandardized factors. Can anyone
> confirm if this is so? If so, is there any way I can get SPSS to
> outpout the unstandardized factors instead?
>
> Thanks in advance!
>
>
>
> Quoting Hector Maletta <[hidden email]>:
>
>> Jimjohn,
>> I do not know who told you such nonsense, but IMHO it's all wrong. PCA
>> identifies underlying factors that "explain" the correlations between your
>> observed variables. In other words, if you were to measure those
> underlying
>> (actually non measurable) factors or components, the correlation between
>> your observed variables, controlling for the underlying factors, would be
>> zero or near zero. There is no direct link between each factor and one
>> specific variable (such as the one you sketch) mandating the
> multiplication
>> of variable 1 times component 1 + variable 2 * component 2, etc. Each
>> component is linked to all variables (though it may be correlated more
>> strongly with sone of them), and each variable is linked to all underlying
>> components.
>> If all your observed variables are closely correlated (e.g. if you have
>> several independent indexes of the same psychological trait, as in several
>> intelligence tests), the first component represents the most important
>> common component that could be constructed to explain the greater part of
>> the intercorrelations between observed variables. Controlling for that
> first
>> component maximises the amount of variance explained by a single (non
>> observed) variable. It leaves usually a portion of unexplained variance in
>> observed variables. The second component is the best you can do to explain
>> the remainded variance in observed variables, and the third and further
>> components, in turn, explain the remaining variance not explained by
>> previous factors. In the traditional interpretation going back to the
> early
>> 20th Century study of cognitive ability by Spearmann, the first factors
>> represented "general intelligence", and the other components may represent
>> other independent factors affecting test scores, such as a general
>> familiarity with test-taking situations, socioeconomic status, specific
>> abilities linked with some specific tests, and what not.
>> PCA is but one of various factor analysis techniques. I recommend you
> first
>> read some elementary text on factor analysis, to get more acquainted with
>> the nature, scope and limitations of PCA.
>>
>> Hector
>>
>>
>>
>> -----Original Message-----
>> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
>> jimjohn
>> Sent: 23 August 2009 18:03
>> To: [hidden email]
>> Subject: PCA: Shift, Twist, Butterfly
>>
>> im just starting to understand how to conduct PCA, and am having trouble
>> interpreting the results. I've been told that the first component
> represents
>> a parallel shift, the second represents a twist, and the third represents
> a
>> butterfly of my original data.
>>
>> from what i understand, if i have three components, then to come up with
> my
>> new first factor I take the different components from the component matrix
>> and do:
>> Component1 * Value Original Variable 1 + Component 2 * Value Original
>> Variable 2 + ...
>>
>> how does this translate into a parallel shift of my original data? any
>> ideas? thanks so much in advance!
>> --
>> View this message in context:
>>
> http://www.nabble.com/PCA%3A-Shift%2C-Twist%2C-Butterfly-tp25108336p25108336
>> .html
>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>> No virus found in this incoming message.
>> Checked by AVG - www.avg.com
>> Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/23/09
>> 06:18:00
>>
>>
>
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.409 / Virus Database: 270.13.67/2326 - Release Date: 08/25/09
> 18:07:00
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Shift, Twist, Butterfly

Hector Maletta
It makes sense. You only must convert the standardized results of your
equation into non-standardized. The standardized results are z=(x -
mean)/SD, and therefore the original variables are x= SD * z + mean. Use the
mean and SD of the original variables to effect this conversion.

Hector

-----Original Message-----
From: [hidden email] [mailto:[hidden email]]
Sent: 26 August 2009 19:25
To: Hector Maletta
Cc: [hidden email]
Subject: RE: Shift, Twist, Butterfly

thanks again hector, really appreciate your help! can i bother you
with one more question?

basically i have thirteen variables which each represent the weekly
change in interest rates for different maturities. (so one variable
represents the weekly change in the one month rate, another variable
represents the weekly change in the 2 year rate, etc.)

im trying to use PCA to shorten the amount of correlated variables.
usually, what is expected when PCA is run on these interest rates (or
so im told) is that the first three factors represent 99% of the
variability in these changes. the first factor usually represents a
parallel shift. so if i plot each of my original itnerest rate
variables, it looks upward sloping because the longer the maturity,
the higher the interest rate is. the changes in the shape of this plot
over time will be explained by these three factors. Factor 1 should be
a parallel shift, Factor 2 should be a twist where the shorter
maturity rates are higher and the longer maturity rates are lower.

basically now taht ive run PCA on the change in rates for each
variable, i want to somehow convert my factors back into the original
variables and compare them with how my original variables are to see
if these parallel and twists actually happen. and that way i wuold
know how much the parallel shift is too, so i could explain that on
average, the rates shift parallel by x% and this explains 75% of the
variability.

if i just look at my component matrix, and i plot the components vs
the variables, i do see that component 1 looks parallel and component
2 looks like a twist.

the problem is i dont know how to convert the data back so that i can
compare it with my actual original interest rate variables and see
that factor 1 results in parallel changes to the variables, etc.

at first i was thinking i could use the formula Variable1 = Component
1 * Factor 1 + Component 2 * Factor 2 +..., and I could just isolate
Component 1 * Factor 1 and look at the mean of that distribution,
compare it with the mean of my original variable and that would tell
me how much the parallel shift. but since my results are always
standardized, this mean of the distribution would always be 0.


hope this makes sense, i know this is kind of all over the place. any
ideas? thanks so much!


Quoting Hector Maletta <[hidden email]>:

> Azam,
> The factor scores are standardized (z scores with zero mean and unit
> standard deviation), and their are expressed as a function of STANDARDIZED
> observed variables, i.e. the observed variables are converted into
z-scores
> and then multiplied by the component score coefficients in order to
compute
> component scores.
> The same is valid in the reverse case: the (standardized) factor scores
may
> be used to estimate the predicted (STANDARDIZED) value of an observed
> variable. To convert these standardized values of observed variables into
> the original observed variables, add the mean and multiply by the standard
> deviation of the original observed variables.
> There is no such thing as an unstandardized factor score or component
score,
> because such (unobserved) variables do not have an objective unit of
> measurement. Their measurement is just the z-score, i.e. their distance
from

> their own mean, measured in units of their own standard deviation.
> Hector
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]]
> Sent: 26 August 2009 10:05
> To: Hector Maletta
> Cc: [hidden email]
> Subject: RE: Shift, Twist, Butterfly
>
> Thanks again Hector! I have a problem with SPSS PCA.
>  From waht I understand, I can relate the variables to the factors by
> using the component matrix but I don't get anything close to the
> actual variables when I solve for this equation using the component
> matrix and the factors SPSS saves.
>
> I'm thinking this is probably because SPSS standardizes the factors
> and I should be looking at the unstandardized factors. Can anyone
> confirm if this is so? If so, is there any way I can get SPSS to
> outpout the unstandardized factors instead?
>
> Thanks in advance!
>
>
>
> Quoting Hector Maletta <[hidden email]>:
>
>> Jimjohn,
>> I do not know who told you such nonsense, but IMHO it's all wrong. PCA
>> identifies underlying factors that "explain" the correlations between
your

>> observed variables. In other words, if you were to measure those
> underlying
>> (actually non measurable) factors or components, the correlation between
>> your observed variables, controlling for the underlying factors, would be
>> zero or near zero. There is no direct link between each factor and one
>> specific variable (such as the one you sketch) mandating the
> multiplication
>> of variable 1 times component 1 + variable 2 * component 2, etc. Each
>> component is linked to all variables (though it may be correlated more
>> strongly with sone of them), and each variable is linked to all
underlying
>> components.
>> If all your observed variables are closely correlated (e.g. if you have
>> several independent indexes of the same psychological trait, as in
several
>> intelligence tests), the first component represents the most important
>> common component that could be constructed to explain the greater part of
>> the intercorrelations between observed variables. Controlling for that
> first
>> component maximises the amount of variance explained by a single (non
>> observed) variable. It leaves usually a portion of unexplained variance
in
>> observed variables. The second component is the best you can do to
explain
>> the remainded variance in observed variables, and the third and further
>> components, in turn, explain the remaining variance not explained by
>> previous factors. In the traditional interpretation going back to the
> early
>> 20th Century study of cognitive ability by Spearmann, the first factors
>> represented "general intelligence", and the other components may
represent

>> other independent factors affecting test scores, such as a general
>> familiarity with test-taking situations, socioeconomic status, specific
>> abilities linked with some specific tests, and what not.
>> PCA is but one of various factor analysis techniques. I recommend you
> first
>> read some elementary text on factor analysis, to get more acquainted with
>> the nature, scope and limitations of PCA.
>>
>> Hector
>>
>>
>>
>> -----Original Message-----
>> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
>> jimjohn
>> Sent: 23 August 2009 18:03
>> To: [hidden email]
>> Subject: PCA: Shift, Twist, Butterfly
>>
>> im just starting to understand how to conduct PCA, and am having trouble
>> interpreting the results. I've been told that the first component
> represents
>> a parallel shift, the second represents a twist, and the third represents
> a
>> butterfly of my original data.
>>
>> from what i understand, if i have three components, then to come up with
> my
>> new first factor I take the different components from the component
matrix

>> and do:
>> Component1 * Value Original Variable 1 + Component 2 * Value Original
>> Variable 2 + ...
>>
>> how does this translate into a parallel shift of my original data? any
>> ideas? thanks so much in advance!
>> --
>> View this message in context:
>>
>
http://www.nabble.com/PCA%3A-Shift%2C-Twist%2C-Butterfly-tp25108336p25108336

>> .html
>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>> No virus found in this incoming message.
>> Checked by AVG - www.avg.com
>> Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date:
08/23/09

>> 06:18:00
>>
>>
>
>
>
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 8.5.409 / Virus Database: 270.13.67/2326 - Release Date: 08/25/09
> 18:07:00
>
>



No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 8.5.409 / Virus Database: 270.13.67/2326 - Release Date: 08/26/09
12:16:00

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD