|
im just starting to understand how to conduct PCA, and am having trouble interpreting the results. I've been told that the first component represents a parallel shift, the second represents a twist, and the third represents a butterfly of my original data.
from what i understand, if i have three components, then to come up with my new first factor I take the different components from the component matrix and do: Component1 * Value Original Variable 1 + Component 2 * Value Original Variable 2 + ... how does this translate into a parallel shift of my original data? any ideas? thanks so much in advance! |
|
Jimjohn,
I do not know who told you such nonsense, but IMHO it's all wrong. PCA identifies underlying factors that "explain" the correlations between your observed variables. In other words, if you were to measure those underlying (actually non measurable) factors or components, the correlation between your observed variables, controlling for the underlying factors, would be zero or near zero. There is no direct link between each factor and one specific variable (such as the one you sketch) mandating the multiplication of variable 1 times component 1 + variable 2 * component 2, etc. Each component is linked to all variables (though it may be correlated more strongly with sone of them), and each variable is linked to all underlying components. If all your observed variables are closely correlated (e.g. if you have several independent indexes of the same psychological trait, as in several intelligence tests), the first component represents the most important common component that could be constructed to explain the greater part of the intercorrelations between observed variables. Controlling for that first component maximises the amount of variance explained by a single (non observed) variable. It leaves usually a portion of unexplained variance in observed variables. The second component is the best you can do to explain the remainded variance in observed variables, and the third and further components, in turn, explain the remaining variance not explained by previous factors. In the traditional interpretation going back to the early 20th Century study of cognitive ability by Spearmann, the first factors represented "general intelligence", and the other components may represent other independent factors affecting test scores, such as a general familiarity with test-taking situations, socioeconomic status, specific abilities linked with some specific tests, and what not. PCA is but one of various factor analysis techniques. I recommend you first read some elementary text on factor analysis, to get more acquainted with the nature, scope and limitations of PCA. Hector -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of jimjohn Sent: 23 August 2009 18:03 To: [hidden email] Subject: PCA: Shift, Twist, Butterfly im just starting to understand how to conduct PCA, and am having trouble interpreting the results. I've been told that the first component represents a parallel shift, the second represents a twist, and the third represents a butterfly of my original data. from what i understand, if i have three components, then to come up with my new first factor I take the different components from the component matrix and do: Component1 * Value Original Variable 1 + Component 2 * Value Original Variable 2 + ... how does this translate into a parallel shift of my original data? any ideas? thanks so much in advance! -- View this message in context: http://www.nabble.com/PCA%3A-Shift%2C-Twist%2C-Butterfly-tp25108336p25108336 .html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD No virus found in this incoming message. Checked by AVG - www.avg.com Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/23/09 06:18:00 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hector,
I don't agree with you. Exploratory factor analysis (EPA) is not the same "thing" as principal component analysis (PCA). PCA is not a type of EFA. PCA and EFA are different statistical methods that are designed to achieve different objectives (Bentler & Kano, 1990). EFA is based on the common factor model that postulates that each item in a battery of measured items is a linear function of one or more common factors and one unique factor. Common factors are latent (unobserved) factors. Conversely, PCA doesn’t differentiate between common and unique variance. Hence, principal components are not latent variables---and it isn’t conceptually correct to equate them with common factors. The goal of PCA is data reduction, i.e., taking sccores on a large set of measured items and reducing the scores on a smaller set of composite variables. In contrast, the goal of EFA is to identify latent constructs, i.e., understanding the structure of the correlations among the measured variables or items. Some have argued that PCA and EFA produce similar results. However, this is not always the case. Differences emerge are likely when communalities are low (e.g., .40). Scott R Millis, PhD, ABPP (CN,CL,RP), CStat, CSci Professor & Director of Research Dept of Physical Medicine & Rehabilitation Dept of Emergency Medicine Wayne State University School of Medicine 261 Mack Blvd Detroit, MI 48201 Email: [hidden email] Tel: 313-993-8085 Fax: 313-966-7682 --- On Sun, 8/23/09, Hector Maletta <[hidden email]> wrote: > From: Hector Maletta <[hidden email]> > Subject: Re: Shift, Twist, Butterfly > To: [hidden email] > Date: Sunday, August 23, 2009, 7:27 PM > Jimjohn, > I do not know who told you such nonsense, but IMHO it's all > wrong. PCA > identifies underlying factors that "explain" the > correlations between your > observed variables. In other words, if you were to measure > those underlying > (actually non measurable) factors or components, the > correlation between > your observed variables, controlling for the underlying > factors, would be > zero or near zero. There is no direct link between each > factor and one > specific variable (such as the one you sketch) mandating > the multiplication > of variable 1 times component 1 + variable 2 * component 2, > etc. Each > component is linked to all variables (though it may be > correlated more > strongly with sone of them), and each variable is linked to > all underlying > components. > If all your observed variables are closely correlated (e.g. > if you have > several independent indexes of the same psychological > trait, as in several > intelligence tests), the first component represents the > most important > common component that could be constructed to explain the > greater part of > the intercorrelations between observed variables. > Controlling for that first > component maximises the amount of variance explained by a > single (non > observed) variable. It leaves usually a portion of > unexplained variance in > observed variables. The second component is the best you > can do to explain > the remainded variance in observed variables, and the third > and further > components, in turn, explain the remaining variance not > explained by > previous factors. In the traditional interpretation going > back to the early > 20th Century study of cognitive ability by Spearmann, the > first factors > represented "general intelligence", and the other > components may represent > other independent factors affecting test scores, such as a > general > familiarity with test-taking situations, socioeconomic > status, specific > abilities linked with some specific tests, and what not. > PCA is but one of various factor analysis techniques. I > recommend you first > read some elementary text on factor analysis, to get more > acquainted with > the nature, scope and limitations of PCA. > > Hector > > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] > On Behalf Of > jimjohn > Sent: 23 August 2009 18:03 > To: [hidden email] > Subject: PCA: Shift, Twist, Butterfly > > im just starting to understand how to conduct PCA, and am > having trouble > interpreting the results. I've been told that the first > component represents > a parallel shift, the second represents a twist, and the > third represents a > butterfly of my original data. > > from what i understand, if i have three components, then to > come up with my > new first factor I take the different components from the > component matrix > and do: > Component1 * Value Original Variable 1 + Component 2 * > Value Original > Variable 2 + ... > > how does this translate into a parallel shift of my > original data? any > ideas? thanks so much in advance! > -- > View this message in context: > http://www.nabble.com/PCA%3A-Shift%2C-Twist%2C-Butterfly-tp25108336p25108336 > .html > Sent from the SPSSX Discussion mailing list archive at > Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the > command > INFO REFCARD > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release > Date: 08/23/09 > 06:18:00 > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the > command > INFO REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Scott,
Sorry but I insist in my view. Factor analysis is a general technique, with many variants. One of the variants concerns the assumptions on commonality: in PCA the whole variance is subjected to analysis, and therefore the sum of factors' contributions explains 100% of the variance of all observed variables, while other variants start with an estimate of the amount of "common" variance and separate it from the rest of variance, usually supposed to be due to idiosyncratic characteristics of each observed variable, unrelated to other observed variables. In this case, the technique computes factors explaining the common variance only, leaving the idiosyncratic part unexplained (attributed to peculiarities of each observed variable). This is hardly such a transcendental difference. From the mathematical or computational point of view there is really no big deal of a difference: the same mathematical procedure applies in both cases, the only difference being in the values at the main diagonal of the intercorrelation matrix: 1's in PCA, and the assumed commonalities in the other alternatives. The purpose of the analysis, and the nature of the variables, would dictate which alternative is more adequate for each analysis, but that hardly impinges on the problem posed by Jimjohn: I limited myself to his case, in which he is using PCA. I do not agree either that PCA is not usable for locating latent factors. That is precisely what factor analysis do, either under PCA hypotheses or otherwise. It estimates one or more fictitious variables ('constructs') that in case they existed would 'explain' (statistically speaking) the observed correlation of observed variables. In other words, controlling for the components or factors would reduce or eliminate the correlation between observed variables. These factors or components are not objective things: they are just mathematical constructs, and their values (and even their correlations with observed variables, or 'loadings') would be altered by a change in the frame of reference (i.e. by rotation). Taking just the major components (in PCA) and disregarding the rest achieves precisely the kind of data reduction you are talking about. That is precisely what Spearman did at the beginning of the 20th Century to identify "general intelligence", in fact just the first factor extracted in a PCA procedure applied to a number of inter-correlated cognitive-ability tests. A similar procedure was used, years later, by Thurstone to extract several factors (corresponding to various different cognitive abilities such as linguistic, quantitative, spatial, etc), for which Thurstone added rotation to the repertoire of factor analysis in order to define the factors in such a way that each was clearly more correlated with one different set of observed variables (this neat attribute was, of course, dependent on the particular frame of reference chosen, and was not a property of the factors themselves; the factors continued to be just mathematical artifacts, not real things). Other variants of FA (more exactly, EFA) were later introduced to deal with other specific (computational) problems. Eventually, confirmatory factor analysis emerged, in which only certain pre-specified effects and factors were included. Some of the effects are postulated to be zero, and the procedure is, at last, not very different from Structural Equation Modeling. EFA always "succeeds": every data set can be treated to EFA, and the results cannot be challenged. CFA, instead, may be refuted if it fails to reproduce the observed data structure. However different the intents and results, all these procedures are just variants of the same basic procedure, factor analysis, which is in turn just a corollary or application of the general linear model where ordinary least square regression, ANOVA and other statistical workhorses also belong. Their common feature is that all variables are supposed to be linearly related with an error term, so that for any set of variables (Y, X, Z, ...) it is postulated that Y=a + bX + cZ...+e, where "e" is a random error with zero mean, and the coefficients (a, b, c...) are computed by minimizing the sum of the squared errors (e2) over all cases. -----Original Message----- From: SR Millis [mailto:[hidden email]] Sent: 23 August 2009 18:55 To: Hector Maletta; SPSS Subject: Re: Shift, Twist, Butterfly Hector, I don't agree with you. Exploratory factor analysis (EPA) is not the same "thing" as principal component analysis (PCA). PCA is not a type of EFA. PCA and EFA are different statistical methods that are designed to achieve different objectives (Bentler & Kano, 1990). EFA is based on the common factor model that postulates that each item in a battery of measured items is a linear function of one or more common factors and one unique factor. Common factors are latent (unobserved) factors. Conversely, PCA doesn't differentiate between common and unique variance. Hence, principal components are not latent variables---and it isn't conceptually correct to equate them with common factors. The goal of PCA is data reduction, i.e., taking scores on a large set of measured items and reducing the scores on a smaller set of composite variables. In contrast, the goal of EFA is to identify latent constructs, i.e., understanding the structure of the correlations among the measured variables or items. Some have argued that PCA and EFA produce similar results. However, this is not always the case. Differences emerge are likely when communalities are low (e.g., .40). Scott R Millis, PhD, ABPP (CN,CL,RP), CStat, CSci Professor & Director of Research Dept of Physical Medicine & Rehabilitation Dept of Emergency Medicine Wayne State University School of Medicine 261 Mack Blvd Detroit, MI 48201 Email: [hidden email] Tel: 313-993-8085 Fax: 313-966-7682 --- On Sun, 8/23/09, Hector Maletta <[hidden email]> wrote: > From: Hector Maletta <[hidden email]> > Subject: Re: Shift, Twist, Butterfly > To: [hidden email] > Date: Sunday, August 23, 2009, 7:27 PM > Jimjohn, > I do not know who told you such nonsense, but IMHO it's all > wrong. PCA > identifies underlying factors that "explain" the > correlations between your > observed variables. In other words, if you were to measure > those underlying > (actually non measurable) factors or components, the > correlation between > your observed variables, controlling for the underlying > factors, would be > zero or near zero. There is no direct link between each > factor and one > specific variable (such as the one you sketch) mandating > the multiplication > of variable 1 times component 1 + variable 2 * component 2, > etc. Each > component is linked to all variables (though it may be > correlated more > strongly with sone of them), and each variable is linked to > all underlying > components. > If all your observed variables are closely correlated (e.g. > if you have > several independent indexes of the same psychological > trait, as in several > intelligence tests), the first component represents the > most important > common component that could be constructed to explain the > greater part of > the intercorrelations between observed variables. > Controlling for that first > component maximises the amount of variance explained by a > single (non > observed) variable. It leaves usually a portion of > unexplained variance in > observed variables. The second component is the best you > can do to explain > the remainded variance in observed variables, and the third > and further > components, in turn, explain the remaining variance not > explained by > previous factors. In the traditional interpretation going > back to the early > 20th Century study of cognitive ability by Spearmann, the > first factors > represented "general intelligence", and the other > components may represent > other independent factors affecting test scores, such as a > general > familiarity with test-taking situations, socioeconomic > status, specific > abilities linked with some specific tests, and what not. > PCA is but one of various factor analysis techniques. I > recommend you first > read some elementary text on factor analysis, to get more > acquainted with > the nature, scope and limitations of PCA. > > Hector > > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] > On Behalf Of > jimjohn > Sent: 23 August 2009 18:03 > To: [hidden email] > Subject: PCA: Shift, Twist, Butterfly > > im just starting to understand how to conduct PCA, and am > having trouble > interpreting the results. I've been told that the first > component represents > a parallel shift, the second represents a twist, and the > third represents a > butterfly of my original data. > > from what i understand, if i have three components, then to > come up with my > new first factor I take the different components from the > component matrix > and do: > Component1 * Value Original Variable 1 + Component 2 * > Value Original > Variable 2 + ... > > how does this translate into a parallel shift of my > original data? any > ideas? thanks so much in advance! > -- > View this message in context: > > .html > Sent from the SPSSX Discussion mailing list archive at > Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the > command > INFO REFCARD > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release > Date: 08/23/09 > 06:18:00 > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the > command > INFO REFCARD > Checked by AVG - www.avg.com Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/23/09 18:03:00 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
PC analysis has a fundamentally different purpose that does PA. PC determines components of the measures that account for the maximum amount of variance and are independent of one another. The purpose for such components is data reduction. There is no rotation needed nor is there any concern for what the underlying structure of the measures are. You just want a smaller number of predictors that are independent of each other. PA, on the other hand, is used to try and fathom the underlying structure of the variables. Its purpose is different. You are searching for latent variables with PA analyses. You are trying to partition the variance of the variables into the three parts, common factor variance, specific factor variance, and error variance. PCA does not do this. PA is much closer to CFA that PCA is. I'd look to the writings of Preacher and MacCallum on Tom Swift and his Electric factor analysis machine for more detail.
Dr. Paul R. Swank, Professor and Director of Research Children's Learning Institute University of Texas Health Science Center-Houston -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Hector Maletta Sent: Sunday, August 23, 2009 8:04 PM To: [hidden email] Subject: Re: Shift, Twist, Butterfly Scott, Sorry but I insist in my view. Factor analysis is a general technique, with many variants. One of the variants concerns the assumptions on commonality: in PCA the whole variance is subjected to analysis, and therefore the sum of factors' contributions explains 100% of the variance of all observed variables, while other variants start with an estimate of the amount of "common" variance and separate it from the rest of variance, usually supposed to be due to idiosyncratic characteristics of each observed variable, unrelated to other observed variables. In this case, the technique computes factors explaining the common variance only, leaving the idiosyncratic part unexplained (attributed to peculiarities of each observed variable). This is hardly such a transcendental difference. From the mathematical or computational point of view there is really no big deal of a difference: the same mathematical procedure applies in both cases, the only difference being in the values at the main diagonal of the intercorrelation matrix: 1's in PCA, and the assumed commonalities in the other alternatives. The purpose of the analysis, and the nature of the variables, would dictate which alternative is more adequate for each analysis, but that hardly impinges on the problem posed by Jimjohn: I limited myself to his case, in which he is using PCA. I do not agree either that PCA is not usable for locating latent factors. That is precisely what factor analysis do, either under PCA hypotheses or otherwise. It estimates one or more fictitious variables ('constructs') that in case they existed would 'explain' (statistically speaking) the observed correlation of observed variables. In other words, controlling for the components or factors would reduce or eliminate the correlation between observed variables. These factors or components are not objective things: they are just mathematical constructs, and their values (and even their correlations with observed variables, or 'loadings') would be altered by a change in the frame of reference (i.e. by rotation). Taking just the major components (in PCA) and disregarding the rest achieves precisely the kind of data reduction you are talking about. That is precisely what Spearman did at the beginning of the 20th Century to identify "general intelligence", in fact just the first factor extracted in a PCA procedure applied to a number of inter-correlated cognitive-ability tests. A similar procedure was used, years later, by Thurstone to extract several factors (corresponding to various different cognitive abilities such as linguistic, quantitative, spatial, etc), for which Thurstone added rotation to the repertoire of factor analysis in order to define the factors in such a way that each was clearly more correlated with one different set of observed variables (this neat attribute was, of course, dependent on the particular frame of reference chosen, and was not a property of the factors themselves; the factors continued to be just mathematical artifacts, not real things). Other variants of FA (more exactly, EFA) were later introduced to deal with other specific (computational) problems. Eventually, confirmatory factor analysis emerged, in which only certain pre-specified effects and factors were included. Some of the effects are postulated to be zero, and the procedure is, at last, not very different from Structural Equation Modeling. EFA always "succeeds": every data set can be treated to EFA, and the results cannot be challenged. CFA, instead, may be refuted if it fails to reproduce the observed data structure. However different the intents and results, all these procedures are just variants of the same basic procedure, factor analysis, which is in turn just a corollary or application of the general linear model where ordinary least square regression, ANOVA and other statistical workhorses also belong. Their common feature is that all variables are supposed to be linearly related with an error term, so that for any set of variables (Y, X, Z, ...) it is postulated that Y=a + bX + cZ...+e, where "e" is a random error with zero mean, and the coefficients (a, b, c...) are computed by minimizing the sum of the squared errors (e2) over all cases. -----Original Message----- From: SR Millis [mailto:[hidden email]] Sent: 23 August 2009 18:55 To: Hector Maletta; SPSS Subject: Re: Shift, Twist, Butterfly Hector, I don't agree with you. Exploratory factor analysis (EPA) is not the same "thing" as principal component analysis (PCA). PCA is not a type of EFA. PCA and EFA are different statistical methods that are designed to achieve different objectives (Bentler & Kano, 1990). EFA is based on the common factor model that postulates that each item in a battery of measured items is a linear function of one or more common factors and one unique factor. Common factors are latent (unobserved) factors. Conversely, PCA doesn't differentiate between common and unique variance. Hence, principal components are not latent variables---and it isn't conceptually correct to equate them with common factors. The goal of PCA is data reduction, i.e., taking scores on a large set of measured items and reducing the scores on a smaller set of composite variables. In contrast, the goal of EFA is to identify latent constructs, i.e., understanding the structure of the correlations among the measured variables or items. Some have argued that PCA and EFA produce similar results. However, this is not always the case. Differences emerge are likely when communalities are low (e.g., .40). Scott R Millis, PhD, ABPP (CN,CL,RP), CStat, CSci Professor & Director of Research Dept of Physical Medicine & Rehabilitation Dept of Emergency Medicine Wayne State University School of Medicine 261 Mack Blvd Detroit, MI 48201 Email: [hidden email] Tel: 313-993-8085 Fax: 313-966-7682 --- On Sun, 8/23/09, Hector Maletta <[hidden email]> wrote: > From: Hector Maletta <[hidden email]> > Subject: Re: Shift, Twist, Butterfly > To: [hidden email] > Date: Sunday, August 23, 2009, 7:27 PM > Jimjohn, > I do not know who told you such nonsense, but IMHO it's all > wrong. PCA > identifies underlying factors that "explain" the > correlations between your > observed variables. In other words, if you were to measure > those underlying > (actually non measurable) factors or components, the > correlation between > your observed variables, controlling for the underlying > factors, would be > zero or near zero. There is no direct link between each > factor and one > specific variable (such as the one you sketch) mandating > the multiplication > of variable 1 times component 1 + variable 2 * component 2, > etc. Each > component is linked to all variables (though it may be > correlated more > strongly with sone of them), and each variable is linked to > all underlying > components. > If all your observed variables are closely correlated (e.g. > if you have > several independent indexes of the same psychological > trait, as in several > intelligence tests), the first component represents the > most important > common component that could be constructed to explain the > greater part of > the intercorrelations between observed variables. > Controlling for that first > component maximises the amount of variance explained by a > single (non > observed) variable. It leaves usually a portion of > unexplained variance in > observed variables. The second component is the best you > can do to explain > the remainded variance in observed variables, and the third > and further > components, in turn, explain the remaining variance not > explained by > previous factors. In the traditional interpretation going > back to the early > 20th Century study of cognitive ability by Spearmann, the > first factors > represented "general intelligence", and the other > components may represent > other independent factors affecting test scores, such as a > general > familiarity with test-taking situations, socioeconomic > status, specific > abilities linked with some specific tests, and what not. > PCA is but one of various factor analysis techniques. I > recommend you first > read some elementary text on factor analysis, to get more > acquainted with > the nature, scope and limitations of PCA. > > Hector > > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] > On Behalf Of > jimjohn > Sent: 23 August 2009 18:03 > To: [hidden email] > Subject: PCA: Shift, Twist, Butterfly > > im just starting to understand how to conduct PCA, and am > having trouble > interpreting the results. I've been told that the first > component represents > a parallel shift, the second represents a twist, and the > third represents a > butterfly of my original data. > > from what i understand, if i have three components, then to > come up with my > new first factor I take the different components from the > component matrix > and do: > Component1 * Value Original Variable 1 + Component 2 * > Value Original > Variable 2 + ... > > how does this translate into a parallel shift of my > original data? any > ideas? thanks so much in advance! > -- > View this message in context: > > .html > Sent from the SPSSX Discussion mailing list archive at > Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the > command > INFO REFCARD > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release > Date: 08/23/09 > 06:18:00 > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the > command > INFO REFCARD > Checked by AVG - www.avg.com Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/23/09 18:03:00 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Hi there, The terms "shift", "twist" and "butterfly" are used in finance and risk management, mostly for describing yield curve changes. From http://en.wikipedia.org/wiki/Fixed_income_attribution, which measures returns generated by various sources of risk in a fixed income portfolio:
<snip>…
So they could be related with factor analysis via covariances and eigenvalues, but are not the same stuff, nor could be interpreted similarly. My two cents, Fernando Mazariegos
-----Mensaje original-----
PC analysis has a fundamentally different purpose that does PA. PC determines components of the measures that account for the maximum amount of variance and are independent of one another. The purpose for such components is data reduction. There is no rotation needed nor is there any concern for what the underlying structure of the measures are. You just want a smaller number of predictors that are independent of each other. PA, on the other hand, is used to try and fathom the underlying structure of the variables. Its purpose is different. You are searching for latent variables with PA analyses. You are trying to partition the variance of the variables into the three parts, common factor variance, specific factor variance, and error variance. PCA does not do this. PA is much closer to CFA that PCA is. I'd look to the writings of Preacher and MacCallum on Tom Swift and his Electric factor analysis machine for more detail. Dr. Paul R. Swank,
-----Original Message-----
Scott,
in PCA the whole variance is subjected to analysis, and therefore the sum of factors' contributions explains 100% of the variance of all observed variables, while other variants start with an estimate of the amount of "common" variance and separate it from the rest of variance, usually supposed to be due to idiosyncratic characteristics of each observed variable, unrelated to other observed variables. In this case, the technique computes factors explaining the common variance only, leaving the idiosyncratic part unexplained (attributed to peculiarities of each observed variable). This is hardly such a transcendental difference. From the mathematical or computational point of view there is really no big deal of a difference: the same mathematical procedure applies in both cases, the only difference being in the values at the main diagonal of the intercorrelation matrix: 1's in PCA, and the assumed commonalities in the other alternatives. The purpose of the analysis, and the nature of the variables, would dictate which alternative is more adequate for each analysis, but that hardly impinges on the problem posed by Jimjohn: I limited myself to his case, in which he is using PCA. I do not agree either that PCA is not usable for locating latent factors.
they are just mathematical constructs, and their values (and even their correlations with observed variables, or 'loadings') would be altered by a change in the frame of reference (i.e. by rotation). Taking just the major components (in PCA) and disregarding the rest achieves precisely the kind of data reduction you are talking about. That is precisely what Spearman did at the beginning of the 20th Century to identify "general intelligence", in fact just the first factor extracted in a PCA procedure applied to a number of inter-correlated cognitive-ability tests. A similar procedure was used, years later, by Thurstone to extract several factors (corresponding to various different cognitive abilities such as linguistic, quantitative, spatial, etc), for which Thurstone added rotation to the repertoire of factor analysis in order to define the factors in such a way that each was clearly more correlated with one different set of observed variables (this neat attribute was, of course, dependent on the particular frame of reference chosen, and was not a property of the factors themselves; the factors continued to be just mathematical artifacts, not real things). Other variants of FA (more exactly, EFA) were later introduced to deal with other specific (computational) problems. Eventually, confirmatory factor analysis emerged, in which only certain pre-specified effects and factors were included. Some of the effects are postulated to be zero, and the procedure is, at last, not very different from Structural Equation Modeling. EFA always "succeeds": every data set can be treated to EFA, and the results cannot be challenged. CFA, instead, may be refuted if it fails to reproduce the observed data structure. However different the intents and results, all these procedures are just variants of the same basic procedure, factor analysis, which is in turn just a corollary or application of the general linear model where ordinary least square regression, ANOVA and other statistical workhorses also belong. Their common feature is that all variables are supposed to be linearly related with an error term, so that for any set of variables (Y, X, Z, ...) it is postulated that Y=a + bX + cZ...+e, where "e" is a random error with zero mean, and the coefficients (a, b, c...) are computed by minimizing the sum of the squared errors (e2) over all cases. -----Original Message-----
Hector, I don't agree with you. Exploratory factor analysis (EPA) is not the same "thing" as principal component analysis (PCA). PCA is not a type of EFA. PCA and EFA are different statistical methods that are designed to achieve different objectives (Bentler & Kano, 1990). EFA is based on the common factor model that postulates that each item in a battery of measured items is a linear function of one or more common factors and one unique factor. Common factors are latent (unobserved) factors. Conversely, PCA doesn't differentiate between common and unique variance.
Some have argued that PCA and EFA produce similar results. However, this is not always the case. Differences emerge are likely when communalities are low (e.g., .40). Scott R Millis, PhD, ABPP (CN,CL,RP), CStat, CSci Professor & Director of Research Dept of Physical Medicine & Rehabilitation Dept of Emergency Medicine Wayne State University School of Medicine 261 Mack Blvd
--- On Sun, 8/23/09, Hector Maletta <[hidden email]> wrote: > From: Hector Maletta <[hidden email]>
=====================
=====================
|
|
In reply to this post by Hector Maletta
Azam,
SPSS produces factor scores for each case, i.e. the score of each case on each component, through the added keyword /SAVE in the FACTOR command. These scores are obtained as linear combinations of variables, weighted by their loadings with each factor. SPSS creates a new variable for each factor score, and gives each new variable a standard name, but you can name them as you wish. Hector -----Original Message----- From: [hidden email] [mailto:[hidden email]] Sent: 24 August 2009 16:46 To: Hector Maletta Subject: RE: Shift, Twist, Butterfly Thanks so much Hector, appreciate that! I am hearing of one thing called inverse factor loadings regarding PCA's, which should return factor values for each variable for each case. Has anyone heard of this? and if so, how can I get SPSS to do that? Thanks in advance! Quoting Hector Maletta <[hidden email]>: > Jimjohn, > I do not know who told you such nonsense, but IMHO it's all wrong. PCA > identifies underlying factors that "explain" the correlations between your > observed variables. In other words, if you were to measure those underlying > (actually non measurable) factors or components, the correlation between > your observed variables, controlling for the underlying factors, would be > zero or near zero. There is no direct link between each factor and one > specific variable (such as the one you sketch) mandating the multiplication > of variable 1 times component 1 + variable 2 * component 2, etc. Each > component is linked to all variables (though it may be correlated more > strongly with sone of them), and each variable is linked to all underlying > components. > If all your observed variables are closely correlated (e.g. if you have > several independent indexes of the same psychological trait, as in several > intelligence tests), the first component represents the most important > common component that could be constructed to explain the greater part of > the intercorrelations between observed variables. Controlling for that first > component maximises the amount of variance explained by a single (non > observed) variable. It leaves usually a portion of unexplained variance in > observed variables. The second component is the best you can do to explain > the remainded variance in observed variables, and the third and further > components, in turn, explain the remaining variance not explained by > previous factors. In the traditional interpretation going back to the early > 20th Century study of cognitive ability by Spearmann, the first factors > represented "general intelligence", and the other components may represent > other independent factors affecting test scores, such as a general > familiarity with test-taking situations, socioeconomic status, specific > abilities linked with some specific tests, and what not. > PCA is but one of various factor analysis techniques. I recommend you first > read some elementary text on factor analysis, to get more acquainted with > the nature, scope and limitations of PCA. > > Hector > > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of > jimjohn > Sent: 23 August 2009 18:03 > To: [hidden email] > Subject: PCA: Shift, Twist, Butterfly > > im just starting to understand how to conduct PCA, and am having trouble > interpreting the results. I've been told that the first component > a parallel shift, the second represents a twist, and the third represents a > butterfly of my original data. > > from what i understand, if i have three components, then to come up with my > new first factor I take the different components from the component matrix > and do: > Component1 * Value Original Variable 1 + Component 2 * Value Original > Variable 2 + ... > > how does this translate into a parallel shift of my original data? any > ideas? thanks so much in advance! > -- > View this message in context: > > .html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/23/09 > 06:18:00 > > No virus found in this incoming message. Checked by AVG - www.avg.com Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/24/09 06:05:00 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Swank, Paul R
I insist: PCA and other types of Factor Analysis rest on different
assumptions and may be used for different purposes, but all are essentially the same statistical model and procedure. One assumes a "commonality" of 1, the others a commonality < 1, i.e. some unique variance, but except for that the rest is exactly the same. You can rotate PCA solutions just as you can rotate other factor analyses: nothing hinders that. As with all statistical procedures, you choose the variant that best suits your dataset and theory. Moreover, the objective is not necessarily data reduction, in the sense of getting a SMALLER number of (latent) variables instead of your original set of (observed) variables. Your goal might be to replace your k observed variables with k latent factors, with the goal for instance of getting the information of your k (correlated) variables separated into (k) orthogonal or uncorrelated factors. One thing is the use to which one habitually puts these procedures, and another thing altogether is the nature of the procedures. Since all kinds of factor analysis create constructs that do not really exist, what set of constructs you'd prefer would depend on the kind of analytical problem you are facing, and the sort of theory you have about the problem. Hector -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Swank, Paul R Sent: 24 August 2009 13:45 To: [hidden email] Subject: Re: Shift, Twist, Butterfly PC analysis has a fundamentally different purpose that does PA. PC determines components of the measures that account for the maximum amount of variance and are independent of one another. The purpose for such components is data reduction. There is no rotation needed nor is there any concern for what the underlying structure of the measures are. You just want a smaller number of predictors that are independent of each other. PA, on the other hand, is used to try and fathom the underlying structure of the variables. Its purpose is different. You are searching for latent variables with PA analyses. You are trying to partition the variance of the variables into the three parts, common factor variance, specific factor variance, and error variance. PCA does not do this. PA is much closer to CFA that PCA is. I'd look to the writings of Preacher and MacCallum on Tom Swift and his Electric factor analysis machine for more detail. Dr. Paul R. Swank, Professor and Director of Research Children's Learning Institute University of Texas Health Science Center-Houston -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Hector Maletta Sent: Sunday, August 23, 2009 8:04 PM To: [hidden email] Subject: Re: Shift, Twist, Butterfly Scott, Sorry but I insist in my view. Factor analysis is a general technique, with many variants. One of the variants concerns the assumptions on commonality: in PCA the whole variance is subjected to analysis, and therefore the sum of factors' contributions explains 100% of the variance of all observed variables, while other variants start with an estimate of the amount of "common" variance and separate it from the rest of variance, usually supposed to be due to idiosyncratic characteristics of each observed variable, unrelated to other observed variables. In this case, the technique computes factors explaining the common variance only, leaving the idiosyncratic part unexplained (attributed to peculiarities of each observed variable). This is hardly such a transcendental difference. From the mathematical or computational point of view there is really no big deal of a difference: the same mathematical procedure applies in both cases, the only difference being in the values at the main diagonal of the intercorrelation matrix: 1's in PCA, and the assumed commonalities in the other alternatives. The purpose of the analysis, and the nature of the variables, would dictate which alternative is more adequate for each analysis, but that hardly impinges on the problem posed by Jimjohn: I limited myself to his case, in which he is using PCA. I do not agree either that PCA is not usable for locating latent factors. That is precisely what factor analysis do, either under PCA hypotheses or otherwise. It estimates one or more fictitious variables ('constructs') that in case they existed would 'explain' (statistically speaking) the observed correlation of observed variables. In other words, controlling for the components or factors would reduce or eliminate the correlation between observed variables. These factors or components are not objective things: they are just mathematical constructs, and their values (and even their correlations with observed variables, or 'loadings') would be altered by a change in the frame of reference (i.e. by rotation). Taking just the major components (in PCA) and disregarding the rest achieves precisely the kind of data reduction you are talking about. That is precisely what Spearman did at the beginning of the 20th Century to identify "general intelligence", in fact just the first factor extracted in a PCA procedure applied to a number of inter-correlated cognitive-ability tests. A similar procedure was used, years later, by Thurstone to extract several factors (corresponding to various different cognitive abilities such as linguistic, quantitative, spatial, etc), for which Thurstone added rotation to the repertoire of factor analysis in order to define the factors in such a way that each was clearly more correlated with one different set of observed variables (this neat attribute was, of course, dependent on the particular frame of reference chosen, and was not a property of the factors themselves; the factors continued to be just mathematical artifacts, not real things). Other variants of FA (more exactly, EFA) were later introduced to deal with other specific (computational) problems. Eventually, confirmatory factor analysis emerged, in which only certain pre-specified effects and factors were included. Some of the effects are postulated to be zero, and the procedure is, at last, not very different from Structural Equation Modeling. EFA always "succeeds": every data set can be treated to EFA, and the results cannot be challenged. CFA, instead, may be refuted if it fails to reproduce the observed data structure. However different the intents and results, all these procedures are just variants of the same basic procedure, factor analysis, which is in turn just a corollary or application of the general linear model where ordinary least square regression, ANOVA and other statistical workhorses also belong. Their common feature is that all variables are supposed to be linearly related with an error term, so that for any set of variables (Y, X, Z, ...) it is postulated that Y=a + bX + cZ...+e, where "e" is a random error with zero mean, and the coefficients (a, b, c...) are computed by minimizing the sum of the squared errors (e2) over all cases. -----Original Message----- From: SR Millis [mailto:[hidden email]] Sent: 23 August 2009 18:55 To: Hector Maletta; SPSS Subject: Re: Shift, Twist, Butterfly Hector, I don't agree with you. Exploratory factor analysis (EPA) is not the same "thing" as principal component analysis (PCA). PCA is not a type of EFA. PCA and EFA are different statistical methods that are designed to achieve different objectives (Bentler & Kano, 1990). EFA is based on the common factor model that postulates that each item in a battery of measured items is a linear function of one or more common factors and one unique factor. Common factors are latent (unobserved) factors. Conversely, PCA doesn't differentiate between common and unique variance. Hence, principal components are not latent variables---and it isn't conceptually correct to equate them with common factors. The goal of PCA is data reduction, i.e., taking scores on a large set of measured items and reducing the scores on a smaller set of composite variables. In contrast, the goal of EFA is to identify latent constructs, i.e., understanding the structure of the correlations among the measured variables or items. Some have argued that PCA and EFA produce similar results. However, this is not always the case. Differences emerge are likely when communalities are low (e.g., .40). Scott R Millis, PhD, ABPP (CN,CL,RP), CStat, CSci Professor & Director of Research Dept of Physical Medicine & Rehabilitation Dept of Emergency Medicine Wayne State University School of Medicine 261 Mack Blvd Detroit, MI 48201 Email: [hidden email] Tel: 313-993-8085 Fax: 313-966-7682 --- On Sun, 8/23/09, Hector Maletta <[hidden email]> wrote: > From: Hector Maletta <[hidden email]> > Subject: Re: Shift, Twist, Butterfly > To: [hidden email] > Date: Sunday, August 23, 2009, 7:27 PM > Jimjohn, > I do not know who told you such nonsense, but IMHO it's all > wrong. PCA > identifies underlying factors that "explain" the > correlations between your > observed variables. In other words, if you were to measure > those underlying > (actually non measurable) factors or components, the > correlation between > your observed variables, controlling for the underlying > factors, would be > zero or near zero. There is no direct link between each > factor and one > specific variable (such as the one you sketch) mandating > the multiplication > of variable 1 times component 1 + variable 2 * component 2, > etc. Each > component is linked to all variables (though it may be > correlated more > strongly with sone of them), and each variable is linked to > all underlying > components. > If all your observed variables are closely correlated (e.g. > if you have > several independent indexes of the same psychological > trait, as in several > intelligence tests), the first component represents the > most important > common component that could be constructed to explain the > greater part of > the intercorrelations between observed variables. > Controlling for that first > component maximises the amount of variance explained by a > single (non > observed) variable. It leaves usually a portion of > unexplained variance in > observed variables. The second component is the best you > can do to explain > the remainded variance in observed variables, and the third > and further > components, in turn, explain the remaining variance not > explained by > previous factors. In the traditional interpretation going > back to the early > 20th Century study of cognitive ability by Spearmann, the > first factors > represented "general intelligence", and the other > components may represent > other independent factors affecting test scores, such as a > general > familiarity with test-taking situations, socioeconomic > status, specific > abilities linked with some specific tests, and what not. > PCA is but one of various factor analysis techniques. I > recommend you first > read some elementary text on factor analysis, to get more > acquainted with > the nature, scope and limitations of PCA. > > Hector > > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] > On Behalf Of > jimjohn > Sent: 23 August 2009 18:03 > To: [hidden email] > Subject: PCA: Shift, Twist, Butterfly > > im just starting to understand how to conduct PCA, and am > having trouble > interpreting the results. I've been told that the first > component represents > a parallel shift, the second represents a twist, and the > third represents a > butterfly of my original data. > > from what i understand, if i have three components, then to > come up with my > new first factor I take the different components from the > component matrix > and do: > Component1 * Value Original Variable 1 + Component 2 * > Value Original > Variable 2 + ... > > how does this translate into a parallel shift of my > original data? any > ideas? thanks so much in advance! > -- > View this message in context: > > .html > Sent from the SPSSX Discussion mailing list archive at > Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the > command > INFO REFCARD > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release > Date: 08/23/09 > 06:18:00 > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the > command > INFO REFCARD > Checked by AVG - www.avg.com Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/23/09 18:03:00 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD No virus found in this incoming message. Checked by AVG - www.avg.com Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/24/09 06:05:00 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Well, I insist to the contrary so we will have to agree to disagree.
Dr. Paul R. Swank, Professor and Director of Research Children's Learning Institute University of Texas Health Science Center-Houston -----Original Message----- From: Hector Maletta [mailto:[hidden email]] Sent: Monday, August 24, 2009 6:03 PM To: Swank, Paul R; [hidden email] Subject: RE: Shift, Twist, Butterfly I insist: PCA and other types of Factor Analysis rest on different assumptions and may be used for different purposes, but all are essentially the same statistical model and procedure. One assumes a "commonality" of 1, the others a commonality < 1, i.e. some unique variance, but except for that the rest is exactly the same. You can rotate PCA solutions just as you can rotate other factor analyses: nothing hinders that. As with all statistical procedures, you choose the variant that best suits your dataset and theory. Moreover, the objective is not necessarily data reduction, in the sense of getting a SMALLER number of (latent) variables instead of your original set of (observed) variables. Your goal might be to replace your k observed variables with k latent factors, with the goal for instance of getting the information of your k (correlated) variables separated into (k) orthogonal or uncorrelated factors. One thing is the use to which one habitually puts these procedures, and another thing altogether is the nature of the procedures. Since all kinds of factor analysis create constructs that do not really exist, what set of constructs you'd prefer would depend on the kind of analytical problem you are facing, and the sort of theory you have about the problem. Hector -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Swank, Paul R Sent: 24 August 2009 13:45 To: [hidden email] Subject: Re: Shift, Twist, Butterfly PC analysis has a fundamentally different purpose that does PA. PC determines components of the measures that account for the maximum amount of variance and are independent of one another. The purpose for such components is data reduction. There is no rotation needed nor is there any concern for what the underlying structure of the measures are. You just want a smaller number of predictors that are independent of each other. PA, on the other hand, is used to try and fathom the underlying structure of the variables. Its purpose is different. You are searching for latent variables with PA analyses. You are trying to partition the variance of the variables into the three parts, common factor variance, specific factor variance, and error variance. PCA does not do this. PA is much closer to CFA that PCA is. I'd look to the writings of Preacher and MacCallum on Tom Swift and his Electric factor analysis machine for more detail. Dr. Paul R. Swank, Professor and Director of Research Children's Learning Institute University of Texas Health Science Center-Houston -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Hector Maletta Sent: Sunday, August 23, 2009 8:04 PM To: [hidden email] Subject: Re: Shift, Twist, Butterfly Scott, Sorry but I insist in my view. Factor analysis is a general technique, with many variants. One of the variants concerns the assumptions on commonality: in PCA the whole variance is subjected to analysis, and therefore the sum of factors' contributions explains 100% of the variance of all observed variables, while other variants start with an estimate of the amount of "common" variance and separate it from the rest of variance, usually supposed to be due to idiosyncratic characteristics of each observed variable, unrelated to other observed variables. In this case, the technique computes factors explaining the common variance only, leaving the idiosyncratic part unexplained (attributed to peculiarities of each observed variable). This is hardly such a transcendental difference. From the mathematical or computational point of view there is really no big deal of a difference: the same mathematical procedure applies in both cases, the only difference being in the values at the main diagonal of the intercorrelation matrix: 1's in PCA, and the assumed commonalities in the other alternatives. The purpose of the analysis, and the nature of the variables, would dictate which alternative is more adequate for each analysis, but that hardly impinges on the problem posed by Jimjohn: I limited myself to his case, in which he is using PCA. I do not agree either that PCA is not usable for locating latent factors. That is precisely what factor analysis do, either under PCA hypotheses or otherwise. It estimates one or more fictitious variables ('constructs') that in case they existed would 'explain' (statistically speaking) the observed correlation of observed variables. In other words, controlling for the components or factors would reduce or eliminate the correlation between observed variables. These factors or components are not objective things: they are just mathematical constructs, and their values (and even their correlations with observed variables, or 'loadings') would be altered by a change in the frame of reference (i.e. by rotation). Taking just the major components (in PCA) and disregarding the rest achieves precisely the kind of data reduction you are talking about. That is precisely what Spearman did at the beginning of the 20th Century to identify "general intelligence", in fact just the first factor extracted in a PCA procedure applied to a number of inter-correlated cognitive-ability tests. A similar procedure was used, years later, by Thurstone to extract several factors (corresponding to various different cognitive abilities such as linguistic, quantitative, spatial, etc), for which Thurstone added rotation to the repertoire of factor analysis in order to define the factors in such a way that each was clearly more correlated with one different set of observed variables (this neat attribute was, of course, dependent on the particular frame of reference chosen, and was not a property of the factors themselves; the factors continued to be just mathematical artifacts, not real things). Other variants of FA (more exactly, EFA) were later introduced to deal with other specific (computational) problems. Eventually, confirmatory factor analysis emerged, in which only certain pre-specified effects and factors were included. Some of the effects are postulated to be zero, and the procedure is, at last, not very different from Structural Equation Modeling. EFA always "succeeds": every data set can be treated to EFA, and the results cannot be challenged. CFA, instead, may be refuted if it fails to reproduce the observed data structure. However different the intents and results, all these procedures are just variants of the same basic procedure, factor analysis, which is in turn just a corollary or application of the general linear model where ordinary least square regression, ANOVA and other statistical workhorses also belong. Their common feature is that all variables are supposed to be linearly related with an error term, so that for any set of variables (Y, X, Z, ...) it is postulated that Y=a + bX + cZ...+e, where "e" is a random error with zero mean, and the coefficients (a, b, c...) are computed by minimizing the sum of the squared errors (e2) over all cases. -----Original Message----- From: SR Millis [mailto:[hidden email]] Sent: 23 August 2009 18:55 To: Hector Maletta; SPSS Subject: Re: Shift, Twist, Butterfly Hector, I don't agree with you. Exploratory factor analysis (EPA) is not the same "thing" as principal component analysis (PCA). PCA is not a type of EFA. PCA and EFA are different statistical methods that are designed to achieve different objectives (Bentler & Kano, 1990). EFA is based on the common factor model that postulates that each item in a battery of measured items is a linear function of one or more common factors and one unique factor. Common factors are latent (unobserved) factors. Conversely, PCA doesn't differentiate between common and unique variance. Hence, principal components are not latent variables---and it isn't conceptually correct to equate them with common factors. The goal of PCA is data reduction, i.e., taking scores on a large set of measured items and reducing the scores on a smaller set of composite variables. In contrast, the goal of EFA is to identify latent constructs, i.e., understanding the structure of the correlations among the measured variables or items. Some have argued that PCA and EFA produce similar results. However, this is not always the case. Differences emerge are likely when communalities are low (e.g., .40). Scott R Millis, PhD, ABPP (CN,CL,RP), CStat, CSci Professor & Director of Research Dept of Physical Medicine & Rehabilitation Dept of Emergency Medicine Wayne State University School of Medicine 261 Mack Blvd Detroit, MI 48201 Email: [hidden email] Tel: 313-993-8085 Fax: 313-966-7682 --- On Sun, 8/23/09, Hector Maletta <[hidden email]> wrote: > From: Hector Maletta <[hidden email]> > Subject: Re: Shift, Twist, Butterfly > To: [hidden email] > Date: Sunday, August 23, 2009, 7:27 PM > Jimjohn, > I do not know who told you such nonsense, but IMHO it's all > wrong. PCA > identifies underlying factors that "explain" the > correlations between your > observed variables. In other words, if you were to measure > those underlying > (actually non measurable) factors or components, the > correlation between > your observed variables, controlling for the underlying > factors, would be > zero or near zero. There is no direct link between each > factor and one > specific variable (such as the one you sketch) mandating > the multiplication > of variable 1 times component 1 + variable 2 * component 2, > etc. Each > component is linked to all variables (though it may be > correlated more > strongly with sone of them), and each variable is linked to > all underlying > components. > If all your observed variables are closely correlated (e.g. > if you have > several independent indexes of the same psychological > trait, as in several > intelligence tests), the first component represents the > most important > common component that could be constructed to explain the > greater part of > the intercorrelations between observed variables. > Controlling for that first > component maximises the amount of variance explained by a > single (non > observed) variable. It leaves usually a portion of > unexplained variance in > observed variables. The second component is the best you > can do to explain > the remainded variance in observed variables, and the third > and further > components, in turn, explain the remaining variance not > explained by > previous factors. In the traditional interpretation going > back to the early > 20th Century study of cognitive ability by Spearmann, the > first factors > represented "general intelligence", and the other > components may represent > other independent factors affecting test scores, such as a > general > familiarity with test-taking situations, socioeconomic > status, specific > abilities linked with some specific tests, and what not. > PCA is but one of various factor analysis techniques. I > recommend you first > read some elementary text on factor analysis, to get more > acquainted with > the nature, scope and limitations of PCA. > > Hector > > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] > On Behalf Of > jimjohn > Sent: 23 August 2009 18:03 > To: [hidden email] > Subject: PCA: Shift, Twist, Butterfly > > im just starting to understand how to conduct PCA, and am > having trouble > interpreting the results. I've been told that the first > component represents > a parallel shift, the second represents a twist, and the > third represents a > butterfly of my original data. > > from what i understand, if i have three components, then to > come up with my > new first factor I take the different components from the > component matrix > and do: > Component1 * Value Original Variable 1 + Component 2 * > Value Original > Variable 2 + ... > > how does this translate into a parallel shift of my > original data? any > ideas? thanks so much in advance! > -- > View this message in context: > > .html > Sent from the SPSSX Discussion mailing list archive at > Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the > command > INFO REFCARD > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release > Date: 08/23/09 > 06:18:00 > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the > command > INFO REFCARD > Checked by AVG - www.avg.com Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/23/09 18:03:00 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD No virus found in this incoming message. Checked by AVG - www.avg.com Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/24/09 06:05:00 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Done.
Hector -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Swank, Paul R Sent: 24 August 2009 18:17 To: [hidden email] Subject: Re: Shift, Twist, Butterfly Well, I insist to the contrary so we will have to agree to disagree. Dr. Paul R. Swank, Professor and Director of Research Children's Learning Institute University of Texas Health Science Center-Houston -----Original Message----- From: Hector Maletta [mailto:[hidden email]] Sent: Monday, August 24, 2009 6:03 PM To: Swank, Paul R; [hidden email] Subject: RE: Shift, Twist, Butterfly I insist: PCA and other types of Factor Analysis rest on different assumptions and may be used for different purposes, but all are essentially the same statistical model and procedure. One assumes a "commonality" of 1, the others a commonality < 1, i.e. some unique variance, but except for that the rest is exactly the same. You can rotate PCA solutions just as you can rotate other factor analyses: nothing hinders that. As with all statistical procedures, you choose the variant that best suits your dataset and theory. Moreover, the objective is not necessarily data reduction, in the sense of getting a SMALLER number of (latent) variables instead of your original set of (observed) variables. Your goal might be to replace your k observed variables with k latent factors, with the goal for instance of getting the information of your k (correlated) variables separated into (k) orthogonal or uncorrelated factors. One thing is the use to which one habitually puts these procedures, and another thing altogether is the nature of the procedures. Since all kinds of factor analysis create constructs that do not really exist, what set of constructs you'd prefer would depend on the kind of analytical problem you are facing, and the sort of theory you have about the problem. Hector -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Swank, Paul R Sent: 24 August 2009 13:45 To: [hidden email] Subject: Re: Shift, Twist, Butterfly PC analysis has a fundamentally different purpose that does PA. PC determines components of the measures that account for the maximum amount of variance and are independent of one another. The purpose for such components is data reduction. There is no rotation needed nor is there any concern for what the underlying structure of the measures are. You just want a smaller number of predictors that are independent of each other. PA, on the other hand, is used to try and fathom the underlying structure of the variables. Its purpose is different. You are searching for latent variables with PA analyses. You are trying to partition the variance of the variables into the three parts, common factor variance, specific factor variance, and error variance. PCA does not do this. PA is much closer to CFA that PCA is. I'd look to the writings of Preacher and MacCallum on Tom Swift and his Electric factor analysis machine for more detail. Dr. Paul R. Swank, Professor and Director of Research Children's Learning Institute University of Texas Health Science Center-Houston -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Hector Maletta Sent: Sunday, August 23, 2009 8:04 PM To: [hidden email] Subject: Re: Shift, Twist, Butterfly Scott, Sorry but I insist in my view. Factor analysis is a general technique, with many variants. One of the variants concerns the assumptions on commonality: in PCA the whole variance is subjected to analysis, and therefore the sum of factors' contributions explains 100% of the variance of all observed variables, while other variants start with an estimate of the amount of "common" variance and separate it from the rest of variance, usually supposed to be due to idiosyncratic characteristics of each observed variable, unrelated to other observed variables. In this case, the technique computes factors explaining the common variance only, leaving the idiosyncratic part unexplained (attributed to peculiarities of each observed variable). This is hardly such a transcendental difference. From the mathematical or computational point of view there is really no big deal of a difference: the same mathematical procedure applies in both cases, the only difference being in the values at the main diagonal of the intercorrelation matrix: 1's in PCA, and the assumed commonalities in the other alternatives. The purpose of the analysis, and the nature of the variables, would dictate which alternative is more adequate for each analysis, but that hardly impinges on the problem posed by Jimjohn: I limited myself to his case, in which he is using PCA. I do not agree either that PCA is not usable for locating latent factors. That is precisely what factor analysis do, either under PCA hypotheses or otherwise. It estimates one or more fictitious variables ('constructs') that in case they existed would 'explain' (statistically speaking) the observed correlation of observed variables. In other words, controlling for the components or factors would reduce or eliminate the correlation between observed variables. These factors or components are not objective things: they are just mathematical constructs, and their values (and even their correlations with observed variables, or 'loadings') would be altered by a change in the frame of reference (i.e. by rotation). Taking just the major components (in PCA) and disregarding the rest achieves precisely the kind of data reduction you are talking about. That is precisely what Spearman did at the beginning of the 20th Century to identify "general intelligence", in fact just the first factor extracted in a PCA procedure applied to a number of inter-correlated cognitive-ability tests. A similar procedure was used, years later, by Thurstone to extract several factors (corresponding to various different cognitive abilities such as linguistic, quantitative, spatial, etc), for which Thurstone added rotation to the repertoire of factor analysis in order to define the factors in such a way that each was clearly more correlated with one different set of observed variables (this neat attribute was, of course, dependent on the particular frame of reference chosen, and was not a property of the factors themselves; the factors continued to be just mathematical artifacts, not real things). Other variants of FA (more exactly, EFA) were later introduced to deal with other specific (computational) problems. Eventually, confirmatory factor analysis emerged, in which only certain pre-specified effects and factors were included. Some of the effects are postulated to be zero, and the procedure is, at last, not very different from Structural Equation Modeling. EFA always "succeeds": every data set can be treated to EFA, and the results cannot be challenged. CFA, instead, may be refuted if it fails to reproduce the observed data structure. However different the intents and results, all these procedures are just variants of the same basic procedure, factor analysis, which is in turn just a corollary or application of the general linear model where ordinary least square regression, ANOVA and other statistical workhorses also belong. Their common feature is that all variables are supposed to be linearly related with an error term, so that for any set of variables (Y, X, Z, ...) it is postulated that Y=a + bX + cZ...+e, where "e" is a random error with zero mean, and the coefficients (a, b, c...) are computed by minimizing the sum of the squared errors (e2) over all cases. -----Original Message----- From: SR Millis [mailto:[hidden email]] Sent: 23 August 2009 18:55 To: Hector Maletta; SPSS Subject: Re: Shift, Twist, Butterfly Hector, I don't agree with you. Exploratory factor analysis (EPA) is not the same "thing" as principal component analysis (PCA). PCA is not a type of EFA. PCA and EFA are different statistical methods that are designed to achieve different objectives (Bentler & Kano, 1990). EFA is based on the common factor model that postulates that each item in a battery of measured items is a linear function of one or more common factors and one unique factor. Common factors are latent (unobserved) factors. Conversely, PCA doesn't differentiate between common and unique variance. Hence, principal components are not latent variables---and it isn't conceptually correct to equate them with common factors. The goal of PCA is data reduction, i.e., taking scores on a large set of measured items and reducing the scores on a smaller set of composite variables. In contrast, the goal of EFA is to identify latent constructs, i.e., understanding the structure of the correlations among the measured variables or items. Some have argued that PCA and EFA produce similar results. However, this is not always the case. Differences emerge are likely when communalities are low (e.g., .40). Scott R Millis, PhD, ABPP (CN,CL,RP), CStat, CSci Professor & Director of Research Dept of Physical Medicine & Rehabilitation Dept of Emergency Medicine Wayne State University School of Medicine 261 Mack Blvd Detroit, MI 48201 Email: [hidden email] Tel: 313-993-8085 Fax: 313-966-7682 --- On Sun, 8/23/09, Hector Maletta <[hidden email]> wrote: > From: Hector Maletta <[hidden email]> > Subject: Re: Shift, Twist, Butterfly > To: [hidden email] > Date: Sunday, August 23, 2009, 7:27 PM > Jimjohn, > I do not know who told you such nonsense, but IMHO it's all > wrong. PCA > identifies underlying factors that "explain" the > correlations between your > observed variables. In other words, if you were to measure > those underlying > (actually non measurable) factors or components, the > correlation between > your observed variables, controlling for the underlying > factors, would be > zero or near zero. There is no direct link between each > factor and one > specific variable (such as the one you sketch) mandating > the multiplication > of variable 1 times component 1 + variable 2 * component 2, > etc. Each > component is linked to all variables (though it may be > correlated more > strongly with sone of them), and each variable is linked to > all underlying > components. > If all your observed variables are closely correlated (e.g. > if you have > several independent indexes of the same psychological > trait, as in several > intelligence tests), the first component represents the > most important > common component that could be constructed to explain the > greater part of > the intercorrelations between observed variables. > Controlling for that first > component maximises the amount of variance explained by a > single (non > observed) variable. It leaves usually a portion of > unexplained variance in > observed variables. The second component is the best you > can do to explain > the remainded variance in observed variables, and the third > and further > components, in turn, explain the remaining variance not > explained by > previous factors. In the traditional interpretation going > back to the early > 20th Century study of cognitive ability by Spearmann, the > first factors > represented "general intelligence", and the other > components may represent > other independent factors affecting test scores, such as a > general > familiarity with test-taking situations, socioeconomic > status, specific > abilities linked with some specific tests, and what not. > PCA is but one of various factor analysis techniques. I > recommend you first > read some elementary text on factor analysis, to get more > acquainted with > the nature, scope and limitations of PCA. > > Hector > > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] > On Behalf Of > jimjohn > Sent: 23 August 2009 18:03 > To: [hidden email] > Subject: PCA: Shift, Twist, Butterfly > > im just starting to understand how to conduct PCA, and am > having trouble > interpreting the results. I've been told that the first > component represents > a parallel shift, the second represents a twist, and the > third represents a > butterfly of my original data. > > from what i understand, if i have three components, then to > come up with my > new first factor I take the different components from the > component matrix > and do: > Component1 * Value Original Variable 1 + Component 2 * > Value Original > Variable 2 + ... > > how does this translate into a parallel shift of my > original data? any > ideas? thanks so much in advance! > -- > View this message in context: > > .html > Sent from the SPSSX Discussion mailing list archive at > Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the > command > INFO REFCARD > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release > Date: 08/23/09 > 06:18:00 > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the > command > INFO REFCARD > Checked by AVG - www.avg.com Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/23/09 18:03:00 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD No virus found in this incoming message. Checked by AVG - www.avg.com Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/24/09 06:05:00 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD No virus found in this incoming message. Checked by AVG - www.avg.com Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/24/09 12:55:00 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
James, I agree with you. However, Paul seems not
to agree that we agree. So be it. Hector From: James C. Whanger
I think you should instead "agree to agree". Hector
described what PCA and FA have "in common" including similar
mathematical underpinnings and how the two statistical procedures relate in a
conceptual hierarchy. Paul and Scott described how PCA and FA "are
different" by pointing out that additional mathematical components are
obtained from FA and each supports different logical inferences based on
results and thus serve different purposes. Both of these arguments are
true and do not contradict each other. On Mon, Aug 24, 2009 at 7:29 PM, Hector Maletta <[hidden email]> wrote: Done. Hector Sent: 24 August 2009 18:17 12:55:00
No
virus found in this incoming message. |
|
In reply to this post by Hector Maletta
Hector,
I agree that we must disagree: I'm with Paul on this issue. Scott R Millis, PhD, ABPP (CN,CL,RP), CStat, CSci Professor & Director of Research Dept of Physical Medicine & Rehabilitation Dept of Emergency Medicine Wayne State University School of Medicine 261 Mack Blvd Detroit, MI 48201 Email: [hidden email] Tel: 313-993-8085 Fax: 313-966-7682 --- On Mon, 8/24/09, Hector Maletta <[hidden email]> wrote: > From: Hector Maletta <[hidden email]> > Subject: Re: Shift, Twist, Butterfly > To: [hidden email] > Date: Monday, August 24, 2009, 7:29 PM > Done. > Hector > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] > On Behalf Of > Swank, Paul R > Sent: 24 August 2009 18:17 > To: [hidden email] > Subject: Re: Shift, Twist, Butterfly > > Well, I insist to the contrary so we will have to agree to > disagree. > > Dr. Paul R. Swank, > Professor and Director of Research > Children's Learning Institute > University of Texas Health Science Center-Houston > > -----Original Message----- > From: Hector Maletta [mailto:[hidden email]] > Sent: Monday, August 24, 2009 6:03 PM > To: Swank, Paul R; [hidden email] > Subject: RE: Shift, Twist, Butterfly > > I insist: PCA and other types of Factor Analysis rest on > different > assumptions and may be used for different purposes, but all > are essentially > the same statistical model and procedure. One assumes a > "commonality" of 1, > the others a commonality < 1, i.e. some unique variance, > but except for that > the rest is exactly the same. You can rotate PCA solutions > just as you can > rotate other factor analyses: nothing hinders that. As with > all statistical > procedures, you choose the variant that best suits your > dataset and theory. > > Moreover, the objective is not necessarily data reduction, > in the sense of > getting a SMALLER number of (latent) variables instead of > your original set > of (observed) variables. Your goal might be to replace your > k observed > variables with k latent factors, with the goal for instance > of getting the > information of your k (correlated) variables separated into > (k) orthogonal > or uncorrelated factors. > > One thing is the use to which one habitually puts these > procedures, and > another thing altogether is the nature of the procedures. > Since all kinds of factor analysis create constructs that > do not really > exist, what set of constructs you'd prefer would depend on > the kind of > analytical problem you are facing, and the sort of theory > you have about the > problem. > Hector > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] > On Behalf Of > Swank, Paul R > Sent: 24 August 2009 13:45 > To: [hidden email] > Subject: Re: Shift, Twist, Butterfly > > PC analysis has a fundamentally different purpose that does > PA. PC > determines components of the measures that account for the > maximum amount of > variance and are independent of one another. The purpose > for such components > is data reduction. There is no rotation needed nor is there > any concern for > what the underlying structure of the measures are. You just > want a smaller > number of predictors that are independent of each other. > PA, on the other > hand, is used to try and fathom the underlying structure of > the variables. > Its purpose is different. You are searching for latent > variables with PA > analyses. You are trying to partition the variance of the > variables into the > three parts, common factor variance, specific factor > variance, and error > variance. PCA does not do this. PA is much closer to CFA > that PCA is. I'd > look to the writings of Preacher and MacCallum on Tom Swift > and his Electric > factor analysis machine for more detail. > > Dr. Paul R. Swank, > Professor and Director of Research > Children's Learning Institute > University of Texas Health Science Center-Houston > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] > On Behalf Of > Hector Maletta > Sent: Sunday, August 23, 2009 8:04 PM > To: [hidden email] > Subject: Re: Shift, Twist, Butterfly > > Scott, > Sorry but I insist in my view. Factor analysis is a general > technique, with > many variants. One of the variants concerns the assumptions > on commonality: > in PCA the whole variance is subjected to analysis, and > therefore the sum of > factors' contributions explains 100% of the variance of all > observed > variables, while other variants start with an estimate of > the amount of > "common" variance and separate it from the rest of > variance, usually > supposed to be due to idiosyncratic characteristics of each > observed > variable, unrelated to other observed variables. In this > case, the technique > computes factors explaining the common variance only, > leaving the > idiosyncratic part unexplained (attributed to peculiarities > of each observed > variable). This is hardly such a transcendental difference. > From the > mathematical or computational point of view there is really > no big deal of a > difference: the same mathematical procedure applies in both > cases, the only > difference being in the values at the main diagonal of the > intercorrelation > matrix: 1's in PCA, and the assumed commonalities in the > other alternatives. > > The purpose of the analysis, and the nature of the > variables, would dictate > which alternative is more adequate for each analysis, but > that hardly > impinges on the problem posed by Jimjohn: I limited myself > to his case, in > which he is using PCA. > I do not agree either that PCA is not usable for locating > latent factors. > That is precisely what factor analysis do, either under PCA > hypotheses or > otherwise. It estimates one or more fictitious variables > ('constructs') that > in case they existed would 'explain' (statistically > speaking) the observed > correlation of observed variables. In other words, > controlling for the > components or factors would reduce or eliminate the > correlation between > observed variables. These factors or components are not > objective things: > they are just mathematical constructs, and their values > (and even their > correlations with observed variables, or 'loadings') would > be altered by a > change in the frame of reference (i.e. by rotation). Taking > just the major > components (in PCA) and disregarding the rest achieves > precisely the kind of > data reduction you are talking about. That is precisely > what Spearman did at > the beginning of the 20th Century to identify "general > intelligence", in > fact just the first factor extracted in a PCA procedure > applied to a number > of inter-correlated cognitive-ability tests. A similar > procedure was used, > years later, by Thurstone to extract several factors > (corresponding to > various different cognitive abilities such as linguistic, > quantitative, > spatial, etc), for which Thurstone added rotation to the > repertoire of > factor analysis in order to define the factors in such a > way that each was > clearly more correlated with one different set of observed > variables (this > neat attribute was, of course, dependent on the particular > frame of > reference chosen, and was not a property of the factors > themselves; the > factors continued to be just mathematical artifacts, not > real things). Other > variants of FA (more exactly, EFA) were later introduced to > deal with other > specific (computational) problems. > Eventually, confirmatory factor analysis emerged, in which > only certain > pre-specified effects and factors were included. Some of > the effects are > postulated to be zero, and the procedure is, at last, not > very different > from Structural Equation Modeling. EFA always "succeeds": > every data set can > be treated to EFA, and the results cannot be challenged. > CFA, instead, may > be refuted if it fails to reproduce the observed data > structure. > However different the intents and results, all these > procedures are just > variants of the same basic procedure, factor analysis, > which is in turn just > a corollary or application of the general linear model > where ordinary least > square regression, ANOVA and other statistical workhorses > also belong. Their > common feature is that all variables are supposed to be > linearly related > with an error term, so that for any set of variables (Y, X, > Z, ...) it is > postulated that Y=a + bX + cZ...+e, where "e" is a random > error with zero > mean, and the coefficients (a, b, c...) are computed by > minimizing the sum > of the squared errors (e2) over all cases. > > > > -----Original Message----- > From: SR Millis [mailto:[hidden email]] > Sent: 23 August 2009 18:55 > To: Hector Maletta; SPSS > Subject: Re: Shift, Twist, Butterfly > > Hector, > > I don't agree with you. Exploratory factor analysis > (EPA) is not the same > "thing" as principal component analysis (PCA). PCA is > not a type of EFA. > PCA and EFA are different statistical methods that are > designed to achieve > different objectives (Bentler & Kano, 1990). > > EFA is based on the common factor model that postulates > that each item in a > battery of measured items is a linear function of one or > more common factors > and one unique factor. Common factors are latent > (unobserved) factors. > Conversely, PCA doesn't differentiate between common and > unique variance. > Hence, principal components are not latent variables---and > it isn't > conceptually correct to equate them with common > factors. The goal of PCA is > data reduction, i.e., taking scores on a large set of > measured items and > reducing the scores on a smaller set of composite > variables. In contrast, > the goal of EFA is to identify latent constructs, i.e., > understanding the > structure of the correlations among the measured variables > or items. > > Some have argued that PCA and EFA produce similar > results. However, this is > not always the case. Differences emerge are likely > when communalities are > low (e.g., .40). > > > Scott R Millis, PhD, ABPP (CN,CL,RP), CStat, CSci > Professor & Director of Research > Dept of Physical Medicine & Rehabilitation > Dept of Emergency Medicine > Wayne State University School of Medicine > 261 Mack Blvd > Detroit, MI 48201 > Email: [hidden email] > Tel: 313-993-8085 > Fax: 313-966-7682 > > > --- On Sun, 8/23/09, Hector Maletta <[hidden email]> > wrote: > > > From: Hector Maletta <[hidden email]> > > Subject: Re: Shift, Twist, Butterfly > > To: [hidden email] > > Date: Sunday, August 23, 2009, 7:27 PM > > Jimjohn, > > I do not know who told you such nonsense, but IMHO > it's all > > wrong. PCA > > identifies underlying factors that "explain" the > > correlations between your > > observed variables. In other words, if you were to > measure > > those underlying > > (actually non measurable) factors or components, the > > correlation between > > your observed variables, controlling for the > underlying > > factors, would be > > zero or near zero. There is no direct link between > each > > factor and one > > specific variable (such as the one you sketch) > mandating > > the multiplication > > of variable 1 times component 1 + variable 2 * > component 2, > > etc. Each > > component is linked to all variables (though it may > be > > correlated more > > strongly with sone of them), and each variable is > linked to > > all underlying > > components. > > If all your observed variables are closely correlated > (e.g. > > if you have > > several independent indexes of the same psychological > > trait, as in several > > intelligence tests), the first component represents > the > > most important > > common component that could be constructed to explain > the > > greater part of > > the intercorrelations between observed variables. > > Controlling for that first > > component maximises the amount of variance explained > by a > > single (non > > observed) variable. It leaves usually a portion of > > unexplained variance in > > observed variables. The second component is the best > you > > can do to explain > > the remainded variance in observed variables, and the > third > > and further > > components, in turn, explain the remaining variance > not > > explained by > > previous factors. In the traditional interpretation > going > > back to the early > > 20th Century study of cognitive ability by Spearmann, > the > > first factors > > represented "general intelligence", and the other > > components may represent > > other independent factors affecting test scores, such > as a > > general > > familiarity with test-taking situations, > socioeconomic > > status, specific > > abilities linked with some specific tests, and what > not. > > PCA is but one of various factor analysis techniques. > I > > recommend you first > > read some elementary text on factor analysis, to get > more > > acquainted with > > the nature, scope and limitations of PCA. > > > > Hector > > > > > > > > -----Original Message----- > > From: SPSSX(r) Discussion [mailto:[hidden email]] > > On Behalf Of > > jimjohn > > Sent: 23 August 2009 18:03 > > To: [hidden email] > > Subject: PCA: Shift, Twist, Butterfly > > > > im just starting to understand how to conduct PCA, and > am > > having trouble > > interpreting the results. I've been told that the > first > > component represents > > a parallel shift, the second represents a twist, and > the > > third represents a > > butterfly of my original data. > > > > from what i understand, if i have three components, > then to > > come up with my > > new first factor I take the different components from > the > > component matrix > > and do: > > Component1 * Value Original Variable 1 + Component 2 > * > > Value Original > > Variable 2 + ... > > > > how does this translate into a parallel shift of my > > original data? any > > ideas? thanks so much in advance! > > -- > > View this message in context: > > > http://www.nabble.com/PCA%3A-Shift%2C-Twist%2C-Butterfly-tp25108336p25108336 > > .html > > Sent from the SPSSX Discussion mailing list archive > at > > Nabble.com. > > > > ===================== > > To manage your subscription to SPSSX-L, send a message > to > > [hidden email] > > (not to SPSSX-L), with no body text except the > > command. To leave the list, send the command > > SIGNOFF SPSSX-L > > For a list of commands to manage subscriptions, send > the > > command > > INFO REFCARD > > No virus found in this incoming message. > > Checked by AVG - www.avg.com > > Version: 8.5.409 / Virus Database: 270.13.64/2321 - > Release > > Date: 08/23/09 > > 06:18:00 > > > > ===================== > > To manage your subscription to SPSSX-L, send a message > to > > [hidden email] > > (not to SPSSX-L), with no body text except the > > command. To leave the list, send the command > > SIGNOFF SPSSX-L > > For a list of commands to manage subscriptions, send > the > > command > > INFO REFCARD > > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release > Date: 08/23/09 > 18:03:00 > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the > command > INFO REFCARD > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the > command > INFO REFCARD > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release > Date: 08/24/09 > 06:05:00 > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the > command > INFO REFCARD > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release > Date: 08/24/09 > 12:55:00 > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the > command > INFO REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Hector Maletta
Thanks again Hector! I have a problem with SPSS PCA.
From waht I understand, I can relate the variables to the factors by using the component matrix but I don't get anything close to the actual variables when I solve for this equation using the component matrix and the factors SPSS saves. I'm thinking this is probably because SPSS standardizes the factors and I should be looking at the unstandardized factors. Can anyone confirm if this is so? If so, is there any way I can get SPSS to outpout the unstandardized factors instead? Thanks in advance! Quoting Hector Maletta <[hidden email]>: > Jimjohn, > I do not know who told you such nonsense, but IMHO it's all wrong. PCA > identifies underlying factors that "explain" the correlations between your > observed variables. In other words, if you were to measure those underlying > (actually non measurable) factors or components, the correlation between > your observed variables, controlling for the underlying factors, would be > zero or near zero. There is no direct link between each factor and one > specific variable (such as the one you sketch) mandating the multiplication > of variable 1 times component 1 + variable 2 * component 2, etc. Each > component is linked to all variables (though it may be correlated more > strongly with sone of them), and each variable is linked to all underlying > components. > If all your observed variables are closely correlated (e.g. if you have > several independent indexes of the same psychological trait, as in several > intelligence tests), the first component represents the most important > common component that could be constructed to explain the greater part of > the intercorrelations between observed variables. Controlling for that first > component maximises the amount of variance explained by a single (non > observed) variable. It leaves usually a portion of unexplained variance in > observed variables. The second component is the best you can do to explain > the remainded variance in observed variables, and the third and further > components, in turn, explain the remaining variance not explained by > previous factors. In the traditional interpretation going back to the early > 20th Century study of cognitive ability by Spearmann, the first factors > represented "general intelligence", and the other components may represent > other independent factors affecting test scores, such as a general > familiarity with test-taking situations, socioeconomic status, specific > abilities linked with some specific tests, and what not. > PCA is but one of various factor analysis techniques. I recommend you first > read some elementary text on factor analysis, to get more acquainted with > the nature, scope and limitations of PCA. > > Hector > > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of > jimjohn > Sent: 23 August 2009 18:03 > To: [hidden email] > Subject: PCA: Shift, Twist, Butterfly > > im just starting to understand how to conduct PCA, and am having trouble > interpreting the results. I've been told that the first component represents > a parallel shift, the second represents a twist, and the third represents a > butterfly of my original data. > > from what i understand, if i have three components, then to come up with my > new first factor I take the different components from the component matrix > and do: > Component1 * Value Original Variable 1 + Component 2 * Value Original > Variable 2 + ... > > how does this translate into a parallel shift of my original data? any > ideas? thanks so much in advance! > -- > View this message in context: > http://www.nabble.com/PCA%3A-Shift%2C-Twist%2C-Butterfly-tp25108336p25108336 > .html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/23/09 > 06:18:00 > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Azam,
The factor scores are standardized (z scores with zero mean and unit standard deviation), and their are expressed as a function of STANDARDIZED observed variables, i.e. the observed variables are converted into z-scores and then multiplied by the component score coefficients in order to compute component scores. The same is valid in the reverse case: the (standardized) factor scores may be used to estimate the predicted (STANDARDIZED) value of an observed variable. To convert these standardized values of observed variables into the original observed variables, add the mean and multiply by the standard deviation of the original observed variables. There is no such thing as an unstandardized factor score or component score, because such (unobserved) variables do not have an objective unit of measurement. Their measurement is just the z-score, i.e. their distance from their own mean, measured in units of their own standard deviation. Hector -----Original Message----- From: [hidden email] [mailto:[hidden email]] Sent: 26 August 2009 10:05 To: Hector Maletta Cc: [hidden email] Subject: RE: Shift, Twist, Butterfly Thanks again Hector! I have a problem with SPSS PCA. From waht I understand, I can relate the variables to the factors by using the component matrix but I don't get anything close to the actual variables when I solve for this equation using the component matrix and the factors SPSS saves. I'm thinking this is probably because SPSS standardizes the factors and I should be looking at the unstandardized factors. Can anyone confirm if this is so? If so, is there any way I can get SPSS to outpout the unstandardized factors instead? Thanks in advance! Quoting Hector Maletta <[hidden email]>: > Jimjohn, > I do not know who told you such nonsense, but IMHO it's all wrong. PCA > identifies underlying factors that "explain" the correlations between your > observed variables. In other words, if you were to measure those underlying > (actually non measurable) factors or components, the correlation between > your observed variables, controlling for the underlying factors, would be > zero or near zero. There is no direct link between each factor and one > specific variable (such as the one you sketch) mandating the multiplication > of variable 1 times component 1 + variable 2 * component 2, etc. Each > component is linked to all variables (though it may be correlated more > strongly with sone of them), and each variable is linked to all underlying > components. > If all your observed variables are closely correlated (e.g. if you have > several independent indexes of the same psychological trait, as in several > intelligence tests), the first component represents the most important > common component that could be constructed to explain the greater part of > the intercorrelations between observed variables. Controlling for that first > component maximises the amount of variance explained by a single (non > observed) variable. It leaves usually a portion of unexplained variance in > observed variables. The second component is the best you can do to explain > the remainded variance in observed variables, and the third and further > components, in turn, explain the remaining variance not explained by > previous factors. In the traditional interpretation going back to the early > 20th Century study of cognitive ability by Spearmann, the first factors > represented "general intelligence", and the other components may represent > other independent factors affecting test scores, such as a general > familiarity with test-taking situations, socioeconomic status, specific > abilities linked with some specific tests, and what not. > PCA is but one of various factor analysis techniques. I recommend you first > read some elementary text on factor analysis, to get more acquainted with > the nature, scope and limitations of PCA. > > Hector > > > > -----Original Message----- > From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of > jimjohn > Sent: 23 August 2009 18:03 > To: [hidden email] > Subject: PCA: Shift, Twist, Butterfly > > im just starting to understand how to conduct PCA, and am having trouble > interpreting the results. I've been told that the first component > a parallel shift, the second represents a twist, and the third represents a > butterfly of my original data. > > from what i understand, if i have three components, then to come up with my > new first factor I take the different components from the component matrix > and do: > Component1 * Value Original Variable 1 + Component 2 * Value Original > Variable 2 + ... > > how does this translate into a parallel shift of my original data? any > ideas? thanks so much in advance! > -- > View this message in context: > > .html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/23/09 > 06:18:00 > > No virus found in this incoming message. Checked by AVG - www.avg.com Version: 8.5.409 / Virus Database: 270.13.67/2326 - Release Date: 08/25/09 18:07:00 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
thanks again hector, really appreciate your help! can i bother you
with one more question? basically i have thirteen variables which each represent the weekly change in interest rates for different maturities. (so one variable represents the weekly change in the one month rate, another variable represents the weekly change in the 2 year rate, etc.) im trying to use PCA to shorten the amount of correlated variables. usually, what is expected when PCA is run on these interest rates (or so im told) is that the first three factors represent 99% of the variability in these changes. the first factor usually represents a parallel shift. so if i plot each of my original itnerest rate variables, it looks upward sloping because the longer the maturity, the higher the interest rate is. the changes in the shape of this plot over time will be explained by these three factors. Factor 1 should be a parallel shift, Factor 2 should be a twist where the shorter maturity rates are higher and the longer maturity rates are lower. basically now taht ive run PCA on the change in rates for each variable, i want to somehow convert my factors back into the original variables and compare them with how my original variables are to see if these parallel and twists actually happen. and that way i wuold know how much the parallel shift is too, so i could explain that on average, the rates shift parallel by x% and this explains 75% of the variability. if i just look at my component matrix, and i plot the components vs the variables, i do see that component 1 looks parallel and component 2 looks like a twist. the problem is i dont know how to convert the data back so that i can compare it with my actual original interest rate variables and see that factor 1 results in parallel changes to the variables, etc. at first i was thinking i could use the formula Variable1 = Component 1 * Factor 1 + Component 2 * Factor 2 +..., and I could just isolate Component 1 * Factor 1 and look at the mean of that distribution, compare it with the mean of my original variable and that would tell me how much the parallel shift. but since my results are always standardized, this mean of the distribution would always be 0. hope this makes sense, i know this is kind of all over the place. any ideas? thanks so much! Quoting Hector Maletta <[hidden email]>: > Azam, > The factor scores are standardized (z scores with zero mean and unit > standard deviation), and their are expressed as a function of STANDARDIZED > observed variables, i.e. the observed variables are converted into z-scores > and then multiplied by the component score coefficients in order to compute > component scores. > The same is valid in the reverse case: the (standardized) factor scores may > be used to estimate the predicted (STANDARDIZED) value of an observed > variable. To convert these standardized values of observed variables into > the original observed variables, add the mean and multiply by the standard > deviation of the original observed variables. > There is no such thing as an unstandardized factor score or component score, > because such (unobserved) variables do not have an objective unit of > measurement. Their measurement is just the z-score, i.e. their distance from > their own mean, measured in units of their own standard deviation. > Hector > -----Original Message----- > From: [hidden email] [mailto:[hidden email]] > Sent: 26 August 2009 10:05 > To: Hector Maletta > Cc: [hidden email] > Subject: RE: Shift, Twist, Butterfly > > Thanks again Hector! I have a problem with SPSS PCA. > From waht I understand, I can relate the variables to the factors by > using the component matrix but I don't get anything close to the > actual variables when I solve for this equation using the component > matrix and the factors SPSS saves. > > I'm thinking this is probably because SPSS standardizes the factors > and I should be looking at the unstandardized factors. Can anyone > confirm if this is so? If so, is there any way I can get SPSS to > outpout the unstandardized factors instead? > > Thanks in advance! > > > > Quoting Hector Maletta <[hidden email]>: > >> Jimjohn, >> I do not know who told you such nonsense, but IMHO it's all wrong. PCA >> identifies underlying factors that "explain" the correlations between your >> observed variables. In other words, if you were to measure those > underlying >> (actually non measurable) factors or components, the correlation between >> your observed variables, controlling for the underlying factors, would be >> zero or near zero. There is no direct link between each factor and one >> specific variable (such as the one you sketch) mandating the > multiplication >> of variable 1 times component 1 + variable 2 * component 2, etc. Each >> component is linked to all variables (though it may be correlated more >> strongly with sone of them), and each variable is linked to all underlying >> components. >> If all your observed variables are closely correlated (e.g. if you have >> several independent indexes of the same psychological trait, as in several >> intelligence tests), the first component represents the most important >> common component that could be constructed to explain the greater part of >> the intercorrelations between observed variables. Controlling for that > first >> component maximises the amount of variance explained by a single (non >> observed) variable. It leaves usually a portion of unexplained variance in >> observed variables. The second component is the best you can do to explain >> the remainded variance in observed variables, and the third and further >> components, in turn, explain the remaining variance not explained by >> previous factors. In the traditional interpretation going back to the > early >> 20th Century study of cognitive ability by Spearmann, the first factors >> represented "general intelligence", and the other components may represent >> other independent factors affecting test scores, such as a general >> familiarity with test-taking situations, socioeconomic status, specific >> abilities linked with some specific tests, and what not. >> PCA is but one of various factor analysis techniques. I recommend you > first >> read some elementary text on factor analysis, to get more acquainted with >> the nature, scope and limitations of PCA. >> >> Hector >> >> >> >> -----Original Message----- >> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of >> jimjohn >> Sent: 23 August 2009 18:03 >> To: [hidden email] >> Subject: PCA: Shift, Twist, Butterfly >> >> im just starting to understand how to conduct PCA, and am having trouble >> interpreting the results. I've been told that the first component > represents >> a parallel shift, the second represents a twist, and the third represents > a >> butterfly of my original data. >> >> from what i understand, if i have three components, then to come up with > my >> new first factor I take the different components from the component matrix >> and do: >> Component1 * Value Original Variable 1 + Component 2 * Value Original >> Variable 2 + ... >> >> how does this translate into a parallel shift of my original data? any >> ideas? thanks so much in advance! >> -- >> View this message in context: >> > http://www.nabble.com/PCA%3A-Shift%2C-Twist%2C-Butterfly-tp25108336p25108336 >> .html >> Sent from the SPSSX Discussion mailing list archive at Nabble.com. >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> [hidden email] (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> No virus found in this incoming message. >> Checked by AVG - www.avg.com >> Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: 08/23/09 >> 06:18:00 >> >> > > > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.409 / Virus Database: 270.13.67/2326 - Release Date: 08/25/09 > 18:07:00 > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
It makes sense. You only must convert the standardized results of your
equation into non-standardized. The standardized results are z=(x - mean)/SD, and therefore the original variables are x= SD * z + mean. Use the mean and SD of the original variables to effect this conversion. Hector -----Original Message----- From: [hidden email] [mailto:[hidden email]] Sent: 26 August 2009 19:25 To: Hector Maletta Cc: [hidden email] Subject: RE: Shift, Twist, Butterfly thanks again hector, really appreciate your help! can i bother you with one more question? basically i have thirteen variables which each represent the weekly change in interest rates for different maturities. (so one variable represents the weekly change in the one month rate, another variable represents the weekly change in the 2 year rate, etc.) im trying to use PCA to shorten the amount of correlated variables. usually, what is expected when PCA is run on these interest rates (or so im told) is that the first three factors represent 99% of the variability in these changes. the first factor usually represents a parallel shift. so if i plot each of my original itnerest rate variables, it looks upward sloping because the longer the maturity, the higher the interest rate is. the changes in the shape of this plot over time will be explained by these three factors. Factor 1 should be a parallel shift, Factor 2 should be a twist where the shorter maturity rates are higher and the longer maturity rates are lower. basically now taht ive run PCA on the change in rates for each variable, i want to somehow convert my factors back into the original variables and compare them with how my original variables are to see if these parallel and twists actually happen. and that way i wuold know how much the parallel shift is too, so i could explain that on average, the rates shift parallel by x% and this explains 75% of the variability. if i just look at my component matrix, and i plot the components vs the variables, i do see that component 1 looks parallel and component 2 looks like a twist. the problem is i dont know how to convert the data back so that i can compare it with my actual original interest rate variables and see that factor 1 results in parallel changes to the variables, etc. at first i was thinking i could use the formula Variable1 = Component 1 * Factor 1 + Component 2 * Factor 2 +..., and I could just isolate Component 1 * Factor 1 and look at the mean of that distribution, compare it with the mean of my original variable and that would tell me how much the parallel shift. but since my results are always standardized, this mean of the distribution would always be 0. hope this makes sense, i know this is kind of all over the place. any ideas? thanks so much! Quoting Hector Maletta <[hidden email]>: > Azam, > The factor scores are standardized (z scores with zero mean and unit > standard deviation), and their are expressed as a function of STANDARDIZED > observed variables, i.e. the observed variables are converted into z-scores > and then multiplied by the component score coefficients in order to compute > component scores. > The same is valid in the reverse case: the (standardized) factor scores may > be used to estimate the predicted (STANDARDIZED) value of an observed > variable. To convert these standardized values of observed variables into > the original observed variables, add the mean and multiply by the standard > deviation of the original observed variables. > There is no such thing as an unstandardized factor score or component score, > because such (unobserved) variables do not have an objective unit of > measurement. Their measurement is just the z-score, i.e. their distance from > their own mean, measured in units of their own standard deviation. > Hector > -----Original Message----- > From: [hidden email] [mailto:[hidden email]] > Sent: 26 August 2009 10:05 > To: Hector Maletta > Cc: [hidden email] > Subject: RE: Shift, Twist, Butterfly > > Thanks again Hector! I have a problem with SPSS PCA. > From waht I understand, I can relate the variables to the factors by > using the component matrix but I don't get anything close to the > actual variables when I solve for this equation using the component > matrix and the factors SPSS saves. > > I'm thinking this is probably because SPSS standardizes the factors > and I should be looking at the unstandardized factors. Can anyone > confirm if this is so? If so, is there any way I can get SPSS to > outpout the unstandardized factors instead? > > Thanks in advance! > > > > Quoting Hector Maletta <[hidden email]>: > >> Jimjohn, >> I do not know who told you such nonsense, but IMHO it's all wrong. PCA >> identifies underlying factors that "explain" the correlations between >> observed variables. In other words, if you were to measure those > underlying >> (actually non measurable) factors or components, the correlation between >> your observed variables, controlling for the underlying factors, would be >> zero or near zero. There is no direct link between each factor and one >> specific variable (such as the one you sketch) mandating the > multiplication >> of variable 1 times component 1 + variable 2 * component 2, etc. Each >> component is linked to all variables (though it may be correlated more >> strongly with sone of them), and each variable is linked to all >> components. >> If all your observed variables are closely correlated (e.g. if you have >> several independent indexes of the same psychological trait, as in several >> intelligence tests), the first component represents the most important >> common component that could be constructed to explain the greater part of >> the intercorrelations between observed variables. Controlling for that > first >> component maximises the amount of variance explained by a single (non >> observed) variable. It leaves usually a portion of unexplained variance in >> observed variables. The second component is the best you can do to explain >> the remainded variance in observed variables, and the third and further >> components, in turn, explain the remaining variance not explained by >> previous factors. In the traditional interpretation going back to the > early >> 20th Century study of cognitive ability by Spearmann, the first factors >> represented "general intelligence", and the other components may represent >> other independent factors affecting test scores, such as a general >> familiarity with test-taking situations, socioeconomic status, specific >> abilities linked with some specific tests, and what not. >> PCA is but one of various factor analysis techniques. I recommend you > first >> read some elementary text on factor analysis, to get more acquainted with >> the nature, scope and limitations of PCA. >> >> Hector >> >> >> >> -----Original Message----- >> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of >> jimjohn >> Sent: 23 August 2009 18:03 >> To: [hidden email] >> Subject: PCA: Shift, Twist, Butterfly >> >> im just starting to understand how to conduct PCA, and am having trouble >> interpreting the results. I've been told that the first component > represents >> a parallel shift, the second represents a twist, and the third represents > a >> butterfly of my original data. >> >> from what i understand, if i have three components, then to come up with > my >> new first factor I take the different components from the component >> and do: >> Component1 * Value Original Variable 1 + Component 2 * Value Original >> Variable 2 + ... >> >> how does this translate into a parallel shift of my original data? any >> ideas? thanks so much in advance! >> -- >> View this message in context: >> > >> .html >> Sent from the SPSSX Discussion mailing list archive at Nabble.com. >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> [hidden email] (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> No virus found in this incoming message. >> Checked by AVG - www.avg.com >> Version: 8.5.409 / Virus Database: 270.13.64/2321 - Release Date: >> 06:18:00 >> >> > > > > No virus found in this incoming message. > Checked by AVG - www.avg.com > Version: 8.5.409 / Virus Database: 270.13.67/2326 - Release Date: 08/25/09 > 18:07:00 > > No virus found in this incoming message. Checked by AVG - www.avg.com Version: 8.5.409 / Virus Database: 270.13.67/2326 - Release Date: 08/26/09 12:16:00 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
