SPSSX Discussion

Re: Principal Component Analysis in Different Measurement Units

Classic

List

Threaded

6 messages Options

Hans Chen

Re: Principal Component Analysis in Different Measurement Units

My friend performed Principal Component Analysis using SPSS 13.0 and got different results using two different data. The first data set is the raw data: 85(proportion of urban population), $80354(GDP per capita); the second data set is the adjusted data: 0.85(the proportion of the urban population), $80.354 thousand (GDP per capita). Actually, the second data set is different from the first data set only in measurement units. For the first data, my friend got four main components, for the second data set, he got five components. Could you advise what might be the reason for the difference?

Thanks for your help.

Han Chen

Maguin, Eugene

Re: Principal Component Analysis in Different Measurement Units

Hard to see that 85 is the proportion of urban population. I wonder if the problem might be the large ratios between pairs of variables. For example 80354 is roughly 1000 times larger than 85. The ratio of variances will be roughly 10E6. Perhaps significant digits are being lost in the eigenvalue solution. I suggest that your friend adjust the data so that the scale of variables is approximately equal. 85, 80.354 or .85, .80354.

Gene Maguin

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Han Chen
Sent: Tuesday, May 20, 2014 1:44 PM
To: [hidden email]
Subject: Re: Principal Component Analysis in Different Measurement Units

Thanks for your help.

Han Chen

Ian Martin-2

Re: Principal Component Analysis in Different Measurement Units

Maybe he used VCV matrix input instead of correlation matrix?

Ian

On May 20, 2014, at 2:08 PM, "Maguin, Eugene" <[hidden email]> wrote:

> Hard to see that 85 is the proportion of urban population. I wonder if the problem might be the large ratios between pairs of variables. For example 80354 is roughly 1000 times larger than 85. The ratio of variances will be roughly 10E6. Perhaps significant digits are being lost in the eigenvalue solution. I suggest that your friend adjust the data so that the scale of variables is approximately equal. 85, 80.354 or .85, .80354.
>
> Gene Maguin
>
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Han Chen
> Sent: Tuesday, May 20, 2014 1:44 PM
> To: [hidden email]
> Subject: Re: Principal Component Analysis in Different Measurement Units
>
> My friend performed Principal Component Analysis using SPSS 13.0 and got different results using two different data. The first data set is the raw data: 85(proportion of urban population), $80354(GDP per capita); the second data set is the adjusted data: 0.85(the proportion of the urban population), $80.354 thousand (GDP per capita). Actually, the second data set is different from the first data set only in measurement units. For the first data, my friend got four main components, for the second data set, he got five components. Could you advise what might be the reason for the difference?
> Thanks for your help.
>
> Han Chen
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Bruce Weaver

Re: Principal Component Analysis in Different Measurement Units

Administrator

We'd be able to tell if the OP had posted the two sets of syntax. ;-)

Ian Martin-2 wrote

Maybe he used VCV matrix input instead of correlation matrix?

Ian

On May 20, 2014, at 2:08 PM, "Maguin, Eugene" <[hidden email]> wrote:

> Hard to see that 85 is the proportion of urban population. I wonder if the problem might be the large ratios between pairs of variables. For example 80354 is roughly 1000 times larger than 85. The ratio of variances will be roughly 10E6. Perhaps significant digits are being lost in the eigenvalue solution. I suggest that your friend adjust the data so that the scale of variables is approximately equal. 85, 80.354 or .85, .80354.
>
> Gene Maguin
>
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Han Chen
> Sent: Tuesday, May 20, 2014 1:44 PM
> To: [hidden email]
> Subject: Re: Principal Component Analysis in Different Measurement Units
>
> My friend performed Principal Component Analysis using SPSS 13.0 and got different results using two different data. The first data set is the raw data: 85(proportion of urban population), $80354(GDP per capita); the second data set is the adjusted data: 0.85(the proportion of the urban population), $80.354 thousand (GDP per capita). Actually, the second data set is different from the first data set only in measurement units. For the first data, my friend got four main components, for the second data set, he got five components. Could you advise what might be the reason for the difference?
> Thanks for your help.
>
> Han Chen
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Rich Ulrich

Re: Principal Component Analysis in Different Measurement Units

In reply to this post by Hans Chen

If he was analyzing correlations, I should hope that the algorithm for
computing r is robust enough that there would be no excuse for the
correlations to come out different. So, the likely explanation is some
unnoted change in data. Look at all the means and r's.

If he was analyzing variances - which must be considered a mistake -
there is no reason for loadings of the two scores to have much
resemblance across analyses. And the "number of factors extracted"
is not meaningfully related to a cutoff of 1.0, if that was used. When
PCA is employed on correlations, the 1.0 represents the amount of
variance to be explained for each variable, and "less than one" says
that the factor is worth less than a single variable and thus might be
ignored for subsequent rotation... assuming you are working from a
theory about important latent factors.

--
Rich Ulrich

Date: Tue, 20 May 2014 12:44:22 -0500
From: [hidden email]
Subject: Re: Principal Component Analysis in Different Measurement Units
To: [hidden email]

Thanks for your help.

Han Chen

David Marso

Re: Principal Component Analysis in Different Measurement Units

Administrator

Changing the scale of a variable by a multiplicative constant will NOT change the CORRELATIONS!
Since a PCA is solely a function of the correlations I would attribute your finding to pilot error
(either in calculation or reporting)!
Please compare the Means, SDs, Ns and R matrix (recalculate means and SDs based on the different 'scaling').
If they don't match then you know what happened.

Rich Ulrich wrote

If he was analyzing correlations, I should hope that the algorithm for
computing r is robust enough that there would be no excuse for the
correlations to come out different. So, the likely explanation is some
unnoted change in data. Look at all the means and r's.

If he was analyzing variances - which must be considered a mistake -
there is no reason for loadings of the two scores to have much
resemblance across analyses. And the "number of factors extracted"
is not meaningfully related to a cutoff of 1.0, if that was used. When
PCA is employed on correlations, the 1.0 represents the amount of
variance to be explained for each variable, and "less than one" says
that the factor is worth less than a single variable and thus might be
ignored for subsequent rotation... assuming you are working from a
theory about important latent factors.

--
Rich Ulrich

Date: Tue, 20 May 2014 12:44:22 -0500
From: [hidden email]
Subject: Re: Principal Component Analysis in Different Measurement Units
To: [hidden email]

My friend performed Principal
Component Analysis using SPSS 13.0 and got different results using two
different data. The first data set is the raw data: 85(proportion of urban
population), $80354(GDP per capita); the second data set is the adjusted data: 0.85(the
proportion of the urban population), $80.354 thousand (GDP per capita). Actually,
the second data set is different from the first data set only in measurement
units. For the first data, my friend got four main components, for the second data
set, he got five components. Could you advise what might be the reason for the difference?
Thanks for your help.

Han Chen

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"