Re: Principal Component Analysis in Different Measurement Units

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Principal Component Analysis in Different Measurement Units

Hans Chen
My friend performed Principal Component Analysis using SPSS 13.0 and got different results using two different data. The first data set is the raw data: 85(proportion of urban population), $80354(GDP per capita); the second data set is the adjusted data: 0.85(the proportion of the urban population), $80.354 thousand (GDP per capita). Actually, the second data set is different from the first data set only in measurement units. For the first data, my friend got four main components, for the second data set, he got five components. Could you advise what might be the reason for the difference?
Thanks for your help.

Han Chen


Reply | Threaded
Open this post in threaded view
|

Re: Principal Component Analysis in Different Measurement Units

Maguin, Eugene

Hard to see that 85 is the proportion  of urban population. I wonder if the problem might be the large ratios between pairs of variables. For example 80354 is roughly 1000 times larger than 85. The ratio of variances will be roughly 10E6. Perhaps significant digits are being lost in the eigenvalue solution. I suggest that your friend adjust the data so that the scale of variables is approximately equal. 85, 80.354 or .85, .80354.

 

Gene Maguin

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Han Chen
Sent: Tuesday, May 20, 2014 1:44 PM
To: [hidden email]
Subject: Re: Principal Component Analysis in Different Measurement Units

 

My friend performed Principal Component Analysis using SPSS 13.0 and got different results using two different data. The first data set is the raw data: 85(proportion of urban population), $80354(GDP per capita); the second data set is the adjusted data: 0.85(the proportion of the urban population), $80.354 thousand (GDP per capita). Actually, the second data set is different from the first data set only in measurement units. For the first data, my friend got four main components, for the second data set, he got five components. Could you advise what might be the reason for the difference?

Thanks for your help.

 

Han Chen

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Principal Component Analysis in Different Measurement Units

Ian Martin-2
Maybe he used VCV matrix input instead of correlation matrix?

Ian

On May 20, 2014, at 2:08 PM, "Maguin, Eugene" <[hidden email]> wrote:

> Hard to see that 85 is the proportion  of urban population. I wonder if the problem might be the large ratios between pairs of variables. For example 80354 is roughly 1000 times larger than 85. The ratio of variances will be roughly 10E6. Perhaps significant digits are being lost in the eigenvalue solution. I suggest that your friend adjust the data so that the scale of variables is approximately equal. 85, 80.354 or .85, .80354.
>
> Gene Maguin
>
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Han Chen
> Sent: Tuesday, May 20, 2014 1:44 PM
> To: [hidden email]
> Subject: Re: Principal Component Analysis in Different Measurement Units
>
> My friend performed Principal Component Analysis using SPSS 13.0 and got different results using two different data. The first data set is the raw data: 85(proportion of urban population), $80354(GDP per capita); the second data set is the adjusted data: 0.85(the proportion of the urban population), $80.354 thousand (GDP per capita). Actually, the second data set is different from the first data set only in measurement units. For the first data, my friend got four main components, for the second data set, he got five components. Could you advise what might be the reason for the difference?
> Thanks for your help.
>
> Han Chen
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Principal Component Analysis in Different Measurement Units

Bruce Weaver
Administrator
We'd be able to tell if the OP had posted the two sets of syntax.  ;-)


Ian Martin-2 wrote
Maybe he used VCV matrix input instead of correlation matrix?

Ian

On May 20, 2014, at 2:08 PM, "Maguin, Eugene" <[hidden email]> wrote:

> Hard to see that 85 is the proportion  of urban population. I wonder if the problem might be the large ratios between pairs of variables. For example 80354 is roughly 1000 times larger than 85. The ratio of variances will be roughly 10E6. Perhaps significant digits are being lost in the eigenvalue solution. I suggest that your friend adjust the data so that the scale of variables is approximately equal. 85, 80.354 or .85, .80354.
>
> Gene Maguin
>
> From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Han Chen
> Sent: Tuesday, May 20, 2014 1:44 PM
> To: [hidden email]
> Subject: Re: Principal Component Analysis in Different Measurement Units
>
> My friend performed Principal Component Analysis using SPSS 13.0 and got different results using two different data. The first data set is the raw data: 85(proportion of urban population), $80354(GDP per capita); the second data set is the adjusted data: 0.85(the proportion of the urban population), $80.354 thousand (GDP per capita). Actually, the second data set is different from the first data set only in measurement units. For the first data, my friend got four main components, for the second data set, he got five components. Could you advise what might be the reason for the difference?
> Thanks for your help.
>
> Han Chen
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Principal Component Analysis in Different Measurement Units

Rich Ulrich
In reply to this post by Hans Chen
If he was analyzing correlations, I should hope that the algorithm for
computing r  is robust enough that there would be no excuse for the
correlations to come out different. So, the likely explanation is some
unnoted change in data.  Look at all the means and r's.

If he was analyzing variances - which must be considered a mistake -
there is no reason for loadings of the two scores to have much
resemblance across analyses.  And the "number of factors extracted"
is not meaningfully related to a cutoff of 1.0, if that was used. When
PCA is employed on correlations, the 1.0  represents the amount of
variance to be explained for each variable, and "less than one" says
that the factor is worth less than a single variable and thus might be
ignored for subsequent rotation... assuming you are working from a
theory about important latent factors.

--
Rich Ulrich




Date: Tue, 20 May 2014 12:44:22 -0500
From: [hidden email]
Subject: Re: Principal Component Analysis in Different Measurement Units
To: [hidden email]

My friend performed Principal Component Analysis using SPSS 13.0 and got different results using two different data. The first data set is the raw data: 85(proportion of urban population), $80354(GDP per capita); the second data set is the adjusted data: 0.85(the proportion of the urban population), $80.354 thousand (GDP per capita). Actually, the second data set is different from the first data set only in measurement units. For the first data, my friend got four main components, for the second data set, he got five components. Could you advise what might be the reason for the difference?
Thanks for your help.

Han Chen


Reply | Threaded
Open this post in threaded view
|

Re: Principal Component Analysis in Different Measurement Units

David Marso
Administrator
Changing the scale of a variable by a multiplicative constant will NOT change the CORRELATIONS!
Since a PCA is solely a function of the correlations  I would attribute your finding  to pilot error
(either in calculation or reporting)!
Please compare the Means, SDs, Ns and R matrix (recalculate means and SDs based on the different  'scaling').
If they don't match then you know what happened.

Rich Ulrich wrote
If he was analyzing correlations, I should hope that the algorithm for
computing r  is robust enough that there would be no excuse for the
correlations to come out different. So, the likely explanation is some
unnoted change in data.  Look at all the means and r's.

If he was analyzing variances - which must be considered a mistake -
there is no reason for loadings of the two scores to have much
resemblance across analyses.  And the "number of factors extracted"
is not meaningfully related to a cutoff of 1.0, if that was used. When
PCA is employed on correlations, the 1.0  represents the amount of
variance to be explained for each variable, and "less than one" says
that the factor is worth less than a single variable and thus might be
ignored for subsequent rotation... assuming you are working from a
theory about important latent factors.

--
Rich Ulrich



Date: Tue, 20 May 2014 12:44:22 -0500
From: [hidden email]
Subject: Re: Principal Component Analysis in Different Measurement Units
To: [hidden email]



My friend performed Principal
Component Analysis using SPSS 13.0 and got different results using two
different data. The first data set is the raw data: 85(proportion of urban
population), $80354(GDP per capita); the second data set is the adjusted data: 0.85(the
proportion of the urban population), $80.354 thousand (GDP per capita). Actually,
the second data set is different from the first data set only in measurement
units. For the first data, my friend got four main components, for the second data
set, he got five components. Could you advise what might be the reason for the difference?
Thanks for your help.

Han Chen
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"