factor analysis using tetrachoric correlation matrix

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

factor analysis using tetrachoric correlation matrix

Ver=?ISO-8859-1?Q?=C3=B3nica?=
I have managed to get the tetrachoric correlation matrix.
I perform a factor analysis from this matrix to get the scores of the 17
investment funds, but the command FACTOR I use gives me an error.

I´m sending you the syntax to see if you can help me.

BEGIN PROGRAM R.
library (foreign)
library (polycor)
library (psych)
library (corpcor)
library (spss_18.0)
fondos20=read.spss("C:/Users/Veronica/Desktop/domingo/SPSS FACTORIAL
HEDONICOS/Factor20.sav",to.data.frame=TRUE)
fondos20
fondos20$D2=factor(fondos20[,1])
fondos20$D3=factor(fondos20[,2])
fondos20$D4=factor(fondos20[,3])
fondos20$D5=factor(fondos20[,4])
fondos20$D6=factor(fondos20[,5])
fondos20$D7=factor(fondos20[,6])
fondos20$D9=factor(fondos20[,7])
fondos20$D10=factor(fondos20[,8])
fondos20$D11=factor(fondos20[,9])
fondos20$D12=factor(fondos20[,10])
fondos20$D13=factor(fondos20[,11])
fondos20$D14=factor(fondos20[,12])
fondos20$D16=factor(fondos20[,13])
fondos20$D17=factor(fondos20[,14])
fondos20$D18=factor(fondos20[,15])
fondos20$D19=factor(fondos20[,16])
fondos20$D22=factor(fondos20[,17])
fondos20$D23=factor(fondos20[,18])
fondos20$D25=factor(fondos20[,19])
fondos20$D26=factor(fondos20[,20])
fondos20$D27=factor(fondos20[,21])
fondos20$D28=factor(fondos20[,22])
fondos20$D29=factor(fondos20[,23])
fondos20$D32=factor(fondos20[,24])
matriztetra=hetcor(fondos20,ML=FALSE,std.err=FALSE,bins=4,pd=TRUE)
print (matriztetra)
END PROGRAM.
BEGIN PROGRAM R.
fondos20=spssdata.GetDataFromSPSS(variables="D2 D3 D4 D5 D6 D7 D9 D10 D11
D12 D13 D14 D16 D17 D18 D19 D22 D23 D25 D26 D27 D28 D29 D32")
print (fondos20)
END PROGRAM.
BEGIN PROGRAM R.
fondos20=spssdata.GetDataFromSPSS(variables="D2 D3 D4 D5 D6 D7 D9 D10 D11
D12 D13 D14 D16 D17 D18 D19 D22 D23 D25 D26 D27 D28 D29 D32",
factorMode="levels")
END PROGRAM.
BEGIN PROGRAM R.
tetraco=hetcor(fondos20, ML=FALSE, std.err=FALSE, bins=4, pd=TRUE)
print (tetraco)
END PROGRAM.
BEGIN PROGRAM R.
str (matriztetra)
END PROGRAM.
BEGIN PROGRAM R.
spsspivottable.Display(matriztetra$types,title="correlations types")
END PROGRAM.
BEGIN PROGRAM R.
spsspivottable.Display(matriztetra$correlations,title="correlations
matrix")
END PROGRAM.
BEGIN PROGRAM R.
str (matriztetra)
print (matriztetra$correlations)
 tetracoricafinal<-matrix
(matriztetra$correlations,nrow=24,ncol=24,byrow=TRUE,list(c
("D2","D3","D4","D5","D6","D7","D9","D10","D11","D12","D13","D14","D16","D1
7","D18","D19","D22","D23","D25","D26","D27","D28","D29","D32")))
print (tetracoricafinal)
END PROGRAM.
FACTOR
/MATRIX= IN (COR=*)
/MISSING LISTWISE
/ANALYSIS D2 D3 D4 D5 D6 D7 D9 D10 D11 D12 D13 D14 D16 D17 D18 D19 D22 D23
D25 D26 D27 D28 D29 D32
/PRINT
/PRINT INITIAL SIG DET KMO INV REPR AIC EXTRACTION FSCORE
/FORMAT SORT
/PLOT EIGEN
/CRITERIA MINEIGEN(1) ITERATE(25)
/EXTRACTION PC
/ROTATION NOROTATE
/SAVE REG (ALL).

Thank you very much for your time and attention.

Regards,

Verónica

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: factor analysis using tetrachoric correlation matrix

Marta Garcia-Granero
Verónica escribió:
> I have managed to get the tetrachoric correlation matrix.
> I perform a factor analysis from this matrix to get the scores of the 17
> investment funds, but the command FACTOR I use gives me an error.
>

Buenas tardes(*) Verónica:

You should give us more details concerning the error FACTOR is giving:
is it reading the data or some problem related to the computation of
KMO...?. Although I have not used  R to compute a tetrachoric
correlation matrix, but a program written by Dirk Enzmann (see reference
below),  I have found no problem making FACTOR accept the matrix.

http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Software/tetcorr.txt

Best regards from Pamplona (snowing),
Marta GG

(*) With Martin's permission to use Spanish ;)

> I´m sending you the syntax to see if you can help me.
>
> BEGIN PROGRAM R.
> library (foreign)
> library (polycor)
> library (psych)
> library (corpcor)
> library (spss_18.0)
> fondos20=read.spss("C:/Users/Veronica/Desktop/domingo/SPSS FACTORIAL
> HEDONICOS/Factor20.sav",to.data.frame=TRUE)
> fondos20
> fondos20$D2=factor(fondos20[,1])
> fondos20$D3=factor(fondos20[,2])
> fondos20$D4=factor(fondos20[,3])
> fondos20$D5=factor(fondos20[,4])
> fondos20$D6=factor(fondos20[,5])
> fondos20$D7=factor(fondos20[,6])
> fondos20$D9=factor(fondos20[,7])
> fondos20$D10=factor(fondos20[,8])
> fondos20$D11=factor(fondos20[,9])
> fondos20$D12=factor(fondos20[,10])
> fondos20$D13=factor(fondos20[,11])
> fondos20$D14=factor(fondos20[,12])
> fondos20$D16=factor(fondos20[,13])
> fondos20$D17=factor(fondos20[,14])
> fondos20$D18=factor(fondos20[,15])
> fondos20$D19=factor(fondos20[,16])
> fondos20$D22=factor(fondos20[,17])
> fondos20$D23=factor(fondos20[,18])
> fondos20$D25=factor(fondos20[,19])
> fondos20$D26=factor(fondos20[,20])
> fondos20$D27=factor(fondos20[,21])
> fondos20$D28=factor(fondos20[,22])
> fondos20$D29=factor(fondos20[,23])
> fondos20$D32=factor(fondos20[,24])
> matriztetra=hetcor(fondos20,ML=FALSE,std.err=FALSE,bins=4,pd=TRUE)
> print (matriztetra)
> END PROGRAM.
> BEGIN PROGRAM R.
> fondos20=spssdata.GetDataFromSPSS(variables="D2 D3 D4 D5 D6 D7 D9 D10 D11
> D12 D13 D14 D16 D17 D18 D19 D22 D23 D25 D26 D27 D28 D29 D32")
> print (fondos20)
> END PROGRAM.
> BEGIN PROGRAM R.
> fondos20=spssdata.GetDataFromSPSS(variables="D2 D3 D4 D5 D6 D7 D9 D10 D11
> D12 D13 D14 D16 D17 D18 D19 D22 D23 D25 D26 D27 D28 D29 D32",
> factorMode="levels")
> END PROGRAM.
> BEGIN PROGRAM R.
> tetraco=hetcor(fondos20, ML=FALSE, std.err=FALSE, bins=4, pd=TRUE)
> print (tetraco)
> END PROGRAM.
> BEGIN PROGRAM R.
> str (matriztetra)
> END PROGRAM.
> BEGIN PROGRAM R.
> spsspivottable.Display(matriztetra$types,title="correlations types")
> END PROGRAM.
> BEGIN PROGRAM R.
> spsspivottable.Display(matriztetra$correlations,title="correlations
> matrix")
> END PROGRAM.
> BEGIN PROGRAM R.
> str (matriztetra)
> print (matriztetra$correlations)
>  tetracoricafinal<-matrix
> (matriztetra$correlations,nrow=24,ncol=24,byrow=TRUE,list(c
> ("D2","D3","D4","D5","D6","D7","D9","D10","D11","D12","D13","D14","D16","D1
> 7","D18","D19","D22","D23","D25","D26","D27","D28","D29","D32")))
> print (tetracoricafinal)
> END PROGRAM.
> FACTOR
> /MATRIX= IN (COR=*)
> /MISSING LISTWISE
> /ANALYSIS D2 D3 D4 D5 D6 D7 D9 D10 D11 D12 D13 D14 D16 D17 D18 D19 D22 D23
> D25 D26 D27 D28 D29 D32
> /PRINT
> /PRINT INITIAL SIG DET KMO INV REPR AIC EXTRACTION FSCORE
> /FORMAT SORT
> /PLOT EIGEN
> /CRITERIA MINEIGEN(1) ITERATE(25)
> /EXTRACTION PC
> /ROTATION NOROTATE
> /SAVE REG (ALL).
>

--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: factor analysis using tetrachoric correlation matrix

Martin Holt
Hi Marta,

(*) I like a little Spanish......makes me feel that the group is connected
across the planet....but it's probably best to imagine you're talking to a
child. I came 8th in UK in Latin 'O'Level, but it doesn't seem to help. (I
think the Romans who invaded Spain talked their own version.)

But when we get "Ver ónica" and "I´m" .........

Hope the snow's manageable/enjoyable,

Martin



----- Original Message -----
From: "Marta García-Granero" <[hidden email]>
To: <[hidden email]>
Sent: Wednesday, March 10, 2010 3:16 PM
Subject: Re: factor analysis using tetrachoric correlation matrix


> Verónica escribió:
>> I have managed to get the tetrachoric correlation matrix.
>> I perform a factor analysis from this matrix to get the scores of the 17
>> investment funds, but the command FACTOR I use gives me an error.
>>
>
> Buenas tardes(*) Verónica:
>
> You should give us more details concerning the error FACTOR is giving:
> is it reading the data or some problem related to the computation of
> KMO...?. Although I have not used  R to compute a tetrachoric
> correlation matrix, but a program written by Dirk Enzmann (see reference
> below),  I have found no problem making FACTOR accept the matrix.
>
> http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Software/tetcorr.txt
>
> Best regards from Pamplona (snowing),
> Marta GG
>
> (*) With Martin's permission to use Spanish ;)
>
>> I´m sending you the syntax to see if you can help me.
>>
>> BEGIN PROGRAM R.
>> library (foreign)
>> library (polycor)
>> library (psych)
>> library (corpcor)
>> library (spss_18.0)
>> fondos20=read.spss("C:/Users/Veronica/Desktop/domingo/SPSS FACTORIAL
>> HEDONICOS/Factor20.sav",to.data.frame=TRUE)
>> fondos20
>> fondos20$D2=factor(fondos20[,1])
>> fondos20$D3=factor(fondos20[,2])
>> fondos20$D4=factor(fondos20[,3])
>> fondos20$D5=factor(fondos20[,4])
>> fondos20$D6=factor(fondos20[,5])
>> fondos20$D7=factor(fondos20[,6])
>> fondos20$D9=factor(fondos20[,7])
>> fondos20$D10=factor(fondos20[,8])
>> fondos20$D11=factor(fondos20[,9])
>> fondos20$D12=factor(fondos20[,10])
>> fondos20$D13=factor(fondos20[,11])
>> fondos20$D14=factor(fondos20[,12])
>> fondos20$D16=factor(fondos20[,13])
>> fondos20$D17=factor(fondos20[,14])
>> fondos20$D18=factor(fondos20[,15])
>> fondos20$D19=factor(fondos20[,16])
>> fondos20$D22=factor(fondos20[,17])
>> fondos20$D23=factor(fondos20[,18])
>> fondos20$D25=factor(fondos20[,19])
>> fondos20$D26=factor(fondos20[,20])
>> fondos20$D27=factor(fondos20[,21])
>> fondos20$D28=factor(fondos20[,22])
>> fondos20$D29=factor(fondos20[,23])
>> fondos20$D32=factor(fondos20[,24])
>> matriztetra=hetcor(fondos20,ML=FALSE,std.err=FALSE,bins=4,pd=TRUE)
>> print (matriztetra)
>> END PROGRAM.
>> BEGIN PROGRAM R.
>> fondos20=spssdata.GetDataFromSPSS(variables="D2 D3 D4 D5 D6 D7 D9 D10 D11
>> D12 D13 D14 D16 D17 D18 D19 D22 D23 D25 D26 D27 D28 D29 D32")
>> print (fondos20)
>> END PROGRAM.
>> BEGIN PROGRAM R.
>> fondos20=spssdata.GetDataFromSPSS(variables="D2 D3 D4 D5 D6 D7 D9 D10 D11
>> D12 D13 D14 D16 D17 D18 D19 D22 D23 D25 D26 D27 D28 D29 D32",
>> factorMode="levels")
>> END PROGRAM.
>> BEGIN PROGRAM R.
>> tetraco=hetcor(fondos20, ML=FALSE, std.err=FALSE, bins=4, pd=TRUE)
>> print (tetraco)
>> END PROGRAM.
>> BEGIN PROGRAM R.
>> str (matriztetra)
>> END PROGRAM.
>> BEGIN PROGRAM R.
>> spsspivottable.Display(matriztetra$types,title="correlations types")
>> END PROGRAM.
>> BEGIN PROGRAM R.
>> spsspivottable.Display(matriztetra$correlations,title="correlations
>> matrix")
>> END PROGRAM.
>> BEGIN PROGRAM R.
>> str (matriztetra)
>> print (matriztetra$correlations)
>>  tetracoricafinal<-matrix
>> (matriztetra$correlations,nrow=24,ncol=24,byrow=TRUE,list(c
>> ("D2","D3","D4","D5","D6","D7","D9","D10","D11","D12","D13","D14","D16","D1
>> 7","D18","D19","D22","D23","D25","D26","D27","D28","D29","D32")))
>> print (tetracoricafinal)
>> END PROGRAM.
>> FACTOR
>> /MATRIX= IN (COR=*)
>> /MISSING LISTWISE
>> /ANALYSIS D2 D3 D4 D5 D6 D7 D9 D10 D11 D12 D13 D14 D16 D17 D18 D19 D22
>> D23
>> D25 D26 D27 D28 D29 D32
>> /PRINT
>> /PRINT INITIAL SIG DET KMO INV REPR AIC EXTRACTION FSCORE
>> /FORMAT SORT
>> /PLOT EIGEN
>> /CRITERIA MINEIGEN(1) ITERATE(25)
>> /EXTRACTION PC
>> /ROTATION NOROTATE
>> /SAVE REG (ALL).
>>
>
> --
> For miscellaneous SPSS related statistical stuff, visit:
> http://gjyp.nl/marta/
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
ron
Reply | Threaded
Open this post in threaded view
|

Re: factor analysis using tetrachoric correlation matrix

ron
In reply to this post by Marta Garcia-Granero
I am trying to perform a factor analysis on a matrix of tetrachoric correlations that I computed manually (per Sheskin 2007).  There are 17 variables and each correlation includes 212 subjects.  I have managed to format the file in a way that SPSS version 19 will recognize, but the factor analysis stops (extraction is not performed).  The matrix is not positive definite.  Because the SPSS syntax requires standard deviations, I computed and included those for each variable, even though they are dichotomously scored.  The syntax I used is copied below.  Can anyone suggest how it needs to be changed so that the factor analysis will run?


MATRIX DATA VARIABLES=V1 TO V17 /CONTENTS=MEAN SD N_SCALAR CORR.
BEGIN DATA
  0.66509             0.28302 0.50000 0.68868 0.12736 0.45755 0.58491 0.15566 0.01887 0.02358 0.34434 0.04717 0.91038 0.08962 0.14623 0.92453 0.45755
  0.47307 0.45153 0.50118 0.46413 0.33416 0.49937 0.49390 0.36339 0.13638 0.15211 0.47628 0.21250 0.28632 0.28632 0.35417 0.26478 0.49937
  212
  1.000                                
  0.983  1.000                              
  0.538  -0.078  1.000                            
  -0.110  0.402  -0.717  1.000                          
  0.037  0.870  -0.745  -0.071  1.000                        
  0.642  -0.907  -0.141  -0.405  -0.767  1.000                      
  0.998  -0.815  -0.965  0.936  0.753  0.999  1.000                    
  0.900  0.545  0.989  -0.965  -0.833  -1.000  0.537  1.000                  
  -0.559  -0.843  0.976  -0.802  0.951  0.647  -0.708  -0.867  1.000                
  0.981  0.153  -0.768  0.344  -0.899  -0.878  -0.835  -0.039  0.955  1.000              
  0.948  0.310  -0.987  0.933  -0.981  0.986  0.373  0.224  -0.390  0.967  1.000            
  0.193  0.659  0.967  -0.818  -0.125  0.142  0.054  0.978  -0.371  -0.381  0.841  1.000          
  0.992  -0.560  -0.560  0.252  0.751  0.560  -0.560  0.560  -0.560  -0.992  -0.992  -0.560  1.000        
  -0.997  0.186  0.858  0.210  -0.267  -0.192  0.209  0.910  -0.598  -0.988  0.989  0.385  0.349  1.000      
  0.965  -0.225  -0.950  0.458  -0.598  0.256  -0.680  -0.598  -0.598  0.828  -0.598  0.914  0.045  -0.598  1.000    
  -0.422  0.796  0.998  0.215  -0.506  0.348  -0.615  0.828  -0.598  -0.792  0.995  0.971  -0.448  -0.611  -0.598  1.000  
  0.971  0.390  -0.941  0.247  -0.967  -0.849  -0.997  -0.535  0.315  0.999  0.373  -0.902  0.813  0.508  0.395  -0.291  1.000
END DATA.
LIST.

FACTOR MATRIX=IN(COR=*)
/PRINT= EXTRACTION ROTATION CORRELATION REPR
/PLOT EIGEN ROTATION
 /CRITERIA MINEIGEN(1) ITERATE(25)
/EXTRACTION=PC
 /ROTATION=PROMAX
 /METHOD=CORRELATION.

FACTOR MATRIX=IN(COR=*)
/PRINT= EXTRACTION ROTATION
/PLOT EIGEN ROTATION
 /CRITERIA MINEIGEN(.45) ITERATE(25)
/EXTRACTION=PC
 /ROTATION=PROMAX
 /METHOD=CORRELATION.

FACTOR MATRIX=IN(COR=*)
/PRINT= ALL
/PLOT EIGEN ROTATION
 /CRITERIA MINEIGEN(1) ITERATE(25)
/EXTRACTION=PAF
 /ROTATION=PROMAX
 /METHOD=CORRELATION.
Reply | Threaded
Open this post in threaded view
|

Re: factor analysis using tetrachoric correlation matrix

David Marso
Administrator
Does there *NOT* seem to be something terribly *WRONG* with this matrix?
For now simply inspect v6 and v7 (corr=.9990).  Now examine the vector of correlations between V6, V7 and the other variables.  Hmmmmm.  How did you say these came about?  Are you certain of the accuracy of the calculations?  What does a matrix of Pearson coefficients look like?  What is the simple 2x2 table of v6 and v7?
Must be something like:
xxx  0       1
0    87       1
1     0    124

But then only one of the means is correct.  If the corr is .999 then the two vars should be nearly identical.
This is only a small glimpse of what appears to be a very screwy matrix.

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: factor analysis using tetrachoric correlation matrix

Rich Ulrich
In reply to this post by ron
As David says, there is absolutely, certainly, something wrong with your
correlations.  I know that 'tetrachoric'  is distinct from a product-moment
correlation, but your deviations from anything reasonable are absurdly large. 

The first three lines of your triangular-form of the r's --
> 1.000
> 0.983 1.000
> 0.538 -0.078 1.000

When variable 1 is almost identical to variable 2, there will not
be a distinct difference in correlations with variable 3 such as
the above  0.54 versus -0.08.  Your matrix if full of near-one
correlations, along with highly diverse profiles of correlations
with other variables.  Such results are not possible for Pearson
(product-moment) r's. 

So if you do your computions right, it might work.  Or, might not.

Tetrachoric r's are affected by the skewness of margins of the
dichotomies, so they *can*  be a lot more inconsistent than
what you would get with Pearsons.  - When it happens to
any sizable degree, your factor analysis will fail for that same
reason.  If you actually do have some tetrachoric r's over
0.90, I would expect the matrix to be ill-conditioned, that is,
not positive-definite. 

I've seen people use tetrachorics because they liked the idea
of larger correlations in the matrix than 0.3 and 0.4's that they
saw with Pearson r's.   But they were only seeing boosts to 0.5
or so.  Whether that is useful or not, is not at all obvious.  But with
values that are intermediate (not near perfect), there is less chance
that the matrix will become ill-conditioned.

There *might* be a gain in using tetrachoric r's instead of Pearson's
if can undo the biases of the skewness, if your items vary a lot in
their marginal counts.  I've never seen it happen, but the theory
seems conceivable. 

But do keep in mind that if you hope to get decent factors, the usual
recommendation of "10 cases per variable" should be doubled or
tripled, because of the increased error of tetrachoric r's.

--
Rich Ulrich


> Date: Mon, 5 Sep 2011 06:13:09 -0700

> From: [hidden email]
> Subject: Re: factor analysis using tetrachoric correlation matrix
> To: [hidden email]
>
> I am trying to perform a factor analysis on a matrix of tetrachoric
> correlations that I computed manually (per Sheskin 2007). There are 17
> variables and each correlation includes 212 subjects. I have managed to
> format the file in a way that SPSS version 19 will recognize, but the factor
> analysis stops (extraction is not performed). The matrix is not positive
> definite. Because the SPSS syntax requires standard deviations, I computed
> and included those for each variable, even though they are dichotomously
> scored. The syntax I used is copied below. Can anyone suggest how it needs
> to be changed so that the factor analysis will run?
>
>
> MATRIX DATA VARIABLES=V1 TO V17 /CONTENTS=MEAN SD N_SCALAR CORR.
> BEGIN DATA
> 0.66509 0.28302 0.50000 0.68868 0.12736 0.45755 0.58491
> 0.15566 0.01887 0.02358 0.34434 0.04717 0.91038 0.08962 0.14623 0.92453
> 0.45755
> 0.47307 0.45153 0.50118 0.46413 0.33416 0.49937 0.49390 0.36339 0.13638
> 0.15211 0.47628 0.21250 0.28632 0.28632 0.35417 0.26478 0.49937
> 212
> 1.000
> 0.983 1.000
> 0.538 -0.078 1.000
> -0.110 0.402 -0.717 1.000
> 0.037 0.870 -0.745 -0.071 1.000
> 0.642 -0.907 -0.141 -0.405 -0.767 1.000
> 0.998 -0.815 -0.965 0.936 0.753 0.999 1.000
> 0.900 0.545 0.989 -0.965 -0.833 -1.000 0.537 1.000
> -0.559 -0.843 0.976 -0.802 0.951 0.647 -0.708 -0.867 1.000
> 0.981 0.153 -0.768 0.344 -0.899 -0.878 -0.835 -0.039 0.955 1.000
> 0.948 0.310 -0.987 0.933 -0.981 0.986 0.373 0.224 -0.390 0.967
> 1.000
> 0.193 0.659 0.967 -0.818 -0.125 0.142 0.054 0.978 -0.371 -0.381
> 0.841 1.000
> 0.992 -0.560 -0.560 0.252 0.751 0.560 -0.560 0.560 -0.560 -0.992
> -0.992 -0.560 1.000
> -0.997 0.186 0.858 0.210 -0.267 -0.192 0.209 0.910 -0.598 -0.988
> 0.989 0.385 0.349 1.000
> 0.965 -0.225 -0.950 0.458 -0.598 0.256 -0.680 -0.598 -0.598 0.828
> -0.598 0.914 0.045 -0.598 1.000
> -0.422 0.796 0.998 0.215 -0.506 0.348 -0.615 0.828 -0.598 -0.792
> 0.995 0.971 -0.448 -0.611 -0.598 1.000
> 0.971 0.390 -0.941 0.247 -0.967 -0.849 -0.997 -0.535 0.315 0.999
> 0.373 -0.902 0.813 0.508 0.395 -0.291 1.000
> END DATA.
> LIST.
>
Reply | Threaded
Open this post in threaded view
|

Re: factor analysis using tetrachoric correlation matrix

Jon K Peck
My first thought was that this pattern was at least mathematically possible, however, implausible, but attempting a Cholesky factorization shows that this matrix is not positive semidefinite, so it can't be a correlation matrix.

Regards,

Jon Peck (no "h")
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        Rich Ulrich <[hidden email]>
To:        [hidden email]
Date:        09/05/2011 06:57 PM
Subject:        Re: [SPSSX-L] factor analysis using tetrachoric correlation matrix
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




As David says, there is absolutely, certainly, something wrong with your
correlations.  I know that 'tetrachoric'  is distinct from a product-moment
correlation, but your deviations from anything reasonable are absurdly large.  

The first three lines of your triangular-form of the r's --
> 1.000
> 0.983 1.000
> 0.538 -0.078 1.000

When variable 1 is almost identical to variable 2, there will not
be a distinct difference in correlations with variable 3 such as
the above  0.54 versus -0.08.  Your matrix if full of near-one
correlations, along with highly diverse profiles of correlations
with other variables.  Such results are not possible for Pearson
(product-moment) r's.  

So if you do your computions right, it might work.  Or, might not.

Tetrachoric r's are affected by the skewness of margins of the
dichotomies, so they *can*  be a lot more inconsistent than
what you would get with Pearsons.  - When it happens to
any sizable degree, your factor analysis will fail for that same
reason.  If you actually do have some tetrachoric r's over
0.90, I would expect the matrix to be ill-conditioned, that is,
not positive-definite.  

I've seen people use tetrachorics because they liked the idea
of larger correlations in the matrix than 0.3 and 0.4's that they
saw with Pearson r's.   But they were only seeing boosts to 0.5
or so.  Whether that is useful or not, is not at all obvious.  But with
values that are intermediate (not near perfect), there is less chance
that the matrix will become ill-conditioned.

There *might* be a gain in using tetrachoric r's instead of Pearson's
if can undo the biases of the skewness, if your items vary a lot in
their marginal counts.  I've never seen it happen, but the theory
seems conceivable.  

But do keep in mind that if you hope to get decent factors, the usual
recommendation of "10 cases per variable" should be doubled or
tripled, because of the increased error of tetrachoric r's.

--
Rich Ulrich


> Date: Mon, 5 Sep 2011 06:13:09 -0700
> From: [hidden email]
> Subject: Re: factor analysis using tetrachoric correlation matrix
> To: [hidden email]
>
> I am trying to perform a factor analysis on a matrix of tetrachoric
> correlations that I computed manually (per Sheskin 2007). There are 17
> variables and each correlation includes 212 subjects. I have managed to
> format the file in a way that SPSS version 19 will recognize, but the factor
> analysis stops (extraction is not performed). The matrix is not positive
> definite. Because the SPSS syntax requires standard deviations, I computed
> and included those for each variable, even though they are dichotomously
> scored. The syntax I used is copied below. Can anyone suggest how it needs
> to be changed so that the factor analysis will run?
>
>
> MATRIX DATA VARIABLES=V1 TO V17 /CONTENTS=MEAN SD N_SCALAR CORR.
> BEGIN DATA
> 0.66509 0.28302 0.50000 0.68868 0.12736 0.45755 0.58491
> 0.15566 0.01887 0.02358 0.34434 0.04717 0.91038 0.08962 0.14623 0.92453
> 0.45755
> 0.47307 0.45153 0.50118 0.46413 0.33416 0.49937 0.49390 0.36339 0.13638
> 0.15211 0.47628 0.21250 0.28632 0.28632 0.35417 0.26478 0.49937
> 212
> 1.000
> 0.983 1.000
> 0.538 -0.078 1.000
> -0.110 0.402 -0.717 1.000
> 0.037 0.870 -0.745 -0.071 1.000
> 0.642 -0.907 -0.141 -0.405 -0.767 1.000
> 0.998 -0.815 -0.965 0.936 0.753 0.999 1.000
> 0.900 0.545 0.989 -0.965 -0.833 -1.000 0.537 1.000
> -0.559 -0.843 0.976 -0.802 0.951 0.647 -0.708 -0.867 1.000
> 0.981 0.153 -0.768 0.344 -0.899 -0.878 -0.835 -0.039 0.955 1.000
> 0.948 0.310 -0.987 0.933 -0.981 0.986 0.373 0.224 -0.390 0.967
> 1.000
> 0.193 0.659 0.967 -0.818 -0.125 0.142 0.054 0.978 -0.371 -0.381
> 0.841 1.000
> 0.992 -0.560 -0.560 0.252 0.751 0.560 -0.560 0.560 -0.560 -0.992
> -0.992 -0.560 1.000
> -0.997 0.186 0.858 0.210 -0.267 -0.192 0.209 0.910 -0.598 -0.988
> 0.989 0.385 0.349 1.000
> 0.965 -0.225 -0.950 0.458 -0.598 0.256 -0.680 -0.598 -0.598 0.828
> -0.598 0.914 0.045 -0.598 1.000
> -0.422 0.796 0.998 0.215 -0.506 0.348 -0.615 0.828 -0.598 -0.792
> 0.995 0.971 -0.448 -0.611 -0.598 1.000
> 0.971 0.390 -0.941 0.247 -0.967 -0.849 -0.997 -0.535 0.315 0.999
> 0.373 -0.902 0.813 0.508 0.395 -0.291 1.000
> END DATA.
> LIST.
>