factor analysis for binary variables

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

factor analysis for binary variables

E. Bernardo
Hi everyone,

  The 70 variables that will be subjected to factor analysis are binary(coded 0 and 1).  May I know your thoughts regarding the methods to use in extracting the factors, as well as the rotation methods.

  Thank you in advance for your help.

  Eins


---------------------------------
  New Email names for you!
Get the Email name you've always wanted on the new @ymail and @rocketmail.
Hurry before someone else does!

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: factor analysis for binary variables

Art Kendall
What do you want to accomplish by using factor analysis?

Are the variables designed  ahead of time to belong to summative scales?
How many scales?
Is this the first time this set of items has been used?

Are you only interested in the common variance (e.g., with items
designed to measure a construct?) Or is there a reason you want to
include unique item variance in the variance accounted for?

It is traditional to go ahead with principal axis factoring on
dichotomous items when you are making scales.  However, now that CATPCA
is available, I suggest that you use that.  As an exercise, you might
want to run your analysis with CATPCA and with FACTOR and see how the
results compare on which items load cleanly together.

You would need to check the documentation to choose specifications for
CATPCA.

If you are trying to build scales, I suggest 1) using PAF since you
would only be interested in the common variance 2)using varimax rotation
so that the final scales have divergent validity, 3) use at least the
scree and preferably parallel analysis (see the archives of this list
for that) to ballpark the number of factors to retain, and 4) keep only
items that load with an absolute loading of . 4 or so, and not more than
.25 or so on another factor.

Art Kendall
Social Research Consultants

Eins Bernardo wrote:

> Hi everyone,
>
>   The 70 variables that will be subjected to factor analysis are binary(coded 0 and 1).  May I know your thoughts regarding the methods to use in extracting the factors, as well as the rotation methods.
>
>   Thank you in advance for your help.
>
>   Eins
>
>
> ---------------------------------
>   New Email names for you!
> Get the Email name you've always wanted on the new @ymail and @rocketmail.
> Hurry before someone else does!
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: factor analysis for binary variables

Kooij, A.J. van der
>It is traditional to go ahead with principal axis factoring on
>dichotomous items when you are making scales.  However, now that CATPCA
>is available, I suggest that you use that.  As an exercise, you might
>want to run your analysis with CATPCA and with FACTOR and see how the
>results compare on which items load cleanly together.

If all variables are binary, the result of CATPCA is equal to standard PCA. For a binary variable the transformation is always linear. So, no matter which optimal scaling level you choose, the result is equal to result of numeric scaling level. (think of the transformation plot: there are 2 categories, you can only fit a straight line between 2 points)

Anita van der Kooij

Data Theory Group

Leiden University

________________________________

From: SPSSX(r) Discussion on behalf of Art Kendall
Sent: Sun 12/10/2008 16:56
To: [hidden email]
Subject: Re: factor analysis for binary variables



What do you want to accomplish by using factor analysis?

Are the variables designed  ahead of time to belong to summative scales?
How many scales?
Is this the first time this set of items has been used?

Are you only interested in the common variance (e.g., with items
designed to measure a construct?) Or is there a reason you want to
include unique item variance in the variance accounted for?

It is traditional to go ahead with principal axis factoring on
dichotomous items when you are making scales.  However, now that CATPCA
is available, I suggest that you use that.  As an exercise, you might
want to run your analysis with CATPCA and with FACTOR and see how the
results compare on which items load cleanly together.

You would need to check the documentation to choose specifications for
CATPCA.

If you are trying to build scales, I suggest 1) using PAF since you
would only be interested in the common variance 2)using varimax rotation
so that the final scales have divergent validity, 3) use at least the
scree and preferably parallel analysis (see the archives of this list
for that) to ballpark the number of factors to retain, and 4) keep only
items that load with an absolute loading of . 4 or so, and not more than
.25 or so on another factor.

Art Kendall
Social Research Consultants

Eins Bernardo wrote:

> Hi everyone,
>
>   The 70 variables that will be subjected to factor analysis are binary(coded 0 and 1).  May I know your thoughts regarding the methods to use in extracting the factors, as well as the rotation methods.
>
>   Thank you in advance for your help.
>
>   Eins
>
>
> ---------------------------------
>   New Email names for you!
> Get the Email name you've always wanted on the new @ymail and @rocketmail.
> Hurry before someone else does!
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
**********************************************************************



====================To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: factor analysis for binary variables

Art Kendall
Is there a way in CATPCA to separate the unique and common variance
analogous to using reliability estimates on the diagonal in principal
axes factoring?

Art

Kooij, A.J. van der wrote:

>> It is traditional to go ahead with principal axis factoring on
>> dichotomous items when you are making scales.  However, now that CATPCA
>> is available, I suggest that you use that.  As an exercise, you might
>> want to run your analysis with CATPCA and with FACTOR and see how the
>> results compare on which items load cleanly together.
>>
>
> If all variables are binary, the result of CATPCA is equal to standard PCA. For a binary variable the transformation is always linear. So, no matter which optimal scaling level you choose, the result is equal to result of numeric scaling level. (think of the transformation plot: there are 2 categories, you can only fit a straight line between 2 points)
>
> Anita van der Kooij
>
> Data Theory Group
>
> Leiden University
>
> ________________________________
>
> From: SPSSX(r) Discussion on behalf of Art Kendall
> Sent: Sun 12/10/2008 16:56
> To: [hidden email]
> Subject: Re: factor analysis for binary variables
>
>
>
> What do you want to accomplish by using factor analysis?
>
> Are the variables designed  ahead of time to belong to summative scales?
> How many scales?
> Is this the first time this set of items has been used?
>
> Are you only interested in the common variance (e.g., with items
> designed to measure a construct?) Or is there a reason you want to
> include unique item variance in the variance accounted for?
>
> It is traditional to go ahead with principal axis factoring on
> dichotomous items when you are making scales.  However, now that CATPCA
> is available, I suggest that you use that.  As an exercise, you might
> want to run your analysis with CATPCA and with FACTOR and see how the
> results compare on which items load cleanly together.
>
> You would need to check the documentation to choose specifications for
> CATPCA.
>
> If you are trying to build scales, I suggest 1) using PAF since you
> would only be interested in the common variance 2)using varimax rotation
> so that the final scales have divergent validity, 3) use at least the
> scree and preferably parallel analysis (see the archives of this list
> for that) to ballpark the number of factors to retain, and 4) keep only
> items that load with an absolute loading of . 4 or so, and not more than
> .25 or so on another factor.
>
> Art Kendall
> Social Research Consultants
>
> Eins Bernardo wrote:
>
>> Hi everyone,
>>
>>   The 70 variables that will be subjected to factor analysis are binary(coded 0 and 1).  May I know your thoughts regarding the methods to use in extracting the factors, as well as the rotation methods.
>>
>>   Thank you in advance for your help.
>>
>>   Eins
>>
>>
>> ---------------------------------
>>   New Email names for you!
>> Get the Email name you've always wanted on the new @ymail and @rocketmail.
>> Hurry before someone else does!
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>>
>>
>>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
>
> **********************************************************************
> This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom they
> are addressed. If you have received this email in error please notify
> the system manager.
> **********************************************************************
>
>
>
> ===================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: factor analysis for binary variables

E. Bernardo
I received an off-list message advising that I have to obtain the tetrachoric correlation matrix of the 70 variables, then I will use that matrix to do EFA.  What are your comments?

  Eins



Art Kendall <[hidden email]> wrote:
  Is there a way in CATPCA to separate the unique and common variance
analogous to using reliability estimates on the diagonal in principal
axes factoring?

Art

Kooij, A.J. van der wrote:

>> It is traditional to go ahead with principal axis factoring on
>> dichotomous items when you are making scales. However, now that CATPCA
>> is available, I suggest that you use that. As an exercise, you might
>> want to run your analysis with CATPCA and with FACTOR and see how the
>> results compare on which items load cleanly together.
>>
>
> If all variables are binary, the result of CATPCA is equal to standard PCA. For a binary variable the transformation is always linear. So, no matter which optimal scaling level you choose, the result is equal to result of numeric scaling level. (think of the transformation plot: there are 2 categories, you can only fit a straight line between 2 points)
>
> Anita van der Kooij
>
> Data Theory Group
>
> Leiden University
>
> ________________________________
>
> From: SPSSX(r) Discussion on behalf of Art Kendall
> Sent: Sun 12/10/2008 16:56
> To: [hidden email]
> Subject: Re: factor analysis for binary variables
>
>
>
> What do you want to accomplish by using factor analysis?
>
> Are the variables designed ahead of time to belong to summative scales?
> How many scales?
> Is this the first time this set of items has been used?
>
> Are you only interested in the common variance (e.g., with items
> designed to measure a construct?) Or is there a reason you want to
> include unique item variance in the variance accounted for?
>
> It is traditional to go ahead with principal axis factoring on
> dichotomous items when you are making scales. However, now that CATPCA
> is available, I suggest that you use that. As an exercise, you might
> want to run your analysis with CATPCA and with FACTOR and see how the
> results compare on which items load cleanly together.
>
> You would need to check the documentation to choose specifications for
> CATPCA.
>
> If you are trying to build scales, I suggest 1) using PAF since you
> would only be interested in the common variance 2)using varimax rotation
> so that the final scales have divergent validity, 3) use at least the
> scree and preferably parallel analysis (see the archives of this list
> for that) to ballpark the number of factors to retain, and 4) keep only
> items that load with an absolute loading of . 4 or so, and not more than
> .25 or so on another factor.
>
> Art Kendall
> Social Research Consultants
>
> Eins Bernardo wrote:
>
>> Hi everyone,
>>
>> The 70 variables that will be subjected to factor analysis are binary(coded 0 and 1). May I know your thoughts regarding the methods to use in extracting the factors, as well as the rotation methods.
>>
>> Thank you in advance for your help.
>>
>> Eins
>>
>>
>> ---------------------------------
>> New Email names for you!
>> Get the Email name you've always wanted on the new @ymail and @rocketmail.
>> Hurry before someone else does!
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>>
>>
>>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
>
> **********************************************************************
> This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom they
> are addressed. If you have received this email in error please notify
> the system manager.
> **********************************************************************
>
>
>
> ===================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD



---------------------------------
  New Email addresses available on Yahoo!
Get the Email name you&#39;ve always wanted on the new @ymail and @rocketmail.
Hurry before someone else does!

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: factor analysis for binary variables

Marta Garcia-Granero
Eins Bernardo wrote:
> I received an off-list message advising that I have to obtain the tetrachoric correlation matrix of the 70 variables, then I will use that matrix to do EFA.  What are your comments?
>

It could be a good idea, assuming your binary variables are adequate for
tetrachoric correlation coefficient.

Check this page (near the end):

http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Software/Enzmann_Software.html

There's a program to compute a tetrachoric correlation matrix that can
be imported into an SPSS dataset and used with FACTOR.

HTH,
Marta GarcĂ­a-Granero

>>
>>
>>> Hi everyone,
>>>
>>> The 70 variables that will be subjected to factor analysis are binary(coded 0 and 1). May I know your thoughts regarding the methods to use in extracting the factors, as well as the rotation methods.
>>>
>>>
--

For miscellaneous statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: factor analysis for binary variables

David Hitchin
In reply to this post by E. Bernardo
Factor analysis is employed for several different purposes. Some of them
are "statistical" in the sense of taking into account distributions and
sampling theory, while others are merely concerned with getting some
insight into data.

I remember many years ago when a student as a project investigated
superstitions held (or not) by other students. I think that there were
about 80 questions, each of them scored zero or one.

Before trying any analysis I thought that so few students would be
superstitious that most of the responses would be zero (leaving little
variance to analyse) or that some would just reply in a haphazard way.

I submitted the whole data set to SPSS for a principal factor analysis
followed by varimax rotation, and the results were extremely clear and
easy to interpret. There was one large general factor (whether people
were superstitious on the whole, or not). The remaining factors all made
sense. There were variables which loaded high on "religious" questions,
those that loaded high on "actions", e.g. crossing ones fingers for
luck, those loading high on "pseudoscientific", e.g. trailing a chain
from your car to earth static prevents travel sickness, and so on.

The moral of this story is that often one need not be too statistically
purist; crude methods can often find most of what is useful in data.
After many years of performing factor analyses by many different methods
I have come to the conclusion that results seem to fall into one of two
groups; those analyses which make no sense at all, however much you
tweak them, and those where the results are clear whatever method is
used, although a little tweaking helps to focus a little better.

David Hitchin

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Doing EFA using correlation matrix as input

E. Bernardo
In reply to this post by Art Kendall
How to do EFA using the correlation matrix as input?  I think this is beyond the point-click in SPSS!  Please help.

  Eins


---------------------------------
 Tired of spam? Yahoo! Mail has the best spam protection around
 http://ph.mail.yahoo.com

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Doing EFA using correlation matrix as input

Marta Garcia-Granero
Eins Bernardo wrote:
> How to do EFA using the correlation matrix as input?  I think this is beyond the point-click in SPSS!  Please help.
>

FACTOR
  /MATRIX = IN(cor=*)
  /ANALYSIS ......


Marta

--
For miscellaneous statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD