SPSSX Discussion

PCA for dichotomous data

Classic

List

Threaded

4 messages Options

news

Jul 08, 2011; 2:12pm

PCA for dichotomous data

58 posts

Hello,

Eurobarometer 66.1 provides data on social values which I would like to
use, with other influences, to explain church going.
The item battery of social values provides 12 questions with yes/no
answer alternatives. The respondent can choose up to three variables.

What I need is a procedure like a PCA for dichotomous data, but I don't
have access to CATPCA. I calculated proximities with the dice algorithm
to correct for the high probability that none of two items will be
selected. I used PROXIMITIES to calculated the similarity of variables.

PROXIMITIES v327 to v338
/VIEW=VARIABLE
/MEASURE= dice (1,0) .

Once PROXIMITIES produces the matrix can you input this as a correlation
matrix into FACTOR ? And how to move from this variable-based analysis
back to the case-based analysis ?

Is there a better alternative for getting a variable structure from
dichotomous variables ?

TIA,
F. Thomas

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Hector Maletta

Jul 08, 2011; 3:15pm

Re: PCA for dichotomous data

602 posts

If the data are dichotomous, conventional PCA (SPSS FACTOR procedure) is
exactly the same as categorical PCA (SPSS CATPCA procedure). The latter is
required when the original data are multi-categorical variables (either
nominal or ordinal), in order to generate (iteratively) optimal scaling
values for the categories and a Principal Component Analysis of the
resulting (interval level) variables.

I wonder whether the fact that each respondent may choose up to three
dichotomous variables has any influence on this. It depends, I surmise, on
the way you want to treat those data.
(a) you may treat each CHOICE as one case. In this fashion, there would be
one case (one row in the dataset) for each combination of respondent and
choice, with up to three (but not necessarily three) choices per respondent.
In this case, my above advice works, although its analysis may require a
two-level model to distinguish between intra- and inter- respondent effects.
(b) you may treat each RESPONDENT as a case. In this option, you may have
different COMBINATIONS of responses per respondent. The maximum number (all
combinations of three out of 12) is probably much higher than the number of
respondents in your sample, and thus only a small proportion of all
combinations will show up. These observed combinations may be treated as a
NOMINAL multy-category variable, with many values. For this kind of approach
CATPCA would be appropriate, but I caution that the number of distinct
combinations observed must not be large (with N respondents and M observed
combinations, you have N-M-1 degrees of freedom, which may result in a
fairly low number, thus invalidating the results in statistical terms.) If
only a few response patterns are observed, and the number of respondents is
comparatively very large, you'd be OK, but beware of too many choices and
too few subjects.

Hector

-----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de ftr
Enviado el: Friday, July 08, 2011 11:13
Para: [hidden email]
Asunto: PCA for dichotomous data

Hello,

Eurobarometer 66.1 provides data on social values which I would like to
use, with other influences, to explain church going.
The item battery of social values provides 12 questions with yes/no
answer alternatives. The respondent can choose up to three variables.

What I need is a procedure like a PCA for dichotomous data, but I don't
have access to CATPCA. I calculated proximities with the dice algorithm
to correct for the high probability that none of two items will be
selected. I used PROXIMITIES to calculated the similarity of variables.

PROXIMITIES v327 to v338
/VIEW=VARIABLE
/MEASURE= dice (1,0) .

Once PROXIMITIES produces the matrix can you input this as a correlation
matrix into FACTOR ? And how to move from this variable-based analysis
back to the case-based analysis ?

Is there a better alternative for getting a variable structure from
dichotomous variables ?

TIA,
F. Thomas

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1388 / Virus Database: 1516/3751 - Release Date: 07/08/11

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

news

Jul 08, 2011; 4:53pm

Re: PCA for dichotomous data

58 posts

Thank your Hector, for your answer.

the fact that respondents have only three choices for 12 items produces
lots of cases with zero as entry. But the respondents did not say No,
they just said nothing. It's a a sort of logic missing. Nevertheless, in
a first rush I tried a PCA with the dichotomous variables but the large
number of zero entries violated the conditions, of course (and killed
the procedure).

In any case, CATPCA is not a solution as I don't have access to this
module.

As my ultimate intention is to explain church going (regular church goer
vs. all the rest) I currently work with a discriminant analysis with the
original items - without having them factor analysed before.

On 08/07/2011 17:15, Hector Maletta wrote:

> If the data are dichotomous, conventional PCA (SPSS FACTOR procedure) is
> exactly the same as categorical PCA (SPSS CATPCA procedure). The latter is
> required when the original data are multi-categorical variables (either
> nominal or ordinal), in order to generate (iteratively) optimal scaling
> values for the categories and a Principal Component Analysis of the
> resulting (interval level) variables.
>
> I wonder whether the fact that each respondent may choose up to three
> dichotomous variables has any influence on this. It depends, I surmise, on
> the way you want to treat those data.
> (a) you may treat each CHOICE as one case. In this fashion, there would be
> one case (one row in the dataset) for each combination of respondent and
> choice, with up to three (but not necessarily three) choices per respondent.
> In this case, my above advice works, although its analysis may require a
> two-level model to distinguish between intra- and inter- respondent effects.
> (b) you may treat each RESPONDENT as a case. In this option, you may have
> different COMBINATIONS of responses per respondent. The maximum number (all
> combinations of three out of 12) is probably much higher than the number of
> respondents in your sample, and thus only a small proportion of all
> combinations will show up. These observed combinations may be treated as a
> NOMINAL multy-category variable, with many values. For this kind of approach
> CATPCA would be appropriate, but I caution that the number of distinct
> combinations observed must not be large (with N respondents and M observed
> combinations, you have N-M-1 degrees of freedom, which may result in a
> fairly low number, thus invalidating the results in statistical terms.) If
> only a few response patterns are observed, and the number of respondents is
> comparatively very large, you'd be OK, but beware of too many choices and
> too few subjects.
>
> Hector
>
> -----Mensaje original-----
> De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de ftr
> Enviado el: Friday, July 08, 2011 11:13
> Para: [hidden email]
> Asunto: PCA for dichotomous data
>
> Hello,
>
> Eurobarometer 66.1 provides data on social values which I would like to
> use, with other influences, to explain church going.
> The item battery of social values provides 12 questions with yes/no
> answer alternatives. The respondent can choose up to three variables.
>
> What I need is a procedure like a PCA for dichotomous data, but I don't
> have access to CATPCA. I calculated proximities with the dice algorithm
> to correct for the high probability that none of two items will be
> selected. I used PROXIMITIES to calculated the similarity of variables.
>
> PROXIMITIES v327 to v338
> /VIEW=VARIABLE
> /MEASURE= dice (1,0) .
>
> Once PROXIMITIES produces the matrix can you input this as a correlation
> matrix into FACTOR ? And how to move from this variable-based analysis
> back to the case-based analysis ?
>
> Is there a better alternative for getting a variable structure from
> dichotomous variables ?
>
> TIA,
> F. Thomas
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 10.0.1388 / Virus Database: 1516/3751 - Release Date: 07/08/11
>
>

... [show rest of quote]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Ryan

Jul 11, 2011; 1:48am

Re: PCA for dichotomous data

910 posts

You could use information obtained from a Rasch model to assess
dimensionality. It should be possible to fit a Rasch model on binary
variables (e.g., yes/no survey items) via the GENLINMIXED procedure in
SPSS 19. This procedure does not require that you have responses to
all items from each person, but missing data are assumed to be missing
at random (MAR). Whether the MAR assumption is tenable for your data
is unclear to me.

Ryan

On Fri, Jul 8, 2011 at 12:53 PM, ftr <[hidden email]> wrote:

> Thank your Hector, for your answer.
>
> the fact that respondents have only three choices for 12 items produces
> lots of cases with zero as entry. But the respondents did not say No,
> they just said nothing. It's a a sort of logic missing. Nevertheless, in
> a first rush I tried a PCA with the dichotomous variables but the large
> number of zero entries violated the conditions, of course (and killed
> the procedure).
>
> In any case, CATPCA is not a solution as I don't have access to this
> module.
>
> As my ultimate intention is to explain church going (regular church goer
> vs. all the rest) I currently work with a discriminant analysis with the
> original items - without having them factor analysed before.
>
> On 08/07/2011 17:15, Hector Maletta wrote:
>>
>> If the data are dichotomous, conventional PCA (SPSS FACTOR procedure) is
>> exactly the same as categorical PCA (SPSS CATPCA procedure). The latter is
>> required when the original data are multi-categorical variables (either
>> nominal or ordinal), in order to generate (iteratively) optimal scaling
>> values for the categories and a Principal Component Analysis of the
>> resulting (interval level) variables.
>>
>> I wonder whether the fact that each respondent may choose up to three
>> dichotomous variables has any influence on this. It depends, I surmise, on
>> the way you want to treat those data.
>> (a) you may treat each CHOICE as one case. In this fashion, there would be
>> one case (one row in the dataset) for each combination of respondent and
>> choice, with up to three (but not necessarily three) choices per
>> respondent.
>> In this case, my above advice works, although its analysis may require a
>> two-level model to distinguish between intra- and inter- respondent
>> effects.
>> (b) you may treat each RESPONDENT as a case. In this option, you may have
>> different COMBINATIONS of responses per respondent. The maximum number
>> (all
>> combinations of three out of 12) is probably much higher than the number
>> of
>> respondents in your sample, and thus only a small proportion of all
>> combinations will show up. These observed combinations may be treated as a
>> NOMINAL multy-category variable, with many values. For this kind of
>> approach
>> CATPCA would be appropriate, but I caution that the number of distinct
>> combinations observed must not be large (with N respondents and M observed
>> combinations, you have N-M-1 degrees of freedom, which may result in a
>> fairly low number, thus invalidating the results in statistical terms.) If
>> only a few response patterns are observed, and the number of respondents
>> is
>> comparatively very large, you'd be OK, but beware of too many choices and
>> too few subjects.
>>
>> Hector
>>
>> -----Mensaje original-----
>> De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de ftr
>> Enviado el: Friday, July 08, 2011 11:13
>> Para: [hidden email]
>> Asunto: PCA for dichotomous data
>>
>> Hello,
>>
>> Eurobarometer 66.1 provides data on social values which I would like to
>> use, with other influences, to explain church going.
>> The item battery of social values provides 12 questions with yes/no
>> answer alternatives. The respondent can choose up to three variables.
>>
>> What I need is a procedure like a PCA for dichotomous data, but I don't
>> have access to CATPCA. I calculated proximities with the dice algorithm
>> to correct for the high probability that none of two items will be
>> selected. I used PROXIMITIES to calculated the similarity of variables.
>>
>> PROXIMITIES v327 to v338
>> /VIEW=VARIABLE
>> /MEASURE= dice (1,0) .
>>
>> Once PROXIMITIES produces the matrix can you input this as a correlation
>> matrix into FACTOR ? And how to move from this variable-based analysis
>> back to the case-based analysis ?
>>
>> Is there a better alternative for getting a variable structure from
>> dichotomous variables ?
>>
>> TIA,
>> F. Thomas
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> [hidden email] (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>> -----
>> No virus found in this message.
>> Checked by AVG - www.avg.com
>> Version: 10.0.1388 / Virus Database: 1516/3751 - Release Date: 07/08/11
>>
>>
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>

... [show rest of quote]