Login  Register

Re: Reference category for dummies in factor analysis

Posted by Kooij, A.J. van der on Aug 17, 2006; 6:50pm
URL: http://spssx-discussion.165.s1.nabble.com/Reference-category-for-dummies-in-factor-analysis-tp1070339p1070356.html

Hector,
Some remarks:
>...categorical factor analysis by alternating least squares (ALSCAL in SPSS jargon) ...
ALSCAL is MDS (Multi Dimensional Scaling). The preferred procedure to use for MDS is PROXSCAL, added to SPSS some versions ago.
 
>... such as optimal scaling or multiple correspondence, but initially
>tried PCA because of its mathematical properties, which come in handy for
>the intended use of the scale in the project. Notice that in this particular
>application we use factor analysis only as an intermediate step, i.e. as a
>way of constructing a scale that is a linear combination of variables taking
>their covariances into account. We are not interested in the factors
>themselves.
With optimal scaling you obtain transformed (is optimally quantified) variables that are continuous. All mathematical properties of PCA apply also to CATPCA, but with respect to the transformed variables. The scale you obtain using CATPCA is continuous.
Some years ago a UN-paper was publiced using CATPCA to create a scale for variables much the same as you describe. If you are interested I can try to find the reference.
 
Regards,
Anita van der Kooij
Data Theory Group
Leiden University.



________________________________

From: SPSSX(r) Discussion on behalf of Hector Maletta
Sent: Thu 17/08/2006 19:04
To: [hidden email]
Subject: Re: Reference category for dummies in factor analysis



Dan,
Yours is a sound question. Latent classes unfortunately would not do in this
case because we need a continuous scale, not a set of discrete classes, even
if they are ordered. We have considered using categorical factor analysis by
alternating least squares (ALSCAL in SPSS jargon) or other non parametric
procedures such as optimal scaling or multiple correspondence, but initially
tried PCA because of its mathematical properties, which come in handy for
the intended use of the scale in the project. Notice that in this particular
application we use factor analysis only as an intermediate step, i.e. as a
way of constructing a scale that is a linear combination of variables taking
their covariances into account. We are not interested in the factors
themselves.
Now about the use of FA with dummy variables: there are conflicting opinions
in the literature about this. Half the library is in favour and the other
half is against. Dummies can indeed be considered as interval scales, since
they have only one interval between their two values, and that interval is
implicitly used as their unit of measurement. The main objection is about
normality of their sampling distribution. Binary random variables have a
binomial distribution, which approximates the normal as n (sample size)
grows larger. Another frequent objection is about normality of residuals in
regression: obviously, if you predict a binary with a binary prediction,
your predicted value would either 1 or 0, and the residual would be either 0
or 1, so you'll have either all residuals to one side of your predictions,
or all residuals to the other side, and you'll never have residuals normally
distributed around your prediction. Take your pick in the library.
However, I do not wish for this thread to become a discussion of our use of
factor analysis in this way, but only of the particular question of the
impact of choosing one or another reference category. The other discussion
is most interesting, but we can address it later.

Hector

-----Mensaje original-----
De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Dan
Zetu
Enviado el: Thursday, August 17, 2006 1:36 PM
Para: [hidden email]
Asunto: Re: Reference category for dummies in factor analysis

Hector:

What I am having a little difficulty comprehending is how a classical factor
analysis can be conducted on a set of dummy (binary) variables? I thought
that's what latent class analysis was for. Perhaps I am missing something in
your post?

Dan


>From: Hector Maletta <[hidden email]>
>Reply-To: Hector Maletta <[hidden email]>
>To: [hidden email]
>Subject: Reference category for dummies in factor analysis
>Date: Thu, 17 Aug 2006 12:52:55 -0300
>
>Dear colleagues,
>
>I am re-posting (slightly re-phrased for added clarity) a question I sent
>the list about a week ago without eliciting any response as yet. I hope
>some
>factor analysis experts may be able to help.
>
>In a research project on which we work together, a colleague of mine
>constructed a scale based on factor scores obtained through classical
>factor
>analysis  (principal components) of a number of categorical census
>variables
>all transformed into dummies. The variables concerned the standard of
>living
>of households and included quality of dwelling and basic services such as
>sanitation, water supply, electricity and the like. (The scale was not
>simply the score for the first factor, but the average score of several
>factors, weighted by their respective contribution to explaining the
>overall
>variance of observed variables, but this is, I surmise, beside the point.)
>
>Now, he found out that the choice of reference or "omitted" category for
>defining the dummies has an influence on results. He first ran the analysis
>using the first category of all categorical variables as the reference
>category, and then repeated the analysis using the last category as the
>reference or omitted category, whatever they might be. He found that the
>resulting scale varied not only in absolute value but also in the shape of
>its distribution.
>
>I can understand that the absolute value of the factor scores may change
>and
>even the ranking of the categories of the various variables (in terms of
>their average scores) may also be different, since after all the list of
>dummies used has varied and the categories are tallied each time against a
>different reference category. But the shape of the scale distribution
>should
>not change, I guess, especially not in a drastic manner. In this case the
>shape of the scale frequency distribution did change.  Both distributions
>were roughly normal, with a kind of "hump" on one side, one of them on the
>left and the other on the right, probably due to the change in reference
>categories, but also with changes in the range of the scale and other
>details.
>
>Also, he found that the two scales had not a perfect correlation, and
>moreover, that their correlation was negative. That the correlation was
>negative may be understandable: the first category in such census variables
>is usually a "good" one (for instance, a home with walls made of brick or
>concrete) and the last one is frequently a "bad" one (earthen floor) or a
>residual heterogeneous one including bad options ("other" kinds of roof).
>But since the two scales are just different combinations of the same
>categorical variables based on the same statistical treatment of their
>given
>covariance matrix, one should expect a closer, indeed a perfect
>correlation,
>even if a negative one is possible for the reasons stated above. Changing
>the reference category should be like changing the unit of measurement or
>the position of the zero point (like passing from Celsius to Fahrenheit), a
>decision not affecting the correlation coefficient with other variables. In
>this case, instead, the two scales had r = -0.54, implying they shared only
>29% of their variance, even in the extreme case when ALL the possible
>factors (as many as variables) were extracted and all their scores averaged
>into the scale, and therefore the entire variance, common or specific, of
>the whole set of variables was taken into account).
>
>I should add that the dataset was a large sample of census data, and all
>the
>results were statistically significant.
>
>Any ideas why choosing different reference categories for dummy conversion
>could have such impact on results? I would greatly appreciate your thoughts
>in this regard.
>
>Hector



**********************************************************************
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.
**********************************************************************