http://spssx-discussion.165.s1.nabble.com/Reference-category-for-dummies-in-factor-analysis-tp1070339p1070354.html
that's what latent class analysis was for. Perhaps I am missing something in
>From: Hector Maletta <
[hidden email]>
>Reply-To: Hector Maletta <
[hidden email]>
>To:
[hidden email]
>Subject: Reference category for dummies in factor analysis
>Date: Thu, 17 Aug 2006 12:52:55 -0300
>
>Dear colleagues,
>
>I am re-posting (slightly re-phrased for added clarity) a question I sent
>the list about a week ago without eliciting any response as yet. I hope
>some
>factor analysis experts may be able to help.
>
>In a research project on which we work together, a colleague of mine
>constructed a scale based on factor scores obtained through classical
>factor
>analysis (principal components) of a number of categorical census
>variables
>all transformed into dummies. The variables concerned the standard of
>living
>of households and included quality of dwelling and basic services such as
>sanitation, water supply, electricity and the like. (The scale was not
>simply the score for the first factor, but the average score of several
>factors, weighted by their respective contribution to explaining the
>overall
>variance of observed variables, but this is, I surmise, beside the point.)
>
>Now, he found out that the choice of reference or "omitted" category for
>defining the dummies has an influence on results. He first ran the analysis
>using the first category of all categorical variables as the reference
>category, and then repeated the analysis using the last category as the
>reference or omitted category, whatever they might be. He found that the
>resulting scale varied not only in absolute value but also in the shape of
>its distribution.
>
>I can understand that the absolute value of the factor scores may change
>and
>even the ranking of the categories of the various variables (in terms of
>their average scores) may also be different, since after all the list of
>dummies used has varied and the categories are tallied each time against a
>different reference category. But the shape of the scale distribution
>should
>not change, I guess, especially not in a drastic manner. In this case the
>shape of the scale frequency distribution did change. Both distributions
>were roughly normal, with a kind of "hump" on one side, one of them on the
>left and the other on the right, probably due to the change in reference
>categories, but also with changes in the range of the scale and other
>details.
>
>Also, he found that the two scales had not a perfect correlation, and
>moreover, that their correlation was negative. That the correlation was
>negative may be understandable: the first category in such census variables
>is usually a "good" one (for instance, a home with walls made of brick or
>concrete) and the last one is frequently a "bad" one (earthen floor) or a
>residual heterogeneous one including bad options ("other" kinds of roof).
>But since the two scales are just different combinations of the same
>categorical variables based on the same statistical treatment of their
>given
>covariance matrix, one should expect a closer, indeed a perfect
>correlation,
>even if a negative one is possible for the reasons stated above. Changing
>the reference category should be like changing the unit of measurement or
>the position of the zero point (like passing from Celsius to Fahrenheit), a
>decision not affecting the correlation coefficient with other variables. In
>this case, instead, the two scales had r = -0.54, implying they shared only
>29% of their variance, even in the extreme case when ALL the possible
>factors (as many as variables) were extracted and all their scores averaged
>into the scale, and therefore the entire variance, common or specific, of
>the whole set of variables was taken into account).
>
>I should add that the dataset was a large sample of census data, and all
>the
>results were statistically significant.
>
>Any ideas why choosing different reference categories for dummy conversion
>could have such impact on results? I would greatly appreciate your thoughts
>in this regard.
>
>Hector