http://spssx-discussion.165.s1.nabble.com/Reference-category-for-dummies-in-factor-analysis-tp1070339p1070345.html
you might try a pfa and see what the communalities look like.
single variable had dummies.
can go to something like 1022 or 1023 depending on the sign.
your problem. I would urge you to post a description of what you are
trying to do and the problem you ran into on the lists I mentioned..
Just to stir the pot. Is it possible that the relation of the variables
> Art,
>
> Thanks for your interesting response. We used PCA, with 1.00
> commonality, i.e. extracting 100% variance. By "classical" I meant
> parametric factor analysis and not any form of optimal scaling or
> alternating least squares forms of data reduction. I am now
> considering these other alternatives under advice of Anita van der Kooij.
>
>
>
> I share your uncertainty about why results change depending on which
> category is omitted, and that was the original question starting this
> thread. Since nobody else seems to have an answer I will offer one
> purely numerical hypothesis. The exercise with the varying results, it
> turns out, was done by my colleague not with the entire census but
> with a SAMPLE of the Peru census (about 200,000 households, still a
> lot but perhaps not so much for so many variables and factors), and
> the contributions of latter factors were pretty small. SPSS provides,
> as is well known, a precision no higher than 15 decimal places approx.
> So it is just possible that some matrix figures for some of the minor
> factors differed only on the 15th or 16th decimal place (or further
> down), and then were taken as equal, and this may have caused some
> matrix to be singular or (most probably) near singular, and the
> results to be still computable but unstable. Moreover, some of the
> categories in census questions used as reference or omitted categories
> were populated by very few cases, which may have compounded the
> problem. Since running this on the entire census (which would enhance
> statistical significance and stability of results) takes a lot of
> computer time and has to be done several times with different
> reference categories, we have not done it yet but will proceed soon
> and report back. But I wanted to know whether some mathematical reason
> existed for the discrepancy.
>
>
>
> About why I would want to create a single score out of multiple
> factors, let us leave it for another occasion since it is a rather
> complicated story of a project connecting factor analysis with index
> number theory and economic welfare theory.
>
>
>
> Hector
>
>
>
> ------------------------------------------------------------------------
>
> De: Art Kendall [mailto:
[hidden email]]
> Enviado el: Friday, August 18, 2006 9:34 AM
> Para: Kooij, A.J. van der;
[hidden email]
> CC:
[hidden email]
> Asunto: Re: Reference category for dummies in factor analysis
>
>
>
> This has been an interesting discussion. I don't know why the FA and
> scores would change depending on which category is omitted. Were
> there errors in recoding to dummies that could have created different
> missing values?
>
>
> You also said classical FA, but then said PCA. What did you use for
> communality estimates.? 1.00? Squared multiple correlations?
>
> (I'm not sure why you would create a single score if you have multiple
> factors either, but that is another question.)
>
> What I do know is that people who know a lot more about CA, MDS, and
> factor analysis than I do ( Like Joe Kruskal, Doug Carroll, Willem
> Heisser, Phipps Arabie, Shizuhiko Nishimoto, et al) follow the
> class-l and mpsych-l discussion lists.
> see
>
>
http://aris.ss.uci.edu/smp/mpsych.html>
> and
>
>
http://www.classification-society.org/csna/lists.html#class-l>
> Art Kendall
>
[hidden email] <mailto:
[hidden email]>
>
>
>
> Kooij, A.J. van der wrote:
>
>>... trouble because any category of each original census question would be an exact linear
>>
>>function of the remaining categories of the question.
>>
>>
>>
>Yes, but this gives trouble in regression, not in PCA, as far as I know.
>
>
>
>
>
>>In the indicator matrix, one category will have zeroes on all indicator variables.
>>
>>
>>
>No, and, sorry, I was confused with CA on indicator matrix, but this is "sort of" PCA. See syntax below (object scores=component scores are equal to row scores CA, category quantifications equal to column scores CA).
>
>Regards,
>
>Anita.
>
>
>
>
>
>data list free/v1 v2 v3.
>
>
>
>begin data.
>
>
>
>1 2 3
>
>
>
>2 1 3
>
>
>
>2 2 2
>
>
>
>3 1 1
>
>
>
>2 3 4
>
>
>
>2 2 2
>
>
>
>1 2 4
>
>
>
>end data.
>
>
>
>
>
>
>
>Multiple Correspondence v1 v2 v3
>
>
>
> /analysis v1 v2 v3
>
>
>
> /dim=2
>
>
>
> /critit .0000001
>
>
>
> /print discrim quant obj
>
>
>
> /plot none.
>
>
>
>
>
>
>
>catpca v1 v2 v3
>
>
>
> /analysis v1 v2 v3 (mnom)
>
>
>
> /dim=2
>
>
>
> /critit .0000001
>
>
>
> /print quant obj
>
>
>
> /plot none.
>
>
>
>
>
>
>
>data list free/v1cat1 v1cat2 v1cat3 v2cat1 v2cat2 v2cat3 v3cat1 v3cat2 v3cat3 v3cat4 .
>
>
>
>begin data.
>
>
>
>1 0 0 0 1 0 0 0 1 0
>
>
>
>0 1 0 1 0 0 0 0 1 0
>
>
>
>0 1 0 0 1 0 0 1 0 0
>
>
>
>0 0 1 1 0 0 1 0 0 0
>
>
>
>0 1 0 0 0 1 0 0 0 1
>
>
>
>0 1 0 0 1 0 0 1 0 0
>
>
>
>1 0 0 0 1 0 0 0 0 1
>
>
>
>end data.
>
>
>
>
>
>
>
>CORRESPONDENCE
>
>
>
> TABLE = all (7,10)
>
>
>
> /DIMENSIONS = 2
>
>
>
> /NORMALIZATION = cprin
>
>
>
> /PRINT = RPOINTS CPOINTS
>
>
>
> /PLOT = none .
>
>
>
>
>
>
>
>________________________________
>
>
>
>From: SPSSX(r) Discussion on behalf of Hector Maletta
>
>Sent: Thu 17/08/2006 19:56
>
>To:
[hidden email] <mailto:
[hidden email]>
>
>Subject: Re: Reference category for dummies in factor analysis
>
>
>
>
>
>
>
>Thank you, Anita. I will certainly look into your suggestion about CATCPA.
>
>However, I suspect some mathematical properties of the scores generated by
>
>CATPCA are not the ones I hope to have in our scale, because of the
>
>non-parametric nature of the procedure (too long to explain here, and not
>
>sure of understanding it myself).
>
>As for your second idea, I think if you try to apply PCA on dummies not
>
>omitting any category you'd run into trouble because any category of each
>
>original census question would be an exact linear function of the remaining
>
>categories of the question. In the indicator matrix, one category will have
>
>zeroes on all indicator variables, and that one is the "omitted" category.
>
>Hector
>
>
>
>
>
>-----Mensaje original-----
>
>De: SPSSX(r) Discussion [mailto:
[hidden email]] En nombre de
>
>Kooij, A.J. van der
>
>Enviado el: Thursday, August 17, 2006 2:37 PM
>
>Para:
[hidden email] <mailto:
[hidden email]>
>
>Asunto: Re: Reference category for dummies in factor analysis
>
>
>
>CATPCA (in Data Reduction menu, under Optimal Scaling) is PCA for
>
>(ordered//ordinal and unorderd/nominal) categorical variables; no need to
>
>use dummies then.
>
>Using PCA on dummies I think you should not omit dummies (for nominal
>
>variables you can do PCA on an indicator maxtrix (that has columns that can
>
>be regarded as dummy variables; a column for each category, thus without
>
>omitting one)).
>
>
>
>Regards,
>
>Anita van der Kooij
>
>Data Theory Group
>
>Leiden University.
>
>
>
>________________________________
>
>
>
>From: SPSSX(r) Discussion on behalf of Hector Maletta
>
>Sent: Thu 17/08/2006 17:52
>
>To:
[hidden email] <mailto:
[hidden email]>
>
>Subject: Reference category for dummies in factor analysis
>
>
>
>
>
>
>
>Dear colleagues,
>
>
>
>I am re-posting (slightly re-phrased for added clarity) a question I sent
>
>the list about a week ago without eliciting any response as yet. I hope some
>
>factor analysis experts may be able to help.
>
>
>
>In a research project on which we work together, a colleague of mine
>
>constructed a scale based on factor scores obtained through classical factor
>
>analysis (principal components) of a number of categorical census variables
>
>all transformed into dummies. The variables concerned the standard of living
>
>of households and included quality of dwelling and basic services such as
>
>sanitation, water supply, electricity and the like. (The scale was not
>
>simply the score for the first factor, but the average score of several
>
>factors, weighted by their respective contribution to explaining the overall
>
>variance of observed variables, but this is, I surmise, beside the point.)
>
>
>
>Now, he found out that the choice of reference or "omitted" category for
>
>defining the dummies has an influence on results. He first ran the analysis
>
>using the first category of all categorical variables as the reference
>
>category, and then repeated the analysis using the last category as the
>
>reference or omitted category, whatever they might be. He found that the
>
>resulting scale varied not only in absolute value but also in the shape of
>
>its distribution.
>
>
>
>I can understand that the absolute value of the factor scores may change and
>
>even the ranking of the categories of the various variables (in terms of
>
>their average scores) may also be different, since after all the list of
>
>dummies used has varied and the categories are tallied each time against a
>
>different reference category. But the shape of the scale distribution should
>
>not change, I guess, especially not in a drastic manner. In this case the
>
>shape of the scale frequency distribution did change. Both distributions
>
>were roughly normal, with a kind of "hump" on one side, one of them on the
>
>left and the other on the right, probably due to the change in reference
>
>categories, but also with changes in the range of the scale and other
>
>details.
>
>
>
>Also, he found that the two scales had not a perfect correlation, and
>
>moreover, that their correlation was negative. That the correlation was
>
>negative may be understandable: the first category in such census variables
>
>is usually a "good" one (for instance, a home with walls made of brick or
>
>concrete) and the last one is frequently a "bad" one (earthen floor) or a
>
>residual heterogeneous one including bad options ("other" kinds of roof).
>
>But since the two scales are just different combinations of the same
>
>categorical variables based on the same statistical treatment of their given
>
>covariance matrix, one should expect a closer, indeed a perfect correlation,
>
>even if a negative one is possible for the reasons stated above. Changing
>
>the reference category should be like changing the unit of measurement or
>
>the position of the zero point (like passing from Celsius to Fahrenheit), a
>
>decision not affecting the correlation coefficient with other variables. In
>
>this case, instead, the two scales had r = -0.54, implying they shared only
>
>29% of their variance, even in the extreme case when ALL the possible
>
>factors (as many as variables) were extracted and all their scores averaged
>
>into the scale, and therefore the entire variance, common or specific, of
>
>the whole set of variables was taken into account).
>
>
>
>I should add that the dataset was a large sample of census data, and all the
>
>results were statistically significant.
>
>
>
>Any ideas why choosing different reference categories for dummy conversion
>
>could have such impact on results? I would greatly appreciate your thoughts
>
>in this regard.
>
>
>
>Hector
>
>
>
>
>
>
>
>**********************************************************************
>
>This email and any files transmitted with it are confidential and
>
>intended solely for the use of the individual or entity to whom they
>
>are addressed. If you have received this email in error please notify
>
>the system manager.
>
>**********************************************************************
>
>
>
>
>
>
>