Hi
I am struggling with a PCA on dichotmous data (1-0). My data stem from a content analysis of 726 paragraphs. For each paragrpah the presence for each of 18 codes was indicated. I am looking for racial idoelogies and am only interested in the relationship between the codes. I performed a PCA on the paragrpahs-codes matrix (with varimax) and got a nicely interpretable solution. Now, I am unsure if I can report on this PCA results because my data are categorical. I tried to do CATPAC and but I don't know if you can rotate the components in a way similar to VARIMAX and if you can save component scores. thanks! Kaat |
Hello,
Thank you for your email. I will be out of the office beginning Tuesday, July 6th, returning on Thursday, July 22nd. In my absence, please contact Dan Buchanan, Director of Financial Policy via email at [hidden email] or at 905-851-8821 X229. Thanks! Genevieve Odoom Policy and Program Analyst OANHSS Suite 700 - 7050 Weston Rd. Woodbridge, ON L4L 8G7 Tel: (905) 851-8821 x 241 Fax: (905) 851-0744 [hidden email] www.oanhss.org<https://mail.oanhss.org/ecp/Organize/www.oanhss.org> ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Kaat
Kaat,
Dichotomous variables can be legitimately seen as internal variables. The trouble with categorical variables such as nationality or ethnicity emerges from the fact that the intervals or real differences between categories cannot be given a reasonable value. But with dichotomous variables the problem does not exist: you have a variable with only two possible values, therefore one possible interval. Define that interval (difference between Yes and No) as your unit of measurement, such that passing from No to Yes is a unit increment. Since this only interval needs not be compared no anything else, you do not have any ambiguity. CATPCA es for variables with 3 or more categories (ordered or unordered). For dichotomies, it gives the same solution of tradicional PSA. Thus your solution is OK. Now, suppose you start with multi-categorial questions, and reduce them to a series of dummies. In this case, even if the original variables where strongly inter-correlated, you are generated some correlations that are necessarily negative (for instance, wooden roofs will be negatively correlated with tile roofs, because one excludes the other, thus producing a lot of negative correlation coefficients in an extended matriz of categories for all variables, even if the original variables WERE positively correlated (in the sense that poor walls were correlated with poor roofs and poor sanitary services). This distorts the results of factor analysis, including reducing the variance explained by the first factor. Thus, CATCA should be used for problems involving multi-category variables, and classical factor analysis for the rest (including authentic binary variables such as Yes-No questions). Beware than unlike classic FACTOR command, the CATPCA command requires holding the entire dataset in memory, thus greately limiting the number of cases and variables you can process. Perhaps using a stripped down database with kist the variables you actually need may allow CATCPA to work, but with large dataset it doesn't work. Hector -----Mensaje original----- De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Kaat Enviado el: Tuesday, July 06, 2010 2:05 PM Para: [hidden email] Asunto: factor analysis on dichotomous data Hi I am struggling with a PCA on dichotmous data (1-0). My data stem from a content analysis of 726 paragraphs. For each paragrpah the presence for each of 18 codes was indicated. I am looking for racial idoelogies and am only interested in the relationship between the codes. I performed a PCA on the paragrpahs-codes matrix (with varimax) and got a nicely interpretable solution. Now, I am unsure if I can report on this PCA results because my data are categorical. I tried to do CATPAC and but I don't know if you can rotate the components in a way similar to VARIMAX and if you can save component scores. thanks! Kaat -- View this message in context: http://old.nabble.com/factor-analysis-on-dichotomous-data-tp29086718p2908671 8.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD Se certificó que el correo entrante no contiene virus. Comprobada por AVG - www.avg.es Versión: 8.5.439 / Base de datos de virus: 271.1.1/2984 - Fecha de la versión: 07/06/10 06:36:00 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Thanks a lot for your clear explanation.
I now understand why I can consider my data as interval data. However, another assumption of factor analysis is that that the variables are linearly related. My yes-no codes do not represent an underlying linear dimension but simply indicate whether a certain belief is expressed or not. Can my variables be linearly related then? I know that Phi point correlations can be calculated and that they are similar to Pearson correlations, but does this also imply that the relationship between the variables is considered a linear relationship? Sorry to insist on this, but I am working on a paper and these are comments I got when presenting this data. Kaat
|
Dichotomous variables represent linear relationships. It cam be proved that the percentage difference of the dep variable between the two categories of another dichotomous variable is algebraically equivalent to a linear regression coefficient, and the phi coefficient is also equivalent to the linear correlation coefficient for two variables with only two values each. Don't have the reference at hand but they are solid results known from years.
Hector -----Mensaje original----- De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Kaat Enviado el: Tuesday, July 06, 2010 5:06 PM Para: [hidden email] Asunto: Re: factor analysis on dichotomous data Thanks a lot for your clear explanation. I now understand why I can consider my data as interval data. However, another assumption of factor analysis is that that the variables are linearly related. My yes-no codes do not represent an underlying linear dimension but simply indicate whether a certain belief is expressed or not. Can my variables be linearly related then? I know that Phi point correlations can be calculated and that they are similar to Pearson correlations, but does this also imply that the relationship between the variables is considered a linear relationship? Sorry to insist on this, but I am working on a paper and these are comments I got when presenting this data. Kaat Hector Maletta wrote: > > Kaat, > Dichotomous variables can be legitimately seen as internal variables. The > trouble with categorical variables such as nationality or ethnicity > emerges > from the fact that the intervals or real differences between categories > cannot be given a reasonable value. But with dichotomous variables the > problem does not exist: you have a variable with only two possible values, > therefore one possible interval. Define that interval (difference between > Yes and No) as your unit of measurement, such that passing from No to Yes > is > a unit increment. Since this only interval needs not be compared no > anything > else, you do not have any ambiguity. > CATPCA es for variables with 3 or more categories (ordered or unordered). > For dichotomies, it gives the same solution of tradicional PSA. Thus your > solution is OK. > > Now, suppose you start with multi-categorial questions, and reduce them to > a > series of dummies. In this case, even if the original variables where > strongly inter-correlated, you are generated some correlations that are > necessarily negative (for instance, wooden roofs will be negatively > correlated with tile roofs, because one excludes the other, thus producing > a > lot of negative correlation coefficients in an extended matriz of > categories > for all variables, even if the original variables WERE positively > correlated > (in the sense that poor walls were correlated with poor roofs and poor > sanitary services). This distorts the results of factor analysis, > including > reducing the variance explained by the first factor. > > Thus, CATCA should be used for problems involving multi-category > variables, > and classical factor analysis for the rest (including authentic binary > variables such as Yes-No questions). > Beware than unlike classic FACTOR command, the CATPCA command requires > holding the entire dataset in memory, thus greately limiting the number of > cases and variables you can process. Perhaps using a stripped down > database > with kist the variables you actually need may allow CATCPA to work, but > with > large dataset it doesn't work. > > Hector > > -----Mensaje original----- > De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de > Kaat > Enviado el: Tuesday, July 06, 2010 2:05 PM > Para: [hidden email] > Asunto: factor analysis on dichotomous data > > Hi > > I am struggling with a PCA on dichotmous data (1-0). > My data stem from a content analysis of 726 paragraphs. For each paragrpah > the presence for each of 18 codes was indicated. I am looking for racial > idoelogies and am only interested in the relationship between the codes. > I performed a PCA on the paragrpahs-codes matrix (with varimax) and got a > nicely interpretable solution. > Now, I am unsure if I can report on this PCA results because my data are > categorical. I tried to do CATPAC and but I don't know if you can rotate > the > components in a way similar to VARIMAX and if you can save component > scores. > > thanks! > Kaat > > -- > View this message in context: > http://old.nabble.com/factor-analysis-on-dichotomous-data-tp29086718p2908671 > 8.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > Se certificó que el correo entrante no contiene virus. > Comprobada por AVG - www.avg.es > Versión: 8.5.439 / Base de datos de virus: 271.1.1/2984 - Fecha de la > versión: 07/06/10 06:36:00 > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > > -- View this message in context: http://old.nabble.com/factor-analysis-on-dichotomous-data-tp29086718p29089587.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD Se certificó que el correo entrante no contiene virus. Comprobada por AVG - www.avg.es Versión: 8.5.439 / Base de datos de virus: 271.1.1/2984 - Fecha de la versión: 07/06/10 18:36:00 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
or look at it this graphically: With CATPCA, nonlinear relations are expressed in the category quantifications. In the transformation plot (categories on x-axis, category quantifications on y-axis), the nature of the relation of a variable with other variables is reflected in the form of the transformation curve. If a variable is non-linearly related to other variables, the transformation plot shows a non-linear curve; a transformation plot showing a curve that is (almost) a straight line reflects a linear relation. With only 2 categories there are only 2 points. The best-fitting line through 2 points is always a straight line. Thus, the transformation plot for a dichotome variable always shows a straight line, implying that a dichotome variable is always linearly related to other variables. Kind regards, Anita van der Kooij Data Theory Group From: SPSSX(r) Discussion on behalf of Hector Maletta Sent: Tue 06-Jul-10 23:23 To: [hidden email] Subject: Re: factor analysis on dichotomous data Dichotomous variables represent linear relationships. It cam be proved that the percentage difference of the dep variable between the two categories of another dichotomous variable is algebraically equivalent to a linear regression coefficient, and the phi coefficient is also equivalent to the linear correlation coefficient for two variables with only two values each. Don't have the reference at hand but they are solid results known from years. ********************************************************************** This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager. **********************************************************************
|
In reply to this post by Hector Maletta
Hi Hector
The answer you gave Kaat is also very helpful for me. I have 2 additional questions: 1. Do you have any literature at hand I could cite concerning the fact that for authentic binary variables classical factor analysis (not CATPCA) should be used? 2. I performed a factor analysis and a CATPCA with the same data (13 binary variables, 3 components) using SPSS 23. The resulting component loadings are practically the same with classical factor analysis as with CATPCA, but with CATPCA (not with classical factor analysis) the varimax-rotation fails (error message "Rotation failed to converge in 5 iterations. (Convergence = .000)."). Do you have an explanation for this difference? Thank you! Mirjam |
Mirjam
Allow me to meddle. For binary variables CATPCA is equivalent to standard PCA. Because a dichotomous numeric variable can be quantified monotonically but in a linear fashion. So, you don't have to use CATPCA unless some of your variables are tritomous or more. The slight difference between PCA and CATPCA you are observing with all variables binary are merely due to the fact that CATPCA is iterative and uses just slightly different standardization. You may ignore the difference. For finding of varimax error - I can't say anything. Are you doing Factor analysis or PCA? You may do PCA on binary variables. However, it is not quite proper to do linear factor analysis on binary data. See my response http://stats.stackexchange.com/a/16335/3277 and http://stats.stackexchange.com/a/186026/3277 21.12.2015 13:36, Mirjam пишет:
Hi Hector The answer you gave Kaat is also very helpful for me. I have 2 additional questions: 1. Do you have any literature at hand I could cite concerning the fact that for authentic binary variables classical factor analysis (not CATPCA) should be used? 2. I performed a factor analysis and a CATPCA with the same data (13 binary variables, 3 components) using SPSS 23. The resulting component loadings are practically the same with classical factor analysis as with CATPCA, but with CATPCA (not with classical factor analysis) the varimax-rotation fails (error message "Rotation failed to converge in 5 iterations. (Convergence = .000)."). Do you have an explanation for this difference? Thank you! Mirjam -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/factor-analysis-on-dichotomous-data-tp1081162p5731158.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |