|
Hi all,
I am trying to do a PCA with SPSS. I am doing it from the optimal scaling options with categorical principal components. All my variables are presence/absence (1/0) data. I have 29 variables and 13,000 samples. But I am not getting any output - it says "A case(s) has only missing data on the active variables, all to be treated as passive. The case(s) is handled as a supplementary object (s).The following variables have less than 3 valid active cases: Q7B41, QQ7B31. Only the Descriptive Statistics tables can be computed. This command is not executed" I cannot understand the output, what does active variables mean? It would be very helpful for me if I could get some suggestions and help regarding this. Thank you, Nabaneeta ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
1. You have an insufficient number of valid cases for teo of the variables.
That is the reason why the analysis is not performed. Possibly negative cases in those variables have been left blank (system missing): if that is the case, you may recode the SYSMIS cases to 0. For instance RECODE Q7B41 QQ7B31 (SYSMIS=0). Please be sure that blank is really equivalent to a negative answer before doing this. 2. For binary variables (coded 1/0 or whatever other values) using categorical principal component analysis (command CATPCA in the Categories module) is exactly the same as using classical principal component analysis (command FACTOR in SPSS Base module). PCA converts all variables into z scores (centered on the mean with unit standard deviation) and therefore the actual numerical values do not matter in the case of binary variables where only one interval exists: code the two categories as 1 and 0 or as 1000 and 0, or as 15456 and 3428, and the results will always be the same. 3. Some people have qualms about using PCA or other factor analysis on binary variables, on the grounds that residuals would not be normally distributed: e.g. if a variable is coded 0/1, the average will be a fraction p such that 0<p<1, and the residuals will be a the extremes (0-p or 1-p) in a bimodal and utterly un-normal distribution. But those qualms are dismissed by most people, who happily go along applying PCA to binary data. Hector -----Mensaje original----- De: SPSSX(r) Discussion [mailto:[hidden email]] En nombre de Nabaneeta Saha Enviado el: Sunday, October 10, 2010 1:17 PM Para: [hidden email] Asunto: PCA with categorical (0/1) variables Hi all, I am trying to do a PCA with SPSS. I am doing it from the optimal scaling options with categorical principal components. All my variables are presence/absence (1/0) data. I have 29 variables and 13,000 samples. But I am not getting any output - it says "A case(s) has only missing data on the active variables, all to be treated as passive. The case(s) is handled as a supplementary object (s).The following variables have less than 3 valid active cases: Q7B41, QQ7B31. Only the Descriptive Statistics tables can be computed. This command is not executed" I cannot understand the output, what does active variables mean? It would be very helpful for me if I could get some suggestions and help regarding this. Thank you, Nabaneeta ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD Se certificó que el correo entrante no contiene virus. Comprobada por AVG - www.avg.es Versión: 8.5.448 / Base de datos de virus: 271.1.1/3185 - Fecha de la versión: 10/10/10 06:34:00 ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Nabaneeta Saha
Hi all,
Thank you Hector. I deleted those two variables from the analysis bcoz they did not have any positive cases. But when I ran the analysis again by imputing missing values with the mode - the output says some of the variables have zero variances due to treating massing values as passive or listwise, so the command could not be executed. I do not understand this. Should I treat the missing values as extra caterogy. What does the option treating missing value with extra category imply.
Thank you once again.
Nabaneeta
On Sun, Oct 10, 2010 at 12:16 PM, Nabaneeta Saha <[hidden email]> wrote: Hi all, |
|
Excluding the two variables with no positive cases is OK: they
are not really “variables” but constants. Now, about your pesky missing values: The normal procedure is excluding a case if ANY of the variables
is missing. This is called LISTWISE. The problem when you have many variables
is that possibly too many cases would be excluded. There are several alternatives: 1.
Exclude missing cases pairwise instead of listwise. This option would
use each case for all the calculations not involving the missing variable. Not
recommendable because the results may not be consistent (a different set of
cases is being used to calculate the various coefficients). 2.
Impute an estimated value to the missing values. This can be
done with the MISSING VALUES component of SPSS, or (in the case of binary
variables) by using logistic regression. But logistic regression does not
estimate the VALUE that is missing, but the PROBABILITY that the missing value actually
equals 1. For instance, if Y is a binary variable with missing values, and its
value can be predicted by other variables A, B, C, … Z, which need not be
binary, the syntax in a simple example would be: LOGISTIC REGRESSION Y WITH A B C D … Z/METHOD ENTER/SAVE
PRED(prob_Y). I have specified METHOD ENTER to generate a predictive equation
with all the mentioned predictors (A B C … Z), but some predictors may not
be statistically significant; an alternative is METHOD FSTEP, which enters the
predictors one by one, starting with the best, and stops when no significant
predictors remain. You need to run this for enough variables with missing values as
to generate a sufficient number of valid cases. Start with the variables with
more missing values. Each time you run LOG REG, a new variable is generated for
each case (see subcommand SAVE above), namely the predicted probability of a
positive response, here called PROB_Y (but you need to change the name for each
variable). Having these probabilities there are two choices: (a) assign the
value 1 whenever p>0.5, and zero otherwise. This may be tricky, especially
if all or most probabilities are <0.5 or >0.5, thus assigning all or
almost all cases to one single category; (b) Use the saved probability
as a value whenever the true value is missing; this converts your binary
variable into a continuous one: most cases will have values 0 or 1, but some
may have a fraction between 0 and 1. Then proceed as before. 3.
As you suggest, you may consider the missing value as a third or
extra category, so that your formerly binary variable has now three categories
(Yes, No, NA). If you are using CATPCA, it would generate a numerical value for
each category, treating the variable as a nominal scale with no particular
ordering. 4.
In some situations, especially in self-administered
questionnaires, you may be pretty certain that missing values are actually
negative responses: people intending to give a negative response, if asked to
mark yes or no, may carelessly leave the answer blank (meaning “No”).
If you can be sure of this, just recode the missing values as zeroes. But be
careful: do it in a new variable under a new name, lest you lose the original
information. And be sure that you can be sure about the meaning of those
blanks. Hope these ideas help. Hector De: SPSSX(r)
Discussion [mailto:[hidden email]] En nombre de Nabaneeta Saha Hi all, Thank you Hector. I deleted those two variables from the
analysis bcoz they did not have any positive cases. But when I ran the analysis
again by imputing missing values with the mode - the output says some of the
variables have zero variances due to treating massing values as passive or
listwise, so the command could not be executed. I do not understand this.
Should I treat the missing values as extra caterogy. What does the option
treating missing value with extra category imply. Thank you once again. Nabaneeta On Sun, Oct 10, 2010 at 12:16 PM, Nabaneeta Saha <[hidden email]> wrote: Hi all, Se certificó
que el correo entrante no contiene virus. |
|
In reply to this post by Nabaneeta Saha
If you have binary data, you should code the values 1,2.
The Leiden routines typically set 0 to missing. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Nabaneeta Saha Sent: Sunday, October 10, 2010 11:17 AM To: [hidden email] Subject: PCA with categorical (0/1) variables Hi all, I am trying to do a PCA with SPSS. I am doing it from the optimal scaling options with categorical principal components. All my variables are presence/absence (1/0) data. I have 29 variables and 13,000 samples. But I am not getting any output - it says "A case(s) has only missing data on the active variables, all to be treated as passive. The case(s) is handled as a supplementary object (s).The following variables have less than 3 valid active cases: Q7B41, QQ7B31. Only the Descriptive Statistics tables can be computed. This command is not executed" I cannot understand the output, what does active variables mean? It would be very helpful for me if I could get some suggestions and help regarding this. Thank you, Nabaneeta ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Nabaneeta Saha
Thank you very much Anthony. It worked! I changed the coding to 1/2.
Thank you Hector! I gained a lot of information regarding tackling missing values - my concept is much clear now!
Nabaneeta
On Sun, Oct 10, 2010 at 12:16 PM, Nabaneeta Saha <[hidden email]> wrote: Hi all, |
| Free forum by Nabble | Edit this page |
