|
Hello,
I have a question concerning a hierarchical cluster analysis. I want to cluster a group of persons according to their responses to - lets say - twenty questions q1 to q20. Those questions are equally scaled from 1 to 10. Certain kinds of attitudes are measured. Now, some of those respondands did not answer one question (="dn/na") -while they answered the other ones. For those cases - after a ca has been performed - no cluster will be assigned - as one (or more) variables were "missing" for this case. How can I deal with the problem? - Is it better to find a solution that fills the missing values before running a cluster analysis? Which method/algorithm can be used? - Or is there a cluster algor. that tolerates some missing values in a series of variables used for clustering? Thank you for an advice Carsten. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
As far as I know, clustering procedures in SPSS exclude all cases with at
least one missing value in some of the relevant variables or include them all considering all values as valid. On the other side, the Missing Values component of SPSS can assign valid values to the missing cases based on their responses to other questions (not only their other actitudinal responses in your case, but also background variables such as age, sex, education, occupation, etc.). I think you have the following options: 1. Leave out those cases, for good. If they are very few, that would probably do no harm to your research, nor unduly reduce your sample by much. 2. Use the non-valid values (give them some non-system-missin numerical code, such as 9 or 99) as it they were valid. Remember, however, that the values in those cases do not represent an amount or quantity, so the variable (if it was an interval scale) would become a categorical variable. Some clustering procedures do not tolerate categorical variables. All in all I do not like this option. 3. Assign estimated valid values to cases with non-valid ones, using the Missing Value component, and proceed as if missing values never existed. Assigning them the mean can be a very crude solution, because (given the other responses a subject has given) the mean may not be her more likely answer to the missing question. 4. If the valid responses range from positive to negative as in a Likert scale, you may have the option of considering missing values as an indifferent response, like "I don't care', and assign them to a middle value. This is, however, dangerous because it assumes a meaning for the missing values, and perhaps they do not have that meaning in all cases. Perhaps the subjects did care, and had a definite opinion, but somehow omitted the answer. Hope this helps. Hector -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Carsten Pauck Sent: 07 August 2008 11:32 To: [hidden email] Subject: Missing values and cluster analysis Hello, I have a question concerning a hierarchical cluster analysis. I want to cluster a group of persons according to their responses to - lets say - twenty questions q1 to q20. Those questions are equally scaled from 1 to 10. Certain kinds of attitudes are measured. Now, some of those respondands did not answer one question (="dn/na") -while they answered the other ones. For those cases - after a ca has been performed - no cluster will be assigned - as one (or more) variables were "missing" for this case. How can I deal with the problem? - Is it better to find a solution that fills the missing values before running a cluster analysis? Which method/algorithm can be used? - Or is there a cluster algor. that tolerates some missing values in a series of variables used for clustering? Thank you for an advice Carsten. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Carsten Pauck
Quoting Carsten Pauck <[hidden email]>:
> Hello, > > I have a question concerning a hierarchical cluster analysis. I want > to cluster a group of persons according to their responses ... > > Now, some of those respondands did not answer one question (="dn/na") > -while they answered the other ones. Hector Maletta has offered some good advice, but there is an additional point that you might consider. You are trying to group "similar" people together, and if there were several people who left the same question or questions unanswered this might indicate that they are similar. Consider an example of a study into people's attitudes to sexuality. Those who were too embarrassed to answer some questions might well be similar in some respect, and other questions too might locate the reason for the refusal and the similarity. In any study where there are missing values you need to consider WHY they are missing, and there are dozens of possible reasons. However, the important distinction is between "missing at random" and other kinds of missing response. Imagine that you have collected all of the data, and that someone randomly deletes numbers here and there. There is no relationship between the characteristics of the person and the fact that they have missing data. This is "missing at random" and such numbers can often be replaced quite usefully with the mean values, or even better with values of the other variables to "predict" what the missing values might have been - see the missing value module in SPSS if you have it. When variables are not missing at random you need to think very carefully about WHY they might be missing, and your conclusion about this will guide the next stage of your analysis. David Hitchin ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
