|
I am interested in analyzing the percentage of missing data per case (i.e how many questions are missing per individual). What is the best way to do this in SPSS?
Thanks. Amanda Brouwer Patient Advocacy & Treatment Lab |
|
Amanda,
Your question is agonizingly unspecific. Let's break it apart. You need to calculate the within-respondent percentage of questions with missing data. Do you know how to do that? Do your questions have a mix of numeric and string values? Do you know how to handle that problem when computing percentage missing? Given the within-respondent missing data percentage, do you have questions about how to analyze it? Please indicate where you need help. Gene Maguin >>I am interested in analyzing the percentage of missing data per case (i.e how many questions are missing per individual). What is the best way to do this in SPSS? Thanks. Amanda Brouwer Patient Advocacy & Treatment Lab Pearse Hall, B53 University of Wisconsin - Milwaukee P.O. Box 413 Milwaukee, WI 53201 [hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Amanda Brouwer
Amanda,
There are a number of ways of taking on this question. If you simply want to know how many times a person had missing data on a set of five items, labelled Q1 to Q5, then you can use a statement like this. count tmiss = q1 q2 q3 q4 q5 (missing) . The variable tmiss will tell you the number of times each subject had missing data on the five questions. On the other hand, if you are interested in looking for patterns in missing data, imputing missing data, etc., then you would want to look into the Missing Values Analysis (MVA) procedures in SPSS. Best, Steve Brand www.StatisticsDoc.com ---- Amanda Brouwer <[hidden email]> wrote: > > > I am interested in analyzing the percentage of missing data per case (i.e how� many� questions are missing per� individual). What is the best way to do this in SPSS? > > > > Thanks. > > -- > > > > > Amanda Brouwer > > Patient Advocacy & Treatment Lab > Pearse Hall, B53 > University of Wisconsin - Milwaukee > P.O. Box 413 > Milwaukee, WI 53201 > [hidden email] -- For personalized and experienced consulting in statistics and research design, visit www.statisticsdoc.com ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Amanda Brouwer
Amanda,
I put this back up on the list so that others can see where you want help because I think there may well be more generic methods (i.e., ones using python) of solving your problem than I can discuss. My advice on string variables is not very simple. Perhaps another person can give you a simpler method for string variables. The big pain problem is that you have string and numeric variables intermixed. (This is where python may be a lot better.) My only solution is to build two lists: a numeric variables list and a string variables list. The numeric variables are easy. Just this: Count totmiss=v1 to v5 v6 v8 v15 to v119(missing). * note the distinction between 'missing' and 'sysmis' as keywords. I have less experience with strings because I seldom work with them in the context of missing data. One problem is whether you have used ' ', that is, blanks, to indicate missing data for an A4 variable. Or, if you have declared a particular value to designate missing data, e.g., 'MD', 'NR', or 'DK'. I don't know the specifics of your data and that makes it hard to be specific. One method is to recode your string variables into numerics so that 0 equals data not missing and 1 equals data missing, and then do a count operation on these new variables. If you don't want to do that, read on! Suppose all your string variables are the same width, e.g., A4, and you use both blanks and a code, e.g., 'NR' to designate missing and both blanks and 'NR' are declared missing via the missing values command. Then, this should work. Count totmiss2=v4 v9 to v14(missing). If 'NR' and blanks are not declared missing, you could declare them missing and use the above statement or do this. Note page 326 in the syntax reference manual. Count totmiss2=v4 v9 to v14('NR ',' '). But, suppose you had string variables with a range of widths. If you have declared values to be missing for each variable. I think this should work but I've never used it. Count totmiss2=v120 v125 v144 v169(missing). If you don't missing values declared, it may be easier to do so but you could also do this. Count totmiss2=v4('NR',' ')/v89('NR ')/v235('NR '). Where V4 is A2, V89 is A3 and V235 is A8. The main thing I am unsure of is how spss handles mismatches between the length of the test string, e.g., 'NR ' and the target variable. Lastly, if you had lots of string variables and they were all different lengths, I think I'd try this (never used it but it seems like it should work). Note that I'm testing for a blank string or the presence of 'NR'. By the way, please note that I think the char.length function returns a value of 0 for a blank string but it may return a value of 1. Not sure about this. Test first. Compute totmiss1=0. Do repeat x=v278 to 305. + if (char.length(x) eq 0) or substr(x,1,2) eq 'NR') totmiss1=totmiss1+1. End repeat. >>Gene - Yes, I need to calculate the percentage of questions with missing data. No, I do not know how to do this. Yes, I do have a mix of string and numeric values. No, I do not know how to address this problem when computing the percentages. >>I am interested in analyzing the percentage of missing data per case (i.e how many questions are missing per individual). What is the best way to do this in SPSS? Thanks. Amanda Brouwer Patient Advocacy & Treatment Lab Pearse Hall, B53 University of Wisconsin - Milwaukee P.O. Box 413 Milwaukee, WI 53201 [hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
