Analyzing the percentage of missing data per case

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Analyzing the percentage of missing data per case

Amanda Brouwer

I am interested in analyzing the percentage of missing data per case (i.e how many questions are missing per individual). What is the best way to do this in SPSS?

 

Thanks.

--

Amanda Brouwer

Patient Advocacy & Treatment Lab
Pearse Hall, B53
University of Wisconsin - Milwaukee
P.O. Box 413
Milwaukee, WI 53201
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: Analyzing the percentage of missing data per case

Maguin, Eugene
Amanda,

Your question is agonizingly unspecific. Let's break it apart. You need to
calculate the within-respondent percentage of questions with missing data.
Do you know how to do that? Do your questions have a mix of numeric and
string values? Do you know how to handle that problem when computing
percentage missing? Given the within-respondent missing data percentage, do
you have questions about how to analyze it? Please indicate where you need
help.

Gene Maguin


>>I am interested in analyzing the percentage of missing data per case (i.e
how many questions are missing per individual). What is the best way to do
this in SPSS?

Thanks.

Amanda Brouwer

Patient Advocacy & Treatment Lab
Pearse Hall, B53
University of Wisconsin - Milwaukee
P.O. Box 413
Milwaukee, WI 53201
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Analyzing the percentage of missing data per case

statisticsdoc
In reply to this post by Amanda Brouwer
Amanda,

There are a number of ways of taking on this question.

If you simply want to know how many times a person had missing data on a set of five items, labelled Q1 to Q5, then you can use a statement like this.

count tmiss = q1 q2 q3 q4 q5 (missing) .

The variable tmiss will tell you the number of times each subject had missing data on the five questions.

On the other hand, if you are interested in looking for patterns in missing data, imputing missing data, etc., then you would want to look into the Missing Values Analysis (MVA) procedures in SPSS.

Best,

Steve Brand

www.StatisticsDoc.com


---- Amanda Brouwer <[hidden email]> wrote:

>
>
> I am interested in analyzing the percentage of missing data per case (i.e how� many� questions are missing per� individual). What is the best way to do this in SPSS?
>
>
>
> Thanks.
>
> --
>
>
>
>
> Amanda Brouwer
>
> Patient Advocacy & Treatment Lab
> Pearse Hall, B53
> University of Wisconsin - Milwaukee
> P.O. Box 413
> Milwaukee, WI 53201
> [hidden email]

--
For personalized and experienced consulting in statistics and research design, visit www.statisticsdoc.com

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Analyzing the percentage of missing data per case

Maguin, Eugene
In reply to this post by Amanda Brouwer
Amanda,

I put this back up on the list so that others can see where you want help
because I think there may well be more generic methods (i.e., ones using
python) of solving your problem than I can discuss. My advice on string
variables is not very simple. Perhaps another person can give you a simpler
method for string variables.

The big pain problem is that you have string and numeric variables
intermixed. (This is where python may be a lot better.) My only solution is
to build two lists: a numeric variables list and a string variables list.
The numeric variables are easy. Just this:

Count totmiss=v1 to v5 v6 v8 v15 to v119(missing).
*  note the distinction between 'missing' and 'sysmis' as keywords.

I have less experience with strings because I seldom work with them in the
context of missing data. One problem is whether you have used '   ', that
is, blanks, to indicate missing data for an A4 variable. Or, if you have
declared a particular value to designate missing data, e.g., 'MD', 'NR', or
'DK'. I don't know the specifics of your data and that makes it hard to be
specific.

One method is to recode your string variables into numerics so that 0 equals
data not missing and 1 equals data missing, and then do a count operation on
these new variables. If you don't want to do that, read on!

Suppose all your string variables are the same width, e.g., A4, and you use
both blanks and a code, e.g., 'NR' to designate missing and both blanks and
'NR' are declared missing via the missing values command. Then, this should
work.

Count totmiss2=v4 v9 to v14(missing).

If 'NR' and blanks are not declared missing, you could declare them missing
and use the above statement or do this. Note page 326 in the syntax
reference manual.

Count totmiss2=v4 v9 to v14('NR  ','    ').

But, suppose you had string variables with a range of widths. If you have
declared values to be missing for each variable. I think this should work
but I've never used it.

Count totmiss2=v120 v125 v144 v169(missing).

If you don't missing values declared, it may be easier to do so but you
could also do this.

Count totmiss2=v4('NR','  ')/v89('NR ')/v235('NR        ').

Where V4 is A2, V89 is A3 and V235 is A8. The main thing I am unsure of is
how spss handles mismatches between the length of the test string, e.g., 'NR
' and the target variable.

Lastly, if you had lots of string variables and they were all different
lengths, I think I'd try this (never used it but it seems like it should
work). Note that I'm testing for a blank string or the presence of 'NR'. By
the way, please note that I think the char.length function returns a value
of 0 for a blank string but it may return a value of 1. Not sure about this.
Test first.

Compute totmiss1=0.
Do repeat x=v278 to 305.
+  if (char.length(x) eq 0) or substr(x,1,2) eq 'NR') totmiss1=totmiss1+1.
End repeat.





>>Gene -

Yes, I need to calculate the percentage of questions with missing data.
No, I do not know how to do this.

Yes, I do have a mix of string and numeric values.
No, I do not know how to address this problem when computing the
percentages.


>>I am interested in analyzing the percentage of missing data per case (i.e
how many questions are missing per individual). What is the best way to do
this in SPSS?

Thanks.

Amanda Brouwer

Patient Advocacy & Treatment Lab
Pearse Hall, B53
University of Wisconsin - Milwaukee
P.O. Box 413
Milwaukee, WI 53201
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD