SPSSX Discussion

What to Do About Missing Data

Classic

List

Threaded

3 messages Options

JC-24

What to Do About Missing Data

Hello,

60 participants took a survey that had seven scales on it. About five
respondents returned surveys with one-two incomplete scales(they missed one
question on a scale, for example). A few questions:

The missing data appears random but is there a test for that and what do I do?

I ran means/std dev tests for all seven scales and some cases were deleted
listwise for some scales. Was that okay and what does that mean?

I ran a cluster analysis for three of the scales (60 cases clustered on
three dimensions). Each participant had a mean score for each scale computed
prior to the cluster analysis. A few participants missed one item on a scale
but I computed the scale mean anyway using fewer than the total number of
questions. Is this okay/defensible? I really can't lose any cases with the
sample so small to begin wih. If this is okay, what "rule" makes it
defensible? How would I explain this in a paper?

Thanks so much in advance,

JC

Della Mora, Marcelo

Re: What to Do About Missing Data

JC, non-response constitutes a great limitation due to the loss of validity and statistical power it represents, it depends on the situation, some times individual fails to answer certain variables and some other times the individual does not answer any variable. For most of the statistical methods that focus on dealing with missing data, it is necessary to study the data
matrix beforehand, observing the missing data generation mechanism, as well as the proportion they represent of the total data. However, replacement by the mean could be a fair method.
Hope this helps.

Regards,

Marcelo Della Mora
Citibank Marketing Credit Cards
+5411 4708-4095
[hidden email]

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
JC
Sent: Wednesday, March 14, 2007 1:29 AM
To: [hidden email]
Subject: What to Do About Missing Data

Hello,

60 participants took a survey that had seven scales on it. About five
respondents returned surveys with one-two incomplete scales(they missed one
question on a scale, for example). A few questions:

The missing data appears random but is there a test for that and what do I do?

I ran means/std dev tests for all seven scales and some cases were deleted
listwise for some scales. Was that okay and what does that mean?

I ran a cluster analysis for three of the scales (60 cases clustered on
three dimensions). Each participant had a mean score for each scale computed
prior to the cluster analysis. A few participants missed one item on a scale
but I computed the scale mean anyway using fewer than the total number of
questions. Is this okay/defensible? I really can't lose any cases with the
sample so small to begin wih. If this is okay, what "rule" makes it
defensible? How would I explain this in a paper?

Thanks so much in advance,

JC

Mark A Davenport MADAVENP

Re: What to Do About Missing Data

What Marcelo refers to here (genration mechanism) is not so much a
question of why but of effect. Not all missing data are created equal.
Data missing completely at random (MCAR) are not normally an issue in
terms of inference (assuming you aren't taking a significant hit in
power). Unfortunately, data are rarely issing completely at random. Data
missing at random (MAR) can be an issue and should be examined closely to
determine the potential for bias. SPSS offers several imputation methods
but keep in mind that not all missing data treatment methods are equal.
Although some sort of imputation such as regression imputation can be
better than simply throwing out cases, regression imputation does not
settle the problem of retaining assumption of random error in the
variables being imputed. Only more advanced treatments such as data
augmentation can do this. I believe AMOS does something like this but the
emthod I have used with success in the past is based on the work of
Schafer and Graham.

For background see

Rubin, D.B. (1987) Multiple imputation for nonresponse in surveys. New
York: Wiley,

For more on augmentation, see

Schafer, J.L. and Graham, J.W. (2002) Missing data: our view of the state
of the art. Psychological Methods, 7, 147-177

***************************************************************************************************************************************************************
Mark A. Davenport Ph.D.
Senior Research Analyst
Office of Institutional Research
The University of North Carolina at Greensboro
336.256.0395
[hidden email]

'An approximate answer to the right question is worth a good deal more
than an exact answer to an approximate question.' --a paraphrase of J. W.
Tukey (1962)

"Della Mora, Marcelo [GCG-LATAM]" <[hidden email]>
Sent by: "SPSSX(r) Discussion" <[hidden email]>
03/14/2007 09:42 AM
Please respond to
"Della Mora, Marcelo [GCG-LATAM]" <[hidden email]>

To
[hidden email]
cc

Subject
Re: What to Do About Missing Data

JC, non-response constitutes a great limitation due to the loss of
validity and statistical power it represents, it depends on the situation,
some times individual fails to answer certain variables and some other
times the individual does not answer any variable. For most of the
statistical methods that focus on dealing with missing data, it is
necessary to study the data
matrix beforehand, observing the missing data generation mechanism, as
well as the proportion they represent of the total data. However,
replacement by the mean could be a fair method.
Hope this helps.

Regards,

Marcelo Della Mora
Citibank Marketing Credit Cards
+5411 4708-4095
[hidden email]

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
JC
Sent: Wednesday, March 14, 2007 1:29 AM
To: [hidden email]
Subject: What to Do About Missing Data

Hello,

60 participants took a survey that had seven scales on it. About five
respondents returned surveys with one-two incomplete scales(they missed
one
question on a scale, for example). A few questions:

The missing data appears random but is there a test for that and what do I
do?

I ran means/std dev tests for all seven scales and some cases were deleted
listwise for some scales. Was that okay and what does that mean?

I ran a cluster analysis for three of the scales (60 cases clustered on
three dimensions). Each participant had a mean score for each scale
computed
prior to the cluster analysis. A few participants missed one item on a
scale
but I computed the scale mean anyway using fewer than the total number of
questions. Is this okay/defensible? I really can't lose any cases with the
sample so small to begin wih. If this is okay, what "rule" makes it
defensible? How would I explain this in a paper?

Thanks so much in advance,

JC