|
Hello,
60 participants took a survey that had seven scales on it. About five respondents returned surveys with one-two incomplete scales(they missed one question on a scale, for example). A few questions: The missing data appears random but is there a test for that and what do I do? I ran means/std dev tests for all seven scales and some cases were deleted listwise for some scales. Was that okay and what does that mean? I ran a cluster analysis for three of the scales (60 cases clustered on three dimensions). Each participant had a mean score for each scale computed prior to the cluster analysis. A few participants missed one item on a scale but I computed the scale mean anyway using fewer than the total number of questions. Is this okay/defensible? I really can't lose any cases with the sample so small to begin wih. If this is okay, what "rule" makes it defensible? How would I explain this in a paper? Thanks so much in advance, JC |
|
JC, non-response constitutes a great limitation due to the loss of validity and statistical power it represents, it depends on the situation, some times individual fails to answer certain variables and some other times the individual does not answer any variable. For most of the statistical methods that focus on dealing with missing data, it is necessary to study the data
matrix beforehand, observing the missing data generation mechanism, as well as the proportion they represent of the total data. However, replacement by the mean could be a fair method. Hope this helps. Regards, Marcelo Della Mora Citibank Marketing Credit Cards +5411 4708-4095 [hidden email] -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of JC Sent: Wednesday, March 14, 2007 1:29 AM To: [hidden email] Subject: What to Do About Missing Data Hello, 60 participants took a survey that had seven scales on it. About five respondents returned surveys with one-two incomplete scales(they missed one question on a scale, for example). A few questions: The missing data appears random but is there a test for that and what do I do? I ran means/std dev tests for all seven scales and some cases were deleted listwise for some scales. Was that okay and what does that mean? I ran a cluster analysis for three of the scales (60 cases clustered on three dimensions). Each participant had a mean score for each scale computed prior to the cluster analysis. A few participants missed one item on a scale but I computed the scale mean anyway using fewer than the total number of questions. Is this okay/defensible? I really can't lose any cases with the sample so small to begin wih. If this is okay, what "rule" makes it defensible? How would I explain this in a paper? Thanks so much in advance, JC |
|
What Marcelo refers to here (genration mechanism) is not so much a
question of why but of effect. Not all missing data are created equal. Data missing completely at random (MCAR) are not normally an issue in terms of inference (assuming you aren't taking a significant hit in power). Unfortunately, data are rarely issing completely at random. Data missing at random (MAR) can be an issue and should be examined closely to determine the potential for bias. SPSS offers several imputation methods but keep in mind that not all missing data treatment methods are equal. Although some sort of imputation such as regression imputation can be better than simply throwing out cases, regression imputation does not settle the problem of retaining assumption of random error in the variables being imputed. Only more advanced treatments such as data augmentation can do this. I believe AMOS does something like this but the emthod I have used with success in the past is based on the work of Schafer and Graham. For background see Rubin, D.B. (1987) Multiple imputation for nonresponse in surveys. New York: Wiley, For more on augmentation, see Schafer, J.L. and Graham, J.W. (2002) Missing data: our view of the state of the art. Psychological Methods, 7, 147-177 *************************************************************************************************************************************************************** Mark A. Davenport Ph.D. Senior Research Analyst Office of Institutional Research The University of North Carolina at Greensboro 336.256.0395 [hidden email] 'An approximate answer to the right question is worth a good deal more than an exact answer to an approximate question.' --a paraphrase of J. W. Tukey (1962) "Della Mora, Marcelo [GCG-LATAM]" <[hidden email]> Sent by: "SPSSX(r) Discussion" <[hidden email]> 03/14/2007 09:42 AM Please respond to "Della Mora, Marcelo [GCG-LATAM]" <[hidden email]> To [hidden email] cc Subject Re: What to Do About Missing Data JC, non-response constitutes a great limitation due to the loss of validity and statistical power it represents, it depends on the situation, some times individual fails to answer certain variables and some other times the individual does not answer any variable. For most of the statistical methods that focus on dealing with missing data, it is necessary to study the data matrix beforehand, observing the missing data generation mechanism, as well as the proportion they represent of the total data. However, replacement by the mean could be a fair method. Hope this helps. Regards, Marcelo Della Mora Citibank Marketing Credit Cards +5411 4708-4095 [hidden email] -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of JC Sent: Wednesday, March 14, 2007 1:29 AM To: [hidden email] Subject: What to Do About Missing Data Hello, 60 participants took a survey that had seven scales on it. About five respondents returned surveys with one-two incomplete scales(they missed one question on a scale, for example). A few questions: The missing data appears random but is there a test for that and what do I do? I ran means/std dev tests for all seven scales and some cases were deleted listwise for some scales. Was that okay and what does that mean? I ran a cluster analysis for three of the scales (60 cases clustered on three dimensions). Each participant had a mean score for each scale computed prior to the cluster analysis. A few participants missed one item on a scale but I computed the scale mean anyway using fewer than the total number of questions. Is this okay/defensible? I really can't lose any cases with the sample so small to begin wih. If this is okay, what "rule" makes it defensible? How would I explain this in a paper? Thanks so much in advance, JC |
| Free forum by Nabble | Edit this page |
