multiple imputation

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

multiple imputation

SiriusxTR
This post was updated on .
Question 1: if I use multiple imputation for missing data, can I use every kind of test in SPSS and they will be acceptable? (descriptives, frecquency and inferential analysis too?)

Question 2: What is the acceptable threshold for missing data for which I can apply multiple imputation? (in percentage given)

Question 3: How should I handle the 5 sets of data which I get by default from multiple imputation? How do I discuss them? Especially if I get 5 Graphs for everything?
Reply | Threaded
Open this post in threaded view
|

Re: multiple imputation

Rich Ulrich
First: These are questions about a particular application, not SPSS questions. 

Second: If you can't argue the principles yourself, the answers may depend
more on "What is acceptable in your field?"  than on "What makes statistical
sense?"  That means you should not get a specific answer unless you give
detail about your data; and preferably you need to ask people who know your
area. 

Third: generalities.  Why and how are data missing, and how
good is the imputation.  What is the application?  With a large
amount of imputation, you *do*  lose (implicitly) degrees of freedom.

3a) If data collection was supposed to be easy, "Missing" may be
an indicator of something bad -- Regardless of how you try to
accommodate it, critics may remain unappeased.

3b) Are your values Missing at Random?  -- Dependencies make
imputation more fragile, more questionable.

3c) Are imputations "pretty good"?  Filling a single spot in a time series
is fairly harmless.  You don't want analyses to hinge on (odd)
imputed values.

3d) If you are doing a prediction, the error introduced by imputation
might be gauged against the size of the residual, in this sense:
If you have R-squared of 0.80 (say) the residual variation is 0.20,
which is only one quarter the residual when the predicted R-squared
is 0.20 (large, in some applications).  The impute-error looms larger
when it is working against the smaller residual.  So, large R-squared
implies that less imputation is acceptable.

Hope this helps.

--
Rich Ulrich


> Date: Tue, 14 Feb 2012 02:41:02 -0800

> From: [hidden email]
> Subject: multiple imputation
> To: [hidden email]
>
> Question 1: if I use multiple imputation for missing data, can I use every
> kind of test in SPSS and they will be acceptable? (descriptives, frecquency
> and inferential analysis too?)
>
> Question 2: What is the acceptable threshold for missing data for which I
> can apply multiple imputation? (in percentage given)
>
> Question 3: How should I handle the 5 sets of data which I get by default
> from multiple imputation? How do I discuss them?
>


Reply | Threaded
Open this post in threaded view
|

Re: multiple imputation

Ryan
What do you plan on doing with these data? I ask because there are certain statistical testing procedures in SPSS (e.g., MIXED, GENLIN, GENLINMIXED)  that are capable of handling missing data without throwing out all information from a given subject. A typical example where one might employ one of the aforementioned procedures would be a repeated measures design. It is not uncommon to have a random spattering of missing data at each time point for some subjects. Concretely, if Subject 1 is missing response data at the second time point, these procedures would still use the available response data from the other time points from Subject 1 in parameter estimation. I am assuming you're dealing with data which are either missing at random (MAR) or missing completely at random (MCAR). 

Ryan

On Tue, Feb 14, 2012 at 5:58 PM, Rich Ulrich <[hidden email]> wrote:
First: These are questions about a particular application, not SPSS questions. 

Second: If you can't argue the principles yourself, the answers may depend
more on "What is acceptable in your field?"  than on "What makes statistical
sense?"  That means you should not get a specific answer unless you give
detail about your data; and preferably you need to ask people who know your
area. 

Third: generalities.  Why and how are data missing, and how
good is the imputation.  What is the application?  With a large
amount of imputation, you *do*  lose (implicitly) degrees of freedom.

3a) If data collection was supposed to be easy, "Missing" may be
an indicator of something bad -- Regardless of how you try to
accommodate it, critics may remain unappeased.

3b) Are your values Missing at Random?  -- Dependencies make
imputation more fragile, more questionable.

3c) Are imputations "pretty good"?  Filling a single spot in a time series
is fairly harmless.  You don't want analyses to hinge on (odd)
imputed values.

3d) If you are doing a prediction, the error introduced by imputation
might be gauged against the size of the residual, in this sense:
If you have R-squared of 0.80 (say) the residual variation is 0.20,
which is only one quarter the residual when the predicted R-squared
is 0.20 (large, in some applications).  The impute-error looms larger
when it is working against the smaller residual.  So, large R-squared
implies that less imputation is acceptable.

Hope this helps.

--
Rich Ulrich


> Date: Tue, 14 Feb 2012 02:41:02 -0800
> From: [hidden email]
> Subject: multiple imputation
> To: [hidden email]

>
> Question 1: if I use multiple imputation for missing data, can I use every
> kind of test in SPSS and they will be acceptable? (descriptives, frecquency
> and inferential analysis too?)
>
> Question 2: What is the acceptable threshold for missing data for which I
> can apply multiple imputation? (in percentage given)
>
> Question 3: How should I handle the 5 sets of data which I get by default
> from multiple imputation? How do I discuss them?
>



Reply | Threaded
Open this post in threaded view
|

Re: multiple imputation

SiriusxTR
This post was updated on .
In reply to this post by Rich Ulrich
Dear Rich,

Thank you for your answer.

1) I was talking about multiple imputation in SPSS v20. There is an add-on or it is already implemented in v20. And the data MCAR. I did Little's test, cause I read that if not MCAR, no way to manage missing data other than excluding (pairwise at tests).


2) So, I am in the medical field, I am a resident doctor. Unfortunately there is 1 guy who I know who knows sth about statistics, nobody else, and he is not on the top.
 Usually missing data in our case come from missing measurements, scale variable (missing laboratory tests) in  a time-line. For example: we have data 1 month postoperatively, 3 months present, 6 month missing, 9 months present, 12 months missing, 18 months missing. In our case the 18 month data have a missing percentage of 18%. That is why I would like to know what would be the threshold, which is the point where I can say that the variable is useless, cannot do anything with it, no point including it in a statistic. The missing data usually origin from the lack of pacient determinism, they do not come for control.

THE BIGGEST PROBLEM: I was in a mathematics-physics class in high-school, did well in physics, but statistical mathematics is not quite my bread. The main concepts ( null hypothesis, normal distribution, statistical significance, type of variables, correlation, regression, power of correlation) I understand, but I have problems choosing the right tests for inferential analysis (although I have tables explaining which test in which case). I have read a lot but, there are some things I don't understand (Somer's d, gamma.....),  because I have to see the test applied and discussed in an example, otherwise I have no clue how to interpret results, I can't translate them into English. I am fairly pragmatic, I have to see it in real-life examples. I have no time or resources to take some courses, so....I am trying to learn on my own and the internet. Things that I understand I have used already (although I am sure there some tiny things related to these that I do not know yet):  one-sample t-test, two related sample t-test, two independent sample t-test, Wilcoxon, Mann-Whitney, Kruskall - Wallis, McNemar, crosstabulation chi-square, Cramer's V , lambda, Pearson correlation, Spearman correlation.

Right now I am stuck with that correlation between an ordinal and nominal (outcome), I did the correlation test, it is OK (Cramer's v and lamba), but cannot figure out the regression (binomial logistic). Some goodness-of-fit test says that the model is not good. So what to do then? I want to know, supposing the case belongs to one group of 3, what or the odds to belong in one or the other group in nominal?

I am sorry that I write so much and I suppose for those who have been in this field for many years would seem that sometimes I ask stupid questions, but I have no other means of figuring things out.
Reply | Threaded
Open this post in threaded view
|

Re: multiple imputation

Bruce Weaver
Administrator
In reply to this post by Rich Ulrich
I agree that the 2nd and 3rd questions are not SPSS questions.  But the first is, and the answer can be found here:

http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.help%2Fmi_analysis.htm

Click on "Procedures That Support Pooling".


*** QUESTION FOR JON P ***.  
I cannot find that same list of procedures in the (v19) Command Syntax Reference Manual.  Just wondering why it's not included.  If this is not in your bailiwick, perhaps you can ask a colleague.  Thanks.


Re questions 2 and 3 from the OP, here are some fairly readable resources I've found helpful.

Acock, A. C. (2005). Working with missing values. Journal of Marriage and Family, 67, 1012-1028.

Donders, A. Rogier T., van der Heijden, Geert J.M.G., Stijnen, T., & Moons, K. G. M. (2006). Review: A gentle introduction to imputation of missing values. Journal of Clinical Epidemiology, 59, 1087-1091.

Multiple Imputation Online. http://www.multiple-imputation.com/ 

Schafer, J. L. (1999). Multiple imputation:  A primer. Statistical Methods in Medical Research, 8, 3-15.

Schafer, J. L., & Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychological Methods, 7(2), 147-177.

HTH.

Rich Ulrich-2 wrote
First: These are questions about a particular application, not SPSS questions.  

Second: If you can't argue the principles yourself, the answers may depend
more on "What is acceptable in your field?"  than on "What makes statistical
sense?"  That means you should not get a specific answer unless you give
detail about your data; and preferably you need to ask people who know your
area.  

Third: generalities.  Why and how are data missing, and how
good is the imputation.  What is the application?  With a large
amount of imputation, you *do*  lose (implicitly) degrees of freedom.

3a) If data collection was supposed to be easy, "Missing" may be
an indicator of something bad -- Regardless of how you try to
accommodate it, critics may remain unappeased.

3b) Are your values Missing at Random?  -- Dependencies make
imputation more fragile, more questionable.

3c) Are imputations "pretty good"?  Filling a single spot in a time series
is fairly harmless.  You don't want analyses to hinge on (odd)
imputed values.

3d) If you are doing a prediction, the error introduced by imputation
might be gauged against the size of the residual, in this sense:
If you have R-squared of 0.80 (say) the residual variation is 0.20,
which is only one quarter the residual when the predicted R-squared
is 0.20 (large, in some applications).  The impute-error looms larger
when it is working against the smaller residual.  So, large R-squared
implies that less imputation is acceptable.

Hope this helps.

--
Rich Ulrich


> Date: Tue, 14 Feb 2012 02:41:02 -0800
> From: [hidden email]
> Subject: multiple imputation
> To: [hidden email]
>
> Question 1: if I use multiple imputation for missing data, can I use every
> kind of test in SPSS and they will be acceptable? (descriptives, frecquency
> and inferential analysis too?)
>
> Question 2: What is the acceptable threshold for missing data for which I
> can apply multiple imputation? (in percentage given)
>
> Question 3: How should I handle the 5 sets of data which I get by default
> from multiple imputation? How do I discuss them?
>
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: multiple imputation

John F Hall
In reply to this post by SiriusxTR
You may find something useful in Jim Ring's statistical notes [pdf: 54
pages, 667 kb] for my Survey Analysis Workshop.

See:
http://surveyresearch.weebly.com/uploads/2/9/9/8/2998485/statistical_notes_2
012_draft.pdf

John Hall

Email:    [hidden email]
Website:        www.surveyresearch.weebly.com
Skype:    surveyresearcher1
Phone:    (+33) (0) 2.33.45.91.47



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
SiriusxTR
Sent: 15 February 2012 10:59
To: [hidden email]
Subject: Re: multiple imputation

Dear Rich,

Thank you for your answer.

1) I was talking about multiple imputation in SPSS v20. There is an add-on
or it is already implemented in v20.


2) So, I am in the medical field, I am a resident doctor. Unfortunately
there is 1 guy who I know who knows sth about statistics, nobody else, and
he is not on the top.
 Usually missing data in our case come from missing measurements, scale
variable (missing laboratory tests) in  a time-line. For example: we have
data 1 month postoperatively, 3 months present, 6 month missing, 9 months
present, 12 months missing, 18 months missing. In our case the 18 month data
have a missing percentage of 18%. That is why I would like to know what
would be the threshold, which is the point where I can say that the variable
is useless, cannot do anything with it, no point including it in a
statistic. The missing data usually origin from the lack of pacient
determinism, they do not come for control.

THE BIGGEST PROBLEM: I was in a mathematics-physics class in high-school,
did well in physics, but statistical mathematics is not quite my bread. The
main concepts ( null hypothesis, normal distribution, statistical
significance, type of variables, correlation, regression, power of
correlation) I understand, but I have problems choosing the right tests for
inferential analysis (although I have tables explaining which test in which
case). I have read a lot but, there are some things I don't understand
(Somer's d, gamma.....),  because I have to see the test applied and
discussed in an example, otherwise I have no clue how to interpret results,
I can't translate them into English. I am fairly pragmatic, I have to see it
in real-life examples. I have no time or resources to take some courses,
so....I am trying to learn on my own and the internet. Things that I
understand I have used already:  one-sample t-test, two related sample
t-test, two independent sample t-test, Wilcoxon, Mann-Whitney, Kruskall -
Wallis, McNemar, crosstabulation chi-square, Cramer's V , lambda.

Right now I am stuck with that correlation between an ordinal and nominal
(outcome), I did the correlation test, it is OK (Cramer's v and lamba), but
cannot figure out the regression (binomial logistic). Some goodness-of-fit
test says that the model is not good. So what to do then? I want to know,
supposing the case belongs to one group of 3, what or the odds to belong in
one or the other group in nominal?

I am sorry that I write so much and I suppose for those who have been in
this field for many years would seem that sometimes I ask stupid questions,
but I have no other means of figuring things out.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/multiple-imputation-tp5482077p
5485490.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD