Ahhh...Missing Data Nightmare

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Ahhh...Missing Data Nightmare

Salbod
Good Day, Dear Friends,

                Nightmare: I was handed a dataset of 219 cases who responded
to a 75 item questions on a 5-pt scale (1=Not at all important to
5=Extremely important).

The items are to be organized into 7 scales (ns=14, 11, 16, 13, 12, 9, &
10).  Listwise deletion would leave only 121 valid cases; 20.1% had 1
missing and 12.3% had 2 missing.



                Right now, I arbitrarily selected cases for inclusion in
analyses:  missing less than 5% (3 items or less); furthermore, in creating
the subscales I used COMPUTE scale = MEAN.n-1().



                I just picked up Paul D. Allison's Sage University Paper
#136 on Missing Data. I should read this a long time ago.



                I am open to suggestions and/or references.



                TIA,



                Stephen Salbod, Pace University, NYC
Reply | Threaded
Open this post in threaded view
|

Re: Ahhh...Missing Data Nightmare

Hector Maletta
         Stephen,
         Three items in 75 is a small number for a 75-item scale, but you
are dealing with several scales, with a few items each. The three missing
items you are prepared to allow for may be distributed in three different
scales, or concentrated in only one. Moreover, they may be missing in a
scale with 9 items or with 16, representing different proportions of items
for that particular scale. I think you should take this into account. If you
are prepared to allow for a few missing items, they should not be more than
one per scale. If they are two to a scale, that scale should be the one with
the most items (16).
         However, all these decisions rest on the implicit assumption you're
your items are interchangeable and highly correlated among them within a
given scale, which is not always the case. Perhaps the missing items are
significant because they cover particular angles of the issue, not covered
by other items in the scale, or because they are the most "difficult" items
in the scale, or whatever other such reason. If so, the scale value of
people lacking that particular item may be not comparable to the rest.
         Hector

         -----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Stephen Salbod
Sent: 21 June 2007 13:00
To: [hidden email]
Subject: Ahhh...Missing Data Nightmare

         Good Day, Dear Friends,

                         Nightmare: I was handed a dataset of 219 cases who
responded
         to a 75 item questions on a 5-pt scale (1=Not at all important to
         5=Extremely important).

         The items are to be organized into 7 scales (ns=14, 11, 16, 13, 12,
9, &
         10).  Listwise deletion would leave only 121 valid cases; 20.1% had
1
         missing and 12.3% had 2 missing.



                         Right now, I arbitrarily selected cases for
inclusion in
         analyses:  missing less than 5% (3 items or less); furthermore, in
creating
         the subscales I used COMPUTE scale = MEAN.n-1().



                         I just picked up Paul D. Allison's Sage University
Paper
         #136 on Missing Data. I should read this a long time ago.



                         I am open to suggestions and/or references.



                         TIA,



                         Stephen Salbod, Pace University, NYC
Reply | Threaded
Open this post in threaded view
|

Re: Ahhh...Missing Data Nightmare

bdates
In reply to this post by Salbod
There is freeware:  WINIMP, which is available for download at

http://www.technelysium.com.au/winimp.html

This will impute data based on both respondent and variable behavior of the
data present.  The low proportion of missing data make imputation viable.

Brian



Confidentiality Notice for Email Transmissions: The information in this
message is confidential and may be legally privileged. It is intended solely
for the addressee.  Access to this message by anyone else is unauthorised.
If you are not the intended recipient, any disclosure, copying, or
distribution of the message, or any action or omission taken by you in
reliance on it, is prohibited and may be unlawful.  Please immediately
contact the sender if you have received this message in error. Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: Ahhh...Missing Data Nightmare

Mark A Davenport MADAVENP
In reply to this post by Salbod
You might want to start with some of Don Rubin's work  Joe Shafer has also
written quite a bit on the subject.  SPSS will do missing data imputation
(use EM) if you have the correct module.  When I have the option, I use
Schafer's NORM program for the data augmentation feature.  Certainly
everything hinges on how your data are missing.  We can only assume that
they are not missing completely at random.  Telling us a bit more might
help.

Mark

***************************************************************************************************************************************************************
Mark A. Davenport Ph.D.
Senior Research Analyst
Office of Institutional Research
The University of North Carolina at Greensboro
336.256.0395
[hidden email]

'An approximate answer to the right question is worth a good deal more
than an exact answer to an approximate question.' --a paraphrase of J. W.
Tukey (1962)






Stephen Salbod <[hidden email]>
Sent by: "SPSSX(r) Discussion" <[hidden email]>
06/21/2007 12:05 PM
Please respond to
Stephen Salbod <[hidden email]>


To
[hidden email]
cc

Subject
Ahhh...Missing Data Nightmare






Good Day, Dear Friends,

                Nightmare: I was handed a dataset of 219 cases who
responded
to a 75 item questions on a 5-pt scale (1=Not at all important to
5=Extremely important).

The items are to be organized into 7 scales (ns=14, 11, 16, 13, 12, 9, &
10).  Listwise deletion would leave only 121 valid cases; 20.1% had 1
missing and 12.3% had 2 missing.



                Right now, I arbitrarily selected cases for inclusion in
analyses:  missing less than 5% (3 items or less); furthermore, in
creating
the subscales I used COMPUTE scale = MEAN.n-1().



                I just picked up Paul D. Allison's Sage University Paper
#136 on Missing Data. I should read this a long time ago.



                I am open to suggestions and/or references.



                TIA,



                Stephen Salbod, Pace University, NYC
Reply | Threaded
Open this post in threaded view
|

Re: Ahhh...Missing Data Nightmare

Norton, John
Hi Stephen and List,

I'd like to add to the points in Mark's post below.  Aside from the impact that the decreasing N will have on the power of the analyses, a very important consideration with missing data is whether that data is missing completely at random, or whether a pattern can be detected.  When data are missing completely at random, there is no bias introduced to the analyses.  However, if a pattern can be determined (for example, if a greater amount of missing data is seen within one population compared to that in another) then bias is introduced and the stability of analyses is further undermined.

You can investigate whether there are patterns of missing data with the Missing Value Analysis module in SPSS.  This is an add-on module which also supports a variety of methods for replacing missing values, including mean substitution, imputation via regression, and expectation maximization algorithms.

For more information, please visit: http://www.spss.com/missing_value/

John Norton
SPSS Inc.


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Mark A Davenport MADAVENP
Sent: Thursday, June 21, 2007 11:52 AM
To: [hidden email]
Subject: Re: Ahhh...Missing Data Nightmare

You might want to start with some of Don Rubin's work  Joe Shafer has also
written quite a bit on the subject.  SPSS will do missing data imputation
(use EM) if you have the correct module.  When I have the option, I use
Schafer's NORM program for the data augmentation feature.  Certainly
everything hinges on how your data are missing.  We can only assume that
they are not missing completely at random.  Telling us a bit more might
help.

Mark

***************************************************************************************************************************************************************
Mark A. Davenport Ph.D.
Senior Research Analyst
Office of Institutional Research
The University of North Carolina at Greensboro
336.256.0395
[hidden email]

'An approximate answer to the right question is worth a good deal more
than an exact answer to an approximate question.' --a paraphrase of J. W.
Tukey (1962)






Stephen Salbod <[hidden email]>
Sent by: "SPSSX(r) Discussion" <[hidden email]>
06/21/2007 12:05 PM
Please respond to
Stephen Salbod <[hidden email]>


To
[hidden email]
cc

Subject
Ahhh...Missing Data Nightmare






Good Day, Dear Friends,

                Nightmare: I was handed a dataset of 219 cases who
responded
to a 75 item questions on a 5-pt scale (1=Not at all important to
5=Extremely important).

The items are to be organized into 7 scales (ns=14, 11, 16, 13, 12, 9, &
10).  Listwise deletion would leave only 121 valid cases; 20.1% had 1
missing and 12.3% had 2 missing.



                Right now, I arbitrarily selected cases for inclusion in
analyses:  missing less than 5% (3 items or less); furthermore, in
creating
the subscales I used COMPUTE scale = MEAN.n-1().



                I just picked up Paul D. Allison's Sage University Paper
#136 on Missing Data. I should read this a long time ago.



                I am open to suggestions and/or references.



                TIA,



                Stephen Salbod, Pace University, NYC
Reply | Threaded
Open this post in threaded view
|

Re: Ahhh...Missing Data Nightmare

Hector Maletta
In reply to this post by bdates
         Brian,
         I have followed your link to Winimp, but apparently it is just a
file compression software like Winzip or Winrar. What gives?

         Hector

         -----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Dates, Brian
Sent: 21 June 2007 13:39
To: [hidden email]
Subject: Re: Ahhh...Missing Data Nightmare

         There is freeware:  WINIMP, which is available for download at

         http://www.technelysium.com.au/winimp.html

         This will impute data based on both respondent and variable
behavior of the
         data present.  The low proportion of missing data make imputation
viable.

         Brian



         Confidentiality Notice for Email Transmissions: The information in
this
         message is confidential and may be legally privileged. It is
intended solely
         for the addressee.  Access to this message by anyone else is
unauthorised.
         If you are not the intended recipient, any disclosure, copying, or
         distribution of the message, or any action or omission taken by you
in
         reliance on it, is prohibited and may be unlawful.  Please
immediately
         contact the sender if you have received this message in error.
Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: Ahhh...Missing Data Nightmare

bdates
In reply to this post by Salbod
Hector,

My apologies to you and the list!  Winimp is the directory that's created by
the NORM program.  Just in case I lost it from my drive and needed it again,
I saved the address that I forwarded to the list, not realizing that it was
to a different program.  Sorry!  The actual address for NORM is:

http://www.stat.psu.edu/~jls/misoftwa.html

Brian



         Brian,
         I have followed your link to Winimp, but apparently it is just a
file compression software like Winzip or Winrar. What gives?

         Hector


Confidentiality Notice for Email Transmissions: The information in this
message is confidential and may be legally privileged. It is intended solely
for the addressee.  Access to this message by anyone else is unauthorised.
If you are not the intended recipient, any disclosure, copying, or
distribution of the message, or any action or omission taken by you in
reliance on it, is prohibited and may be unlawful.  Please immediately
contact the sender if you have received this message in error. Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: Ahhh...Missing Data Nightmare

Kathy McKnight
In reply to this post by Mark A Davenport MADAVENP
Following up on Mark's recommendations below, Schafer & Graham published a
"State of the Art" paper re: missing data in 2002 in Psych Methods (Vol. 7,
pp147-177) that discusses the strengths & weaknesses of the different
missing data handling techniques and the missing data conditions for using
them (e.g., missing at random or MAR). It's an excellent paper, and their
conclusions in 2002 would most likely be the same in 2007 re: the best
methods currently available for handling missing data. They recommend either
a data augmentation method (e.g., the EM algorithm) or multiple imputation.
Don Rubin has remarked that multiple imputation is generally the best
method. As others on this list have noted, it depends on the missing data
conditions. When conditions are optimal, i.e. data are missing completely at
random (MCAR), a number of missing data handling techniques are useful. The
problem is, most of our data are not MCAR. That's when the missing data
literature gets complicated in terms of making a single recommendation for
what to do with missing data.

Multiple imputation seems to be the preferred method these days. If you go
with that method, you have to run the multiple imputations PRIOR to
analyzing the data. I'm not aware of any multiple imputation procedure in
SPSS. Just to clarify the remark below, if Mark is referring to the EM
algorithm below, that is not an imputation method. Nothing is imputed using
that method with missing data. This method uses the observed data as well as
an assumed underlying distribution (multivariate normality) for parameter
estimation, whether or not missing data occur in a data set.

Katherine McKnight

On 6/21/07, Mark A Davenport MADAVENP <[hidden email]> wrote:

>
> You might want to start with some of Don Rubin's work  Joe Shafer has also
> written quite a bit on the subject.  SPSS will do missing data imputation
> (use EM) if you have the correct module.  When I have the option, I use
> Schafer's NORM program for the data augmentation feature.  Certainly
> everything hinges on how your data are missing.  We can only assume that
> they are not missing completely at random.  Telling us a bit more might
> help.
>
> Mark
>
>
> ***************************************************************************************************************************************************************
> Mark A. Davenport Ph.D.
> Senior Research Analyst
> Office of Institutional Research
> The University of North Carolina at Greensboro
> 336.256.0395
> [hidden email]
>
> 'An approximate answer to the right question is worth a good deal more
> than an exact answer to an approximate question.' --a paraphrase of J. W.
> Tukey (1962)
>
>
>
>
>
>
> Stephen Salbod <[hidden email]>
> Sent by: "SPSSX(r) Discussion" <[hidden email]>
> 06/21/2007 12:05 PM
> Please respond to
> Stephen Salbod <[hidden email]>
>
>
> To
> [hidden email]
> cc
>
> Subject
> Ahhh...Missing Data Nightmare
>
>
>
>
>
>
> Good Day, Dear Friends,
>
>                 Nightmare: I was handed a dataset of 219 cases who
> responded
> to a 75 item questions on a 5-pt scale (1=Not at all important to
> 5=Extremely important).
>
> The items are to be organized into 7 scales (ns=14, 11, 16, 13, 12, 9, &
> 10).  Listwise deletion would leave only 121 valid cases; 20.1% had 1
> missing and 12.3% had 2 missing.
>
>
>
>                 Right now, I arbitrarily selected cases for inclusion in
> analyses:  missing less than 5% (3 items or less); furthermore, in
> creating
> the subscales I used COMPUTE scale = MEAN.n-1().
>
>
>
>                 I just picked up Paul D. Allison's Sage University Paper
> #136 on Missing Data. I should read this a long time ago.
>
>
>
>                 I am open to suggestions and/or references.
>
>
>
>                 TIA,
>
>
>
>                 Stephen Salbod, Pace University, NYC
>
Reply | Threaded
Open this post in threaded view
|

Re: Ahhh...Missing Data Nightmare

Mark A Davenport MADAVENP
In reply to this post by Salbod
An added bonus: John Graham has a few spss macros designed to help complete
the data augmentation step (combining estimates).  I have the website
bookmarked at the office.  I send it along later today.

Mark
Reply | Threaded
Open this post in threaded view
|

Re: Ahhh...Missing Data Nightmare

Claudiu Tufis
In reply to this post by Hector Maletta
Norm is a very good multiple imputation program. I used it with survey data
and it worked very well.

Another MI program (freeware) you might want to look at is Amelia II, which
is available at http://gking.harvard.edu/amelia/ This seems to be more
popular among political scientists. The same team that developed Amelia also
wrote some macros (links on the same page) that make it easier to combine
the results in R or Stata.

Allison (Allison, Paul. 2002. Missing Data. Thousand Oaks: Sage
Publications.) talks about both Norm and Amelia (a previous version) and:
"Both algorithms have some theoretical justification. Proponents of SIR
claim that it requires far less computer time. However, the relative
superiority of these two methods is far from settled" (Allison, 2002: 34).

I know that SAS also has a multiple imputation component (PROC MI and PROC
MIANALYZE) but I've never used them.

Claudiu



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Hector Maletta
Sent: Thursday, June 21, 2007 1:29 PM
To: [hidden email]
Subject: Re: Ahhh...Missing Data Nightmare

         Brian,
         I have followed your link to Winimp, but apparently it is just a
file compression software like Winzip or Winrar. What gives?

         Hector

         -----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Dates, Brian
Sent: 21 June 2007 13:39
To: [hidden email]
Subject: Re: Ahhh...Missing Data Nightmare

         There is freeware:  WINIMP, which is available for download at

         http://www.technelysium.com.au/winimp.html

         This will impute data based on both respondent and variable
behavior of the
         data present.  The low proportion of missing data make imputation
viable.

         Brian



         Confidentiality Notice for Email Transmissions: The information in
this
         message is confidential and may be legally privileged. It is
intended solely
         for the addressee.  Access to this message by anyone else is
unauthorised.
         If you are not the intended recipient, any disclosure, copying, or
         distribution of the message, or any action or omission taken by you
in
         reliance on it, is prohibited and may be unlawful.  Please
immediately
         contact the sender if you have received this message in error.
Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: Ahhh...Missing Data Nightmare

Kornbrot, Diana
In reply to this post by Kathy McKnight
Just returned from an excellent seminar on bayesian estimation MCMC
[monte-carlo markov chains] using the package
REALCOM
Materials are excellent and it is free for academic purposes to UK academics
http://www.cmm.bristol.ac.uk/research/Realcom/index.shtml

MCMC is both more efficient and less bias prone than other methods
Well worth a look for the theory even if you don't use the package
WINBUGs is also rumoured to be good

Best

diana

Professor Diana Kornbrot
University of Hertfordshire
College Lane, Hatfield, AL10 9AB, UK
Email:  [hidden email]
Web:   http://web.mac.com/kornbrot/iWeb/KornbrotHome.html
Blended Learning Unit
  voice: +44[0]170 728 1315 fax: +44[0] 170 728 1320
Psychology
  voice: +44[0]170 728 4626 fax: +44[0]170 728 5073

Kornbrot
19 Elmhurst Avenue
London N2 0LT, UK
 voice: +44[0208  883 3657 fax: +44[0] 0208 444 2081



|-----Original Message-----
|From: SPSSX(r) Discussion [mailto:[hidden email]] On
|Behalf Of Kathy McKnight
|Sent: 21 June 2007 19:00
|To: [hidden email]
|Subject: Re: Ahhh...Missing Data Nightmare
|
|Following up on Mark's recommendations below, Schafer & Graham
|published a "State of the Art" paper re: missing data in 2002
|in Psych Methods (Vol. 7,
|pp147-177) that discusses the strengths & weaknesses of the
|different missing data handling techniques and the missing
|data conditions for using them (e.g., missing at random or
|MAR). It's an excellent paper, and their conclusions in 2002
|would most likely be the same in 2007 re: the best methods
|currently available for handling missing data. They recommend
|either a data augmentation method (e.g., the EM algorithm) or
|multiple imputation.
|Don Rubin has remarked that multiple imputation is generally
|the best method. As others on this list have noted, it depends
|on the missing data conditions. When conditions are optimal,
|i.e. data are missing completely at random (MCAR), a number of
|missing data handling techniques are useful. The problem is,
|most of our data are not MCAR. That's when the missing data
|literature gets complicated in terms of making a single
|recommendation for what to do with missing data.
|
|Multiple imputation seems to be the preferred method these
|days. If you go with that method, you have to run the multiple
|imputations PRIOR to analyzing the data. I'm not aware of any
|multiple imputation procedure in SPSS. Just to clarify the
|remark below, if Mark is referring to the EM algorithm below,
|that is not an imputation method. Nothing is imputed using
|that method with missing data. This method uses the observed
|data as well as an assumed underlying distribution
|(multivariate normality) for parameter estimation, whether or
|not missing data occur in a data set.
|
|Katherine McKnight
|
|On 6/21/07, Mark A Davenport MADAVENP <[hidden email]> wrote:
|>
|> You might want to start with some of Don Rubin's work  Joe
|Shafer has
|> also written quite a bit on the subject.  SPSS will do missing data
|> imputation (use EM) if you have the correct module.  When I have the
|> option, I use Schafer's NORM program for the data augmentation
|> feature.  Certainly everything hinges on how your data are missing.
|> We can only assume that they are not missing completely at random.
|> Telling us a bit more might help.
|>
|> Mark
|>
|>
|>
|**********************************************************************
|>
|**********************************************************************
|> *******************
|> Mark A. Davenport Ph.D.
|> Senior Research Analyst
|> Office of Institutional Research
|> The University of North Carolina at Greensboro
|> 336.256.0395
|> [hidden email]
|>
|> 'An approximate answer to the right question is worth a good
|deal more
|> than an exact answer to an approximate question.' --a
|paraphrase of J. W.
|> Tukey (1962)
|>
|>
|>
|>
|>
|>
|> Stephen Salbod <[hidden email]>
|> Sent by: "SPSSX(r) Discussion" <[hidden email]>
|> 06/21/2007 12:05 PM
|> Please respond to
|> Stephen Salbod <[hidden email]>
|>
|>
|> To
|> [hidden email]
|> cc
|>
|> Subject
|> Ahhh...Missing Data Nightmare
|>
|>
|>
|>
|>
|>
|> Good Day, Dear Friends,
|>
|>                 Nightmare: I was handed a dataset of 219 cases who
|> responded to a 75 item questions on a 5-pt scale (1=Not at all
|> important to 5=Extremely important).
|>
|> The items are to be organized into 7 scales (ns=14, 11, 16,
|13, 12, 9,
|> & 10).  Listwise deletion would leave only 121 valid cases;
|20.1% had
|> 1 missing and 12.3% had 2 missing.
|>
|>
|>
|>                 Right now, I arbitrarily selected cases for
|inclusion
|> in
|> analyses:  missing less than 5% (3 items or less); furthermore, in
|> creating the subscales I used COMPUTE scale = MEAN.n-1().
|>
|>
|>
|>                 I just picked up Paul D. Allison's Sage University
|> Paper
|> #136 on Missing Data. I should read this a long time ago.
|>
|>
|>
|>                 I am open to suggestions and/or references.
|>
|>
|>
|>                 TIA,
|>
|>
|>
|>                 Stephen Salbod, Pace University, NYC
|>
|
|
Reply | Threaded
Open this post in threaded view
|

Re: Ahhh...Missing Data Nightmare

lts1
In reply to this post by Salbod
Hi Stephen,

For a slightly different view click below.  Roth & Switzer discuss a number
of missing data technique, and they're one of the few papers I've seen that
distinguishes between missing scales and missing items within a scale.

http://division.aomonline.org/rm/1999_RMD_Forum_Missing_Data.htm.

Good luck & I hope this helps.

    Best,
        Lisa

Lisa T. Stickney
Ph.D. Candidate
The Fox School of Business
     and Management
Temple University
[hidden email]


----- Original Message -----
From: "Stephen Salbod" <[hidden email]>
To: <[hidden email]>
Sent: Thursday, June 21, 2007 11:59 AM
Subject: Ahhh...Missing Data Nightmare


> Good Day, Dear Friends,
>
>                Nightmare: I was handed a dataset of 219 cases who
> responded
> to a 75 item questions on a 5-pt scale (1=Not at all important to
> 5=Extremely important).
>
> The items are to be organized into 7 scales (ns=14, 11, 16, 13, 12, 9, &
> 10).  Listwise deletion would leave only 121 valid cases; 20.1% had 1
> missing and 12.3% had 2 missing.
>
>
>
>                Right now, I arbitrarily selected cases for inclusion in
> analyses:  missing less than 5% (3 items or less); furthermore, in
> creating
> the subscales I used COMPUTE scale = MEAN.n-1().
>
>
>
>                I just picked up Paul D. Allison's Sage University Paper
> #136 on Missing Data. I should read this a long time ago.
>
>
>
>                I am open to suggestions and/or references.
>
>
>
>                TIA,
>
>
>
>                Stephen Salbod, Pace University, NYC
>
Reply | Threaded
Open this post in threaded view
|

Re: Ahhh...Missing Data Nightmare

Claudiu Tufis
In reply to this post by Mark A Davenport MADAVENP
Those macros would be a blessing.

Right now I have to split the dataset by imputation, run the analyses,
import the results in excel, and then use an excel macro to combine the
results.

Claudiu

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Mark Davenport
Sent: Thursday, June 21, 2007 2:11 PM
To: [hidden email]
Subject: Re: Ahhh...Missing Data Nightmare

An added bonus: John Graham has a few spss macros designed to help complete
the data augmentation step (combining estimates).  I have the website
bookmarked at the office.  I send it along later today.

Mark
Reply | Threaded
Open this post in threaded view
|

Re: Ahhh...Missing Data Nightmare

Mark A Davenport MADAVENP
SPSS macros for NORM (data augmentation step)

http://mcgee.hhdev.psu.edu/missing/index.html

http://mcgee.hhdev.psu.edu/missing/sep15/index.html

***************************************************************************************************************************************************************
Mark A. Davenport Ph.D.
Senior Research Analyst
Office of Institutional Research
The University of North Carolina at Greensboro
336.256.0395
[hidden email]

'An approximate answer to the right question is worth a good deal more
than an exact answer to an approximate question.' --a paraphrase of J. W.
Tukey (1962)






Claudiu Tufis <[hidden email]>
Sent by: "SPSSX(r) Discussion" <[hidden email]>
06/21/2007 03:05 PM
Please respond to
Claudiu Tufis <[hidden email]>


To
[hidden email]
cc

Subject
Re: Ahhh...Missing Data Nightmare






Those macros would be a blessing.

Right now I have to split the dataset by imputation, run the analyses,
import the results in excel, and then use an excel macro to combine the
results.

Claudiu

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Mark Davenport
Sent: Thursday, June 21, 2007 2:11 PM
To: [hidden email]
Subject: Re: Ahhh...Missing Data Nightmare

An added bonus: John Graham has a few spss macros designed to help
complete
the data augmentation step (combining estimates).  I have the website
bookmarked at the office.  I send it along later today.

Mark
Reply | Threaded
Open this post in threaded view
|

SPSS for Windows Vista

cbautista
Hi list,

I have SPSS version 12 and version 13. Do versions work with Windows Vista?
or I need to get SPSS version 15?.

Thanks,

/Christian
Reply | Threaded
Open this post in threaded view
|

Re: SPSS for Windows Vista

zstatman
V15 has a has a Hotfix for Vista. No versions prior to this are "Vista
certified"


On 6/21/2007 5:08:24 PM, Christian Bautista ([hidden email])
wrote:
> Hi list,
>
> I have SPSS version 12 and version 13. Do versions work with Windows
> Vista?
> or I need to get SPSS version 15?.
>
> Thanks,
>
> /Christian
Will
Statistical Services
 
============
info.statman@earthlink.net
http://home.earthlink.net/~z_statman/
============
Reply | Threaded
Open this post in threaded view
|

Re: Ahhh...Missing Data Nightmare

Bauer, John H.
In reply to this post by Claudiu Tufis
A Python module rubin.py is available at SPSS Developer Central which uses Rubin's Rules to combine results:

http://www.spss.com/devcentral/index.cfm?pg=downloadDet&dId=55

The readme file assumes that multiple imputation was done by AMOS 7.0, but any validly imputed dataset will work.  AMOS 7.0 is able to save multiply imputed data, either as separate files or as a single file with a variable identifying the imputation number.  The latter is especially convenient for use in SPSS with SPLIT FILE, which is what rubin.py expects.  Multiple files are not currently supported; the files would have to be combined first.

The Python module can be used for any statistic for which the standard error is available in the same table.  First, supply syntax to compute the statistic, and identify the table containing the results using OMS identifiers.  This will run the syntax for each imputation, saving the results to a new dataset.  Next, name the variables containing the statistic and its standard error.  The final output includes two tables, one containing the statistic, standard error, t, df, and significance; the other gives the Relative Increase in Variance due to Nonresponse, and the Rate of Missing Information.

SPSS 15.0 or later is required.  If you do not have an AMOS license, a two-week free trial of AMOS 7.0 can be downloaded from http://www.spss.com/amos/ or installed from the SPSS 15.0 CD.  Again, AMOS is not required, but the examples are written assuming it was used.

Additional information can be obtained with:

BEGIN PROGRAM.
import rubin
help(rubin)
END PROGRAM.

There are also more examples in the file rubin_example.sps.


-----------------------------
The readme file for rubin.py:
-----------------------------

Given a file containing multiply imputed data saved from AMOS 7.0, this project uses Rubin's Rules to combine the results.

Before you start, run through Example 30 in the AMOS 7.0 User's Guide.  Choose to save to a single file.

Drop rubin.py in Lib/site-packages or your favorite location on the Python path, then try the examples.  In particular, Example 31 from the AMOS 7.0 User's Guide:

BEGIN PROGRAM.

from rubin import Rubin

syntax = """
REGRESSION
  /MISSING LISTWISE
  /STATISTICS COEFF OUTS R ANOVA
  /CRITERIA=PIN(.05) POUT(.10)
  /NOORIGIN
  /DEPENDENT wordmean
  /METHOD=ENTER sentence  ."""

rubin = Rubin(dataset='amos_imputations',
              syntax=syntax,
              commands='Regression',
              subtypes='Coefficients')

END PROGRAM.

*************************************************.
* Inspect the dataset with the imputed statistics.
* Decide that a split on Var3 will allow
* 'B', 'Std.Error' values to be correctly combined,
* with separate values for Var3 = 'sentence'
* and for Var3 = 'wordmean' .
* ................................................

BEGIN PROGRAM.

rubin.combineStatistics(stat="B",
                        stderror="Std.Error",
                        select=None,
                        split="Var3")

END PROGRAM.


-------------------------------------------------
Sample (draft) output:
-------------------------------------------------


Statistics combined by Rubin's Rules

|__________|____________________________________|
|B         |Statistics                          |
|          |______|__________|_____|_______|____|
|          |Mean  |Std. Error|t    |df     |Sig.|
|__________|______|__________|_____|_______|____|
|(Constant)|-2.416|3.378     |-.715|143.145|.476|
|__________|______|__________|_____|_______|____|
|sentence  |1.100 |.173      |6.372|137.398|.000|
|__________|______|__________|_____|_______|____|


Missing Information

|__________|____________________________________________________________________________|
|B         |Statistics                                                                  |
|          |________________________________________________|___________________________|
|          |Relative Increase in Variance due to Nonresponse|Rate of Missing Information|
|__________|________________________________________________|___________________________|
|(Constant)|.335                                            |.261                       |
|__________|________________________________________________|___________________________|
|sentence  |.344                                            |.267                       |
|__________|________________________________________________|___________________________|


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Claudiu Tufis
Sent: Thursday, June 21, 2007 2:04 PM
To: [hidden email]
Subject: Re: Ahhh...Missing Data Nightmare

Those macros would be a blessing.

Right now I have to split the dataset by imputation, run the analyses, import the results in excel, and then use an excel macro to combine the results.

Claudiu

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Mark Davenport
Sent: Thursday, June 21, 2007 2:11 PM
To: [hidden email]
Subject: Re: Ahhh...Missing Data Nightmare

An added bonus: John Graham has a few spss macros designed to help complete the data augmentation step (combining estimates).  I have the website bookmarked at the office.  I send it along later today.

Mark