Hi,
I was suggested using multiple imputation since I had many missing values (the sample size is small — # of organizations that responded to the survey was about 140).
Before I did so, and if I used list-wise deletion, N was about 73 for linear multiple regression. If I did moderated multiple regressions, N was below 50. Now after the procedure, N increased up to 882!!!!
Do you know why? It is great, but I cannot explain where this number comes from to the audience at an upcoming conference… Need your advice, please.
|
Administrator
|
"Now after the procedure, N increased up to 882!!!!"
Using InterneTelepathy, I will guess that "the procedure" you refer to here is multiple imputation. Is that right? Did you happen to turn off the SPLIT FILE (by Imputation_) before running your model?
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Thank you both so much -
I actually used "Maximum Likelihood" as the "procedure" as I kept getting error message when I was trying to use Multiple Imputation.(sorry for not being clarified). The warning I got from my attempt to do multiple imputation is "The imputation model for (my dependent variables) contains more than 1-- panameters, No missing values will be imputed. Reducing the number of effects in the imputation model, by margining sparse categorial variables, changing the measurement level to scale, removing two-way interactions or spacifying constains on the roles of some variables, may resolve the problem. Alternatively increase the max number of parameters allows on the MAXMODELPARM keyword of the IMPUTE subcommand." Hmmmm, could anybody tell me what I should do? From: <Ware>, William B <[hidden email]> Date: Friday, November 9, 2012 7:25 AM To: Tamaki Onishi <[hidden email]> Subject: RE: A huge difference in N?? Unfortunately, I think it means that you did the imputation incorrectly. I¹ve never seen any imputation increase the sample size. On 11/9/12 8:26 AM, "Bruce Weaver" <[hidden email]> wrote: >"Now after the procedure, N increased up to 882!!!!" > >Using InterneTelepathy, I will guess that "the procedure" you refer to >here >is multiple imputation. Is that right? Did you happen to turn off the >SPLIT FILE (by Imputation_) before running your model? > > > >Onishi, Tamaki wrote >> Hi, >> >> I was suggested using multiple imputation since I had many missing >>values >> (the sample size is small ‹ # of organizations that responded to the >> survey was about 140). >> >> Before I did so, and if I used list-wise deletion, N was about 73 for >> linear multiple regression. If I did moderated multiple regressions, N >>was >> below 50. Now after the procedure, N increased up to 882!!!! >> >> Do you know why? It is great, but I cannot explain where this number >>comes >> from to the audience at an upcoming conference Need your advice, >>please. > > > > > >----- >-- >Bruce Weaver >[hidden email] >http://sites.google.com/a/lakeheadu.ca/bweaver/ > >"When all else fails, RTFM." > >NOTE: My Hotmail account is not monitored regularly. >To send me an e-mail, please use the address shown above. > >-- >View this message in context: >http://spssx-discussion.1045642.n5.nabble.com/A-huge-difference-in-N-tp571 >6127p5716136.html >Sent from the SPSSX Discussion mailing list archive at Nabble.com. > >===================== >To manage your subscription to SPSSX-L, send a message to >[hidden email] (not to SPSSX-L), with no body text except the >command. To leave the list, send the command >SIGNOFF SPSSX-L >For a list of commands to manage subscriptions, send the command >INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
"Hmmmm, could anybody tell me what I should do? "
For a start provide a concise description of the data and their measurement properties and the SYNTAX used to do the imputation. That would at least take the discussion beyond InTel and ESPss . --
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Bruce Weaver
Hello all,
I finally got a home premium package of SPSS21, which has an add-on feature of Multiple Imputation. I was reading the IBM manual "Missing Values" and was about to try the example in this manual using a sample data. According to this manual, "the sample files installed with the product can be found in th Samples subdirectory of the installation directory." I looked for one in the CD ROM, etc., but was not able to locate this "Samples subdirectory" thing. Does anybody know where I can find one? Thanks much in advance. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Where did you install Statistics? The
samples subdirectory will be under that.
Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] new phone: 720-342-5621 From: "Onishi, Tamaki" <[hidden email]> To: [hidden email], Date: 11/29/2012 07:56 PM Subject: [SPSSX-L] SPSS21 - Where is "Samples" sub directory of the installation directory?? Sent by: "SPSSX(r) Discussion" <[hidden email]> Hello all, I finally got a home premium package of SPSS21, which has an add-on feature of Multiple Imputation. I was reading the IBM manual "Missing Values" and was about to try the example in this manual using a sample data. According to this manual, "the sample files installed with the product can be found in th Samples subdirectory of the installation directory." I looked for one in the CD ROM, etc., but was not able to locate this "Samples subdirectory" thing. Does anybody know where I can find one? Thanks much in advance. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Great, I located it. Thanks, Jon.
From: Jon K Peck <[hidden email]>
Date: Thursday, November 29, 2012 10:21 PM To: Tamaki Onishi <[hidden email]> Cc: "[hidden email]" <[hidden email]> Subject: Re: [SPSSX-L] SPSS21 - Where is "Samples" sub directory of the installation directory?? Where did you install Statistics? The samples subdirectory will be under that.
Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] new phone: 720-342-5621 From: "Onishi, Tamaki" <[hidden email]> To: [hidden email], Date: 11/29/2012 07:56 PM Subject: [SPSSX-L] SPSS21 - Where is "Samples" sub directory of the installation directory?? Sent by: "SPSSX(r) Discussion" <[hidden email]> Hello all, I finally got a home premium package of SPSS21, which has an add-on feature of Multiple Imputation. I was reading the IBM manual "Missing Values" and was about to try the example in this manual using a sample data. According to this manual, "the sample files installed with the product can be found in th Samples subdirectory of the installation directory." I looked for one in the CD ROM, etc., but was not able to locate this "Samples subdirectory" thing. Does anybody know where I can find one? Thanks much in advance. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Bruce Weaver
Hi all,
I am using SPSS Multiple Imputation by following the IBM SPSS Missing Values 19 Guide, and am about to begin imputation. Then, I found this "Automatic Imputation of Missing Value" section in the IBM guide. The section explains that first setting the random seed by going to Transform > Random Number Generators, and then --> Set Active Generator --> Mersenne Twister --> Set Starting Point --> Fixed Value and type 20070525 as the value. I don't understand if this value of 20070525 is applied to all datasets, or only to this example. If the latter is the case, how should I decide what value I should use? Thanks much always. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
The actual starting value for a random
number generator makes no difference. It is only useful if you want
to reproduce the exact same sequence of random numbers at a point in the
future. The case study might include that instruction so that your
results will match exactly those shown in the study.
Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] new phone: 720-342-5621 From: "Onishi, Tamaki" <[hidden email]> To: [hidden email], Date: 11/30/2012 01:36 PM Subject: [SPSSX-L] Automatic Imputation of Missing Values - "Random Number Generators"? Sent by: "SPSSX(r) Discussion" <[hidden email]> Hi all, I am using SPSS Multiple Imputation by following the IBM SPSS Missing Values 19 Guide, and am about to begin imputation. Then, I found this "Automatic Imputation of Missing Value" section in the IBM guide. The section explains that first setting the random seed by going to Transform > Random Number Generators, and then --> Set Active Generator --> Mersenne Twister --> Set Starting Point --> Fixed Value and type 20070525 as the value. I don't understand if this value of 20070525 is applied to all datasets, or only to this example. If the latter is the case, how should I decide what value I should use? Thanks much always. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by tonishi@iupui.edu
Hi Tamaki,
I haven't used the automated imputation in SPSS in awhile, but seeds are typically a way of controlling for possible collisions with other random generators (to explain why one generator might generate the same set of random numbers is an entirely different thread, but seeds are a way of ensuring, if set properly, that you don't have the same values being generated). And you can use the (same) seed to regenerate the data points if you ever needed it (say for example that you documented in your report that you used a seed of ########... if someone wanted to go back and check your work they could use that seed and recreate those values).
It looks like the seed you providing is a date - the IBM/SPSS manual is probably suggesting to put in the date (2007, 05, 25 in this case). That way, if you come back to the code at a later date and do more imputation and use the random numbers generator again, you'll input a _NEW_ seed value (which would be the next date) to ensure you are getting no patterns in your random generator.
I don't understand if this value of 20070525 is applied to all datasets, Im not sitting at a PC that has SPSS right now, but if I had to guess, like most random generators, the seed value will reset itself once you close SPSS (try it and see - I could be wrong). So this "seed" would only be for this "example" and not all "datasets".
-J ---- J. R. Carroll On Fri, Nov 30, 2012 at 3:34 PM, Onishi, Tamaki <[hidden email]> wrote: Hi all, |
In reply to this post by Jon K Peck
Hello,
I was trying to run imputation, then I keep getting the following warnings:
What I did was to include all raw data from my survey. Since many of these will be summed or averaged to be aggregated variables, there are numerous missing values to impute.
In this warning message, I am interpreting that I can merge some variables. Is this only for categorical variables? Many of my IV are based on scales and computed by summing or averaging 5~20
different scale items. Should I create aggregated variables first to create IV and then run imputation model based on these IV rather than original scale items?
Thanks!
|
You already have the answer to your question. The program won’t execute and to get it to execute you have to reduce the number of parameters. Your question is about how to do that. So, let’s assume you have evaluated the likelihood of items being NMAR (not missing at random) and ruled that out for all items. Now you have two kinds of items. Items to be used as part of a scale of items such that the scale score is the variable and items to be used as variables (e.g., sex, age, etc). You’ve done a missing data analysis and identified which variables are predictors of another variable being missing (i.e., missingness) AND how well pairs of variables are correlated. Probably (but you can check this), you could treat scale items as MCAR (missing completely at random) and average the items to form scale scores. That still might be too many variables to pass through the imputation routine. If so, then you pretty much have to do the imputation for the questions/hypotheses to be investigated. Gene Maguin From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Onishi, Tamaki Hello, I was trying to run imputation, then I keep getting the following warnings:
What I did was to include all raw data from my survey. Since many of these will be summed or averaged to be aggregated variables, there are numerous missing values to impute. In this warning message, I am interpreting that I can merge some variables. Is this only for categorical variables? Many of my IV are based on scales and computed by summing or averaging 5~20 different scale items. Should I create aggregated variables first to create IV and then run imputation model based on these IV rather than original scale items? Thanks! |
In reply to this post by tonishi@iupui.edu
Let’s check that ‘aggregate’ has the same meaning for both of us. I use ‘aggregate’ to mean that I summing or averaging a variable across records. In other words, the operation defined by the spss Aggregate command. What is your meaning of aggregate? What I suggested was what would be the result of the following command: Compute xmean=mean.5(x1 to x7). Do you, by chance, have a multilevel data structure? For instance, employees at a number of companies were surveyed about something and now you want to know about how company characteristics relate to that something? Gene Maguin From: Onishi, Tamaki [mailto:[hidden email]] Thank you so much, Dr. Maguin. I have a slightly different question about one point you mentioned below — "average the items to form scale scores." Is averaging items a standard way to create an aggregated variable? I was initially averaging scale scores, but ended up creating many variables by summing the scores of scale items, rather than averaging or weighted averaging them, primarily as suggested by my dissertation committee (my field is management). Their rationale is that because all items are equally impotent and there is no need to weight. But, I was challenged at a conference about this way to creating and so far no literature gave me a clear idea about what is a good way to go, and more importantly why so. I really appreciate your advice on this. Thank you so much, Tamaki From: <Maguin>, Eugene <[hidden email]> You already have the answer to your question. The program won’t execute and to get it to execute you have to reduce the number of parameters. Your question is about how to do that. So, let’s assume you have evaluated the likelihood of items being NMAR (not missing at random) and ruled that out for all items. Now you have two kinds of items. Items to be used as part of a scale of items such that the scale score is the variable and items to be used as variables (e.g., sex, age, etc). You’ve done a missing data analysis and identified which variables are predictors of another variable being missing (i.e., missingness) AND how well pairs of variables are correlated. Probably (but you can check this), you could treat scale items as MCAR (missing completely at random) and average the items to form scale scores. That still might be too many variables to pass through the imputation routine. If so, then you pretty much have to do the imputation for the questions/hypotheses to be investigated. Gene Maguin From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Onishi, Tamaki Hello, I was trying to run imputation, then I keep getting the following warnings:
What I did was to include all raw data from my survey. Since many of these will be summed or averaged to be aggregated variables, there are numerous missing values to impute. In this warning message, I am interpreting that I can merge some variables. Is this only for categorical variables? Many of my IV are based on scales and computed by summing or averaging 5~20 different scale items. Should I create aggregated variables first to create IV and then run imputation model based on these IV rather than original scale items? Thanks! |
Free forum by Nabble | Edit this page |