A huge difference in N??

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

A huge difference in N??

tonishi@iupui.edu
Hi, 

I was suggested using multiple imputation since I had many missing values (the sample size is small — # of organizations that responded to the survey was about 140). 

Before I did so, and if I used list-wise deletion, N was about 73 for linear multiple regression. If I did moderated multiple regressions, N was below 50. Now after the procedure, N increased up to 882!!!! 

Do you know why? It is great, but I cannot explain where this number comes from to the audience at an upcoming conference… Need your advice, please. 


Reply | Threaded
Open this post in threaded view
|

Re: A huge difference in N??

Bruce Weaver
Administrator
"Now after the procedure, N increased up to 882!!!!"

Using InterneTelepathy, I will guess that "the procedure" you refer to here is multiple imputation.  Is that right?  Did you happen to turn off the SPLIT FILE (by Imputation_) before running your model?  


Onishi, Tamaki wrote
Hi,

I was suggested using multiple imputation since I had many missing values (the sample size is small — # of organizations that responded to the survey was about 140).

Before I did so, and if I used list-wise deletion, N was about 73 for linear multiple regression. If I did moderated multiple regressions, N was below 50. Now after the procedure, N increased up to 882!!!!

Do you know why? It is great, but I cannot explain where this number comes from to the audience at an upcoming conference… Need your advice, please.
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: A huge difference in N??

tonishi@iupui.edu
Thank you both so much -

I actually used "Maximum Likelihood" as the "procedure" as I kept getting
error message when I was trying to use Multiple Imputation.(sorry for not
being clarified). The warning I got from my attempt to do multiple
imputation is

"The imputation model for (my dependent variables) contains more than 1--
panameters, No missing values will be imputed. Reducing the number of
effects in the imputation model, by margining sparse categorial variables,
changing the measurement level to scale, removing two-way interactions or
spacifying constains  on the roles of some variables, may resolve the
problem.
Alternatively increase the max number of parameters allows on the
MAXMODELPARM keyword of the IMPUTE subcommand."

Hmmmm, could anybody tell me what I should do?




From: <Ware>, William B <[hidden email]>
Date: Friday, November 9, 2012 7:25 AM
To: Tamaki Onishi <[hidden email]>
Subject: RE: A huge difference in N??


Unfortunately, I think it means that you did the imputation incorrectly.
I¹ve never seen any imputation increase the sample size.




On 11/9/12 8:26 AM, "Bruce Weaver" <[hidden email]> wrote:

>"Now after the procedure, N increased up to 882!!!!"
>
>Using InterneTelepathy, I will guess that "the procedure" you refer to
>here
>is multiple imputation.  Is that right?  Did you happen to turn off the
>SPLIT FILE (by Imputation_) before running your model?
>
>
>
>Onishi, Tamaki wrote
>> Hi,
>>
>> I was suggested using multiple imputation since I had many missing
>>values
>> (the sample size is small ‹ # of organizations that responded to the
>> survey was about 140).
>>
>> Before I did so, and if I used list-wise deletion, N was about 73 for
>> linear multiple regression. If I did moderated multiple regressions, N
>>was
>> below 50. Now after the procedure, N increased up to 882!!!!
>>
>> Do you know why? It is great, but I cannot explain where this number
>>comes
>> from to the audience at an upcoming conference  Need your advice,
>>please.
>
>
>
>
>
>-----
>--
>Bruce Weaver
>[hidden email]
>http://sites.google.com/a/lakeheadu.ca/bweaver/
>
>"When all else fails, RTFM."
>
>NOTE: My Hotmail account is not monitored regularly.
>To send me an e-mail, please use the address shown above.
>
>--
>View this message in context:
>http://spssx-discussion.1045642.n5.nabble.com/A-huge-difference-in-N-tp571
>6127p5716136.html
>Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
>=====================
>To manage your subscription to SPSSX-L, send a message to
>[hidden email] (not to SPSSX-L), with no body text except the
>command. To leave the list, send the command
>SIGNOFF SPSSX-L
>For a list of commands to manage subscriptions, send the command
>INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: A huge difference in N??

David Marso
Administrator
"Hmmmm, could anybody tell me what I should do? "
For a start provide a concise description of the data and their measurement properties and the SYNTAX used to do the imputation.  That would at least take the discussion beyond InTel and ESPss .
--
Onishi, Tamaki wrote
Thank you both so much -

I actually used "Maximum Likelihood" as the "procedure" as I kept getting
error message when I was trying to use Multiple Imputation.(sorry for not
being clarified). The warning I got from my attempt to do multiple
imputation is

"The imputation model for (my dependent variables) contains more than 1--
panameters, No missing values will be imputed. Reducing the number of
effects in the imputation model, by margining sparse categorial variables,
changing the measurement level to scale, removing two-way interactions or
spacifying constains  on the roles of some variables, may resolve the
problem.
Alternatively increase the max number of parameters allows on the
MAXMODELPARM keyword of the IMPUTE subcommand."

Hmmmm, could anybody tell me what I should do?




From: <Ware>, William B <[hidden email]>
Date: Friday, November 9, 2012 7:25 AM
To: Tamaki Onishi <[hidden email]>
Subject: RE: A huge difference in N??


Unfortunately, I think it means that you did the imputation incorrectly.
I¹ve never seen any imputation increase the sample size.




On 11/9/12 8:26 AM, "Bruce Weaver" <[hidden email]> wrote:

>"Now after the procedure, N increased up to 882!!!!"
>
>Using InterneTelepathy, I will guess that "the procedure" you refer to
>here
>is multiple imputation.  Is that right?  Did you happen to turn off the
>SPLIT FILE (by Imputation_) before running your model?
>
>
>
>Onishi, Tamaki wrote
>> Hi,
>>
>> I was suggested using multiple imputation since I had many missing
>>values
>> (the sample size is small ‹ # of organizations that responded to the
>> survey was about 140).
>>
>> Before I did so, and if I used list-wise deletion, N was about 73 for
>> linear multiple regression. If I did moderated multiple regressions, N
>>was
>> below 50. Now after the procedure, N increased up to 882!!!!
>>
>> Do you know why? It is great, but I cannot explain where this number
>>comes
>> from to the audience at an upcoming conference  Need your advice,
>>please.
>
>
>
>
>
>-----
>--
>Bruce Weaver
>[hidden email]
>http://sites.google.com/a/lakeheadu.ca/bweaver/
>
>"When all else fails, RTFM."
>
>NOTE: My Hotmail account is not monitored regularly.
>To send me an e-mail, please use the address shown above.
>
>--
>View this message in context:
>http://spssx-discussion.1045642.n5.nabble.com/A-huge-difference-in-N-tp571
>6127p5716136.html
>Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
>=====================
>To manage your subscription to SPSSX-L, send a message to
>[hidden email] (not to SPSSX-L), with no body text except the
>command. To leave the list, send the command
>SIGNOFF SPSSX-L
>For a list of commands to manage subscriptions, send the command
>INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

SPSS21 - Where is "Samples" sub directory of the installation directory??

tonishi@iupui.edu
In reply to this post by Bruce Weaver
Hello all,

I finally got a home premium package of SPSS21, which has an add-on
feature of Multiple Imputation.

I was reading the IBM manual "Missing Values" and was about to try the
example in this manual using a sample data. According to this manual, "the
sample files installed with the product can be found in th Samples
subdirectory of the installation directory." I looked for one in the CD
ROM, etc., but was not able to locate this "Samples subdirectory" thing.
Does anybody know where I can find one?

Thanks much in advance.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: SPSS21 - Where is "Samples" sub directory of the installation directory??

Jon K Peck
Where did you install Statistics?  The samples subdirectory will be under that.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        "Onishi, Tamaki" <[hidden email]>
To:        [hidden email],
Date:        11/29/2012 07:56 PM
Subject:        [SPSSX-L] SPSS21 - Where is "Samples" sub directory of the              installation              directory??
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Hello all,

I finally got a home premium package of SPSS21, which has an add-on
feature of Multiple Imputation.

I was reading the IBM manual "Missing Values" and was about to try the
example in this manual using a sample data. According to this manual, "the
sample files installed with the product can be found in th Samples
subdirectory of the installation directory." I looked for one in the CD
ROM, etc., but was not able to locate this "Samples subdirectory" thing.
Does anybody know where I can find one?

Thanks much in advance.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: SPSS21 - Where is "Samples" sub directory of the installation directory??

tonishi@iupui.edu
Great, I located it. Thanks, Jon.

From: Jon K Peck <[hidden email]>
Date: Thursday, November 29, 2012 10:21 PM
To: Tamaki Onishi <[hidden email]>
Cc: "[hidden email]" <[hidden email]>
Subject: Re: [SPSSX-L] SPSS21 - Where is "Samples" sub directory of the installation directory??

Where did you install Statistics?  The samples subdirectory will be under that.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        "Onishi, Tamaki" <[hidden email]>
To:        [hidden email],
Date:        11/29/2012 07:56 PM
Subject:        [SPSSX-L] SPSS21 - Where is "Samples" sub directory of the              installation              directory??
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Hello all,

I finally got a home premium package of SPSS21, which has an add-on
feature of Multiple Imputation.

I was reading the IBM manual "Missing Values" and was about to try the
example in this manual using a sample data. According to this manual, "the
sample files installed with the product can be found in th Samples
subdirectory of the installation directory." I looked for one in the CD
ROM, etc., but was not able to locate this "Samples subdirectory" thing.
Does anybody know where I can find one?

Thanks much in advance.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Automatic Imputation of Missing Values - "Random Number Generators"?

tonishi@iupui.edu
In reply to this post by Bruce Weaver
Hi all,

I am using SPSS Multiple Imputation by following the IBM SPSS Missing
Values 19 Guide, and am about to begin imputation. Then, I found this
"Automatic Imputation of Missing Value" section in the IBM guide. The
section explains that first setting the random seed by going to Transform
> Random Number Generators, and then

--> Set Active Generator --> Mersenne Twister --> Set Starting Point -->
Fixed Value and type 20070525 as the value.

I don't understand if this value of 20070525 is applied to all datasets,
or only to this example. If the latter is the case, how should I decide
what value I should use?

Thanks much always.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Automatic Imputation of Missing Values - "Random Number Generators"?

Jon K Peck
The actual starting value for a random number generator makes no difference.  It is only useful if you want to reproduce the exact same sequence of random numbers at a point in the future.  The case study might include that instruction so that your results will match exactly those shown in the study.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        "Onishi, Tamaki" <[hidden email]>
To:        [hidden email],
Date:        11/30/2012 01:36 PM
Subject:        [SPSSX-L] Automatic Imputation of Missing Values - "Random Number              Generators"?
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Hi all,

I am using SPSS Multiple Imputation by following the IBM SPSS Missing
Values 19 Guide, and am about to begin imputation. Then, I found this
"Automatic Imputation of Missing Value" section in the IBM guide. The
section explains that first setting the random seed by going to Transform
> Random Number Generators, and then

--> Set Active Generator --> Mersenne Twister --> Set Starting Point -->
Fixed Value and type 20070525 as the value.

I don't understand if this value of 20070525 is applied to all datasets,
or only to this example. If the latter is the case, how should I decide
what value I should use?

Thanks much always.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Automatic Imputation of Missing Values - "Random Number Generators"?

J. R. Carroll
In reply to this post by tonishi@iupui.edu
Hi Tamaki, 

I haven't used the automated imputation in SPSS in awhile, but seeds are typically a way of controlling for possible collisions with other random generators (to explain why one generator might generate the same set of random numbers is an entirely different thread, but seeds are a way of ensuring, if set properly, that you don't have the same values being generated).  And you can use the (same) seed to regenerate the data points if you ever needed it (say for example that you documented in your report that you used a seed of ########... if someone wanted to go back and check your work they could use that seed and recreate those values).  

It looks like the seed you providing is a date - the IBM/SPSS manual is probably suggesting to put in the date (2007, 05, 25 in this case).  That way, if you come back to the code at a later date and do more imputation and use the random numbers generator again, you'll input a _NEW_ seed value (which would be the next date) to ensure you are getting no patterns in your random generator. 

I don't understand if this value of 20070525 is applied to all datasets,
or only to this example. If the latter is the case, how should I decide
what value I should use?

Im not sitting at a PC that has SPSS right now, but if I had to guess, like most random generators, the seed value will reset itself once you close SPSS (try it and see - I could be wrong).  So this "seed" would only be for this "example" and not all "datasets".  

-J

----


J. R. Carroll





On Fri, Nov 30, 2012 at 3:34 PM, Onishi, Tamaki <[hidden email]> wrote:
Hi all,

I am using SPSS Multiple Imputation by following the IBM SPSS Missing
Values 19 Guide, and am about to begin imputation. Then, I found this
"Automatic Imputation of Missing Value" section in the IBM guide. The
section explains that first setting the random seed by going to Transform
> Random Number Generators, and then

--> Set Active Generator --> Mersenne Twister --> Set Starting Point -->
Fixed Value and type 20070525 as the value.

I don't understand if this value of 20070525 is applied to all datasets,
or only to this example. If the latter is the case, how should I decide
what value I should use?

Thanks much always.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Too many parameters in the imputation model? -- should impute original data or aggregate data?

tonishi@iupui.edu
In reply to this post by Jon K Peck
Hello, 

I was trying to run imputation, then I keep getting the following warnings: 


The imputation model for ORG_NP contains more than 100 parameters. No missing values will be imputed. Reducing the number of effects in the imputation model, by merging sparse categories of categorical variables, changing the measurement level of ordinal variables to scale, removing two-way interactions, or specifying constraints on the roles of some variables, may resolve the problem. Alternatively increase the maximum number of parameters allowed on the MAXMODELPARAM keyword of the IMPUTE subcommand.


What I did was to include all raw data from my survey. Since many of these will be summed or averaged to be aggregated variables, there are numerous missing values to impute. 

In this warning message, I am interpreting that I can merge some variables. Is this only for categorical variables? Many of my IV are based on scales and computed by summing or averaging 5~20 different scale items. Should I create aggregated variables first to create IV and then run imputation model based on these IV rather than original scale items? 

Thanks! 


Reply | Threaded
Open this post in threaded view
|

Re: Too many parameters in the imputation model? -- should impute original data or aggregate data?

Maguin, Eugene

You already have the answer to your question. The program won’t execute and to get it to execute you have to reduce the number of parameters. Your question is about how to do that. So, let’s assume you have evaluated the likelihood of items being NMAR (not missing at random) and ruled that out for all items.  Now you have two kinds of items. Items to be used as part of a scale of items such that the scale score is the variable and items to be used as variables (e.g., sex, age, etc). You’ve done a missing data analysis and identified which variables are predictors of another variable being missing (i.e., missingness) AND how well pairs of variables are correlated. Probably (but you can check this), you could treat scale items as MCAR (missing completely at random) and average the items to form scale scores. That still might be too many variables to pass through the imputation routine. If so, then you pretty much have to do the imputation for the questions/hypotheses to be investigated.

 

Gene Maguin

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Onishi, Tamaki
Sent: Monday, December 03, 2012 11:39 AM
To: [hidden email]
Subject: Too many parameters in the imputation model? -- should impute original data or aggregate data?

 

Hello, 

 

I was trying to run imputation, then I keep getting the following warnings: 

 

The imputation model for ORG_NP contains more than 100 parameters. No missing values will be imputed. Reducing the number of effects in the imputation model, by merging sparse categories of categorical variables, changing the measurement level of ordinal variables to scale, removing two-way interactions, or specifying constraints on the roles of some variables, may resolve the problem. Alternatively increase the maximum number of parameters allowed on the MAXMODELPARAM keyword of the IMPUTE subcommand.

 

What I did was to include all raw data from my survey. Since many of these will be summed or averaged to be aggregated variables, there are numerous missing values to impute. 

 

In this warning message, I am interpreting that I can merge some variables. Is this only for categorical variables? Many of my IV are based on scales and computed by summing or averaging 5~20 different scale items. Should I create aggregated variables first to create IV and then run imputation model based on these IV rather than original scale items? 

 

Thanks! 

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Too many parameters in the imputation model? -- should impute original data or aggregate data?

Maguin, Eugene
In reply to this post by tonishi@iupui.edu

Let’s check that ‘aggregate’ has the same meaning for both of us. I use ‘aggregate’ to mean that I summing or averaging a variable across records. In other words, the operation defined by the spss Aggregate command. What is your meaning of aggregate?

 

What I suggested was what would be the result of the following command:

Compute xmean=mean.5(x1 to x7).

 

Do you, by chance, have a multilevel data structure? For instance, employees at a number of companies were surveyed about something and now you want to know about how company characteristics relate to that something?

 

Gene Maguin

 

 

From: Onishi, Tamaki [mailto:[hidden email]]
Sent: Monday, December 03, 2012 3:39 PM
To: Maguin, Eugene
Subject: Re: Too many parameters in the imputation model? -- should impute original data or aggregate data?

 

Thank you so much, Dr. Maguin. I have a slightly different question about one point you mentioned below — "average the items to form scale scores." 

 

Is averaging items a standard way to create an aggregated variable? I was initially averaging scale scores, but ended up creating many variables by summing the scores of scale items, rather than averaging or weighted averaging them, primarily as suggested by my dissertation committee (my field is management). Their rationale is that because all items are equally impotent and there is no need to weight. But, I was challenged at a conference about this way to creating and so far no literature gave me a clear idea about what is a good way to go, and more importantly why so. 

 

I really appreciate your advice on this. Thank you so much, 

Tamaki 

 

From: <Maguin>, Eugene <[hidden email]>
Reply-To: "Maguin, Eugene" <[hidden email]>
Date: Monday, December 3, 2012 1:18 PM
To: "[hidden email]" <[hidden email]>
Subject: Re: Too many parameters in the imputation model? -- should impute original data or aggregate data?

 

You already have the answer to your question. The program won’t execute and to get it to execute you have to reduce the number of parameters. Your question is about how to do that. So, let’s assume you have evaluated the likelihood of items being NMAR (not missing at random) and ruled that out for all items.  Now you have two kinds of items. Items to be used as part of a scale of items such that the scale score is the variable and items to be used as variables (e.g., sex, age, etc). You’ve done a missing data analysis and identified which variables are predictors of another variable being missing (i.e., missingness) AND how well pairs of variables are correlated. Probably (but you can check this), you could treat scale items as MCAR (missing completely at random) and average the items to form scale scores. That still might be too many variables to pass through the imputation routine. If so, then you pretty much have to do the imputation for the questions/hypotheses to be investigated.

 

Gene Maguin

 

 

From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Onishi, Tamaki
Sent: Monday, December 03, 2012 11:39 AM
To: [hidden email]
Subject: Too many parameters in the imputation model? -- should impute original data or aggregate data?

 

Hello, 

 

I was trying to run imputation, then I keep getting the following warnings: 

 

The imputation model for ORG_NP contains more than 100 parameters. No missing values will be imputed. Reducing the number of effects in the imputation model, by merging sparse categories of categorical variables, changing the measurement level of ordinal variables to scale, removing two-way interactions, or specifying constraints on the roles of some variables, may resolve the problem. Alternatively increase the maximum number of parameters allowed on the MAXMODELPARAM keyword of the IMPUTE subcommand.

 

What I did was to include all raw data from my survey. Since many of these will be summed or averaged to be aggregated variables, there are numerous missing values to impute. 

 

In this warning message, I am interpreting that I can merge some variables. Is this only for categorical variables? Many of my IV are based on scales and computed by summing or averaging 5~20 different scale items. Should I create aggregated variables first to create IV and then run imputation model based on these IV rather than original scale items? 

 

Thanks!