|
Hello :-)
First time posting so forgive me if I give an unclear description of what I am trying to achieve.. I need to bootstrap my correlations due to non-normal age distribution (and age is correlated with most variables, so I have controlled by regressing tasks onto age). Anyway, I would like to take advantage of all the data available but some data is not available (i.e. because the participant didn't understand the task so were excluded, or they didn't complete the task). So this means I have different N on each variable. Of course when I bootstrap all correlations in the matrix it uses listwise deletion. Is it ok to instead bootstrap each pair of correlations and create my own matrix? And then run a multiple regression on these bootstrapped correlations using pairwise deletion (or use as input into Amos)? Any insights and advice would be greatly appreciated! Many thanks, Laura |
|
Administrator
|
If I follow, the ultimate goal is to estimate a multiple regression model that includes age plus a bunch of other variables, but you are concerned because age is not normally distributed. Right? What does the age distribution look like?
Bear in mind the following points. 1. The normal distribution is just a model, and nothing in nature is truly normal (see mkweb.bcgsc.ca/pointsofsignificance/img/Boxonmaths.pdf). (Nothing in nature is truly linear either.) 2. The key assumptions of OLS linear regression are that the *errors* (not the variables) are independently and identically distributed as normal with a mean of 0 and some variance (sigma^2). And normality of the errors is less important than their independence and homoscedasticity. With those points in mind, you might want to just fit your regression model and then examine the residuals (e.g., using residual plots). Googling <SPSS regression residual analysis> will likely turn up some good info on how to proceed. HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
Besides Bruce's points, which are all valid, you may have endogenous selection of your complete data for the regression, which could introduce bias in the regression model. However, without more information about that model and the dependent variable, it is impossible to know. On Wed, May 24, 2017 at 5:52 AM, Bruce Weaver <[hidden email]> wrote: If I follow, the ultimate goal is to estimate a multiple regression model |
| Free forum by Nabble | Edit this page |
