In response to my last multiple imputation request, Art encouraged me to write down my problem in detail. He also contributed a multitude of questions, most of them which I will try to answer here.
Design & sample: - 9 response variables, ordinal (0,1,2,3), intercorrelated (between .2 and .3). They are 9 single questions from a psychological screening instrument, load onto one factor. I suspect great heterogeneity though, and want to look at these items invididually (yes, they are just single questions, and where never designed to be used independently, but that's what I will have to do, in lack of other data available). The DVs are skewed, about 40% have 0 (no problems), 30% 1, 20% 2, 10% 3 (problems nearly every day). Obviously, since they are ordinal, I cannot log transform. One could argue, however, that they could be considered continuous (0=at no point during the last 2 weeks, 1= 2 days, 2= 4 days, 3=nearly every day). - 5 measurement points - 800 subjects. No groups, so every person has 1 data point on each response variable on each measurement point. - every subject had major stress while the study ran, so overall, the response variables increase drastically (some more, some less) - 10 baseline covariates that are all interesting in terms of explaining the increase of response variables over time (e.g. personality facets, gender) - time-varying covariates (e.g. workload between this and last measurement point) - missing data: data not missing at random. Dropouts occur for people who have higher score on response variables at measurement point before dropout. It's typical for a psychological study. It occurs on all variables, also the DVs. First measurement point 3% missing on DV, 5th measurement point 40% missing on DVs. Research Question: - Does predictor x1 have differential effects on the outcome variables? This is exploratory. E.g. x1 could only affect y1 y4 y5 and y6, and x2 only y5, whereas x3 only affect y1-4 and y6-y8. This is unclear yet, because usually people use the sum-score of y1 - y9 and just calculate ONE (e.g.) regression from x1 to Y(total). Models (1) Now, one could use 9 univariate tests (repeated measurement GLMM, currently in SPSS20, with AR1 and random effects "subject" and "time"), and predict each of y1 to y9 by x1 to x15. But that (a) doesn't control for the fact that the response variables are correlated, and (b) invites type-I error. I did this, as a first step, however, and found that some x only predict some y, whereas some x predict all y, so this seems worth exploring further. (I might eventually have to do it this way, because multivariate response models with 9 outcomes seem to be impossible to compute.) (2) The second option is running multivariate models, and go for interaction effects between predictors and the multivariate response. I'm currently trying to do this in R (MCMCglmm), but it's pretty hard to set up the priors, and the interpretations are messy in a model with 15 predictors * Y(multivariate). I'd be happy about any kind of input how I could try to answer my research question. Thanks -- T |
Below is my Knee Jerk Responses to these issues:
Matthew J Poes Research Data Specialist Center for Prevention Research and Development University of Illinois 510 Devonshire Dr. Champaign, IL 61820 Phone: 217-265-4576 email: [hidden email] -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of torvon Sent: Tuesday, May 01, 2012 6:05 AM To: [hidden email] Subject: Multivariate Multilevel Mixed-Effects Model In response to my last multiple imputation request, Art encouraged me to write down my problem in detail. He also contributed a multitude of questions, most of them which I will try to answer here. *Design & sample:* - 9 response variables, ordinal (0,1,2,3), intercorrelated (between .2 and .3). They are 9 single questions from a psychological screening instrument, load onto one factor. I suspect great heterogeneity though, and want to look at these items invididually (yes, they are just single questions, and where never designed to be used independently, but that's what I will have to do, in lack of other data available). The DVs are skewed, about 40% have 0 (no problems), 30% 1, 20% 2, 10% 3 (problems nearly every day). Obviously, since they are ordinal, I cannot log transform. One could argue, however, that they could be considered continuous (0=at no point during the last 2 weeks, 1= 2 days, 2= 4 days, 3=nearly every day). *MP: You could use these as linear continuous variables, but it seems like this is not necessarily a tenable assumption. You can also treat them as a set of dummy coded categorical variables, and this may introduce less biased estimates. *MP: When you say response variable, you mean this is your DV? If so, treating them as separate means running 9 separate analysis, this is a bit crazy. On top of that, error in single item response questions used as DV's is going to be high and create problems for the analysis. I would recommend not doing this personally. I think SEM would be a better approach if this is what you want to do. - 5 measurement points - 800 subjects. No groups, so every person has 1 data point on each response variable on each measurement point. - every subject had major stress while the study ran, so overall, the response variables increase drastically (some more, some less) - 10 baseline covariates that are all interesting in terms of explaining the increase of response variables over time (e.g. personality facets, gender) *MP: Are these covariates important explanatory variables, or true covariates? If they are important explanatory variables, you may want to develop a set of hypothesis on how each is expected to impact the outcome, and what this function would look like. Besides being important for interpretation, this may help reduce the modeling complexity some. *MP: Are any of these expected to interact with any others, or with any other factors in the model? - time-varying covariates (e.g. workload between this and last measurement point) *MP: That's fine, just make sure you put them in the right location in your model. - missing data: data not missing at random. Dropouts occur for people who have higher score on response variables at measurement point before dropout. It's typical for a psychological study. It occurs on all variables, also the DVs. First measurement point 3% missing on DV, 5th measurement point 40% missing on DVs. *MP: You have a bunch of issues to work on here. First, NMAR means you need to add to your model the predictive reason for missing data. In this case, stress on the previous score. I might consider a probability score for each time point that the next time point will be missing. I'd also consider adding a dummy variable for those missing variables. I've seen various large scale models of psychological distress with large amounts of missing data in the end time points make it to publication, in fact, this was a specialty for a past instructor of mine. HLM itself will handle this fine, but the accuracy and generalizability of the estimates in the time points with more missing data decreases. *MP: I'd also look into some of the other approaches to NMAR data for this. I'm not an expert, but I understand pattern-mixture modeling to be a common good approach. *MP: Little suggests the use of Last observation carried forward as a pattern mixture model. This would seem like the best approach to me, though I'm really not an expert on this. *Research Question:* - Does predictor x1 have differential effects on the outcome variables? This is exploratory. E.g. x1 could only affect y1 y4 y5 and y6, and x2 only y5, whereas x3 only affect y1-4 and y6-y8. This is unclear yet, because usually people use the sum-score of y1 - y9 and just calculate ONE (e.g.) regression from x1 to Y(total). *MP: And they probably do this for a reason. Like I said, there is an implicit idea to summative scales that they have reduced error in measuring the construct that the set of item's collectively measures, which no single item can measure accurately. *MP: Have you considered a factor analysis of items to see if they can load as you intend? Even if they don't, what about combining the items into summative scales as shown above? Doing 9 separate models is just nuts, and the accuracy of the estimates will not be very good. *Models* (1) Now, one could use 9 univariate tests (repeated measurement GLMM, currently in SPSS20, with AR1 and random effects "subject" and "time"), and predict each of y1 to y9 by x1 to x15. But that (a) doesn't control for the fact that the response variables are correlated, and (b) invites type-I error. I did this, as a first step, however, and found that some x only predict some y, whereas some x predict all y, so this seems worth exploring further. (I might eventually have to do it this way, because multivariate response models with 9 outcomes seem to be impossible to compute.) *MP: Type 1 error would be reduced by correctly using a correction for the fact that you are running so many tests. *MP: The correlation amongst factors would be fixed if you used a sum scale as I've suggested, but the correlation isn't so high as to matter much either. *MP: To fully account for things as you seem to want to, the only thing that makes sense to me is a fully specified SEM, which will account for measurement error, correlation amongst terms, etc. (2) The second option is running multivariate models, and go for interaction effects between predictors and the multivariate response. I'm currently trying to do this in R (MCMCglmm), but it's pretty hard to set up the priors, and the interpretations are messy in a model with 15 predictors * Y(multivariate). I'd be happy about any kind of input how I could try to answer my research question. Thanks -- T -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multivariate-Multilevel-Mixed-Effects-Model-tp5677844.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by torvon
Torvon,
I understand that your posting started as a multiple imputation question, I assume, because you have missing data. In addition to that problem, you have an ordinal variable measurement model at five time points. You have both time constant and time varying covariates. You say, "Dropouts occur for people who have higher score on response variables at measurement point before dropout" but I don't think that necessarily means the data are 'not missing at random'. It could be that the data are 'missing at random'. My suggestion is to abandon spss and, if you possibly can, use Mplus. It was built for problems like this. If not mplus, then HLM or MLwin or Mx or an R routine. Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of torvon Sent: Tuesday, May 01, 2012 7:05 AM To: [hidden email] Subject: Multivariate Multilevel Mixed-Effects Model In response to my last multiple imputation request, Art encouraged me to write down my problem in detail. He also contributed a multitude of questions, most of them which I will try to answer here. *Design & sample:* - 9 response variables, ordinal (0,1,2,3), intercorrelated (between .2 and .3). They are 9 single questions from a psychological screening instrument, load onto one factor. I suspect great heterogeneity though, and want to look at these items invididually (yes, they are just single questions, and where never designed to be used independently, but that's what I will have to do, in lack of other data available). The DVs are skewed, about 40% have 0 (no problems), 30% 1, 20% 2, 10% 3 (problems nearly every day). Obviously, since they are ordinal, I cannot log transform. One could argue, however, that they could be considered continuous (0=at no point during the last 2 weeks, 1= 2 days, 2= 4 days, 3=nearly every day). - 5 measurement points - 800 subjects. No groups, so every person has 1 data point on each response variable on each measurement point. - every subject had major stress while the study ran, so overall, the response variables increase drastically (some more, some less) - 10 baseline covariates that are all interesting in terms of explaining the increase of response variables over time (e.g. personality facets, gender) - time-varying covariates (e.g. workload between this and last measurement point) - missing data: data not missing at random. Dropouts occur for people who have higher score on response variables at measurement point before dropout. It's typical for a psychological study. It occurs on all variables, also the DVs. First measurement point 3% missing on DV, 5th measurement point 40% missing on DVs. *Research Question:* - Does predictor x1 have differential effects on the outcome variables? This is exploratory. E.g. x1 could only affect y1 y4 y5 and y6, and x2 only y5, whereas x3 only affect y1-4 and y6-y8. This is unclear yet, because usually people use the sum-score of y1 - y9 and just calculate ONE (e.g.) regression from x1 to Y(total). *Models* (1) Now, one could use 9 univariate tests (repeated measurement GLMM, currently in SPSS20, with AR1 and random effects "subject" and "time"), and predict each of y1 to y9 by x1 to x15. But that (a) doesn't control for the fact that the response variables are correlated, and (b) invites type-I error. I did this, as a first step, however, and found that some x only predict some y, whereas some x predict all y, so this seems worth exploring further. (I might eventually have to do it this way, because multivariate response models with 9 outcomes seem to be impossible to compute.) (2) The second option is running multivariate models, and go for interaction effects between predictors and the multivariate response. I'm currently trying to do this in R (MCMCglmm), but it's pretty hard to set up the priors, and the interpretations are messy in a model with 15 predictors * Y(multivariate). I'd be happy about any kind of input how I could try to answer my research question. Thanks -- T -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multivariate-Multilevel-Mixed-Effects-Model-tp5677844.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by torvon
See below.
> Date: Tue, 1 May 2012 04:05:15 -0700 > From: [hidden email] > Subject: Multivariate Multilevel Mixed-Effects Model > To: [hidden email] > > In response to my last multiple imputation request, Art encouraged me to > write down my problem in detail. He also contributed a multitude of > questions, most of them which I will try to answer here. > > *Design & sample:* > > - 9 response variables, ordinal (0,1,2,3), intercorrelated (between .2 and > .3). They are 9 single questions from a psychological screening instrument, > load onto one factor. I suspect great heterogeneity though, and want to look > at these items invididually (yes, they are just single questions, and where > never designed to be used independently, but that's what I will have to do, > in lack of other data available). The DVs are skewed, about 40% have 0 (no > problems), 30% 1, 20% 2, 10% 3 (problems nearly every day). Obviously, since > they are ordinal, I cannot log transform. One could argue, however, that > they could be considered continuous (0=at no point during the last 2 weeks, > 1= 2 days, 2= 4 days, 3=nearly every day). > - 5 measurement points > - 800 subjects. No groups, so every person has 1 data point on each response > variable on each measurement point. You can argue that they are continuous, and it is probably a pretty good approximation that these do represent equal intervals. So they could be used as they are. But you have a whole lot of data, across time, and you could find a more precise scaling from the data, or find out if the intervals are to "equal" that it doesn't matter. Item Response Theory has methods for this. But I would start by (for instance) taking Item 1 as "Groups" in a Discriminant Function and look at the distance between the resulting group means when classified by the other 8 items. And so on. On the other hand, it seems to me that you are wasting a lot of time and attention on Items when you don't have an overall picture of the data, and you have more serious problems, like Censoring. - The proper regard of "differences among items" is to assume that (a) everything is basically the same, and so (b) after the overall picture is established, you *might* look to see if separate items show a *significant* deviation from the overall pattern, based on logic and meaning ... Don't try to run 100 tests, and then massage some sense out the random ones that happen to differ in p-value from each other. > - every subject had major stress while the study ran, so overall, the > response variables increase drastically (some more, some less) > - 10 baseline covariates that are all interesting in terms of explaining the > increase of response variables over time (e.g. personality facets, gender) > - time-varying covariates (e.g. workload between this and last measurement > point) > - missing data: data not missing at random. Dropouts occur for people who > have higher score on response variables at measurement point before dropout. > It's typical for a psychological study. It occurs on all variables, also the > DVs. First measurement point 3% missing on DV, 5th measurement point 40% > missing on DVs. > Missing: Is there *any* missing other than censorship from premature ending? With 800 cases, it sounds like you have enough cases that you might profitably construct a first model that is based on "number of periods available" -- Especially, you do seem to indicate that there will be differences based between these cohorts, on both the demographic and dependent variables. > > *Research Question:* > > - Does predictor x1 have differential effects on the outcome variables? This > is exploratory. E.g. x1 could only affect y1 y4 y5 and y6, and x2 only y5, > whereas x3 only affect y1-4 and y6-y8. This is unclear yet, because usually > people use the sum-score of y1 - y9 and just calculate ONE (e.g.) regression > from x1 to Y(total). Look first at the cross-time reliability of the items, separately. You may find that half the separate items look too unreliable to trust, compared to the other half. If your "unreliable items" give as many suggestive results as your "reliable items" do, that's a pretty good sign that you are finding random variations. Computing 3 subfactors from the 9 scores is another route that is potentially more useful than treating separate items. > > > *Models* > > (1) Now, one could use 9 univariate tests (repeated measurement GLMM, > currently in SPSS20, with AR1 and random effects "subject" and "time"), and > predict each of y1 to y9 by x1 to x15. But that (a) doesn't control for the > fact that the response variables are correlated, and (b) invites type-I > error. I did this, as a first step, however, and found that some x only > predict some y, whereas some x predict all y, so this seems worth exploring > further. > (I might eventually have to do it this way, because multivariate response > models with 9 outcomes seem to be impossible to compute.) > > (2) The second option is running multivariate models, and go for interaction > effects between predictors and the multivariate response. I'm currently > trying to do this in R (MCMCglmm), but it's pretty hard to set up the > priors, and the interpretations are messy in a model with 15 predictors * > Y(multivariate). > > > I'd be happy about any kind of input how I could try to answer my research > question > > Thanks. -- Rich Ulrich |
Free forum by Nabble | Edit this page |