SPSSX Discussion

Multivariate Multilevel Mixed-Effects Model

Classic

List

Threaded

4 messages Options

torvon

Multivariate Multilevel Mixed-Effects Model

In response to my last multiple imputation request, Art encouraged me to write down my problem in detail. He also contributed a multitude of questions, most of them which I will try to answer here.

Design & sample:

- 9 response variables, ordinal (0,1,2,3), intercorrelated (between .2 and .3). They are 9 single questions from a psychological screening instrument, load onto one factor. I suspect great heterogeneity though, and want to look at these items invididually (yes, they are just single questions, and where never designed to be used independently, but that's what I will have to do, in lack of other data available). The DVs are skewed, about 40% have 0 (no problems), 30% 1, 20% 2, 10% 3 (problems nearly every day). Obviously, since they are ordinal, I cannot log transform. One could argue, however, that they could be considered continuous (0=at no point during the last 2 weeks, 1= 2 days, 2= 4 days, 3=nearly every day).
- 5 measurement points
- 800 subjects. No groups, so every person has 1 data point on each response variable on each measurement point.
- every subject had major stress while the study ran, so overall, the response variables increase drastically (some more, some less)
- 10 baseline covariates that are all interesting in terms of explaining the increase of response variables over time (e.g. personality facets, gender)
- time-varying covariates (e.g. workload between this and last measurement point)
- missing data: data not missing at random. Dropouts occur for people who have higher score on response variables at measurement point before dropout. It's typical for a psychological study. It occurs on all variables, also the DVs. First measurement point 3% missing on DV, 5th measurement point 40% missing on DVs.

Research Question:

- Does predictor x1 have differential effects on the outcome variables? This is exploratory. E.g. x1 could only affect y1 y4 y5 and y6, and x2 only y5, whereas x3 only affect y1-4 and y6-y8. This is unclear yet, because usually people use the sum-score of y1 - y9 and just calculate ONE (e.g.) regression from x1 to Y(total).

Models

(1) Now, one could use 9 univariate tests (repeated measurement GLMM, currently in SPSS20, with AR1 and random effects "subject" and "time"), and predict each of y1 to y9 by x1 to x15. But that (a) doesn't control for the fact that the response variables are correlated, and (b) invites type-I error. I did this, as a first step, however, and found that some x only predict some y, whereas some x predict all y, so this seems worth exploring further.
(I might eventually have to do it this way, because multivariate response models with 9 outcomes seem to be impossible to compute.)

(2) The second option is running multivariate models, and go for interaction effects between predictors and the multivariate response. I'm currently trying to do this in R (MCMCglmm), but it's pretty hard to set up the priors, and the interpretations are messy in a model with 15 predictors * Y(multivariate).

I'd be happy about any kind of input how I could try to answer my research question.

Thanks

-- T

Poes, Matthew Joseph

Re: Multivariate Multilevel Mixed-Effects Model

Below is my Knee Jerk Responses to these issues:

Matthew J Poes
Research Data Specialist
Center for Prevention Research and Development
University of Illinois
510 Devonshire Dr.
Champaign, IL 61820
Phone: 217-265-4576
email: [hidden email]

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of torvon
Sent: Tuesday, May 01, 2012 6:05 AM
To: [hidden email]
Subject: Multivariate Multilevel Mixed-Effects Model

In response to my last multiple imputation request, Art encouraged me to write down my problem in detail. He also contributed a multitude of questions, most of them which I will try to answer here.

*Design & sample:*

- 9 response variables, ordinal (0,1,2,3), intercorrelated (between .2 and .3). They are 9 single questions from a psychological screening instrument, load onto one factor. I suspect great heterogeneity though, and want to look at these items invididually (yes, they are just single questions, and where never designed to be used independently, but that's what I will have to do, in lack of other data available). The DVs are skewed, about 40% have 0 (no problems), 30% 1, 20% 2, 10% 3 (problems nearly every day). Obviously, since they are ordinal, I cannot log transform. One could argue, however, that they could be considered continuous (0=at no point during the last 2 weeks, 1= 2 days, 2= 4 days, 3=nearly every day).
*MP: You could use these as linear continuous variables, but it seems like this is not necessarily a tenable assumption. You can also treat them as a set of dummy coded categorical variables, and this may introduce less biased estimates.
*MP: When you say response variable, you mean this is your DV? If so, treating them as separate means running 9 separate analysis, this is a bit crazy. On top of that, error in single item response questions used as DV's is going to be high and create problems for the analysis. I would recommend not doing this personally. I think SEM would be a better approach if this is what you want to do.
- 5 measurement points
- 800 subjects. No groups, so every person has 1 data point on each response variable on each measurement point.
- every subject had major stress while the study ran, so overall, the response variables increase drastically (some more, some less)
- 10 baseline covariates that are all interesting in terms of explaining the increase of response variables over time (e.g. personality facets, gender)
*MP: Are these covariates important explanatory variables, or true covariates? If they are important explanatory variables, you may want to develop a set of hypothesis on how each is expected to impact the outcome, and what this function would look like. Besides being important for interpretation, this may help reduce the modeling complexity some.
*MP: Are any of these expected to interact with any others, or with any other factors in the model?
- time-varying covariates (e.g. workload between this and last measurement
point)
*MP: That's fine, just make sure you put them in the right location in your model.
- missing data: data not missing at random. Dropouts occur for people who have higher score on response variables at measurement point before dropout.
It's typical for a psychological study. It occurs on all variables, also the DVs. First measurement point 3% missing on DV, 5th measurement point 40% missing on DVs.
*MP: You have a bunch of issues to work on here. First, NMAR means you need to add to your model the predictive reason for missing data. In this case, stress on the previous score. I might consider a probability score for each time point that the next time point will be missing. I'd also consider adding a dummy variable for those missing variables. I've seen various large scale models of psychological distress with large amounts of missing data in the end time points make it to publication, in fact, this was a specialty for a past instructor of mine. HLM itself will handle this fine, but the accuracy and generalizability of the estimates in the time points with more missing data decreases.
*MP: I'd also look into some of the other approaches to NMAR data for this. I'm not an expert, but I understand pattern-mixture modeling to be a common good approach.
*MP: Little suggests the use of Last observation carried forward as a pattern mixture model. This would seem like the best approach to me, though I'm really not an expert on this.

*Research Question:*

- Does predictor x1 have differential effects on the outcome variables? This is exploratory. E.g. x1 could only affect y1 y4 y5 and y6, and x2 only y5, whereas x3 only affect y1-4 and y6-y8. This is unclear yet, because usually people use the sum-score of y1 - y9 and just calculate ONE (e.g.) regression from x1 to Y(total).
*MP: And they probably do this for a reason. Like I said, there is an implicit idea to summative scales that they have reduced error in measuring the construct that the set of item's collectively measures, which no single item can measure accurately.
*MP: Have you considered a factor analysis of items to see if they can load as you intend? Even if they don't, what about combining the items into summative scales as shown above? Doing 9 separate models is just nuts, and the accuracy of the estimates will not be very good.

*Models*

(1) Now, one could use 9 univariate tests (repeated measurement GLMM, currently in SPSS20, with AR1 and random effects "subject" and "time"), and predict each of y1 to y9 by x1 to x15. But that (a) doesn't control for the fact that the response variables are correlated, and (b) invites type-I error. I did this, as a first step, however, and found that some x only predict some y, whereas some x predict all y, so this seems worth exploring further.
(I might eventually have to do it this way, because multivariate response models with 9 outcomes seem to be impossible to compute.)
*MP: Type 1 error would be reduced by correctly using a correction for the fact that you are running so many tests.
*MP: The correlation amongst factors would be fixed if you used a sum scale as I've suggested, but the correlation isn't so high as to matter much either.
*MP: To fully account for things as you seem to want to, the only thing that makes sense to me is a fully specified SEM, which will account for measurement error, correlation amongst terms, etc.
(2) The second option is running multivariate models, and go for interaction effects between predictors and the multivariate response. I'm currently trying to do this in R (MCMCglmm), but it's pretty hard to set up the priors, and the interpretations are messy in a model with 15 predictors * Y(multivariate).

I'd be happy about any kind of input how I could try to answer my research question.

Thanks

-- T

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multivariate-Multilevel-Mixed-Effects-Model-tp5677844.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Maguin, Eugene

Re: Multivariate Multilevel Mixed-Effects Model

In reply to this post by torvon

Torvon,

I understand that your posting started as a multiple imputation question, I assume, because you have missing data. In addition to that problem, you have an ordinal variable measurement model at five time points. You have both time constant and time varying covariates. You say, "Dropouts occur for people who have higher score on response variables at measurement point before dropout" but I don't think that necessarily means the data are 'not missing at random'. It could be that the data are 'missing at random'. My suggestion is to abandon spss and, if you possibly can, use Mplus. It was built for problems like this. If not mplus, then HLM or MLwin or Mx or an R routine.

Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of torvon
Sent: Tuesday, May 01, 2012 7:05 AM
To: [hidden email]
Subject: Multivariate Multilevel Mixed-Effects Model

In response to my last multiple imputation request, Art encouraged me to
write down my problem in detail. He also contributed a multitude of
questions, most of them which I will try to answer here.

*Design & sample:*

- 9 response variables, ordinal (0,1,2,3), intercorrelated (between .2 and
.3). They are 9 single questions from a psychological screening instrument,
load onto one factor. I suspect great heterogeneity though, and want to look
at these items invididually (yes, they are just single questions, and where
never designed to be used independently, but that's what I will have to do,
in lack of other data available). The DVs are skewed, about 40% have 0 (no
problems), 30% 1, 20% 2, 10% 3 (problems nearly every day). Obviously, since
they are ordinal, I cannot log transform. One could argue, however, that
they could be considered continuous (0=at no point during the last 2 weeks,
1= 2 days, 2= 4 days, 3=nearly every day).
- 5 measurement points
- 800 subjects. No groups, so every person has 1 data point on each response
variable on each measurement point.
- every subject had major stress while the study ran, so overall, the
response variables increase drastically (some more, some less)
- 10 baseline covariates that are all interesting in terms of explaining the
increase of response variables over time (e.g. personality facets, gender)
- time-varying covariates (e.g. workload between this and last measurement
point)
- missing data: data not missing at random. Dropouts occur for people who
have higher score on response variables at measurement point before dropout.
It's typical for a psychological study. It occurs on all variables, also the
DVs. First measurement point 3% missing on DV, 5th measurement point 40%
missing on DVs.

*Research Question:*

- Does predictor x1 have differential effects on the outcome variables? This
is exploratory. E.g. x1 could only affect y1 y4 y5 and y6, and x2 only y5,
whereas x3 only affect y1-4 and y6-y8. This is unclear yet, because usually
people use the sum-score of y1 - y9 and just calculate ONE (e.g.) regression
from x1 to Y(total).

*Models*

(1) Now, one could use 9 univariate tests (repeated measurement GLMM,
currently in SPSS20, with AR1 and random effects "subject" and "time"), and
predict each of y1 to y9 by x1 to x15. But that (a) doesn't control for the
fact that the response variables are correlated, and (b) invites type-I
error. I did this, as a first step, however, and found that some x only
predict some y, whereas some x predict all y, so this seems worth exploring
further.
(I might eventually have to do it this way, because multivariate response
models with 9 outcomes seem to be impossible to compute.)

(2) The second option is running multivariate models, and go for interaction
effects between predictors and the multivariate response. I'm currently
trying to do this in R (MCMCglmm), but it's pretty hard to set up the
priors, and the interpretations are messy in a model with 15 predictors *
Y(multivariate).

I'd be happy about any kind of input how I could try to answer my research
question.

Thanks

-- T

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multivariate-Multilevel-Mixed-Effects-Model-tp5677844.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Rich Ulrich

Re: Multivariate Multilevel Mixed-Effects Model

In reply to this post by torvon

See below.

> Date: Tue, 1 May 2012 04:05:15 -0700

> From: [hidden email]
> Subject: Multivariate Multilevel Mixed-Effects Model
> To: [hidden email]
>
> In response to my last multiple imputation request, Art encouraged me to
> write down my problem in detail. He also contributed a multitude of
> questions, most of them which I will try to answer here.
>
> *Design & sample:*
>
> - 9 response variables, ordinal (0,1,2,3), intercorrelated (between .2 and
> .3). They are 9 single questions from a psychological screening instrument,
> load onto one factor. I suspect great heterogeneity though, and want to look
> at these items invididually (yes, they are just single questions, and where
> never designed to be used independently, but that's what I will have to do,
> in lack of other data available). The DVs are skewed, about 40% have 0 (no
> problems), 30% 1, 20% 2, 10% 3 (problems nearly every day). Obviously, since
> they are ordinal, I cannot log transform. One could argue, however, that
> they could be considered continuous (0=at no point during the last 2 weeks,
> 1= 2 days, 2= 4 days, 3=nearly every day).
> - 5 measurement points
> - 800 subjects. No groups, so every person has 1 data point on each response
> variable on each measurement point.

You can argue that they are continuous, and it is probably
a pretty good approximation that these do represent equal
intervals. So they could be used as they are. But you have
a whole lot of data, across time, and you could find a more
precise scaling from the data, or find out if the intervals are
to "equal" that it doesn't matter. Item Response Theory has
methods for this. But I would start by (for instance) taking
Item 1 as "Groups" in a Discriminant Function and look at the
distance between the resulting group means when classified
by the other 8 items. And so on.

On the other hand, it seems to me that you are wasting a lot
of time and attention on Items when you don't have an overall
picture of the data, and you have more serious problems, like
Censoring. - The proper regard of "differences among items"
is to assume that (a) everything is basically the same, and so
(b) after the overall picture is established, you *might* look
to see if separate items show a *significant* deviation from
the overall pattern, based on logic and meaning ... Don't try
to run 100 tests, and then massage some sense out the random
ones that happen to differ in p-value from each other.

> - every subject had major stress while the study ran, so overall, the
> response variables increase drastically (some more, some less)
> - 10 baseline covariates that are all interesting in terms of explaining the
> increase of response variables over time (e.g. personality facets, gender)
> - time-varying covariates (e.g. workload between this and last measurement
> point)
> - missing data: data not missing at random. Dropouts occur for people who
> have higher score on response variables at measurement point before dropout.
> It's typical for a psychological study. It occurs on all variables, also the
> DVs. First measurement point 3% missing on DV, 5th measurement point 40%
> missing on DVs.
>

Missing: Is there *any* missing other than censorship from
premature ending?

With 800 cases, it sounds like you have enough cases that you
might profitably construct a first model that is based on "number
of periods available" -- Especially, you do seem to indicate that
there will be differences based between these cohorts, on both
the demographic and dependent variables.

>
> *Research Question:*
>
> - Does predictor x1 have differential effects on the outcome variables? This
> is exploratory. E.g. x1 could only affect y1 y4 y5 and y6, and x2 only y5,
> whereas x3 only affect y1-4 and y6-y8. This is unclear yet, because usually
> people use the sum-score of y1 - y9 and just calculate ONE (e.g.) regression
> from x1 to Y(total).

Look first at the cross-time reliability of the items, separately.
You may find that half the separate items look too unreliable to
trust, compared to the other half. If your "unreliable items" give
as many suggestive results as your "reliable items" do, that's a
pretty good sign that you are finding random variations.

Computing 3 subfactors from the 9 scores is another route that
is potentially more useful than treating separate items.

>
>
> *Models*
>
> (1) Now, one could use 9 univariate tests (repeated measurement GLMM,
> currently in SPSS20, with AR1 and random effects "subject" and "time"), and
> predict each of y1 to y9 by x1 to x15. But that (a) doesn't control for the
> fact that the response variables are correlated, and (b) invites type-I
> error. I did this, as a first step, however, and found that some x only
> predict some y, whereas some x predict all y, so this seems worth exploring
> further.
> (I might eventually have to do it this way, because multivariate response
> models with 9 outcomes seem to be impossible to compute.)
>
> (2) The second option is running multivariate models, and go for interaction
> effects between predictors and the multivariate response. I'm currently
> trying to do this in R (MCMCglmm), but it's pretty hard to set up the
> priors, and the interpretations are messy in a model with 15 predictors *
> Y(multivariate).
>
>
> I'd be happy about any kind of input how I could try to answer my research
> question
>
> Thanks.

...

--
Rich Ulrich