SPSSX Discussion

mixed-effects model for longitudinal data interpretation

Classic

List

Threaded

3 messages Options

Anton Balabanov

mixed-effects model for longitudinal data interpretation

Dear listers,

My question is rather statistical one, but the relevant SPSS syntax I have
integrated into the message. The question is long and is split on several
ones and I would really much appreciate if someone could explain me at least
some points.

I deal with longitudinal data were the level-2 observations consist of
households and level-1 observations consist of several repeated measurements
of the same household. In one paper relevant to my research domain it is
suggested to use mixed-effects regression models to treat such kind of data
(B. Gianni et al., Panel regression models for measuring multidimensional
poverty dynamics, Statistical Methods and Applications, Springer-Verlag,
2003, 11: 259-369).

They used the mixed-effects model of this kind:
Yit=b0+b1*time + b2*x1+:+bk*xk + ui + eit, where
Yit - DV for household (HH) i at time t,
b0 - constant intercept,
time - the polynomial that represents the effect of time,
x1 : xk - time varying covariates,
ui - HH-specific intercept (i.e., random intercept),
eit - error term, having autocorrelation structure: eit=rho*eit-1 + di,
where rho is the autocorrelation coefficient and di - totally random error.

This is (I believe) mixed-effects regression model with one random effect
(HH intercept) and error structure of AR1 type.

As I have never used such complex models, I decided first to simulate some
data and play with it to "feel" how it works. I tried to generate the
similar model with 500 HH, 4 timepoints of observations (var t=0, 1, 2, 3),
thus, 2000 obs. in total, but 20% are then randomly dropped to simulate
sample attrition, one random intercept (var intercp), one fixed effect (var
x) and autocorrelation for error rho=0.5. The syntax for data generation is
follows.
Here is my first question:

******** 1. DOES IT GET WHAT I WHANT? ********.

INPUT PROGRAM.
/* set random generator seed to have reproducible result of generation */
SET SEED = 15082006.
/* 500 individuals, or level-2 observations */

LOOP id=1 to 500.

/* generate random intercept (individual difference) for each individual */
COMPUTE intercp=RV.NORMAL(20,3).

/* 4 timepoints, or 500*4 = 2000 level-1 observations */
LOOP t=0 to 3.
LEAVE id intercp.

/* just time-varying covariate, will be used as fixed-effect */
COMPUTE x=RND(RV.UNIFORM(0,1)).

* Observation at time 0 just normally distributed with standard*/
/* deviation counted as SQRT(Var(ei)/(1-rho**2)). */
/* This special assumption for time 0 observation I've got from G.Betti at
al., 2002 */
/* thougth, I think this is not of great importance for my questions */
IF (t=0) model=4*x+intercp.
IF (t=0) y=RV.NORM(model,SQRT(25/(1-0.25))).

/* Following observations accounts for the model error from */
/* previous timepoint with rho=0.5... */
IF (t=1) model=3*t+intercp+4*x+0.5*(lag(y)-lag(model))./
* ...plus independed error */
IF (t=1) y=model+RV.NORM(0,5).
IF (t=2) model=3*t+intercp+4*x+0.5*(lag(y)-lag(model)).
IF (t=2) y=model+RV.NORM(0,5).
IF (t=3) model=3*t+intercp+4*x+0.5*(lag(y)-lag(model)).
IF (t=3) y=model+RV.NORM(0,5).

COMPUTE attrition=RV.UNIFORM(0,1).
END CASE.
END LOOP.
END LOOP.
END FILE.
END INPUT PROGRAM.
/* Delete approx. 20% of the level 1 observations */
SELECT IF (attrition<0.8).

**************************************************.

Well, now I switch to analysis. First, I would like to fit the model without
correlated error; use MIXED procedure.

*****************.
/* 1. First, fit the model Yit = b0 + b1*t + b2*x + b0i + ei, */
/* where b0i is a random intercept (HH specific) */
/* NO ERROR AUTOCORRELATION INTRUDUCED */
MIXED
y WITH t x
/CRITERIA = SCORING(3)
/FIXED = intercept t x
/RANDOM = intercept |SUBJECT(id)
/METHOD = ML
/PRINT = SOLUTION COVB G R
/SAVE PRED.
*****************.
All effects are significant. As a fit statistic for the model SPSS returns
several Log-Likelihood information criteria. Sometimes it could be used to
compare more complex model with simpler ones. My second question is:

************* 2. But how one can check the explanatory power of model, how
well model fits the data? I mean some metric similar to R-squared in
multiple regression. The R-squared here is not computed (and I guess, could
not be computed). But if I save the predicted values and than check the
correlation with original DV and than square the coefficient, in multiple
regression I get the R-squared. May I do the same in my case? Could this
"R-squared" be used as some vague metric of the model fit?

CORRELATIONS /VARIABLES=y PRED_1.
************************************************.

Continue with the analysis. If we fit the same model in the freeware MIXREG
(by D. Hedeker, R. Gibbons, http://tigger.uic.edu/~hedeker/mix.html) it will
count the intra-cluster correlation, ICC as follows:

ICC = Var(random intercept) / (Var(random intercept) + Var(residual)). That
is, ICC in this case seems to be a proportion of variance, accounted for by
random intercept out of all variance unaccounted for by all fixed effects.
In SPSS MIXED the ICC is not computed but all necessary variances are
estimated. Third question:

************************ 3. Is it correct to interpret this ICC coefficient
as a metric of importance of permanent unobserved heterogeneity between
households. (The unobserved heterogeneity itself is estimated for each
household as a random intercept) **********.

Finally. Let's introduce the AR1-structure for errors. The syntax is:

/* 2. Now, split the individual error term onto autocorrelated component
*/
/* and pure independent error, as envisaged by data simulation.
*/
/* Thus, fit the model Yit = b0 + b1*t + b2*x + b0i + rho*eit-1 + eit */
/* where rho is the autocorrelation coefficient
*/
/* ERROR AR1 STRUCTURE INTRUDUCED
*/
MIXED
y WITH t x
/CRITERIA = SCORING(3)
/FIXED = intercept t x
/RANDOM = intercept |SUBJECT(id)
/REPEATED = t |SUBJECT(id) COVTYPE(AR1)
/METHOD = ML
/PRINT = SOLUTION COVB G R
/SAVE PRED.
Happily, all effects are significant and the estimates resemble the
simulated coefficients (rho is estimated as 0.476 vs. 0.5 simulated). The
decrease in Log-Likelihood from the previous model is significant by the
Chi-square (if only model with and without AR1 error structures are
comparable in this way). But! If we check correlations between DV and
predictions from 1 and 2 model, the correlation with first prediction is
about 0.1 higher than with second! CORRELATIONS /VARIABLES=y PRED_1 PRED_2.

Thus, here is my forth (and the last) question:

***********4. Why?? I am looking for the answer from the practical point of
view. I expect from the more complex model which fits data better the better
prediction of at least the sample data! But it seems not the case.
*****************.

Again, I will be totally thankful for someone who will find some time to
understand the problem and provide some (preferably, not overloaded with
mathematics) answer.

All the very best,

Anton

Maguin, Eugene

Plotting

All,

This is branching off type follow-up on my earlier question about how to get
GLM to plot what I think the documentation says it should be able to do. I
taken a different tack.

Predicted values can be saved from either GLM or the analysis can be recast
as a regression and solved using the regression procedure and predictec
values saved from there. However, how do I get a plot that shows the within
group regression line for each group. Background: My model has a single
between factor and one covariate. Thus what I want to see is a plot of
covariate by DV for each value of my 'by' variable and all on a single plot
so that I can see where lines cross. I think I have run through the options
on spss but maybe not. Graph will do scatterplots but there is no provision
for a by variable and no provision for showing a regression line. Igraph is
basically the same but although a by variable can be specified, it produces
panels. Useless to me.

Thanks, Gene Maguin

Kornbrot, Diana

Re: Plotting

I do that a lot
Once you have saved the predicted values
Main tab
Graphs>Interactive>Scatter
Y-axis = predicted
X-axis = covariate
Colour [or line if its for publication & needs to be b&w]= factor variable
Options tab
top: Choose regression as function
bottom: tick separate lines; untick group¹
NB these can be changed afterwards by double clicking on the grpah produced

100% agree with you that its ridiculous that you can¹t get this in GLM
direct

Best
Diana

On 15/8/06 20:00, "Gene Maguin" <[hidden email]> wrote:

> All,
>
> This is branching off type follow-up on my earlier question about how to get
> GLM to plot what I think the documentation says it should be able to do. I
> taken a different tack.
>
> Predicted values can be saved from either GLM or the analysis can be recast
> as a regression and solved using the regression procedure and predictec
> values saved from there. However, how do I get a plot that shows the within
> group regression line for each group. Background: My model has a single
> between factor and one covariate. Thus what I want to see is a plot of
> covariate by DV for each value of my 'by' variable and all on a single plot
> so that I can see where lines cross. I think I have run through the options
> on spss but maybe not. Graph will do scatterplots but there is no provision
> for a by variable and no provision for showing a regression line. Igraph is
> basically the same but although a by variable can be specified, it produces
> panels. Useless to me.
>
> Thanks, Gene Maguin
>

Professor Diana Kornbrot
Evaluation Co-ordinator, Blended Learning Unit
University of Hertfordshire
College Lane, Hatfield, Hertfordshire AL10 9AB, UK
Blended Learning Unit
voice +44 (0) 170 728 1315
fax +44 (0) 170 728 1320
Psychology
voice +44 (0) 170 728 4626
fax +44 (0) 170 728 5073
email: [hidden email]
http://www.psy.herts.ac.uk/pub/D.E.Kornbrot/hmpage.html

Home
19 Elmhurst Avenue
London N2 0LT, UK
+44 (0) 208 883 3657