Dear listers,
My question is rather statistical one, but the relevant SPSS syntax I have integrated into the message. The question is long and is split on several ones and I would really much appreciate if someone could explain me at least some points. I deal with longitudinal data were the level-2 observations consist of households and level-1 observations consist of several repeated measurements of the same household. In one paper relevant to my research domain it is suggested to use mixed-effects regression models to treat such kind of data (B. Gianni et al., Panel regression models for measuring multidimensional poverty dynamics, Statistical Methods and Applications, Springer-Verlag, 2003, 11: 259-369). They used the mixed-effects model of this kind: Yit=b0+b1*time + b2*x1+:+bk*xk + ui + eit, where Yit - DV for household (HH) i at time t, b0 - constant intercept, time - the polynomial that represents the effect of time, x1 : xk - time varying covariates, ui - HH-specific intercept (i.e., random intercept), eit - error term, having autocorrelation structure: eit=rho*eit-1 + di, where rho is the autocorrelation coefficient and di - totally random error. This is (I believe) mixed-effects regression model with one random effect (HH intercept) and error structure of AR1 type. As I have never used such complex models, I decided first to simulate some data and play with it to "feel" how it works. I tried to generate the similar model with 500 HH, 4 timepoints of observations (var t=0, 1, 2, 3), thus, 2000 obs. in total, but 20% are then randomly dropped to simulate sample attrition, one random intercept (var intercp), one fixed effect (var x) and autocorrelation for error rho=0.5. The syntax for data generation is follows. Here is my first question: ******** 1. DOES IT GET WHAT I WHANT? ********. INPUT PROGRAM. /* set random generator seed to have reproducible result of generation */ SET SEED = 15082006. /* 500 individuals, or level-2 observations */ LOOP id=1 to 500. /* generate random intercept (individual difference) for each individual */ COMPUTE intercp=RV.NORMAL(20,3). /* 4 timepoints, or 500*4 = 2000 level-1 observations */ LOOP t=0 to 3. LEAVE id intercp. /* just time-varying covariate, will be used as fixed-effect */ COMPUTE x=RND(RV.UNIFORM(0,1)). * Observation at time 0 just normally distributed with standard*/ /* deviation counted as SQRT(Var(ei)/(1-rho**2)). */ /* This special assumption for time 0 observation I've got from G.Betti at al., 2002 */ /* thougth, I think this is not of great importance for my questions */ IF (t=0) model=4*x+intercp. IF (t=0) y=RV.NORM(model,SQRT(25/(1-0.25))). /* Following observations accounts for the model error from */ /* previous timepoint with rho=0.5... */ IF (t=1) model=3*t+intercp+4*x+0.5*(lag(y)-lag(model))./ * ...plus independed error */ IF (t=1) y=model+RV.NORM(0,5). IF (t=2) model=3*t+intercp+4*x+0.5*(lag(y)-lag(model)). IF (t=2) y=model+RV.NORM(0,5). IF (t=3) model=3*t+intercp+4*x+0.5*(lag(y)-lag(model)). IF (t=3) y=model+RV.NORM(0,5). COMPUTE attrition=RV.UNIFORM(0,1). END CASE. END LOOP. END LOOP. END FILE. END INPUT PROGRAM. /* Delete approx. 20% of the level 1 observations */ SELECT IF (attrition<0.8). **************************************************. Well, now I switch to analysis. First, I would like to fit the model without correlated error; use MIXED procedure. *****************. /* 1. First, fit the model Yit = b0 + b1*t + b2*x + b0i + ei, */ /* where b0i is a random intercept (HH specific) */ /* NO ERROR AUTOCORRELATION INTRUDUCED */ MIXED y WITH t x /CRITERIA = SCORING(3) /FIXED = intercept t x /RANDOM = intercept |SUBJECT(id) /METHOD = ML /PRINT = SOLUTION COVB G R /SAVE PRED. *****************. All effects are significant. As a fit statistic for the model SPSS returns several Log-Likelihood information criteria. Sometimes it could be used to compare more complex model with simpler ones. My second question is: ************* 2. But how one can check the explanatory power of model, how well model fits the data? I mean some metric similar to R-squared in multiple regression. The R-squared here is not computed (and I guess, could not be computed). But if I save the predicted values and than check the correlation with original DV and than square the coefficient, in multiple regression I get the R-squared. May I do the same in my case? Could this "R-squared" be used as some vague metric of the model fit? CORRELATIONS /VARIABLES=y PRED_1. ************************************************. Continue with the analysis. If we fit the same model in the freeware MIXREG (by D. Hedeker, R. Gibbons, http://tigger.uic.edu/~hedeker/mix.html) it will count the intra-cluster correlation, ICC as follows: ICC = Var(random intercept) / (Var(random intercept) + Var(residual)). That is, ICC in this case seems to be a proportion of variance, accounted for by random intercept out of all variance unaccounted for by all fixed effects. In SPSS MIXED the ICC is not computed but all necessary variances are estimated. Third question: ************************ 3. Is it correct to interpret this ICC coefficient as a metric of importance of permanent unobserved heterogeneity between households. (The unobserved heterogeneity itself is estimated for each household as a random intercept) **********. Finally. Let's introduce the AR1-structure for errors. The syntax is: /* 2. Now, split the individual error term onto autocorrelated component */ /* and pure independent error, as envisaged by data simulation. */ /* Thus, fit the model Yit = b0 + b1*t + b2*x + b0i + rho*eit-1 + eit */ /* where rho is the autocorrelation coefficient */ /* ERROR AR1 STRUCTURE INTRUDUCED */ MIXED y WITH t x /CRITERIA = SCORING(3) /FIXED = intercept t x /RANDOM = intercept |SUBJECT(id) /REPEATED = t |SUBJECT(id) COVTYPE(AR1) /METHOD = ML /PRINT = SOLUTION COVB G R /SAVE PRED. Happily, all effects are significant and the estimates resemble the simulated coefficients (rho is estimated as 0.476 vs. 0.5 simulated). The decrease in Log-Likelihood from the previous model is significant by the Chi-square (if only model with and without AR1 error structures are comparable in this way). But! If we check correlations between DV and predictions from 1 and 2 model, the correlation with first prediction is about 0.1 higher than with second! CORRELATIONS /VARIABLES=y PRED_1 PRED_2. Thus, here is my forth (and the last) question: ***********4. Why?? I am looking for the answer from the practical point of view. I expect from the more complex model which fits data better the better prediction of at least the sample data! But it seems not the case. *****************. Again, I will be totally thankful for someone who will find some time to understand the problem and provide some (preferably, not overloaded with mathematics) answer. All the very best, Anton |
All,
This is branching off type follow-up on my earlier question about how to get GLM to plot what I think the documentation says it should be able to do. I taken a different tack. Predicted values can be saved from either GLM or the analysis can be recast as a regression and solved using the regression procedure and predictec values saved from there. However, how do I get a plot that shows the within group regression line for each group. Background: My model has a single between factor and one covariate. Thus what I want to see is a plot of covariate by DV for each value of my 'by' variable and all on a single plot so that I can see where lines cross. I think I have run through the options on spss but maybe not. Graph will do scatterplots but there is no provision for a by variable and no provision for showing a regression line. Igraph is basically the same but although a by variable can be specified, it produces panels. Useless to me. Thanks, Gene Maguin |
I do that a lot
Once you have saved the predicted values Main tab Graphs>Interactive>Scatter Y-axis = predicted X-axis = covariate Colour [or line if its for publication & needs to be b&w]= factor variable Options tab top: Choose regression as function bottom: tick separate lines; untick group¹ NB these can be changed afterwards by double clicking on the grpah produced 100% agree with you that its ridiculous that you can¹t get this in GLM direct Best Diana On 15/8/06 20:00, "Gene Maguin" <[hidden email]> wrote: > All, > > This is branching off type follow-up on my earlier question about how to get > GLM to plot what I think the documentation says it should be able to do. I > taken a different tack. > > Predicted values can be saved from either GLM or the analysis can be recast > as a regression and solved using the regression procedure and predictec > values saved from there. However, how do I get a plot that shows the within > group regression line for each group. Background: My model has a single > between factor and one covariate. Thus what I want to see is a plot of > covariate by DV for each value of my 'by' variable and all on a single plot > so that I can see where lines cross. I think I have run through the options > on spss but maybe not. Graph will do scatterplots but there is no provision > for a by variable and no provision for showing a regression line. Igraph is > basically the same but although a by variable can be specified, it produces > panels. Useless to me. > > Thanks, Gene Maguin > Professor Diana Kornbrot Evaluation Co-ordinator, Blended Learning Unit University of Hertfordshire College Lane, Hatfield, Hertfordshire AL10 9AB, UK Blended Learning Unit voice +44 (0) 170 728 1315 fax +44 (0) 170 728 1320 Psychology voice +44 (0) 170 728 4626 fax +44 (0) 170 728 5073 email: [hidden email] http://www.psy.herts.ac.uk/pub/D.E.Kornbrot/hmpage.html Home 19 Elmhurst Avenue London N2 0LT, UK +44 (0) 208 883 3657 |
Free forum by Nabble | Edit this page |