Re: Missing values in MIXED
Posted by
Kornbrot, Diana on
Mar 17, 2013; 8:43am
URL: http://spssx-discussion.165.s1.nabble.com/Missing-values-in-MIXED-tp5718714p5718748.html
Re: Missing values in MIXED
Ryan
Thanks
Done all that. Converting horizontal to vertical is straightforward using the data structuring wizard [don’t need syntax], once one gets the hang of it
My ACTUAL question was:
MIXED with data in long form can cope with missing data, with correction for denominator df
GLM REPEATED insists on NO missing data
So what is the difference?
With the help of Bruce Weaver, I have NOW worked out that the difference lies in the covariance matrix used for estimation of parameters
REPEATED applies list wise deletion and so discards any subjects that do not have values for all variables,
MIXED applies pair wise deletion. Suspect the reduced df is harmonic mean of df for relevant groups, but do not know
Bruce provides following useful refs that suggest that using MIXED may actually be less biased than any of a whole slew of complicated imputation procedures:
Twisk & de Vente (2002): http://europepmc.org/abstract/MED/11927199
Twisk (2003):
http://books.google.ca/books?hl=en&lr=&id=TCg02e-tI_cC&oi=fnd&pg=PR15&dq=Twisk+2003&ots=2GfodRIiu9&sig=z8BSBQoRaZNavIzj_QOeATBP_nw#v=onepage&q=Twisk%202003&f=false
Singer & Willett (/Applied Longitudinal Data Analysis/, Chapter 5).
I NOW recommend MIXED with UNSTRUCTURED covariance matrix across the board. No doubt it will take time to ‘filter down’ to all users
Output much simpler as all inferential tests in 1 table
Can do appropriate post hoc or planned comparisons with standard errors correctly estimated from unstructured covariance matrix.
MIXED has limitation of not supplying effect sizes.
Jason Becksted points out that on can calculate partial eta squared = F*df1/(F*df1+df2), where df1 is the hypothesis df and df2 is the error df.
REPEATED, no doubt ground breaking in its time [distant past], is fiddly & potentially misleading. Although the multivariate option uses correct unstructured covariance matrix, the post hocs use SEs based on inappropriate diagnonal covraince matrix, with GG corrections. Personally, have never seen a covariance matrix with all pair wise covariances equal – seems improbable in the real world.
Best
Diana
On 16/03/2013 16:49, "R B" <ryan.andrew.black@...> wrote:
Diana,
In order to employ a linear mixed model in SPSS, one must construct the dataset in vertical format, such that there are "k" cases per subject with an identification variable with non-repeating numbers for cases associated with a particular subject. Assuming the within-subjects variable is either nominal, ordinal, or is composed of equally-spaced intervals, it is common practice for the within-subjects variable to be a numeric integer variable with sequential values from 1 through "k" levels of the within-subjects variable. Finally, the response variable must be concatenated vertically with each measurement linked to the appropriate ID and level of the within-subject variable.
Here is an illustration:
ID Time y
1 1 34
1 2 22
1 3 12
1 4 11
2 1 33
2 2 32
2 3 .
2 4 22
3 1 38
3 2 37
3 3 34
3 4 30
.
.
.
.
As you can see above, the second subject was not measured at time 3. As a result, that case will be excluded from the linear mixed model analysis. However, data obtained from other times points for that particular subject will be included in the analysis. The assumption we must make in order to obtain unbiased estimates derived from a linear mixed model is that the data are missing randomly. With that said, the MIXED procedure in SPSS calculates degrees of freedom using Satterthwaite's Approximation:
http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.help%2Falg_mixed_custom-tests_satterthwaite.htm
This approximation has been shown to be valid for balanced and unbalanced designs.
In addition to the benefits of not having to exclude all data from subjects who happen to have data which are missing randomly for parameter estimation, the MIXED procedure allows for modeling of continuous response variables using various hierarchical designs and residual covariance structures.
Ryan
On Fri, Mar 15, 2013 at 11:46 AM, Kornbrot, Diana <d.e.kornbrot@...> wrote:
If one uses repeated in procedure GLM then it appears that all subjects must have vlaues for all combinations of the rpeated measures
BUT using MIXED, there is then a non-integer error df
How is SPSS actually handling the missing values?
Nb Am using unstructured covariance matrix
Thanks for help
Best
Diana
Emeritus Professor Diana Kornbrot
email: d.e.kornbrot@... <http://d.e.kornbrot@...>
web: http://dianakornbrot.wordpress.com/
Work
Department of Psychology
School of Life and Medical Sciences
University of Hertfordshire
College Lane, Hatfield, Hertfordshire AL10 9AB, UK
voice: +44 (0) 170 728 4626 <tel:%2B44%20%280%29%20170%20728%204626>
Home
19 Elmhurst Avenue
London N2 0LT, UK
voice: +44 (0) 208 444 2081 <tel:%2B44%20%280%29%20208%20%C2%A0444%202081>
mobile: +44 (0) 740 318 1612 <tel:%2B44%20%280%29%20740%20318%201612>
Emeritus Professor Diana Kornbrot
email: d.e.kornbrot@...
web: http://dianakornbrot.wordpress.com/
Work
Department of Psychology
School of Life and Medical Sciences
University of Hertfordshire
College Lane, Hatfield, Hertfordshire AL10 9AB, UK
voice: +44 (0) 170 728 4626
Home
19 Elmhurst Avenue
London N2 0LT, UK
voice: +44 (0) 208 444 2081
mobile: +44 (0) 740 318 1612