SPSSX Discussion

Missing values in MIXED

Classic

List

Threaded

17 messages Options

Kornbrot, Diana

Missing values in MIXED

Missing values in MIXED If one uses repeated in procedure GLM then it appears that all subjects must have vlaues for all combinations of the rpeated measures
BUT using MIXED, there is then a non-integer error df
How is SPSS actually handling the missing values?
Nb Am using unstructured covariance matrix

Thanks for help
Best
Diana

Emeritus Professor Diana Kornbrot
email: d.e.kornbrot@...
web:    http://dianakornbrot.wordpress.com/
Work
Department of Psychology
School of Life and Medical Sciences
University of Hertfordshire
College Lane, Hatfield, Hertfordshire AL10 9AB, UK
voice:   +44 (0) 170 728 4626
Home
19 Elmhurst Avenue
London N2 0LT, UK
voice:   +44 (0) 208 444 2081
mobile: +44 (0) 740 318 1612

Bruce Weaver

Re: Missing values in MIXED

Administrator

Hello Diana. I don't have a direct answer to your question, but I do have some pointers to material that may be helpful.

1. Singer & Willett (Applied Longitudinal Data Analysis, Chapter 5) talk about missing data in the multilevel model for change.

2. Twisk (Applied Multilevel Analysis, 2006, p. 107) says this:

"However, when applying multilevel analysis to longitudinal data, there is no need to have a 'complete' dataset, and furthermore, it has been shown that multilevel analysis is very flexible in handling missing data. It has even been shown that applying multilevel analysis to an incomplete dataset is even better than applying imputation methods (Twisk and de Vente, 2002; Twisk, 2003)."

Twisk & de Vente (2002): http://europepmc.org/abstract/MED/11927199
Twisk (2003): http://books.google.ca/books?hl=en&lr=&id=TCg02e-tI_cC&oi=fnd&pg=PR15&dq=Twisk+2003&ots=2GfodRIiu9&sig=z8BSBQoRaZNavIzj_QOeATBP_nw#v=onepage&q=Twisk%202003&f=false

HTH.

Cheers,
Bruce

Kornbrot, Diana wrote

If one uses repeated in procedure GLM then it appears that all subjects must have vlaues for all combinations of the rpeated measures
BUT using MIXED, there is then a non-integer error df
How is SPSS actually handling the missing values?
Nb Am using unstructured covariance matrix

Thanks for help
Best
Diana
________________________________
Emeritus Professor Diana Kornbrot
email: [hidden email]
web: http://dianakornbrot.wordpress.com/
Work
Department of Psychology
School of Life and Medical Sciences
University of Hertfordshire
College Lane, Hatfield, Hertfordshire AL10 9AB, UK
voice: +44 (0) 170 728 4626
Home
19 Elmhurst Avenue
London N2 0LT, UK
voice: +44 (0) 208 444 2081
mobile: +44 (0) 740 318 1612

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Ryan

Re: Missing values in MIXED

In reply to this post by Kornbrot, Diana

Diana,

In order to employ a linear mixed model in SPSS, one must construct the dataset in vertical format, such that there are "k" cases per subject with an identification variable with non-repeating numbers for cases associated with a particular subject. Assuming the within-subjects variable is either nominal, ordinal, or is composed of equally-spaced intervals, it is common practice for the within-subjects variable to be a numeric integer variable with sequential values from 1 through "k" levels of the within-subjects variable. Finally, the response variable must be concatenated vertically with each measurement linked to the appropriate ID and level of the within-subject variable.

Here is an illustration:

ID Time y
1   1    34
1   2    22
1   3    12
1   4    11
2   1    33
2   2    32
2   3    .
2   4    22
3   1    38
3   2    37
3   3    34
3   4    30
.
.
.
.

As you can see above, the second subject was not measured at time 3. As a result, that case will be excluded from the linear mixed model analysis. However, data obtained from other times points for that particular subject will be included in the analysis. The assumption we must make in order to obtain unbiased estimates derived from a linear mixed model is that the data are missing randomly. With that said, the MIXED procedure in SPSS calculates degrees of freedom using Satterthwaite's Approximation:

http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.help%2Falg_mixed_custom-tests_satterthwaite.htm

This approximation has been shown to be valid for balanced and unbalanced designs.

In addition to the benefits of not having to exclude all data from subjects who happen to have data which are missing randomly for parameter estimation, the MIXED procedure allows for modeling of continuous response variables using various hierarchical designs and residual covariance structures.

Ryan

On Fri, Mar 15, 2013 at 11:46 AM, Kornbrot, Diana <[hidden email]> wrote:

If one uses repeated in procedure GLM then it appears that all subjects must have vlaues for all combinations of the rpeated measures
BUT using MIXED, there is then a non-integer error df
How is SPSS actually handling the missing values?
Nb Am using unstructured covariance matrix

Thanks for help
Best
Diana

Emeritus Professor Diana Kornbrot
email: d.e.kornbrot@...
web:    http://dianakornbrot.wordpress.com/
Work
Department of Psychology
School of Life and Medical Sciences
University of Hertfordshire
College Lane, Hatfield, Hertfordshire AL10 9AB, UK
voice:   <a href="tel:%2B44%20%280%29%20170%20728%204626" target="_blank" value="+441707284626">+44 (0) 170 728 4626
Home
19 Elmhurst Avenue
London N2 0LT, UK
voice:   <a href="tel:%2B44%20%280%29%20208%20%C2%A0444%202081" target="_blank" value="+442084442081">+44 (0) 208 444 2081
mobile: <a href="tel:%2B44%20%280%29%20740%20318%201612" target="_blank" value="+447403181612">+44 (0) 740 318 1612

Kornbrot, Diana

Re: Missing values in MIXED

Re: Missing values in MIXED Ryan

Thanks
Done all that. Converting horizontal to vertical is straightforward using the data structuring wizard [don’t need syntax], once one gets the hang of it

My ACTUAL question was:
MIXED with data in long form can cope with missing data, with correction for denominator df
GLM REPEATED insists on NO missing data
So what is the difference?

With the help of Bruce Weaver, I have NOW worked out that the difference lies in the covariance matrix used for estimation of parameters
REPEATED applies list wise deletion and so discards any subjects that do not have values for all variables,
MIXED applies pair wise deletion. Suspect the reduced df is harmonic mean of df for relevant groups, but do not know

Bruce provides following useful refs that suggest that using MIXED may actually be less biased than any of a whole slew of complicated imputation procedures:
Twisk & de Vente (2002): http://europepmc.org/abstract/MED/11927199
Twisk (2003):
http://books.google.ca/books?hl=en&lr=&id=TCg02e-tI_cC&oi=fnd&pg=PR15&dq=Twisk+2003&ots=2GfodRIiu9&sig=z8BSBQoRaZNavIzj_QOeATBP_nw#v=onepage&q=Twisk%202003&f=false
Singer & Willett (/Applied Longitudinal Data Analysis/, Chapter 5).

I NOW recommend MIXED with UNSTRUCTURED covariance matrix across the board. No doubt it will take time to ‘filter down’ to all users
Output much simpler as all inferential tests in 1 table
Can do appropriate post hoc or planned comparisons with standard errors correctly estimated from unstructured covariance matrix.

MIXED has limitation of not supplying effect sizes.
Jason Becksted points out that on can calculate partial eta squared = F*df1/(F*df1+df2), where df1 is the hypothesis df and df2 is the error df.

REPEATED, no doubt ground breaking in its time [distant past], is fiddly & potentially misleading. Although the multivariate option uses correct unstructured covariance matrix, the post hocs use SEs based on inappropriate diagnonal covraince matrix, with GG corrections. Personally, have never seen a covariance matrix with all pair wise covariances equal – seems improbable in the real world.

Best

Diana

On 16/03/2013 16:49, "R B" <ryan.andrew.black@...> wrote:

Diana,

In order to employ a linear mixed model in SPSS, one must construct the dataset in vertical format, such that there are "k" cases per subject with an identification variable with non-repeating numbers for cases associated with a particular subject. Assuming the within-subjects variable is either nominal, ordinal, or is composed of equally-spaced intervals, it is common practice for the within-subjects variable to be a numeric integer variable with sequential values from 1 through "k" levels of the within-subjects variable. Finally, the response variable must be concatenated vertically with each measurement linked to the appropriate ID and level of the within-subject variable.

Here is an illustration:

ID Time y
1   1    34
1   2    22
1   3    12
1   4    11
2   1    33
2   2    32
2   3    .
2   4    22
3   1    38
3   2    37
3   3    34
3   4    30
.
.
.
.

As you can see above, the second subject was not measured at time 3. As a result, that case will be excluded from the linear mixed model analysis. However, data obtained from other times points for that particular subject will be included in the analysis. The assumption we must make in order to obtain unbiased estimates derived from a linear mixed model is that the data are missing randomly. With that said, the MIXED procedure in SPSS calculates degrees of freedom using Satterthwaite's Approximation:

http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.help%2Falg_mixed_custom-tests_satterthwaite.htm

This approximation has been shown to be valid for balanced and unbalanced designs.

In addition to the benefits of not having to exclude all data from subjects who happen to have data which are missing randomly for parameter estimation, the MIXED procedure allows for modeling of continuous response variables using various hierarchical designs and residual covariance structures.

Ryan
On Fri, Mar 15, 2013 at 11:46 AM, Kornbrot, Diana <d.e.kornbrot@...> wrote:

If one uses repeated in procedure GLM then it appears that all subjects must have vlaues for all combinations of the rpeated measures
BUT using MIXED, there is then a non-integer error df
How is SPSS actually handling the missing values?
Nb Am using unstructured covariance matrix

Thanks for help
Best
Diana

Emeritus Professor Diana Kornbrot
email: d.e.kornbrot@... <http://d.e.kornbrot@...>
web:    http://dianakornbrot.wordpress.com/
Work
Department of Psychology
School of Life and Medical Sciences
University of Hertfordshire
College Lane, Hatfield, Hertfordshire AL10 9AB, UK
voice:   +44 (0) 170 728 4626 <tel:%2B44%20%280%29%20170%20728%204626>
Home
19 Elmhurst Avenue
London N2 0LT, UK
voice:   +44 (0) 208 444 2081 <tel:%2B44%20%280%29%20208%20%C2%A0444%202081>
mobile: +44 (0) 740 318 1612 <tel:%2B44%20%280%29%20740%20318%201612>

Ryan

Re: Missing values in MIXED

Diana,

See my comments below.

On Mar 17, 2013, at 4:43 AM, "Kornbrot, Diana" <[hidden email]> wrote:

Re: Missing values in MIXED Ryan

Thanks
Done all that. Converting horizontal to vertical is straightforward using the data structuring wizard [don’t need syntax], once one gets the hang of it

My ACTUAL question was:
MIXED with data in long form can cope with missing data, with correction for denominator df
GLM REPEATED insists on NO missing data
So what is the difference?

With the help of Bruce Weaver, I have NOW worked out that the difference lies in the covariance matrix used for estimation of parameters
REPEATED applies list wise deletion and so discards any subjects that do not have values for all variables,
MIXED applies pair wise deletion.

That is exactly what I showed in the illustration.

Suspect the reduced df is harmonic mean of df for relevant groups, but do not know

No need to suspect. I provided a link to the formula for df error. I don't know what you mean by reduced.

Bruce provides following useful refs that suggest that using MIXED may actually be less biased than any of a whole slew of complicated imputation procedures:
Twisk & de Vente (2002): http://europepmc.org/abstract/MED/11927199
Twisk (2003):
http://books.google.ca/books?hl=en&lr=&id=TCg02e-tI_cC&oi=fnd&pg=PR15&dq=Twisk+2003&ots=2GfodRIiu9&sig=z8BSBQoRaZNavIzj_QOeATBP_nw#v=onepage&q=Twisk%202003&f=false
Singer & Willett (/Applied Longitudinal Data Analysis/, Chapter 5).

I NOW recommend MIXED with UNSTRUCTURED covariance matrix across the board.

That is a poor recommendation. The goal should be to find the optimal residual variance-covariance structure. You could reduce statistical power if you employ an unstructured matrix if there is a less restrictive structure that fits that data equally well (e.g., AR1. TOEP). There may be other aspects to your data as well (G-side random effects that should be incorporated).

No doubt it will take time to ‘filter down’ to all users
Output much simpler as all inferential tests in 1 table
Can do appropriate post hoc or planned comparisons with standard errors correctly estimated from unstructured covariance matrix.

That is not only true for the unstructured matrix.

MIXED has limitation of not supplying effect sizes.
Jason Becksted points out that on can calculate partial eta squared = F*df1/(F*df1+df2), where df1 is the hypothesis df and df2 is the error df.

So did I, publicly, when you asked. And I pointed out that one would have to employ ML to use that same formula to obtain partial eta squared from a fully balanced fixed effects only design. But, I would question the validity of using that formula under all circumstances, which is why I provided the alternative. For example, what if you are trying to determine the effect size of a random effect? What if your fixed effect predictor is at a higher level? There have been plenty of discussions on this matter on the multilevel listserve and in multilevel textbooks. I would not simply apply that formula to all circumstances. In fact, I would generally recommend using the second approach I showed.

REPEATED, no doubt ground breaking in its time [distant past], is fiddly & potentially misleading. Although the multivariate option uses correct unstructured covariance matrix, the post hocs use SEs based on inappropriate diagnonal covraince matrix, with GG corrections. Personally, have never seen a covariance matrix with all pair wise covariances equal – seems improbable in the real world.

Again, there are alternatives to both extremes. It is not one versus the other.

Best

Diana

On 16/03/2013 16:49, "R B" <ryan.andrew.black@...> wrote:

Diana,

In order to employ a linear mixed model in SPSS, one must construct the dataset in vertical format, such that there are "k" cases per subject with an identification variable with non-repeating numbers for cases associated with a particular subject. Assuming the within-subjects variable is either nominal, ordinal, or is composed of equally-spaced intervals, it is common practice for the within-subjects variable to be a numeric integer variable with sequential values from 1 through "k" levels of the within-subjects variable. Finally, the response variable must be concatenated vertically with each measurement linked to the appropriate ID and level of the within-subject variable.

Here is an illustration:

ID Time y
1   1    34
1   2    22
1   3    12
1   4    11
2   1    33
2   2    32
2   3    .
2   4    22
3   1    38
3   2    37
3   3    34
3   4    30
.
.
.
.

As you can see above, the second subject was not measured at time 3. As a result, that case will be excluded from the linear mixed model analysis. However, data obtained from other times points for that particular subject will be included in the analysis. The assumption we must make in order to obtain unbiased estimates derived from a linear mixed model is that the data are missing randomly. With that said, the MIXED procedure in SPSS calculates degrees of freedom using Satterthwaite's Approximation:

http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.help%2Falg_mixed_custom-tests_satterthwaite.htm

This approximation has been shown to be valid for balanced and unbalanced designs.

In addition to the benefits of not having to exclude all data from subjects who happen to have data which are missing randomly for parameter estimation, the MIXED procedure allows for modeling of continuous response variables using various hierarchical designs and residual covariance structures.

Ryan
On Fri, Mar 15, 2013 at 11:46 AM, Kornbrot, Diana <d.e.kornbrot@...> wrote:

If one uses repeated in procedure GLM then it appears that all subjects must have vlaues for all combinations of the rpeated measures
BUT using MIXED, there is then a non-integer error df
How is SPSS actually handling the missing values?
Nb Am using unstructured covariance matrix

Thanks for help
Best
Diana

Emeritus Professor Diana Kornbrot
email: d.e.kornbrot@... <http://d.e.kornbrot@...>
web:    http://dianakornbrot.wordpress.com/
Work
Department of Psychology
School of Life and Medical Sciences
University of Hertfordshire
College Lane, Hatfield, Hertfordshire AL10 9AB, UK
voice:   +44 (0) 170 728 4626 <tel:%2B44%20%280%29%20170%20728%204626>
Home
19 Elmhurst Avenue
London N2 0LT, UK
voice:   +44 (0) 208 444 2081 <tel:%2B44%20%280%29%20208%20%C2%A0444%202081>
mobile: +44 (0) 740 318 1612 <tel:%2B44%20%280%29%20740%20318%201612>

Emeritus Professor Diana Kornbrot
email: d.e.kornbrot@...
web:    http://dianakornbrot.wordpress.com/
Work
Department of Psychology
School of Life and Medical Sciences
University of Hertfordshire
College Lane, Hatfield, Hertfordshire AL10 9AB, UK
voice:   +44 (0) 170 728 4626
Home
19 Elmhurst Avenue
London N2 0LT, UK
voice:   +44 (0) 208 444 2081
mobile: +44 (0) 740 318 1612

Bruce Weaver

Re: Missing values in MIXED

Administrator

Ryan, I think you meant to say more restrictive below, did you not? I.e., unstructured imposes no restrictions on the covariance matrix, and therefore uses up more df. When you use another structure that fits the data reasonably well, you impose restrictions that buy you some df.

Cheers,
Bruce

Ryan Black wrote

--- snip ---

> I NOW recommend MIXED with UNSTRUCTURED covariance matrix across the board.

That is a poor recommendation. The goal should be to find the optimal residual variance-covariance structure. You could reduce statistical power if you employ an unstructured matrix if there is a less restrictive structure that fits that data equally well (e.g., AR1. TOEP). There may be other aspects to your data as well (G-side random effects that should be incorporated). [emphasis added]

Ryan

Re: Missing values in MIXED

In reply to this post by Ryan

Dear SPSS-L,

Diana made a bold statement that under all circumstances one should employ a residual unstructured variance-covariance structure. Let me dispel that myth immediately. Run the code BELOW, and note that by employing a likelihood ratio test we observe that the first-order autoregressive structure is fitting the data equally well to the unstructured residual matrix. If the objective in science is to obtain the most parsimonious model that best explains the phenomenon, why would we not apply the same rule when building statistical models?

Second, by using the more parsimonious model (first-order autoregressive residaul structure), for the illustration below, take note that one obtains a statistically more powerful test of the fixed effect of time. In fact, by employing an unstructured residual matrix the fixed effect of time is not significant at alpha=.05, whereas the fixed effect for time is significant at alpha=.05 for the first-order autoregressive matrix.

This is one of many examples I could have simulated where using the general recommendation that Diana made to only use an unstructured matrix will result in not only poor science, but differential conclusions.

Ryan

*Generate Data for Mixed Model with AR1 specification.

set seed 98734523.

new file.

inp pro.

compute subject=-99.

compute time = -99.

compute x1 = -99.

compute x2 = -99.

compute x3 = -99.

compute e1 = -99.

compute e2 = -99.

compute e3 = -99.

compute sigma = 1.

compute rho = 0.50.

compute a11 = 1.

compute a21 = rho.

compute a31 = rho**2.

compute a22 = sqrt(1 - rho**2).

compute a32 = rho*sqrt(1 - rho**2).

compute a33 = sqrt(1 - rho**2).

leave subject to a33.

loop subject= 1 to 100.

compute x1 = rv.normal(0,1).

compute x2 = rv.normal(0,1).

compute x3 = rv.normal(0,1).

compute e1 = sigma * a11*x1.

compute e2 = sigma * (a21*x1 + a22*x2).

compute e3 = sigma * (a31*x1 + a32*x2 + a33*x3).

loop time = 1 to 3.

compute y = 1.5 + 0.20*(time=1) + 0.22*(time=2) + e1*(time=1) +

e2*(time=2) + e3*(time=3).

end case.

end loop.

end file.

end inp pro.

exe.

delete variables x1 x2 x3 sigma rho a11 a21 a31 a22 a32 a33 e1 e2 e3.

MIXED y BY time

/FIXED=time | SSTYPE(3)

/METHOD=REML

/PRINT=R SOLUTION

/REPEATED=time | SUBJECT(subject) COVTYPE(UN).

MIXED y BY time

/FIXED=time | SSTYPE(3)

/METHOD=REML

/PRINT=R SOLUTION

/REPEATED=time | SUBJECT(subject) COVTYPE(AR1).

compute deviance_difference = 804.669655 - 801.255252.

compute deviance_p_value = 1 - CDF.CHISQ(deviance_difference,4).

execute.

On Sun, Mar 17, 2013 at 8:40 AM, <[hidden email]> wrote:

> Diana,

> See my comments below.

> On Mar 17, 2013, at 4:43 AM, "Kornbrot, Diana" <[hidden email]> wrote:

> Ryan

> Thanks

> Done all that. Converting horizontal to vertical is straightforward using the data structuring wizard [don’t need syntax], once one gets the hang of it

> My ACTUAL question was:

> MIXED with data in long form can cope with missing data, with correction for denominator df

> GLM REPEATED insists on NO missing data

> So what is the difference?

> With the help of Bruce Weaver, I have NOW worked out that the difference lies in the covariance matrix used for estimation of parameters

> REPEATED applies list wise deletion and so discards any subjects that do not have values for all variables,

> MIXED applies pair wise deletion.

> That is exactly what I showed in the illustration.

> Suspect the reduced df is harmonic mean of df for relevant groups, but do not know

> No need to suspect. I provided a link to the formula for df error. I don't know what you mean by reduced.

> Bruce provides following useful refs that suggest that using MIXED may actually be less biased than any of a whole slew of complicated imputation procedures:

> Twisk & de Vente (2002): http://europepmc.org/abstract/MED/11927199

> Twisk (2003):

> http://books.google.ca/books?hl=en&lr=&id=TCg02e-tI_cC&oi=fnd&pg=PR15&dq=Twisk+2003&ots=2GfodRIiu9&sig=z8BSBQoRaZNavIzj_QOeATBP_nw#v=onepage&q=Twisk%202003&f=false

> Singer & Willett (/Applied Longitudinal Data Analysis/, Chapter 5).

> I NOW recommend MIXED with UNSTRUCTURED covariance matrix across the board.

> That is a poor recommendation. The goal should be to find the optimal residual variance-covariance structure. You could reduce statistical power if you employ an unstructured matrix if there is a less restrictive structure that fits that data equally well (e.g., AR1. TOEP). There may be other aspects to your data as well (G-side random effects that should be incorporated).

> No doubt it will take time to ‘filter down’ to all users

> Output much simpler as all inferential tests in 1 table

> Can do appropriate post hoc or planned comparisons with standard errors correctly estimated from unstructured covariance matrix.

> That is not only true for the unstructured matrix.

> MIXED has limitation of not supplying effect sizes.

> Jason Becksted points out that on can calculate partial eta squared = F*df1/(F*df1+df2), where df1 is the hypothesis df and df2 is the error df.

> So did I, publicly, when you asked. And I pointed out that one would have to employ ML to use that same formula to obtain partial eta squared from a fully balanced fixed effects only design. But, I would question the validity of using that formula under all circumstances, which is why I provided the alternative. For example, what if you are trying to determine the effect size of a random effect? What if your fixed effect predictor is at a higher level? There have been plenty of discussions on this matter on the multilevel listserve and in multilevel textbooks. I would not simply apply that formula to all circumstances. In fact, I would generally recommend using the second approach I showed.

> REPEATED, no doubt ground breaking in its time [distant past], is fiddly & potentially misleading. Although the multivariate option uses correct unstructured covariance matrix, the post hocs use SEs based on inappropriate diagnonal covraince matrix, with GG corrections. Personally, have never seen a covariance matrix with all pair wise covariances equal – seems improbable in the real world.

> Again, there are alternatives to both extremes. It is not one versus the other.

> Best

> Diana

> On 16/03/2013 16:49, "R B" <[hidden email]> wrote:

> Diana,

> In order to employ a linear mixed model in SPSS, one must construct the dataset in vertical format, such that there are "k" cases per subject with an identification variable with non-repeating numbers for cases associated with a particular subject. Assuming the within-subjects variable is either nominal, ordinal, or is composed of equally-spaced intervals, it is common practice for the within-subjects variable to be a numeric integer variable with sequential values from 1 through "k" levels of the within-subjects variable. Finally, the response variable must be concatenated vertically with each measurement linked to the appropriate ID and level of the within-subject variable.

> Here is an illustration:

> ID Time y

> 1 1 34

> 1 2 22

> 1 3 12

> 1 4 11

> 2 1 33

> 2 2 32

> 2 3 .

> 2 4 22

> 3 1 38

> 3 2 37

> 3 3 34

> 3 4 30

> .

> As you can see above, the second subject was not measured at time 3. As a result, that case will be excluded from the linear mixed model analysis. However, data obtained from other times points for that particular subject will be included in the analysis. The assumption we must make in order to obtain unbiased estimates derived from a linear mixed model is that the data are missing randomly. With that said, the MIXED procedure in SPSS calculates degrees of freedom using Satterthwaite's Approximation:

> http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.help%2Falg_mixed_custom-tests_satterthwaite.htm

> This approximation has been shown to be valid for balanced and unbalanced designs.

> In addition to the benefits of not having to exclude all data from subjects who happen to have data which are missing randomly for parameter estimation, the MIXED procedure allows for modeling of continuous response variables using various hierarchical designs and residual covariance structures.

> Ryan

> On Fri, Mar 15, 2013 at 11:46 AM, Kornbrot, Diana <[hidden email]> wrote:

> If one uses repeated in procedure GLM then it appears that all subjects must have vlaues for all combinations of the rpeated measures

> BUT using MIXED, there is then a non-integer error df

> How is SPSS actually handling the missing values?

> Nb Am using unstructured covariance matrix

> Thanks for help

> Best

> Diana

> ________________________________

> Emeritus Professor Diana Kornbrot

> email: [hidden email] <http://d.e.kornbrot@...>

> web: http://dianakornbrot.wordpress.com/

> Work

> Department of Psychology

> School of Life and Medical Sciences

> University of Hertfordshire

> College Lane, Hatfield, Hertfordshire AL10 9AB, UK

> voice: +44 (0) 170 728 4626 <tel:%2B44%20%280%29%20170%20728%204626>

> Home

> 19 Elmhurst Avenue

> London N2 0LT, UK

> voice: +44 (0) 208 444 2081 <tel:%2B44%20%280%29%20208%20%C2%A0444%202081>

> mobile: +44 (0) 740 318 1612 <tel:%2B44%20%280%29%20740%20318%201612>

> ________________________________

> Emeritus Professor Diana Kornbrot

> email: [hidden email]

> web: http://dianakornbrot.wordpress.com/

> Work

> Department of Psychology

> School of Life and Medical Sciences

> University of Hertfordshire

> College Lane, Hatfield, Hertfordshire AL10 9AB, UK

> voice: +44 (0) 170 728 4626

> Home

> 19 Elmhurst Avenue

> London N2 0LT, UK

> voice: +44 (0) 208 444 2081

> mobile: +44 (0) 740 318 1612

Ryan

Re: Missing values in MIXED

In reply to this post by Bruce Weaver

Hi Bruce,

That's correct. I meant to state "more restrictive." Good catch. I submitted the message before re-reading it. For those interested, see the illustration I provide in a message I just posted that illustrates the point I was making. Note how the more restrictive model (AR1) is estimating 4 fewer parameters than the least restrictive (UN).

I must say that I'm very surprised by a recommendation to always use the unstructured matrix. In my own work with repeated measures data, so often I have found that some type of a residual correlation matrix that accounts for decay of correlations among residuals obtained from observations more distant in time fits the data equally well to an unstructured matrix. Of course, I am referring to a design which only has one within-subjects variable.

What if, for example, one were to analyze data collected from a type of randomized controlled trial (RCT). What types of the structures might one consider? I'll leave that question unanswered for people to ponder.

Ryan

On Sun, Mar 17, 2013 at 9:16 AM, Bruce Weaver <[hidden email]> wrote:

Ryan, I think you meant to say *more* restrictive below, did you not? I.e.,
unstructured imposes no restrictions on the covariance matrix, and therefore
uses up more df. When you use another structure that fits the data
reasonably well, you impose restrictions that buy you some df.

Cheers,
Bruce

Ryan Black wrote
> --- snip ---

>
>> I NOW recommend MIXED with UNSTRUCTURED covariance matrix across the
>> board.
>
> That is a poor recommendation. The goal should be to find the optimal
> residual variance-covariance structure. You could reduce statistical power
> if you employ an unstructured matrix if there is a

*
> less restrictive
*

> structure that fits that data equally well (e.g., AR1. TOEP). There may
> be other aspects to your data as well (G-side random effects that should

> be incorporated). [
*
> emphasis added
*

> ]

-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--

View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Missing-values-in-MIXED-tp5718714p5718753.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Ryan

Re: Missing values in MIXED

In reply to this post by Ryan

Since Bruce pointed out a typo I made, I decided to reread my entire response to the OP. I noticed another typo. In this post, I correct both typos and I have decided to add another comment. All changes are ***CAPITALIZED*** in the text BELOW my name. But, I also have another comment to make right here:

The reason I'm taking such an interest in this thread is that I have heard this general recommendation before; that is, to always use an unstructured residual-covariance matrix. I don't know if there is a textbook out there that makes such a silly (at best) or dangerous (at worst) recommendation, but my guess is because of the assumption that the unstructured matrix can never be wrong due to the lack of restrictions. Let me make a somewhat provocative statement...An unstructured residual variance-covariance structure applied to ALL subjects is not always the LEAST restrictive residual variance-covariance structure. I realize that in the past I have even said that an unstructured matrix is the least restrictive, but I should have couched that statement in the context of single group designs only.

Ryan

On Sun, Mar 17, 2013 at 8:40 AM, <[hidden email]> wrote:

Diana,

See my comments below.

On Mar 17, 2013, at 4:43 AM, "Kornbrot, Diana" <[hidden email]> wrote:

Ryan

Thanks
Done all that. Converting horizontal to vertical is straightforward using the data structuring wizard [don’t need syntax], once one gets the hang of it

My ACTUAL question was:
MIXED with data in long form can cope with missing data, with correction for denominator df
GLM REPEATED insists on NO missing data
So what is the difference?

With the help of Bruce Weaver, I have NOW worked out that the difference lies in the covariance matrix used for estimation of parameters
REPEATED applies list wise deletion and so discards any subjects that do not have values for all variables,
MIXED applies pair wise deletion.

That is exactly what I showed in the illustration.

Suspect the reduced df is harmonic mean of df for relevant groups, but do not know

No need to suspect. I provided a link to the formula for df error. I don't know what you mean by reduced.

Bruce provides following useful refs that suggest that using MIXED may actually be less biased than any of a whole slew of complicated imputation procedures:
Twisk & de Vente (2002): http://europepmc.org/abstract/MED/11927199
Twisk (2003):
http://books.google.ca/books?hl=en&lr=&id=TCg02e-tI_cC&oi=fnd&pg=PR15&dq=Twisk+2003&ots=2GfodRIiu9&sig=z8BSBQoRaZNavIzj_QOeATBP_nw#v=onepage&q=Twisk%202003&f=false
Singer & Willett (/Applied Longitudinal Data Analysis/, Chapter 5).

I NOW recommend MIXED with UNSTRUCTURED covariance matrix across the board.

That is a poor recommendation. The goal should be to find the optimal residual variance-covariance structure. You could reduce statistical power if you employ an unstructured matrix if there is a ****MORE*** restrictive structure that fits that data equally well (e.g., AR1. TOEP). There may be other aspects to your data as well (G-side random effects that should be incorporated).

No doubt it will take time to ‘filter down’ to all users
Output much simpler as all inferential tests in 1 table
Can do appropriate post hoc or planned comparisons with standard errors correctly estimated from unstructured covariance matrix.

That is not only true for the unstructured matrix.

MIXED has limitation of not supplying effect sizes.
Jason Becksted points out that on can calculate partial eta squared = F*df1/(F*df1+df2), where df1 is the hypothesis df and df2 is the error df.

So did I, publicly, when you asked. And I pointed out that one would have to employ ML to use the ***ALTERNATIVE*** formula to obtain partial eta squared from a fully balanced fixed effects only design. But, I would question the validity of using that formula ***ABOVE*** under all circumstances, which is why I provided the alternative. For example, what if you are trying to determine the effect size of a random effect? What if your fixed effect predictor is at a higher level? There have been plenty of discussions on this matter on the multilevel listserve and in multilevel textbooks. I would not simply apply that formula to all circumstances. In fact, I would generally recommend using the second approach I showed. ***SPEAKING OF EFFECT SIZE, WE MUST ALSO BE CAREFUL TO DEFINE WHAT WE MEAN BY EFFECT SIZE***

REPEATED, no doubt ground breaking in its time [distant past], is fiddly & potentially misleading. Although the multivariate option uses correct unstructured covariance matrix, the post hocs use SEs based on inappropriate diagnonal covraince matrix, with GG corrections. Personally, have never seen a covariance matrix with all pair wise covariances equal – seems improbable in the real world.

Again, there are alternatives to both extremes. It is not one versus the other.

Best

Diana

On 16/03/2013 16:49, "R B" <ryan.andrew.black@...> wrote:

Diana,

In order to employ a linear mixed model in SPSS, one must construct the dataset in vertical format, such that there are "k" cases per subject with an identification variable with non-repeating numbers for cases associated with a particular subject. Assuming the within-subjects variable is either nominal, ordinal, or is composed of equally-spaced intervals, it is common practice for the within-subjects variable to be a numeric integer variable with sequential values from 1 through "k" levels of the within-subjects variable. Finally, the response variable must be concatenated vertically with each measurement linked to the appropriate ID and level of the within-subject variable.

Here is an illustration:

ID Time y
1   1    34
1   2    22
1   3    12
1   4    11
2   1    33
2   2    32
2   3    .
2   4    22
3   1    38
3   2    37
3   3    34
3   4    30
.
.
.
.

As you can see above, the second subject was not measured at time 3. As a result, that case will be excluded from the linear mixed model analysis. However, data obtained from other times points for that particular subject will be included in the analysis. The assumption we must make in order to obtain unbiased estimates derived from a linear mixed model is that the data are missing randomly. With that said, the MIXED procedure in SPSS calculates degrees of freedom using Satterthwaite's Approximation:

http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.help%2Falg_mixed_custom-tests_satterthwaite.htm

This approximation has been shown to be valid for balanced and unbalanced designs.

In addition to the benefits of not having to exclude all data from subjects who happen to have data which are missing randomly for parameter estimation, the MIXED procedure allows for modeling of continuous response variables using various hierarchical designs and residual covariance structures.

Ryan
On Fri, Mar 15, 2013 at 11:46 AM, Kornbrot, Diana <d.e.kornbrot@...> wrote:

If one uses repeated in procedure GLM then it appears that all subjects must have vlaues for all combinations of the rpeated measures
BUT using MIXED, there is then a non-integer error df
How is SPSS actually handling the missing values?
Nb Am using unstructured covariance matrix

Thanks for help
Best
Diana

Emeritus Professor Diana Kornbrot
email: d.e.kornbrot@... <http://d.e.kornbrot@...>
web:    http://dianakornbrot.wordpress.com/
Work
Department of Psychology
School of Life and Medical Sciences
University of Hertfordshire
College Lane, Hatfield, Hertfordshire AL10 9AB, UK
voice:   <a href="tel:%2B44%20%280%29%20170%20728%204626" target="_blank" value="+441707284626">+44 (0) 170 728 4626 <tel:%2B44%20%280%29%20170%20728%204626>
Home
19 Elmhurst Avenue
London N2 0LT, UK
voice:   <a href="tel:%2B44%20%280%29%20208%20%C2%A0444%202081" target="_blank" value="+442084442081">+44 (0) 208 444 2081 <tel:%2B44%20%280%29%20208%20%C2%A0444%202081>
mobile: <a href="tel:%2B44%20%280%29%20740%20318%201612" target="_blank" value="+447403181612">+44 (0) 740 318 1612 <tel:%2B44%20%280%29%20740%20318%201612>

Emeritus Professor Diana Kornbrot
email: d.e.kornbrot@...
web:    http://dianakornbrot.wordpress.com/
Work
Department of Psychology
School of Life and Medical Sciences
University of Hertfordshire
College Lane, Hatfield, Hertfordshire AL10 9AB, UK
voice:   <a href="tel:%2B44%20%280%29%20170%20728%204626" target="_blank" value="+441707284626">+44 (0) 170 728 4626
Home
19 Elmhurst Avenue
London N2 0LT, UK
voice:   <a href="tel:%2B44%20%280%29%20208%20%C2%A0444%202081" target="_blank" value="+442084442081">+44 (0) 208 444 2081
mobile: <a href="tel:%2B44%20%280%29%20740%20318%201612" target="_blank" value="+447403181612">+44 (0) 740 318 1612

Bruce Weaver

Re: Missing values in MIXED

Administrator

Off the top of my head, I can't say where I read it (and I don't have my books with me today), but I do think that at least one author I've read recommends always starting with an unstructured residual covariance matrix, and imposing restrictions if/when it makes sense to do so. I wonder if this is the approach Diana was actually promoting.

Cheers,
Bruce

Ryan Black wrote

Since Bruce pointed out a typo I made, I decided to reread my entire
response to the OP. I noticed another typo. In this post, I correct both
typos and I have decided to add another comment. All changes are
***CAPITALIZED*** in the text BELOW my name. But, I also have another
comment to make right here:

The reason I'm taking such an interest in this thread is that I have heard
this general recommendation before; that is, to always use an unstructured
residual-covariance matrix. I don't know if there is a textbook out there
that makes such a silly (at best) or dangerous (at worst) recommendation,
but my guess is because of the assumption that the unstructured matrix can
never be wrong due to the lack of restrictions. Let me make a somewhat
provocative statement...An unstructured
residual variance-covariance structure applied to ALL subjects is not
always the LEAST restrictive residual variance-covariance structure. I
realize that in the past I have even said that an unstructured matrix is
the least restrictive, but I should have couched that statement in the
context of single group designs only.

Ryan
On Sun, Mar 17, 2013 at 8:40 AM, <[hidden email]> wrote:

> Diana,
>
> See my comments below.
>
> On Mar 17, 2013, at 4:43 AM, "Kornbrot, Diana" <[hidden email]>
> wrote:
>
> Ryan
>
> Thanks
> Done all that. Converting horizontal to vertical is straightforward using
> the data structuring wizard [don’t need syntax], once one gets the hang of
> it
>
> My ACTUAL question was:
> MIXED with data in long form can cope with missing data, with correction
> for denominator df
> GLM REPEATED insists on NO missing data
> So what is the difference?
>
> With the help of Bruce Weaver, I have NOW worked out that the difference
> lies in the covariance matrix used for estimation of parameters
> REPEATED applies list wise deletion and so discards any subjects that do
> not have values for all variables,
> MIXED applies pair wise deletion.
>
>
> That is exactly what I showed in the illustration.
>
> Suspect the reduced df is harmonic mean of df for relevant groups, but do
> not know
>
>
> No need to suspect. I provided a link to the formula for df error. I don't
> know what you mean by reduced.
>
>
> Bruce provides following useful refs that suggest that using MIXED may
> actually be less biased than any of a whole slew of complicated imputation
> procedures:
> Twisk & de Vente (2002): *http://europepmc.org/abstract/MED/11927199
> *Twisk (2003):
> *
> http://books.google.ca/books?hl=en&lr=&id=TCg02e-tI_cC&oi=fnd&pg=PR15&dq=Twisk+2003&ots=2GfodRIiu9&sig=z8BSBQoRaZNavIzj_QOeATBP_nw#v=onepage&q=Twisk%202003&f=false
> *Singer & Willett (/Applied Longitudinal Data Analysis/, Chapter 5).
>
> I NOW recommend MIXED with UNSTRUCTURED covariance matrix across the
> board.
>
>
> That is a poor recommendation. The goal should be to find the optimal
> residual variance-covariance structure. You could reduce statistical power
> if you employ an unstructured matrix if there is a ****MORE*** restrictive
> structure that fits that data equally well (e.g., AR1. TOEP). There may be
> other aspects to your data as well (G-side random effects that should be
> incorporated).
>
> No doubt it will take time to ‘filter down’ to all users
> Output much simpler as all inferential tests in 1 table
> Can do appropriate post hoc or planned comparisons with standard errors
> correctly estimated from unstructured covariance matrix.
>
>
> That is not only true for the unstructured matrix.
>
>
> MIXED has limitation of not supplying effect sizes.
> Jason Becksted points out that on can calculate partial eta squared =
> F*df1/(F*df1+df2), where df1 is the hypothesis df and df2 is the error df.
>
>
> So did I, publicly, when you asked. And I pointed out that one would have
> to employ ML to use the ***ALTERNATIVE*** formula to obtain partial eta
> squared from a fully balanced fixed effects only design. But, I would
> question the validity of using that formula ***ABOVE*** under all
> circumstances, which is why I provided the alternative. For example, what
> if you are trying to determine the effect size of a random effect? What if
> your fixed effect predictor is at a higher level? There have been plenty of
> discussions on this matter on the multilevel listserve and in multilevel
> textbooks. I would not simply apply that formula to all circumstances. In
> fact, I would generally recommend using the second approach I
> showed. ***SPEAKING OF EFFECT SIZE, WE MUST ALSO BE CAREFUL TO DEFINE WHAT
> WE MEAN BY EFFECT SIZE***
>
>
> REPEATED, no doubt ground breaking in its time [distant past], is fiddly &
> potentially misleading. Although the multivariate option uses correct
> unstructured covariance matrix, the post hocs use SEs based on
> inappropriate diagnonal covraince matrix, with GG corrections. Personally,
> have never seen a covariance matrix with all pair wise covariances equal –
> seems improbable in the real world.
>
>
> Again, there are alternatives to both extremes. It is not one versus the
> other.
>
>
> Best
>
> Diana
>
> On 16/03/2013 16:49, "R B" <[hidden email]> wrote:
>
> Diana,
>
> In order to employ a linear mixed model in SPSS, one must construct the
> dataset in vertical format, such that there are "k" cases per subject with
> an identification variable with non-repeating numbers for cases associated
> with a particular subject. Assuming the within-subjects variable is either
> nominal, ordinal, or is composed of equally-spaced intervals, it is common
> practice for the within-subjects variable to be a numeric integer variable
> with sequential values from 1 through "k" levels of the within-subjects
> variable. Finally, the response variable must be concatenated vertically
> with each measurement linked to the appropriate ID and level of the
> within-subject variable.
>
> Here is an illustration:
>
> ID Time y
> 1 1 34
> 1 2 22
> 1 3 12
> 1 4 11
> 2 1 33
> 2 2 32
> 2 3 .
> 2 4 22
> 3 1 38
> 3 2 37
> 3 3 34
> 3 4 30
> .
> .
> .
> .
>
> As you can see above, the second subject was not measured at time 3. As a
> result, that case will be excluded from the linear mixed model analysis.
> However, data obtained from other times points for that particular subject
> will be included in the analysis. The assumption we must make in order to
> obtain unbiased estimates derived from a linear mixed model is that the
> data are missing randomly. With that said, the MIXED procedure in SPSS
> calculates degrees of freedom using Satterthwaite's Approximation:
>
>
> http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.help%2Falg_mixed_custom-tests_satterthwaite.htm
>
> This approximation has been shown to be valid for balanced and unbalanced
> designs.
>
> In addition to the benefits of not having to exclude all data from
> subjects who happen to have data which are missing randomly for parameter
> estimation, the MIXED procedure allows for modeling of continuous response
> variables using various hierarchical designs and residual covariance
> structures.
>
> Ryan
> On Fri, Mar 15, 2013 at 11:46 AM, Kornbrot, Diana <
> [hidden email]> wrote:
>
> If one uses repeated in procedure GLM then it appears that all subjects
> must have vlaues for all combinations of the rpeated measures
> BUT using MIXED, there is then a non-integer error df
> How is SPSS actually handling the missing values?
> Nb Am using unstructured covariance matrix
>
> Thanks for help
> Best
> Diana
> ------------------------------
> Emeritus Professor Diana Kornbrot
> email: [hidden email] <http://[hidden email]>
> web: http://dianakornbrot.wordpress.com/
> *Work
> *Department of Psychology
> School of Life and Medical Sciences
> University of Hertfordshire
> College Lane, Hatfield, Hertfordshire AL10 9AB, UK
> voice: +44 (0) 170 728 4626 <tel:%2B44%20%280%29%20170%20728%204626>
> *Home
> *19 Elmhurst Avenue
> London N2 0LT, UK
> voice: +44 (0) 208 444 2081<tel:%2B44%20%280%29%20208%20%C2%A0444%202081>
> mobile: +44 (0) 740 318 1612 <tel:%2B44%20%280%29%20740%20318%201612>
>
>
>
>
>
> ------------------------------
> Emeritus Professor Diana Kornbrot
> email: [hidden email]
> web: http://dianakornbrot.wordpress.com/
> *Work
> *Department of Psychology
> School of Life and Medical Sciences
> University of Hertfordshire
> College Lane, Hatfield, Hertfordshire AL10 9AB, UK
> voice: +44 (0) 170 728 4626
> *Home
> *19 Elmhurst Avenue
> London N2 0LT, UK
> voice: +44 (0) 208 444 2081
> mobile: +44 (0) 740 318 1612
>
>
>

Rich Ulrich

Re: Missing values in MIXED

In reply to this post by Ryan

Ryan,
What you wrote suggests that using the covariances only
increases the power, and that we always want more power.
In that case, one might conclude that it is always "safe" to ignore
the extra power by using the unstructured alternative, since it
only sacrifices power.

This bothers me, because I doubt that it is true. It reminds me
of the assertion I have heard, that it is always "safe" to use
the grouped t-test instead of a paired test, because "you only lose
power." And for the t-test, *that* is not true. When the correlation
is negative, the error term is larger for the paired -test, and so the
paired t-test is *necessarily* the right one, by virtue of the fact that
it has less power than the grouped test.

I don't know how well the simple t-test generalizes to the structure
in question, but a negative intra-class correlation is not impossible,
when you use the proper definition of ICC. (I have seen a lousy
definition in one popular description of hierarchical analysis, which
defines its so-called ICC by an inadequate analogy. And it can't be
negative, so it is a flawed analogy.) Negative ICCs are not the most
common ones, but I did see a lecturer on HA who unwittingly stated
an example that featured it.

--
Rich Ulrich

Date: Sun, 17 Mar 2013 09:22:19 -0400
From: [hidden email]
Subject: Re: Missing values in MIXED
To: [hidden email]

Dear SPSS-L,

Ryan

--
... snip, lengthy example.

Ryan

Re: Missing values in MIXED

In reply to this post by Bruce Weaver

Hi Bruce,

That certainly makes sense to determine if there is a discernible pattern in the residual covariance matrix. In fact, I almost always begin with an unstructured matrix, and based on the pattern, decide which restrictive structures to test against the unstructured matrix. If there are multiple groups, then one might consider fitting group-specific unstructured matrices.

Best wishes,

Ryan

On Mar 17, 2013, at 3:28 PM, Bruce Weaver <[hidden email]> wrote:

> Off the top of my head, I can't say where I read it (and I don't have my
> books with me today), but I do think that at least one author I've read
> recommends always /starting/ with an unstructured residual covariance
> matrix, and imposing restrictions if/when it makes sense to do so. I wonder
> if this is the approach Diana was actually promoting.
>
> Cheers,
> Bruce
>
>
> Ryan Black wrote
>> Since Bruce pointed out a typo I made, I decided to reread my entire
>> response to the OP. I noticed another typo. In this post, I correct both
>> typos and I have decided to add another comment. All changes are
>> ***CAPITALIZED*** in the text BELOW my name. But, I also have another
>> comment to make right here:
>>
>> The reason I'm taking such an interest in this thread is that I have heard
>> this general recommendation before; that is, to always use an unstructured
>> residual-covariance matrix. I don't know if there is a textbook out there
>> that makes such a silly (at best) or dangerous (at worst) recommendation,
>> but my guess is because of the assumption that the unstructured matrix can
>> never be wrong due to the lack of restrictions. Let me make a somewhat
>> provocative statement...An unstructured
>> residual variance-covariance structure applied to ALL subjects is not
>> always the LEAST restrictive residual variance-covariance structure. I
>> realize that in the past I have even said that an unstructured matrix is
>> the least restrictive, but I should have couched that statement in the
>> context of single group designs only.
>>
>> Ryan
>> On Sun, Mar 17, 2013 at 8:40 AM, <
>
>> ryan.andrew.black@
>
>> > wrote:
>>
>>> Diana,
>>>
>>> See my comments below.
>>>
>>> On Mar 17, 2013, at 4:43 AM, "Kornbrot, Diana" <
>
>> d.e.kornbrot@.ac
>
>> >
>>> wrote:
>>>
>>> Ryan
>>>
>>> Thanks
>>> Done all that. Converting horizontal to vertical is straightforward using
>>> the data structuring wizard [don’t need syntax], once one gets the hang
>>> of
>>> it
>>>
>>> My ACTUAL question was:
>>> MIXED with data in long form can cope with missing data, with correction
>>> for denominator df
>>> GLM REPEATED insists on NO missing data
>>> So what is the difference?
>>>
>>> With the help of Bruce Weaver, I have NOW worked out that the difference
>>> lies in the covariance matrix used for estimation of parameters
>>> REPEATED applies list wise deletion and so discards any subjects that do
>>> not have values for all variables,
>>> MIXED applies pair wise deletion.
>>>
>>>
>>> That is exactly what I showed in the illustration.
>>>
>>> Suspect the reduced df is harmonic mean of df for relevant groups, but do
>>> not know
>>>
>>>
>>> No need to suspect. I provided a link to the formula for df error. I
>>> don't
>>> know what you mean by reduced.
>>>
>>>
>>> Bruce provides following useful refs that suggest that using MIXED may
>>> actually be less biased than any of a whole slew of complicated
>>> imputation
>>> procedures:
>>> Twisk & de Vente (2002): *http://europepmc.org/abstract/MED/11927199
>>> *Twisk (2003):
>>> *
>>> http://books.google.ca/books?hl=en&lr=&id=TCg02e-tI_cC&oi=fnd&pg=PR15&dq=Twisk+2003&ots=2GfodRIiu9&sig=z8BSBQoRaZNavIzj_QOeATBP_nw#v=onepage&q=Twisk%202003&f=false
>>> *Singer & Willett (/Applied Longitudinal Data Analysis/, Chapter 5).
>>>
>>> I NOW recommend MIXED with UNSTRUCTURED covariance matrix across the
>>> board.
>>>
>>>
>>> That is a poor recommendation. The goal should be to find the optimal
>>> residual variance-covariance structure. You could reduce statistical
>>> power
>>> if you employ an unstructured matrix if there is a ****MORE***
>>> restrictive
>>> structure that fits that data equally well (e.g., AR1. TOEP). There may
>>> be
>>> other aspects to your data as well (G-side random effects that should be
>>> incorporated).
>>>
>>> No doubt it will take time to ‘filter down’ to all users
>>> Output much simpler as all inferential tests in 1 table
>>> Can do appropriate post hoc or planned comparisons with standard errors
>>> correctly estimated from unstructured covariance matrix.
>>>
>>>
>>> That is not only true for the unstructured matrix.
>>>
>>>
>>> MIXED has limitation of not supplying effect sizes.
>>> Jason Becksted points out that on can calculate partial eta squared =
>>> F*df1/(F*df1+df2), where df1 is the hypothesis df and df2 is the error
>>> df.
>>>
>>>
>>> So did I, publicly, when you asked. And I pointed out that one would have
>>> to employ ML to use the ***ALTERNATIVE*** formula to obtain partial eta
>>> squared from a fully balanced fixed effects only design. But, I would
>>> question the validity of using that formula ***ABOVE*** under all
>>> circumstances, which is why I provided the alternative. For example, what
>>> if you are trying to determine the effect size of a random effect? What
>>> if
>>> your fixed effect predictor is at a higher level? There have been plenty
>>> of
>>> discussions on this matter on the multilevel listserve and in multilevel
>>> textbooks. I would not simply apply that formula to all circumstances. In
>>> fact, I would generally recommend using the second approach I
>>> showed. ***SPEAKING OF EFFECT SIZE, WE MUST ALSO BE CAREFUL TO DEFINE
>>> WHAT
>>> WE MEAN BY EFFECT SIZE***
>>>
>>>
>>> REPEATED, no doubt ground breaking in its time [distant past], is fiddly
>>> &
>>> potentially misleading. Although the multivariate option uses correct
>>> unstructured covariance matrix, the post hocs use SEs based on
>>> inappropriate diagnonal covraince matrix, with GG corrections.
>>> Personally,
>>> have never seen a covariance matrix with all pair wise covariances equal
>>> –
>>> seems improbable in the real world.
>>>
>>>
>>> Again, there are alternatives to both extremes. It is not one versus the
>>> other.
>>>
>>>
>>> Best
>>>
>>> Diana
>>>
>>> On 16/03/2013 16:49, "R B" <
>
>> ryan.andrew.black@
>
>> > wrote:
>>>
>>> Diana,
>>>
>>> In order to employ a linear mixed model in SPSS, one must construct the
>>> dataset in vertical format, such that there are "k" cases per subject
>>> with
>>> an identification variable with non-repeating numbers for cases
>>> associated
>>> with a particular subject. Assuming the within-subjects variable is
>>> either
>>> nominal, ordinal, or is composed of equally-spaced intervals, it is
>>> common
>>> practice for the within-subjects variable to be a numeric integer
>>> variable
>>> with sequential values from 1 through "k" levels of the within-subjects
>>> variable. Finally, the response variable must be concatenated vertically
>>> with each measurement linked to the appropriate ID and level of the
>>> within-subject variable.
>>>
>>> Here is an illustration:
>>>
>>> ID Time y
>>> 1 1 34
>>> 1 2 22
>>> 1 3 12
>>> 1 4 11
>>> 2 1 33
>>> 2 2 32
>>> 2 3 .
>>> 2 4 22
>>> 3 1 38
>>> 3 2 37
>>> 3 3 34
>>> 3 4 30
>>> .
>>> .
>>> .
>>> .
>>>
>>> As you can see above, the second subject was not measured at time 3. As a
>>> result, that case will be excluded from the linear mixed model analysis.
>>> However, data obtained from other times points for that particular
>>> subject
>>> will be included in the analysis. The assumption we must make in order to
>>> obtain unbiased estimates derived from a linear mixed model is that the
>>> data are missing randomly. With that said, the MIXED procedure in SPSS
>>> calculates degrees of freedom using Satterthwaite's Approximation:
>>>
>>>
>>> http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.help%2Falg_mixed_custom-tests_satterthwaite.htm
>>>
>>> This approximation has been shown to be valid for balanced and unbalanced
>>> designs.
>>>
>>> In addition to the benefits of not having to exclude all data from
>>> subjects who happen to have data which are missing randomly for parameter
>>> estimation, the MIXED procedure allows for modeling of continuous
>>> response
>>> variables using various hierarchical designs and residual covariance
>>> structures.
>>>
>>> Ryan
>>> On Fri, Mar 15, 2013 at 11:46 AM, Kornbrot, Diana <
>
>> d.e.kornbrot@.ac
>
>>> wrote:
>>>
>>> If one uses repeated in procedure GLM then it appears that all subjects
>>> must have vlaues for all combinations of the rpeated measures
>>> BUT using MIXED, there is then a non-integer error df
>>> How is SPSS actually handling the missing values?
>>> Nb Am using unstructured covariance matrix
>>>
>>> Thanks for help
>>> Best
>>> Diana
>>> ------------------------------
>>> Emeritus Professor Diana Kornbrot
>>> email:
>
>> d.e.kornbrot@.ac
>
>> <http://
>
>> d.e.kornbrot@.ac
>
>> >
>>> web: http://dianakornbrot.wordpress.com/
>>> *Work
>>> *Department of Psychology
>>> School of Life and Medical Sciences
>>> University of Hertfordshire
>>> College Lane, Hatfield, Hertfordshire AL10 9AB, UK
>>> voice: +44 (0) 170 728 4626
>>> <tel:%2B44%20%280%29%20170%20728%204626>
>>> *Home
>>> *19 Elmhurst Avenue
>>> London N2 0LT, UK
>>> voice: +44 (0) 208 444
>>> 2081<tel:%2B44%20%280%29%20208%20%C2%A0444%202081>
>>> mobile: +44 (0) 740 318 1612
>>> <tel:%2B44%20%280%29%20740%20318%201612>
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------
>>> Emeritus Professor Diana Kornbrot
>>> email:
>
>> d.e.kornbrot@.ac
>
>>> web: http://dianakornbrot.wordpress.com/
>>> *Work
>>> *Department of Psychology
>>> School of Life and Medical Sciences
>>> University of Hertfordshire
>>> College Lane, Hatfield, Hertfordshire AL10 9AB, UK
>>> voice: +44 (0) 170 728 4626
>>> *Home
>>> *19 Elmhurst Avenue
>>> London N2 0LT, UK
>>> voice: +44 (0) 208 444 2081
>>> mobile: +44 (0) 740 318 1612
>
>
>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Missing-values-in-MIXED-tp5718714p5718770.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

Maguin, Eugene

Re: Missing values in MIXED

Ryan,

' ... If there are multiple groups, then one might consider fitting group-specific unstructured matrices.'

How would you do that? Just to keep the discussion context clear, we have been talking about a mixed analysis that includes a repeated statement and not one that includes only a random statement. True?

Thanks, Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of SUBSCRIBE SPSSX-L Anonymous
Sent: Sunday, March 17, 2013 4:44 PM
To: [hidden email]
Subject: Re: Missing values in MIXED

Hi Bruce,

That certainly makes sense to determine if there is a discernible pattern in the residual covariance matrix. In fact, I almost always begin with an unstructured matrix, and based on the pattern, decide which restrictive structures to test against the unstructured matrix. If there are multiple groups, then one might consider fitting group-specific unstructured matrices.

Best wishes,

Ryan

On Mar 17, 2013, at 3:28 PM, Bruce Weaver <[hidden email]> wrote:

> Off the top of my head, I can't say where I read it (and I don't have
> my books with me today), but I do think that at least one author I've
> read recommends always /starting/ with an unstructured residual
> covariance matrix, and imposing restrictions if/when it makes sense to
> do so. I wonder if this is the approach Diana was actually promoting.
>
> Cheers,
> Bruce
>
>
> Ryan Black wrote
>> Since Bruce pointed out a typo I made, I decided to reread my entire
>> response to the OP. I noticed another typo. In this post, I correct
>> both typos and I have decided to add another comment. All changes are
>> ***CAPITALIZED*** in the text BELOW my name. But, I also have another
>> comment to make right here:
>>
>> The reason I'm taking such an interest in this thread is that I have
>> heard this general recommendation before; that is, to always use an
>> unstructured residual-covariance matrix. I don't know if there is a
>> textbook out there that makes such a silly (at best) or dangerous (at
>> worst) recommendation, but my guess is because of the assumption that
>> the unstructured matrix can never be wrong due to the lack of
>> restrictions. Let me make a somewhat provocative statement...An
>> unstructured residual variance-covariance structure applied to ALL
>> subjects is not always the LEAST restrictive residual
>> variance-covariance structure. I realize that in the past I have even
>> said that an unstructured matrix is the least restrictive, but I
>> should have couched that statement in the context of single group designs only.
>>
>> Ryan
>> On Sun, Mar 17, 2013 at 8:40 AM, <
>
>> ryan.andrew.black@
>
>> > wrote:
>>
>>> Diana,
>>>
>>> See my comments below.
>>>
>>> On Mar 17, 2013, at 4:43 AM, "Kornbrot, Diana" <
>
>> d.e.kornbrot@.ac
>
>> >
>>> wrote:
>>>
>>> Ryan
>>>
>>> Thanks
>>> Done all that. Converting horizontal to vertical is straightforward
>>> using the data structuring wizard [don’t need syntax], once one gets
>>> the hang of it
>>>
>>> My ACTUAL question was:
>>> MIXED with data in long form can cope with missing data, with
>>> correction for denominator df GLM REPEATED insists on NO missing
>>> data So what is the difference?
>>>
>>> With the help of Bruce Weaver, I have NOW worked out that the
>>> difference lies in the covariance matrix used for estimation of
>>> parameters REPEATED applies list wise deletion and so discards any
>>> subjects that do not have values for all variables, MIXED applies
>>> pair wise deletion.
>>>
>>>
>>> That is exactly what I showed in the illustration.
>>>
>>> Suspect the reduced df is harmonic mean of df for relevant groups,
>>> but do not know
>>>
>>>
>>> No need to suspect. I provided a link to the formula for df error. I
>>> don't know what you mean by reduced.
>>>
>>>
>>> Bruce provides following useful refs that suggest that using MIXED
>>> may actually be less biased than any of a whole slew of complicated
>>> imputation
>>> procedures:
>>> Twisk & de Vente (2002):
>>> *http://europepmc.org/abstract/MED/11927199
>>> *Twisk (2003):
>>> *
>>> http://books.google.ca/books?hl=en&lr=&id=TCg02e-tI_cC&oi=fnd&pg=PR1
>>> 5&dq=Twisk+2003&ots=2GfodRIiu9&sig=z8BSBQoRaZNavIzj_QOeATBP_nw#v=one
>>> page&q=Twisk%202003&f=false *Singer & Willett (/Applied Longitudinal
>>> Data Analysis/, Chapter 5).
>>>
>>> I NOW recommend MIXED with UNSTRUCTURED covariance matrix across the
>>> board.
>>>
>>>
>>> That is a poor recommendation. The goal should be to find the
>>> optimal residual variance-covariance structure. You could reduce
>>> statistical power if you employ an unstructured matrix if there is a
>>> ****MORE*** restrictive structure that fits that data equally well
>>> (e.g., AR1. TOEP). There may be other aspects to your data as well
>>> (G-side random effects that should be incorporated).
>>>
>>> No doubt it will take time to ‘filter down’ to all users Output much
>>> simpler as all inferential tests in 1 table Can do appropriate post
>>> hoc or planned comparisons with standard errors correctly estimated
>>> from unstructured covariance matrix.
>>>
>>>
>>> That is not only true for the unstructured matrix.
>>>
>>>
>>> MIXED has limitation of not supplying effect sizes.
>>> Jason Becksted points out that on can calculate partial eta squared
>>> = F*df1/(F*df1+df2), where df1 is the hypothesis df and df2 is the
>>> error df.
>>>
>>>
>>> So did I, publicly, when you asked. And I pointed out that one would
>>> have to employ ML to use the ***ALTERNATIVE*** formula to obtain
>>> partial eta squared from a fully balanced fixed effects only design.
>>> But, I would question the validity of using that formula ***ABOVE***
>>> under all circumstances, which is why I provided the alternative.
>>> For example, what if you are trying to determine the effect size of
>>> a random effect? What if your fixed effect predictor is at a higher
>>> level? There have been plenty of discussions on this matter on the
>>> multilevel listserve and in multilevel textbooks. I would not simply
>>> apply that formula to all circumstances. In fact, I would generally
>>> recommend using the second approach I showed. ***SPEAKING OF EFFECT
>>> SIZE, WE MUST ALSO BE CAREFUL TO DEFINE WHAT WE MEAN BY EFFECT
>>> SIZE***
>>>
>>>
>>> REPEATED, no doubt ground breaking in its time [distant past], is
>>> fiddly & potentially misleading. Although the multivariate option
>>> uses correct unstructured covariance matrix, the post hocs use SEs
>>> based on inappropriate diagnonal covraince matrix, with GG
>>> corrections.
>>> Personally,
>>> have never seen a covariance matrix with all pair wise covariances
>>> equal – seems improbable in the real world.
>>>
>>>
>>> Again, there are alternatives to both extremes. It is not one versus
>>> the other.
>>>
>>>
>>> Best
>>>
>>> Diana
>>>
>>> On 16/03/2013 16:49, "R B" <
>
>> ryan.andrew.black@
>
>> > wrote:
>>>
>>> Diana,
>>>
>>> In order to employ a linear mixed model in SPSS, one must construct
>>> the dataset in vertical format, such that there are "k" cases per
>>> subject with an identification variable with non-repeating numbers
>>> for cases associated with a particular subject. Assuming the
>>> within-subjects variable is either nominal, ordinal, or is composed
>>> of equally-spaced intervals, it is common practice for the
>>> within-subjects variable to be a numeric integer variable with
>>> sequential values from 1 through "k" levels of the within-subjects
>>> variable. Finally, the response variable must be concatenated
>>> vertically with each measurement linked to the appropriate ID and
>>> level of the within-subject variable.
>>>
>>> Here is an illustration:
>>>
>>> ID Time y
>>> 1 1 34
>>> 1 2 22
>>> 1 3 12
>>> 1 4 11
>>> 2 1 33
>>> 2 2 32
>>> 2 3 .
>>> 2 4 22
>>> 3 1 38
>>> 3 2 37
>>> 3 3 34
>>> 3 4 30
>>> .
>>> .
>>> .
>>> .
>>>
>>> As you can see above, the second subject was not measured at time 3.
>>> As a result, that case will be excluded from the linear mixed model analysis.
>>> However, data obtained from other times points for that particular
>>> subject will be included in the analysis. The assumption we must
>>> make in order to obtain unbiased estimates derived from a linear
>>> mixed model is that the data are missing randomly. With that said,
>>> the MIXED procedure in SPSS calculates degrees of freedom using
>>> Satterthwaite's Approximation:
>>>
>>>
>>> http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?
>>> topic=%2Fcom.ibm.spss.statistics.help%2Falg_mixed_custom-tests_satte
>>> rthwaite.htm
>>>
>>> This approximation has been shown to be valid for balanced and
>>> unbalanced designs.
>>>
>>> In addition to the benefits of not having to exclude all data from
>>> subjects who happen to have data which are missing randomly for
>>> parameter estimation, the MIXED procedure allows for modeling of
>>> continuous response variables using various hierarchical designs and
>>> residual covariance structures.
>>>
>>> Ryan
>>> On Fri, Mar 15, 2013 at 11:46 AM, Kornbrot, Diana <
>
>> d.e.kornbrot@.ac
>
>>> wrote:
>>>
>>> If one uses repeated in procedure GLM then it appears that all
>>> subjects must have vlaues for all combinations of the rpeated
>>> measures BUT using MIXED, there is then a non-integer error df How
>>> is SPSS actually handling the missing values?
>>> Nb Am using unstructured covariance matrix
>>>
>>> Thanks for help
>>> Best
>>> Diana
>>> ------------------------------
>>> Emeritus Professor Diana Kornbrot
>>> email:
>
>> d.e.kornbrot@.ac
>
>> <http://
>
>> d.e.kornbrot@.ac
>
>> >
>>> web: http://dianakornbrot.wordpress.com/
>>> *Work
>>> *Department of Psychology
>>> School of Life and Medical Sciences
>>> University of Hertfordshire
>>> College Lane, Hatfield, Hertfordshire AL10 9AB, UK
>>> voice: +44 (0) 170 728 4626
>>> <tel:%2B44%20%280%29%20170%20728%204626>
>>> *Home
>>> *19 Elmhurst Avenue
>>> London N2 0LT, UK
>>> voice: +44 (0) 208 444
>>> 2081<tel:%2B44%20%280%29%20208%20%C2%A0444%202081>
>>> mobile: +44 (0) 740 318 1612
>>> <tel:%2B44%20%280%29%20740%20318%201612>
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------
>>> Emeritus Professor Diana Kornbrot
>>> email:
>
>> d.e.kornbrot@.ac
>
>>> web: http://dianakornbrot.wordpress.com/
>>> *Work
>>> *Department of Psychology
>>> School of Life and Medical Sciences
>>> University of Hertfordshire
>>> College Lane, Hatfield, Hertfordshire AL10 9AB, UK
>>> voice: +44 (0) 170 728 4626
>>> *Home
>>> *19 Elmhurst Avenue
>>> London N2 0LT, UK
>>> voice: +44 (0) 208 444 2081
>>> mobile: +44 (0) 740 318 1612
>
>
>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/Missing-values-in-MIXED-
> tp5718714p5718770.html Sent from the SPSSX Discussion mailing list
> archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except
> the command. To leave the list, send the command SIGNOFF SPSSX-L For a
> list of commands to manage subscriptions, send the command INFO
> REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Ryan

Re: Missing values in MIXED

In reply to this post by Rich Ulrich

Rich,

Did you miss my rationale for determining the optimal var-cov matrix as a rationale for parsimony? Yes, it could increase power, but that was not the thrust of my point.

Ryan

Sent from my iPhone

On Mar 17, 2013, at 4:30 PM, Rich Ulrich <[hidden email]> wrote:

Ryan,
What you wrote suggests that using the covariances only
increases the power, and that we always want more power.
In that case, one might conclude that it is always "safe" to ignore
the extra power by using the unstructured alternative, since it
only sacrifices power.

This bothers me, because I doubt that it is true. It reminds me
of the assertion I have heard, that it is always "safe" to use
the grouped t-test instead of a paired test, because "you only lose
power." And for the t-test, *that* is not true. When the correlation
is negative, the error term is larger for the paired -test, and so the
paired t-test is *necessarily* the right one, by virtue of the fact that
it has less power than the grouped test.

I don't know how well the simple t-test generalizes to the structure
in question, but a negative intra-class correlation is not impossible,
when you use the proper definition of ICC. (I have seen a lousy
definition in one popular description of hierarchical analysis, which
defines its so-called ICC by an inadequate analogy. And it can't be
negative, so it is a flawed analogy.) Negative ICCs are not the most
common ones, but I did see a lecturer on HA who unwittingly stated
an example that featured it.

--
Rich Ulrich

Date: Sun, 17 Mar 2013 09:22:19 -0400
From: [hidden email]
Subject: Re: Missing values in MIXED
To: [hidden email]

Dear SPSS-L,

Diana made a bold statement that under all circumstances one should employ a residual unstructured variance-covariance structure. Let me dispel that myth immediately. Run the code BELOW, and note that by employing a likelihood ratio test we observe that the first-order autoregressive structure is fitting the data equally well to the unstructured residual matrix. If the objective in science is to obtain the most parsimonious model that best explains the phenomenon, why would we not apply the same rule when building statistical models?

Second, by using the more parsimonious model (first-order autoregressive residaul structure), for the illustration below, take note that one obtains a statistically more powerful test of the fixed effect of time. In fact, by employing an unstructured residual matrix the fixed effect of time is not significant at alpha=.05, whereas the fixed effect for time is significant at alpha=.05 for the first-order autoregressive matrix.

This is one of many examples I could have simulated where using the general recommendation that Diana made to only use an unstructured matrix will result in not only poor science, but differential conclusions.

Ryan
--
... snip, lengthy example.

Rich Ulrich

Re: Missing values in MIXED

I followed the rationale that it was better science and wiser.

I missed it, if you anywhere suggested that the wrong var-cov matrix can,
when the ICC is negative (one case is when scores add to a near-fixed
total), give a test that is bad because it rejects too often.

--
Rich Ulrich

Date: Sun, 17 Mar 2013 17:01:30 -0400
From: [hidden email]
Subject: Re: Missing values in MIXED
To: [hidden email]

Rich,

Did you miss my rationale for determining the optimal var-cov matrix as a rationale for parsimony? Yes, it could increase power, but that was not the thrust of my point.

Ryan

Sent from my iPhone

On Mar 17, 2013, at 4:30 PM, Rich Ulrich <[hidden email]> wrote:

Ryan,
What you wrote suggests that using the covariances only
increases the power, and that we always want more power.
In that case, one might conclude that it is always "safe" to ignore
the extra power by using the unstructured alternative, since it
only sacrifices power.

This bothers me, because I doubt that it is true. It reminds me
of the assertion I have heard, that it is always "safe" to use
the grouped t-test instead of a paired test, because "you only lose
power." And for the t-test, *that* is not true. When the correlation
is negative, the error term is larger for the paired -test, and so the
paired t-test is *necessarily* the right one, by virtue of the fact that
it has less power than the grouped test.

I don't know how well the simple t-test generalizes to the structure
in question, but a negative intra-class correlation is not impossible,
when you use the proper definition of ICC. (I have seen a lousy
definition in one popular description of hierarchical analysis, which
defines its so-called ICC by an inadequate analogy. And it can't be
negative, so it is a flawed analogy.) Negative ICCs are not the most
common ones, but I did see a lecturer on HA who unwittingly stated
an example that featured it.

--
Rich Ulrich

Date: Sun, 17 Mar 2013 09:22:19 -0400
From: [hidden email]
Subject: Re: Missing values in MIXED
To: [hidden email]

Dear SPSS-L,

Diana made a bold statement that under all circumstances one should employ a residual unstructured variance-covariance structure. Let me dispel that myth immediately. Run the code BELOW, and note that by employing a likelihood ratio test we observe that the first-order autoregressive structure is fitting the data equally well to the unstructured residual matrix. If the objective in science is to obtain the most parsimonious model that best explains the phenomenon, why would we not apply the same rule when building statistical models?

Second, by using the more parsimonious model (first-order autoregressive residaul structure), for the illustration below, take note that one obtains a statistically more powerful test of the fixed effect of time. In fact, by employing an unstructured residual matrix the fixed effect of time is not significant at alpha=.05, whereas the fixed effect for time is significant at alpha=.05 for the first-order autoregressive matrix.

This is one of many examples I could have simulated where using the general recommendation that Diana made to only use an unstructured matrix will result in not only poor science, but differential conclusions.

Ryan
--
... snip, lengthy example.

Ryan

Re: Missing values in MIXED

Rich,

You are asking a question unrelated to my general recommendation of avoiding model overfitting with respect to the residual variance-covariance matrix (R-side random effects) in the context of a linear mixed model. I had made reasonable assumptions when making these recommendations.

I will respond to your comment in a new thread, as I see this issue as off-topic enough to deserve its own thread.

Ryan

On Mar 18, 2013, at 1:46 AM, Rich Ulrich <[hidden email]> wrote:

I followed the rationale that it was better science and wiser.

I missed it, if you anywhere suggested that the wrong var-cov matrix can,
when the ICC is negative (one case is when scores add to a near-fixed
total), give a test that is bad because it rejects too often.

--
Rich Ulrich

Date: Sun, 17 Mar 2013 17:01:30 -0400
From: [hidden email]
Subject: Re: Missing values in MIXED
To: [hidden email]

Rich,

Did you miss my rationale for determining the optimal var-cov matrix as a rationale for parsimony? Yes, it could increase power, but that was not the thrust of my point.

Ryan

Sent from my iPhone

On Mar 17, 2013, at 4:30 PM, Rich Ulrich <[hidden email]> wrote:

Ryan,
What you wrote suggests that using the covariances only
increases the power, and that we always want more power.
In that case, one might conclude that it is always "safe" to ignore
the extra power by using the unstructured alternative, since it
only sacrifices power.

This bothers me, because I doubt that it is true. It reminds me
of the assertion I have heard, that it is always "safe" to use
the grouped t-test instead of a paired test, because "you only lose
power." And for the t-test, *that* is not true. When the correlation
is negative, the error term is larger for the paired -test, and so the
paired t-test is *necessarily* the right one, by virtue of the fact that
it has less power than the grouped test.

I don't know how well the simple t-test generalizes to the structure
in question, but a negative intra-class correlation is not impossible,
when you use the proper definition of ICC. (I have seen a lousy
definition in one popular description of hierarchical analysis, which
defines its so-called ICC by an inadequate analogy. And it can't be
negative, so it is a flawed analogy.) Negative ICCs are not the most
common ones, but I did see a lecturer on HA who unwittingly stated
an example that featured it.

--
Rich Ulrich

Date: Sun, 17 Mar 2013 09:22:19 -0400
From: [hidden email]
Subject: Re: Missing values in MIXED
To: [hidden email]

Dear SPSS-L,

Diana made a bold statement that under all circumstances one should employ a residual unstructured variance-covariance structure. Let me dispel that myth immediately. Run the code BELOW, and note that by employing a likelihood ratio test we observe that the first-order autoregressive structure is fitting the data equally well to the unstructured residual matrix. If the objective in science is to obtain the most parsimonious model that best explains the phenomenon, why would we not apply the same rule when building statistical models?

Second, by using the more parsimonious model (first-order autoregressive residaul structure), for the illustration below, take note that one obtains a statistically more powerful test of the fixed effect of time. In fact, by employing an unstructured residual matrix the fixed effect of time is not significant at alpha=.05, whereas the fixed effect for time is significant at alpha=.05 for the first-order autoregressive matrix.

This is one of many examples I could have simulated where using the general recommendation that Diana made to only use an unstructured matrix will result in not only poor science, but differential conclusions.

Ryan
--
... snip, lengthy example.

Kornbrot, Diana

Update on MIXED

In reply to this post by Ryan

Update on MIXED This is a really useful procedure.
It is actually simpler to use than GLM univariate + GLM repeated for researchers {[sychology, education, biology, maketing, etc.]who are not interested in gory statistical.
Advantages
1 procedure whether repeated or not
Same look and feel output for within and between groups
Makes less assumptions, e.g. Can take care of factorial unequal variance
Useful information for dialogue users, and possibly script also

the sub-command Œrepeated¹ is misleadingly named. It really should be named Œcovariance¹, as if a variable is included in repeated than on can ask for a covaraince structure. In particular for a between group gactor this enables different varainces for each group. [discovered this when jon peck gave the syntax
personally I recommend always specifying Œunstructured¹ for covariance, as this makes fewest assumptions but other options easily available
it IS possible to have pivot table rather than model viewer output in MIXED generalized linear. This enable export of results in a single operation. However this option is in the preferences [mac], presumably options [windows] under the OUTPUT tab. [as far as I can see it is not in the syntax for print options]

Useful locations, os don¹t waste time on ghastly ibm web site
http://www-933.ibm.com/support/fixcentral/options for patches