BUT using MIXED, there is then a non-integer error df How is SPSS actually handling the missing values? Nb Am using unstructured covariance matrix Thanks for help Best Diana Emeritus Professor Diana Kornbrot email: d.e.kornbrot@... web: http://dianakornbrot.wordpress.com/ Work Department of Psychology School of Life and Medical Sciences University of Hertfordshire College Lane, Hatfield, Hertfordshire AL10 9AB, UK voice: +44 (0) 170 728 4626 Home 19 Elmhurst Avenue London N2 0LT, UK voice: +44 (0) 208 444 2081 mobile: +44 (0) 740 318 1612 |
Administrator
|
Hello Diana. I don't have a direct answer to your question, but I do have some pointers to material that may be helpful.
1. Singer & Willett (Applied Longitudinal Data Analysis, Chapter 5) talk about missing data in the multilevel model for change. 2. Twisk (Applied Multilevel Analysis, 2006, p. 107) says this: "However, when applying multilevel analysis to longitudinal data, there is no need to have a 'complete' dataset, and furthermore, it has been shown that multilevel analysis is very flexible in handling missing data. It has even been shown that applying multilevel analysis to an incomplete dataset is even better than applying imputation methods (Twisk and de Vente, 2002; Twisk, 2003)." Twisk & de Vente (2002): http://europepmc.org/abstract/MED/11927199 Twisk (2003): http://books.google.ca/books?hl=en&lr=&id=TCg02e-tI_cC&oi=fnd&pg=PR15&dq=Twisk+2003&ots=2GfodRIiu9&sig=z8BSBQoRaZNavIzj_QOeATBP_nw#v=onepage&q=Twisk%202003&f=false HTH. Cheers, Bruce
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Kornbrot, Diana
Diana, In order to employ a linear mixed model in SPSS, one must construct the dataset in vertical format, such that there are "k" cases per subject with an identification variable with non-repeating numbers for cases associated with a particular subject. Assuming the within-subjects variable is either nominal, ordinal, or is composed of equally-spaced intervals, it is common practice for the within-subjects variable to be a numeric integer variable with sequential values from 1 through "k" levels of the within-subjects variable. Finally, the response variable must be concatenated vertically with each measurement linked to the appropriate ID and level of the within-subject variable.
Here is an illustration: ID Time y
1 1 34 1 2 22 1 3 12 1 4 11 2 1 33 2 2 32 2 3 . 2 4 22 3 1 38 3 2 37 3 3 34 3 4 30 . . . . As you can see above, the second subject was not measured at time 3. As a result, that case will be excluded from the linear mixed model analysis. However, data obtained from other times points for that particular subject will be included in the analysis. The assumption we must make in order to obtain unbiased estimates derived from a linear mixed model is that the data are missing randomly. With that said, the MIXED procedure in SPSS calculates degrees of freedom using Satterthwaite's Approximation: This approximation has been shown to be valid for balanced and unbalanced designs. In addition to the benefits of not having to exclude all data from subjects who happen to have data which are missing randomly for parameter estimation, the MIXED procedure allows for modeling of continuous response variables using various hierarchical designs and residual covariance structures.
Ryan On Fri, Mar 15, 2013 at 11:46 AM, Kornbrot, Diana <[hidden email]> wrote:
|
Thanks Done all that. Converting horizontal to vertical is straightforward using the data structuring wizard [don’t need syntax], once one gets the hang of it My ACTUAL question was: MIXED with data in long form can cope with missing data, with correction for denominator df GLM REPEATED insists on NO missing data So what is the difference? With the help of Bruce Weaver, I have NOW worked out that the difference lies in the covariance matrix used for estimation of parameters REPEATED applies list wise deletion and so discards any subjects that do not have values for all variables, MIXED applies pair wise deletion. Suspect the reduced df is harmonic mean of df for relevant groups, but do not know Bruce provides following useful refs that suggest that using MIXED may actually be less biased than any of a whole slew of complicated imputation procedures: Twisk & de Vente (2002): http://europepmc.org/abstract/MED/11927199 Twisk (2003): http://books.google.ca/books?hl=en&lr=&id=TCg02e-tI_cC&oi=fnd&pg=PR15&dq=Twisk+2003&ots=2GfodRIiu9&sig=z8BSBQoRaZNavIzj_QOeATBP_nw#v=onepage&q=Twisk%202003&f=false Singer & Willett (/Applied Longitudinal Data Analysis/, Chapter 5). I NOW recommend MIXED with UNSTRUCTURED covariance matrix across the board. No doubt it will take time to ‘filter down’ to all users Output much simpler as all inferential tests in 1 table Can do appropriate post hoc or planned comparisons with standard errors correctly estimated from unstructured covariance matrix. MIXED has limitation of not supplying effect sizes. Jason Becksted points out that on can calculate partial eta squared = F*df1/(F*df1+df2), where df1 is the hypothesis df and df2 is the error df. REPEATED, no doubt ground breaking in its time [distant past], is fiddly & potentially misleading. Although the multivariate option uses correct unstructured covariance matrix, the post hocs use SEs based on inappropriate diagnonal covraince matrix, with GG corrections. Personally, have never seen a covariance matrix with all pair wise covariances equal – seems improbable in the real world. Best Diana On 16/03/2013 16:49, "R B" <ryan.andrew.black@...> wrote: Diana, Emeritus Professor Diana Kornbrot email: d.e.kornbrot@... web: http://dianakornbrot.wordpress.com/ Work Department of Psychology School of Life and Medical Sciences University of Hertfordshire College Lane, Hatfield, Hertfordshire AL10 9AB, UK voice: +44 (0) 170 728 4626 Home 19 Elmhurst Avenue London N2 0LT, UK voice: +44 (0) 208 444 2081 mobile: +44 (0) 740 318 1612 |
Diana, See my comments below.
That is a poor recommendation. The goal should be to find the optimal residual variance-covariance structure. You could reduce statistical power if you employ an unstructured matrix if there is a less restrictive structure that fits that data equally well (e.g., AR1. TOEP). There may be other aspects to your data as well (G-side random effects that should be incorporated).
|
Administrator
|
Ryan, I think you meant to say more restrictive below, did you not? I.e., unstructured imposes no restrictions on the covariance matrix, and therefore uses up more df. When you use another structure that fits the data reasonably well, you impose restrictions that buy you some df.
Cheers, Bruce
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Ryan
Dear SPSS-L, Diana made a bold statement that under all circumstances one should employ a residual unstructured variance-covariance structure. Let me dispel that myth immediately. Run the code BELOW, and note that by employing a likelihood ratio test we observe that the first-order autoregressive structure is fitting the data equally well to the unstructured residual matrix. If the objective in science is to obtain the most parsimonious model that best explains the phenomenon, why would we not apply the same rule when building statistical models?
Second, by using the more parsimonious model (first-order autoregressive residaul structure), for the illustration below, take note that one obtains a statistically more powerful test of the fixed effect of time. In fact, by employing an unstructured residual matrix the fixed effect of time is not significant at alpha=.05, whereas the fixed effect for time is significant at alpha=.05 for the first-order autoregressive matrix.
This is one of many examples I could have simulated where using the general recommendation that Diana made to only use an unstructured matrix will result in not only poor science, but differential conclusions.
Ryan -- *Generate Data for Mixed Model with AR1 specification. set seed 98734523. new file. inp pro. compute subject=-99.
compute time = -99. compute x1 = -99. compute x2 = -99. compute x3 = -99. compute e1 = -99. compute e2 = -99. compute e3 = -99. compute sigma = 1.
compute rho = 0.50. compute a11 = 1. compute a21 = rho. compute a31 = rho**2. compute a22 = sqrt(1 - rho**2). compute a32 = rho*sqrt(1 - rho**2).
compute a33 = sqrt(1 - rho**2). leave subject to a33. loop subject= 1 to 100. compute x1 = rv.normal(0,1). compute x2 = rv.normal(0,1).
compute x3 = rv.normal(0,1). compute e1 = sigma * a11*x1. compute e2 = sigma * (a21*x1 + a22*x2). compute e3 = sigma * (a31*x1 + a32*x2 + a33*x3). loop time = 1 to 3. compute y = 1.5 + 0.20*(time=1) + 0.22*(time=2) + e1*(time=1) + e2*(time=2) + e3*(time=3). end case. end loop. end loop.
end file. end inp pro. exe. delete variables x1 x2 x3 sigma rho a11 a21 a31 a22 a32 a33 e1 e2 e3. MIXED y BY time /FIXED=time | SSTYPE(3)
/METHOD=REML /PRINT=R SOLUTION /REPEATED=time | SUBJECT(subject) COVTYPE(UN). MIXED y BY time /FIXED=time | SSTYPE(3) /METHOD=REML
/PRINT=R SOLUTION /REPEATED=time | SUBJECT(subject) COVTYPE(AR1). compute deviance_difference = 804.669655 - 801.255252. compute deviance_p_value = 1 - CDF.CHISQ(deviance_difference,4).
execute. On Sun, Mar 17, 2013 at 8:40 AM, <[hidden email]> wrote: > > Diana, >
> See my comments below. > > On Mar 17, 2013, at 4:43 AM, "Kornbrot, Diana" <[hidden email]> wrote: >
> Ryan > > Thanks > Done all that. Converting horizontal to vertical is straightforward using the data structuring wizard [don’t need syntax], once one gets the hang of it
> > My ACTUAL question was: > MIXED with data in long form can cope with missing data, with correction for denominator df > GLM REPEATED insists on NO missing data
> So what is the difference? > > With the help of Bruce Weaver, I have NOW worked out that the difference lies in the covariance matrix used for estimation of parameters > REPEATED applies list wise deletion and so discards any subjects that do not have values for all variables,
> MIXED applies pair wise deletion. > > > That is exactly what I showed in the illustration. > > Suspect the reduced df is harmonic mean of df for relevant groups, but do not know
> > > No need to suspect. I provided a link to the formula for df error. I don't know what you mean by reduced. > > > Bruce provides following useful refs that suggest that using MIXED may actually be less biased than any of a whole slew of complicated imputation procedures:
> Twisk & de Vente (2002): http://europepmc.org/abstract/MED/11927199 > Twisk (2003):
> Singer & Willett (/Applied Longitudinal Data Analysis/, Chapter 5). > > I NOW recommend MIXED with UNSTRUCTURED covariance matrix across the board. > >
> That is a poor recommendation. The goal should be to find the optimal residual variance-covariance structure. You could reduce statistical power if you employ an unstructured matrix if there is a less restrictive structure that fits that data equally well (e.g., AR1. TOEP). There may be other aspects to your data as well (G-side random effects that should be incorporated).
> > No doubt it will take time to ‘filter down’ to all users > Output much simpler as all inferential tests in 1 table > Can do appropriate post hoc or planned comparisons with standard errors correctly estimated from unstructured covariance matrix.
> > > That is not only true for the unstructured matrix. > > > MIXED has limitation of not supplying effect sizes. > Jason Becksted points out that on can calculate partial eta squared = F*df1/(F*df1+df2), where df1 is the hypothesis df and df2 is the error df.
> > > So did I, publicly, when you asked. And I pointed out that one would have to employ ML to use that same formula to obtain partial eta squared from a fully balanced fixed effects only design. But, I would question the validity of using that formula under all circumstances, which is why I provided the alternative. For example, what if you are trying to determine the effect size of a random effect? What if your fixed effect predictor is at a higher level? There have been plenty of discussions on this matter on the multilevel listserve and in multilevel textbooks. I would not simply apply that formula to all circumstances. In fact, I would generally recommend using the second approach I showed.
> > > REPEATED, no doubt ground breaking in its time [distant past], is fiddly & potentially misleading. Although the multivariate option uses correct unstructured covariance matrix, the post hocs use SEs based on inappropriate diagnonal covraince matrix, with GG corrections. Personally, have never seen a covariance matrix with all pair wise covariances equal – seems improbable in the real world.
> > > Again, there are alternatives to both extremes. It is not one versus the other. > > > Best > > Diana >
> On 16/03/2013 16:49, "R B" <[hidden email]> wrote: > > Diana, > > In order to employ a linear mixed model in SPSS, one must construct the dataset in vertical format, such that there are "k" cases per subject with an identification variable with non-repeating numbers for cases associated with a particular subject. Assuming the within-subjects variable is either nominal, ordinal, or is composed of equally-spaced intervals, it is common practice for the within-subjects variable to be a numeric integer variable with sequential values from 1 through "k" levels of the within-subjects variable. Finally, the response variable must be concatenated vertically with each measurement linked to the appropriate ID and level of the within-subject variable.
> > Here is an illustration: > > ID Time y > 1 1 34 > 1 2 22 > 1 3 12 > 1 4 11 > 2 1 33
> 2 2 32 > 2 3 . > 2 4 22 > 3 1 38 > 3 2 37 > 3 3 34 > 3 4 30 > . > .
> . > . > > As you can see above, the second subject was not measured at time 3. As a result, that case will be excluded from the linear mixed model analysis. However, data obtained from other times points for that particular subject will be included in the analysis. The assumption we must make in order to obtain unbiased estimates derived from a linear mixed model is that the data are missing randomly. With that said, the MIXED procedure in SPSS calculates degrees of freedom using Satterthwaite's Approximation:
>
> > This approximation has been shown to be valid for balanced and unbalanced designs. > > In addition to the benefits of not having to exclude all data from subjects who happen to have data which are missing randomly for parameter estimation, the MIXED procedure allows for modeling of continuous response variables using various hierarchical designs and residual covariance structures.
> > Ryan > On Fri, Mar 15, 2013 at 11:46 AM, Kornbrot, Diana <[hidden email]> wrote: > > If one uses repeated in procedure GLM then it appears that all subjects must have vlaues for all combinations of the rpeated measures
> BUT using MIXED, there is then a non-integer error df > How is SPSS actually handling the missing values? > Nb Am using unstructured covariance matrix > > Thanks for help
> Best > Diana > ________________________________ > Emeritus Professor Diana Kornbrot > email: [hidden email] <http://d.e.kornbrot@...>
> Work > Department of Psychology > School of Life and Medical Sciences
> University of Hertfordshire > College Lane, Hatfield, Hertfordshire AL10 9AB, UK > voice: +44 (0) 170 728 4626 <tel:%2B44%20%280%29%20170%20728%204626> > Home
> 19 Elmhurst Avenue > London N2 0LT, UK > voice: +44 (0) 208 444 2081 <tel:%2B44%20%280%29%20208%20%C2%A0444%202081> > mobile: +44 (0) 740 318 1612 <tel:%2B44%20%280%29%20740%20318%201612>
> > > > > > ________________________________ > Emeritus Professor Diana Kornbrot > email: [hidden email]
> Work > Department of Psychology > School of Life and Medical Sciences
> University of Hertfordshire > College Lane, Hatfield, Hertfordshire AL10 9AB, UK > voice: +44 (0) 170 728 4626 > Home > 19 Elmhurst Avenue > London N2 0LT, UK
> voice: +44 (0) 208 444 2081 > mobile: +44 (0) 740 318 1612 > > |
In reply to this post by Bruce Weaver
Hi Bruce, That's correct. I meant to state "more restrictive." Good catch. I submitted the message before re-reading it. For those interested, see the illustration I provide in a message I just posted that illustrates the point I was making. Note how the more restrictive model (AR1) is estimating 4 fewer parameters than the least restrictive (UN).
I must say that I'm very surprised by a recommendation to always use the unstructured matrix. In my own work with repeated measures data, so often I have found that some type of a residual correlation matrix that accounts for decay of correlations among residuals obtained from observations more distant in time fits the data equally well to an unstructured matrix. Of course, I am referring to a design which only has one within-subjects variable.
What if, for example, one were to analyze data collected from a type of randomized controlled trial (RCT). What types of the structures might one consider? I'll leave that question unanswered for people to ponder.
Ryan On Sun, Mar 17, 2013 at 9:16 AM, Bruce Weaver <[hidden email]> wrote: Ryan, I think you meant to say *more* restrictive below, did you not? I.e., |
In reply to this post by Ryan
Since Bruce pointed out a typo I made, I decided to reread my entire response to the OP. I noticed another typo. In this post, I correct both typos and I have decided to add another comment. All changes are ***CAPITALIZED*** in the text BELOW my name. But, I also have another comment to make right here:
The reason I'm taking such an interest in this thread is that I have heard this general recommendation before; that is, to always use an unstructured residual-covariance matrix. I don't know if there is a textbook out there that makes such a silly (at best) or dangerous (at worst) recommendation, but my guess is because of the assumption that the unstructured matrix can never be wrong due to the lack of restrictions. Let me make a somewhat provocative statement...An unstructured residual variance-covariance structure applied to ALL subjects is not always the LEAST restrictive residual variance-covariance structure. I realize that in the past I have even said that an unstructured matrix is the least restrictive, but I should have couched that statement in the context of single group designs only.
Ryan On Sun, Mar 17, 2013 at 8:40 AM, <[hidden email]> wrote:
|
Administrator
|
Off the top of my head, I can't say where I read it (and I don't have my books with me today), but I do think that at least one author I've read recommends always starting with an unstructured residual covariance matrix, and imposing restrictions if/when it makes sense to do so. I wonder if this is the approach Diana was actually promoting.
Cheers, Bruce
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Ryan
Ryan,
What you wrote suggests that using the covariances only increases the power, and that we always want more power. In that case, one might conclude that it is always "safe" to ignore the extra power by using the unstructured alternative, since it only sacrifices power. This bothers me, because I doubt that it is true. It reminds me of the assertion I have heard, that it is always "safe" to use the grouped t-test instead of a paired test, because "you only lose power." And for the t-test, *that* is not true. When the correlation is negative, the error term is larger for the paired -test, and so the paired t-test is *necessarily* the right one, by virtue of the fact that it has less power than the grouped test. I don't know how well the simple t-test generalizes to the structure in question, but a negative intra-class correlation is not impossible, when you use the proper definition of ICC. (I have seen a lousy definition in one popular description of hierarchical analysis, which defines its so-called ICC by an inadequate analogy. And it can't be negative, so it is a flawed analogy.) Negative ICCs are not the most common ones, but I did see a lecturer on HA who unwittingly stated an example that featured it. -- Rich Ulrich Date: Sun, 17 Mar 2013 09:22:19 -0400 From: [hidden email] Subject: Re: Missing values in MIXED To: [hidden email] Dear SPSS-L, Diana made a bold statement that under all circumstances one should employ a residual unstructured variance-covariance structure. Let me dispel that myth immediately. Run the code BELOW, and note that by employing a likelihood ratio test we observe that the first-order autoregressive structure is fitting the data equally well to the unstructured residual matrix. If the objective in science is to obtain the most parsimonious model that best explains the phenomenon, why would we not apply the same rule when building statistical models?
Second, by using the more parsimonious model (first-order autoregressive residaul structure), for the illustration below, take note that one obtains a statistically more powerful test of the fixed effect of time. In fact, by employing an unstructured residual matrix the fixed effect of time is not significant at alpha=.05, whereas the fixed effect for time is significant at alpha=.05 for the first-order autoregressive matrix.
This is one of many examples I could have simulated where using the general recommendation that Diana made to only use an unstructured matrix will result in not only poor science, but differential conclusions.
Ryan -- ... snip, lengthy example. |
In reply to this post by Bruce Weaver
Hi Bruce,
That certainly makes sense to determine if there is a discernible pattern in the residual covariance matrix. In fact, I almost always begin with an unstructured matrix, and based on the pattern, decide which restrictive structures to test against the unstructured matrix. If there are multiple groups, then one might consider fitting group-specific unstructured matrices. Best wishes, Ryan On Mar 17, 2013, at 3:28 PM, Bruce Weaver <[hidden email]> wrote: > Off the top of my head, I can't say where I read it (and I don't have my > books with me today), but I do think that at least one author I've read > recommends always /starting/ with an unstructured residual covariance > matrix, and imposing restrictions if/when it makes sense to do so. I wonder > if this is the approach Diana was actually promoting. > > Cheers, > Bruce > > > Ryan Black wrote >> Since Bruce pointed out a typo I made, I decided to reread my entire >> response to the OP. I noticed another typo. In this post, I correct both >> typos and I have decided to add another comment. All changes are >> ***CAPITALIZED*** in the text BELOW my name. But, I also have another >> comment to make right here: >> >> The reason I'm taking such an interest in this thread is that I have heard >> this general recommendation before; that is, to always use an unstructured >> residual-covariance matrix. I don't know if there is a textbook out there >> that makes such a silly (at best) or dangerous (at worst) recommendation, >> but my guess is because of the assumption that the unstructured matrix can >> never be wrong due to the lack of restrictions. Let me make a somewhat >> provocative statement...An unstructured >> residual variance-covariance structure applied to ALL subjects is not >> always the LEAST restrictive residual variance-covariance structure. I >> realize that in the past I have even said that an unstructured matrix is >> the least restrictive, but I should have couched that statement in the >> context of single group designs only. >> >> Ryan >> On Sun, Mar 17, 2013 at 8:40 AM, < > >> ryan.andrew.black@ > >> > wrote: >> >>> Diana, >>> >>> See my comments below. >>> >>> On Mar 17, 2013, at 4:43 AM, "Kornbrot, Diana" < > >> d.e.kornbrot@.ac > >> > >>> wrote: >>> >>> Ryan >>> >>> Thanks >>> Done all that. Converting horizontal to vertical is straightforward using >>> the data structuring wizard [don’t need syntax], once one gets the hang >>> of >>> it >>> >>> My ACTUAL question was: >>> MIXED with data in long form can cope with missing data, with correction >>> for denominator df >>> GLM REPEATED insists on NO missing data >>> So what is the difference? >>> >>> With the help of Bruce Weaver, I have NOW worked out that the difference >>> lies in the covariance matrix used for estimation of parameters >>> REPEATED applies list wise deletion and so discards any subjects that do >>> not have values for all variables, >>> MIXED applies pair wise deletion. >>> >>> >>> That is exactly what I showed in the illustration. >>> >>> Suspect the reduced df is harmonic mean of df for relevant groups, but do >>> not know >>> >>> >>> No need to suspect. I provided a link to the formula for df error. I >>> don't >>> know what you mean by reduced. >>> >>> >>> Bruce provides following useful refs that suggest that using MIXED may >>> actually be less biased than any of a whole slew of complicated >>> imputation >>> procedures: >>> Twisk & de Vente (2002): *http://europepmc.org/abstract/MED/11927199 >>> *Twisk (2003): >>> * >>> http://books.google.ca/books?hl=en&lr=&id=TCg02e-tI_cC&oi=fnd&pg=PR15&dq=Twisk+2003&ots=2GfodRIiu9&sig=z8BSBQoRaZNavIzj_QOeATBP_nw#v=onepage&q=Twisk%202003&f=false >>> *Singer & Willett (/Applied Longitudinal Data Analysis/, Chapter 5). >>> >>> I NOW recommend MIXED with UNSTRUCTURED covariance matrix across the >>> board. >>> >>> >>> That is a poor recommendation. The goal should be to find the optimal >>> residual variance-covariance structure. You could reduce statistical >>> power >>> if you employ an unstructured matrix if there is a ****MORE*** >>> restrictive >>> structure that fits that data equally well (e.g., AR1. TOEP). There may >>> be >>> other aspects to your data as well (G-side random effects that should be >>> incorporated). >>> >>> No doubt it will take time to ‘filter down’ to all users >>> Output much simpler as all inferential tests in 1 table >>> Can do appropriate post hoc or planned comparisons with standard errors >>> correctly estimated from unstructured covariance matrix. >>> >>> >>> That is not only true for the unstructured matrix. >>> >>> >>> MIXED has limitation of not supplying effect sizes. >>> Jason Becksted points out that on can calculate partial eta squared = >>> F*df1/(F*df1+df2), where df1 is the hypothesis df and df2 is the error >>> df. >>> >>> >>> So did I, publicly, when you asked. And I pointed out that one would have >>> to employ ML to use the ***ALTERNATIVE*** formula to obtain partial eta >>> squared from a fully balanced fixed effects only design. But, I would >>> question the validity of using that formula ***ABOVE*** under all >>> circumstances, which is why I provided the alternative. For example, what >>> if you are trying to determine the effect size of a random effect? What >>> if >>> your fixed effect predictor is at a higher level? There have been plenty >>> of >>> discussions on this matter on the multilevel listserve and in multilevel >>> textbooks. I would not simply apply that formula to all circumstances. In >>> fact, I would generally recommend using the second approach I >>> showed. ***SPEAKING OF EFFECT SIZE, WE MUST ALSO BE CAREFUL TO DEFINE >>> WHAT >>> WE MEAN BY EFFECT SIZE*** >>> >>> >>> REPEATED, no doubt ground breaking in its time [distant past], is fiddly >>> & >>> potentially misleading. Although the multivariate option uses correct >>> unstructured covariance matrix, the post hocs use SEs based on >>> inappropriate diagnonal covraince matrix, with GG corrections. >>> Personally, >>> have never seen a covariance matrix with all pair wise covariances equal >>> – >>> seems improbable in the real world. >>> >>> >>> Again, there are alternatives to both extremes. It is not one versus the >>> other. >>> >>> >>> Best >>> >>> Diana >>> >>> On 16/03/2013 16:49, "R B" < > >> ryan.andrew.black@ > >> > wrote: >>> >>> Diana, >>> >>> In order to employ a linear mixed model in SPSS, one must construct the >>> dataset in vertical format, such that there are "k" cases per subject >>> with >>> an identification variable with non-repeating numbers for cases >>> associated >>> with a particular subject. Assuming the within-subjects variable is >>> either >>> nominal, ordinal, or is composed of equally-spaced intervals, it is >>> common >>> practice for the within-subjects variable to be a numeric integer >>> variable >>> with sequential values from 1 through "k" levels of the within-subjects >>> variable. Finally, the response variable must be concatenated vertically >>> with each measurement linked to the appropriate ID and level of the >>> within-subject variable. >>> >>> Here is an illustration: >>> >>> ID Time y >>> 1 1 34 >>> 1 2 22 >>> 1 3 12 >>> 1 4 11 >>> 2 1 33 >>> 2 2 32 >>> 2 3 . >>> 2 4 22 >>> 3 1 38 >>> 3 2 37 >>> 3 3 34 >>> 3 4 30 >>> . >>> . >>> . >>> . >>> >>> As you can see above, the second subject was not measured at time 3. As a >>> result, that case will be excluded from the linear mixed model analysis. >>> However, data obtained from other times points for that particular >>> subject >>> will be included in the analysis. The assumption we must make in order to >>> obtain unbiased estimates derived from a linear mixed model is that the >>> data are missing randomly. With that said, the MIXED procedure in SPSS >>> calculates degrees of freedom using Satterthwaite's Approximation: >>> >>> >>> http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.help%2Falg_mixed_custom-tests_satterthwaite.htm >>> >>> This approximation has been shown to be valid for balanced and unbalanced >>> designs. >>> >>> In addition to the benefits of not having to exclude all data from >>> subjects who happen to have data which are missing randomly for parameter >>> estimation, the MIXED procedure allows for modeling of continuous >>> response >>> variables using various hierarchical designs and residual covariance >>> structures. >>> >>> Ryan >>> On Fri, Mar 15, 2013 at 11:46 AM, Kornbrot, Diana < > >> d.e.kornbrot@.ac > >>> wrote: >>> >>> If one uses repeated in procedure GLM then it appears that all subjects >>> must have vlaues for all combinations of the rpeated measures >>> BUT using MIXED, there is then a non-integer error df >>> How is SPSS actually handling the missing values? >>> Nb Am using unstructured covariance matrix >>> >>> Thanks for help >>> Best >>> Diana >>> ------------------------------ >>> Emeritus Professor Diana Kornbrot >>> email: > >> d.e.kornbrot@.ac > >> <http:// > >> d.e.kornbrot@.ac > >> > >>> web: http://dianakornbrot.wordpress.com/ >>> *Work >>> *Department of Psychology >>> School of Life and Medical Sciences >>> University of Hertfordshire >>> College Lane, Hatfield, Hertfordshire AL10 9AB, UK >>> voice: +44 (0) 170 728 4626 >>> <tel:%2B44%20%280%29%20170%20728%204626> >>> *Home >>> *19 Elmhurst Avenue >>> London N2 0LT, UK >>> voice: +44 (0) 208 444 >>> 2081<tel:%2B44%20%280%29%20208%20%C2%A0444%202081> >>> mobile: +44 (0) 740 318 1612 >>> <tel:%2B44%20%280%29%20740%20318%201612> >>> >>> >>> >>> >>> >>> ------------------------------ >>> Emeritus Professor Diana Kornbrot >>> email: > >> d.e.kornbrot@.ac > >>> web: http://dianakornbrot.wordpress.com/ >>> *Work >>> *Department of Psychology >>> School of Life and Medical Sciences >>> University of Hertfordshire >>> College Lane, Hatfield, Hertfordshire AL10 9AB, UK >>> voice: +44 (0) 170 728 4626 >>> *Home >>> *19 Elmhurst Avenue >>> London N2 0LT, UK >>> voice: +44 (0) 208 444 2081 >>> mobile: +44 (0) 740 318 1612 > > > > > > ----- > -- > Bruce Weaver > [hidden email] > http://sites.google.com/a/lakeheadu.ca/bweaver/ > > "When all else fails, RTFM." > > NOTE: My Hotmail account is not monitored regularly. > To send me an e-mail, please use the address shown above. > > -- > View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Missing-values-in-MIXED-tp5718714p5718770.html > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Ryan,
' ... If there are multiple groups, then one might consider fitting group-specific unstructured matrices.' How would you do that? Just to keep the discussion context clear, we have been talking about a mixed analysis that includes a repeated statement and not one that includes only a random statement. True? Thanks, Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of SUBSCRIBE SPSSX-L Anonymous Sent: Sunday, March 17, 2013 4:44 PM To: [hidden email] Subject: Re: Missing values in MIXED Hi Bruce, That certainly makes sense to determine if there is a discernible pattern in the residual covariance matrix. In fact, I almost always begin with an unstructured matrix, and based on the pattern, decide which restrictive structures to test against the unstructured matrix. If there are multiple groups, then one might consider fitting group-specific unstructured matrices. Best wishes, Ryan On Mar 17, 2013, at 3:28 PM, Bruce Weaver <[hidden email]> wrote: > Off the top of my head, I can't say where I read it (and I don't have > my books with me today), but I do think that at least one author I've > read recommends always /starting/ with an unstructured residual > covariance matrix, and imposing restrictions if/when it makes sense to > do so. I wonder if this is the approach Diana was actually promoting. > > Cheers, > Bruce > > > Ryan Black wrote >> Since Bruce pointed out a typo I made, I decided to reread my entire >> response to the OP. I noticed another typo. In this post, I correct >> both typos and I have decided to add another comment. All changes are >> ***CAPITALIZED*** in the text BELOW my name. But, I also have another >> comment to make right here: >> >> The reason I'm taking such an interest in this thread is that I have >> heard this general recommendation before; that is, to always use an >> unstructured residual-covariance matrix. I don't know if there is a >> textbook out there that makes such a silly (at best) or dangerous (at >> worst) recommendation, but my guess is because of the assumption that >> the unstructured matrix can never be wrong due to the lack of >> restrictions. Let me make a somewhat provocative statement...An >> unstructured residual variance-covariance structure applied to ALL >> subjects is not always the LEAST restrictive residual >> variance-covariance structure. I realize that in the past I have even >> said that an unstructured matrix is the least restrictive, but I >> should have couched that statement in the context of single group designs only. >> >> Ryan >> On Sun, Mar 17, 2013 at 8:40 AM, < > >> ryan.andrew.black@ > >> > wrote: >> >>> Diana, >>> >>> See my comments below. >>> >>> On Mar 17, 2013, at 4:43 AM, "Kornbrot, Diana" < > >> d.e.kornbrot@.ac > >> > >>> wrote: >>> >>> Ryan >>> >>> Thanks >>> Done all that. Converting horizontal to vertical is straightforward >>> using the data structuring wizard [don’t need syntax], once one gets >>> the hang of it >>> >>> My ACTUAL question was: >>> MIXED with data in long form can cope with missing data, with >>> correction for denominator df GLM REPEATED insists on NO missing >>> data So what is the difference? >>> >>> With the help of Bruce Weaver, I have NOW worked out that the >>> difference lies in the covariance matrix used for estimation of >>> parameters REPEATED applies list wise deletion and so discards any >>> subjects that do not have values for all variables, MIXED applies >>> pair wise deletion. >>> >>> >>> That is exactly what I showed in the illustration. >>> >>> Suspect the reduced df is harmonic mean of df for relevant groups, >>> but do not know >>> >>> >>> No need to suspect. I provided a link to the formula for df error. I >>> don't know what you mean by reduced. >>> >>> >>> Bruce provides following useful refs that suggest that using MIXED >>> may actually be less biased than any of a whole slew of complicated >>> imputation >>> procedures: >>> Twisk & de Vente (2002): >>> *http://europepmc.org/abstract/MED/11927199 >>> *Twisk (2003): >>> * >>> http://books.google.ca/books?hl=en&lr=&id=TCg02e-tI_cC&oi=fnd&pg=PR1 >>> 5&dq=Twisk+2003&ots=2GfodRIiu9&sig=z8BSBQoRaZNavIzj_QOeATBP_nw#v=one >>> page&q=Twisk%202003&f=false *Singer & Willett (/Applied Longitudinal >>> Data Analysis/, Chapter 5). >>> >>> I NOW recommend MIXED with UNSTRUCTURED covariance matrix across the >>> board. >>> >>> >>> That is a poor recommendation. The goal should be to find the >>> optimal residual variance-covariance structure. You could reduce >>> statistical power if you employ an unstructured matrix if there is a >>> ****MORE*** restrictive structure that fits that data equally well >>> (e.g., AR1. TOEP). There may be other aspects to your data as well >>> (G-side random effects that should be incorporated). >>> >>> No doubt it will take time to ‘filter down’ to all users Output much >>> simpler as all inferential tests in 1 table Can do appropriate post >>> hoc or planned comparisons with standard errors correctly estimated >>> from unstructured covariance matrix. >>> >>> >>> That is not only true for the unstructured matrix. >>> >>> >>> MIXED has limitation of not supplying effect sizes. >>> Jason Becksted points out that on can calculate partial eta squared >>> = F*df1/(F*df1+df2), where df1 is the hypothesis df and df2 is the >>> error df. >>> >>> >>> So did I, publicly, when you asked. And I pointed out that one would >>> have to employ ML to use the ***ALTERNATIVE*** formula to obtain >>> partial eta squared from a fully balanced fixed effects only design. >>> But, I would question the validity of using that formula ***ABOVE*** >>> under all circumstances, which is why I provided the alternative. >>> For example, what if you are trying to determine the effect size of >>> a random effect? What if your fixed effect predictor is at a higher >>> level? There have been plenty of discussions on this matter on the >>> multilevel listserve and in multilevel textbooks. I would not simply >>> apply that formula to all circumstances. In fact, I would generally >>> recommend using the second approach I showed. ***SPEAKING OF EFFECT >>> SIZE, WE MUST ALSO BE CAREFUL TO DEFINE WHAT WE MEAN BY EFFECT >>> SIZE*** >>> >>> >>> REPEATED, no doubt ground breaking in its time [distant past], is >>> fiddly & potentially misleading. Although the multivariate option >>> uses correct unstructured covariance matrix, the post hocs use SEs >>> based on inappropriate diagnonal covraince matrix, with GG >>> corrections. >>> Personally, >>> have never seen a covariance matrix with all pair wise covariances >>> equal – seems improbable in the real world. >>> >>> >>> Again, there are alternatives to both extremes. It is not one versus >>> the other. >>> >>> >>> Best >>> >>> Diana >>> >>> On 16/03/2013 16:49, "R B" < > >> ryan.andrew.black@ > >> > wrote: >>> >>> Diana, >>> >>> In order to employ a linear mixed model in SPSS, one must construct >>> the dataset in vertical format, such that there are "k" cases per >>> subject with an identification variable with non-repeating numbers >>> for cases associated with a particular subject. Assuming the >>> within-subjects variable is either nominal, ordinal, or is composed >>> of equally-spaced intervals, it is common practice for the >>> within-subjects variable to be a numeric integer variable with >>> sequential values from 1 through "k" levels of the within-subjects >>> variable. Finally, the response variable must be concatenated >>> vertically with each measurement linked to the appropriate ID and >>> level of the within-subject variable. >>> >>> Here is an illustration: >>> >>> ID Time y >>> 1 1 34 >>> 1 2 22 >>> 1 3 12 >>> 1 4 11 >>> 2 1 33 >>> 2 2 32 >>> 2 3 . >>> 2 4 22 >>> 3 1 38 >>> 3 2 37 >>> 3 3 34 >>> 3 4 30 >>> . >>> . >>> . >>> . >>> >>> As you can see above, the second subject was not measured at time 3. >>> As a result, that case will be excluded from the linear mixed model analysis. >>> However, data obtained from other times points for that particular >>> subject will be included in the analysis. The assumption we must >>> make in order to obtain unbiased estimates derived from a linear >>> mixed model is that the data are missing randomly. With that said, >>> the MIXED procedure in SPSS calculates degrees of freedom using >>> Satterthwaite's Approximation: >>> >>> >>> http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp? >>> topic=%2Fcom.ibm.spss.statistics.help%2Falg_mixed_custom-tests_satte >>> rthwaite.htm >>> >>> This approximation has been shown to be valid for balanced and >>> unbalanced designs. >>> >>> In addition to the benefits of not having to exclude all data from >>> subjects who happen to have data which are missing randomly for >>> parameter estimation, the MIXED procedure allows for modeling of >>> continuous response variables using various hierarchical designs and >>> residual covariance structures. >>> >>> Ryan >>> On Fri, Mar 15, 2013 at 11:46 AM, Kornbrot, Diana < > >> d.e.kornbrot@.ac > >>> wrote: >>> >>> If one uses repeated in procedure GLM then it appears that all >>> subjects must have vlaues for all combinations of the rpeated >>> measures BUT using MIXED, there is then a non-integer error df How >>> is SPSS actually handling the missing values? >>> Nb Am using unstructured covariance matrix >>> >>> Thanks for help >>> Best >>> Diana >>> ------------------------------ >>> Emeritus Professor Diana Kornbrot >>> email: > >> d.e.kornbrot@.ac > >> <http:// > >> d.e.kornbrot@.ac > >> > >>> web: http://dianakornbrot.wordpress.com/ >>> *Work >>> *Department of Psychology >>> School of Life and Medical Sciences >>> University of Hertfordshire >>> College Lane, Hatfield, Hertfordshire AL10 9AB, UK >>> voice: +44 (0) 170 728 4626 >>> <tel:%2B44%20%280%29%20170%20728%204626> >>> *Home >>> *19 Elmhurst Avenue >>> London N2 0LT, UK >>> voice: +44 (0) 208 444 >>> 2081<tel:%2B44%20%280%29%20208%20%C2%A0444%202081> >>> mobile: +44 (0) 740 318 1612 >>> <tel:%2B44%20%280%29%20740%20318%201612> >>> >>> >>> >>> >>> >>> ------------------------------ >>> Emeritus Professor Diana Kornbrot >>> email: > >> d.e.kornbrot@.ac > >>> web: http://dianakornbrot.wordpress.com/ >>> *Work >>> *Department of Psychology >>> School of Life and Medical Sciences >>> University of Hertfordshire >>> College Lane, Hatfield, Hertfordshire AL10 9AB, UK >>> voice: +44 (0) 170 728 4626 >>> *Home >>> *19 Elmhurst Avenue >>> London N2 0LT, UK >>> voice: +44 (0) 208 444 2081 >>> mobile: +44 (0) 740 318 1612 > > > > > > ----- > -- > Bruce Weaver > [hidden email] > http://sites.google.com/a/lakeheadu.ca/bweaver/ > > "When all else fails, RTFM." > > NOTE: My Hotmail account is not monitored regularly. > To send me an e-mail, please use the address shown above. > > -- > View this message in context: > http://spssx-discussion.1045642.n5.nabble.com/Missing-values-in-MIXED- > tp5718714p5718770.html Sent from the SPSSX Discussion mailing list > archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except > the command. To leave the list, send the command SIGNOFF SPSSX-L For a > list of commands to manage subscriptions, send the command INFO > REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Rich Ulrich
Rich, Did you miss my rationale for determining the optimal var-cov matrix as a rationale for parsimony? Yes, it could increase power, but that was not the thrust of my point. Ryan Sent from my iPhone
|
I followed the rationale that it was better science and wiser.
I missed it, if you anywhere suggested that the wrong var-cov matrix can, when the ICC is negative (one case is when scores add to a near-fixed total), give a test that is bad because it rejects too often. -- Rich Ulrich Date: Sun, 17 Mar 2013 17:01:30 -0400 From: [hidden email] Subject: Re: Missing values in MIXED To: [hidden email] Rich, Did you miss my rationale for determining the optimal var-cov matrix as a rationale for parsimony? Yes, it could increase power, but that was not the thrust of my point. Ryan Sent from my iPhone
|
Rich, You are asking a question unrelated to my general recommendation of avoiding model overfitting with respect to the residual variance-covariance matrix (R-side random effects) in the context of a linear mixed model. I had made reasonable assumptions when making these recommendations. I will respond to your comment in a new thread, as I see this issue as off-topic enough to deserve its own thread. Ryan
|
In reply to this post by Ryan
It is actually simpler to use than GLM univariate + GLM repeated for researchers {[sychology, education, biology, maketing, etc.]who are not interested in gory statistical. Advantages 1 procedure whether repeated or not Same look and feel output for within and between groups Makes less assumptions, e.g. Can take care of factorial unequal variance Useful information for dialogue users, and possibly script also
Useful locations, os don¹t waste time on ghastly ibm web site http://www-933.ibm.com/support/fixcentral/options for patches Emeritus Professor Diana Kornbrot email: d.e.kornbrot@... web: http://dianakornbrot.wordpress.com/ Work Department of Psychology School of Life and Medical Sciences University of Hertfordshire College Lane, Hatfield, Hertfordshire AL10 9AB, UK voice: +44 (0) 170 728 4626 Home 19 Elmhurst Avenue London N2 0LT, UK voice: +44 (0) 208 444 2081 mobile: +44 (0) 740 318 1612 |
Free forum by Nabble | Edit this page |