Mixed Model Error- overspecified?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Mixed Model Error- overspecified?

kw1130
I am trying to replicate an analysis done by a colleague (who used a different program for data analysis), and I believe that using a mixed model is the best way to handle the data. However, I'm unfamiliar with using mixed models and I'm getting an error every time I run the analysis. I imagine this is because the model is overspecified, but I'm not sure what I'm doing wrong. I included my syntax at the end of the message.

The experimental design is like this: I have one variable, subjects, that I want to include as a random factor. The subjects saw letter strings on a computer screen and had to decide whether the string was a word or non-word. I have two within-subjects factors: "wordness" (whether the string was a word or non-word) and "relatedness" (whether the string was related to another stimuli on the screen or unrelated). Together, I think, this forms a nested repeated measures design. The dependent variable is reaction time.

Here is my syntax for this model:
MIXED Rt2Adj_mean BY Relatedness Wordness Subj
  /CRITERIA=CIN(95) MXITER(100) MXSTEP(10) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE)
  /FIXED=Relatedness Wordness Relatedness*Wordness | SSTYPE(3)
  /METHOD=ML
  /PRINT=DESCRIPTIVES
  /RANDOM=Subj | COVTYPE(VC)
  /REPEATED=Relatedness*Wordness | SUBJECT(Subj) COVTYPE(UN).


I've also tried it with this slight adjustment, adding the intercept as a random effect (which I think may be more correct?)
MIXED Rt2Adj_mean BY Relatedness Wordness Subj
  /CRITERIA=CIN(95) MXITER(100) MXSTEP(10) SCORING(1) SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE) PCONVERGE(0.000001, ABSOLUTE)
  /FIXED=Relatedness Wordness Relatedness*Wordness | SSTYPE(3)
  /METHOD=ML
  /PRINT=DESCRIPTIVES
  /RANDOM=INTERCEPT Subj | COVTYPE(VC)
  /REPEATED=Relatedness*Wordness | SUBJECT(Subj) COVTYPE(UN).



Either way, when I run the analysis, I get the following error: "Warnings
Iteration was terminated but convergence has not been achieved. The MIXED procedure continues despite this warning. Subsequent results produced are based on the last iteration. Validity of the model fit is uncertain." Output is produced, but it doesn't come close to the the output I should be getting that was produced by my colleague. I am wondering if the error is due to specifying subjects as a random effect but also as a subjects variable (using the GUI). However, I'm not sure how to get around this, as I do need subjects to be a random effect.

It seems like this should be a simple enough design to analyze but I think my unfamiliarity with mixed models is causing me to run into problems. I've been trying to read up on them, but nothing has jumped out at me as a solution (although I do think that overspecification is potentially my problem). Any help would be greatly appreciated!
Reply | Threaded
Open this post in threaded view
|

Re: Mixed Model Error- overspecified?

Ryan
I could certainly think through how the model could be parameterized but since your goal is to obtain a model equivalent to that specified by your colleague, let's work off of his/her code. 

I assume your colleague used a well-known software package (e.g., SAS, HLM, MPLUS, STATA). Please tell us which software your colleague used and post your colleague's code.

Ryan


On Tue, May 6, 2014 at 11:06 PM, kw1130 <[hidden email]> wrote:
I am trying to replicate an analysis done by a colleague (who used a
different program for data analysis), and I believe that using a mixed model
is the best way to handle the data. However, I'm unfamiliar with using mixed
models and I'm getting an error every time I run the analysis. I imagine
this is because the model is overspecified, but I'm not sure what I'm doing
wrong. I included my syntax at the end of the message.

The experimental design is like this: I have one variable, subjects, that I
want to include as a random factor. The subjects saw letter strings on a
computer screen and had to decide whether the string was a word or non-word.
I have two within-subjects factors: "wordness" (whether the string was a
word or non-word) and "relatedness" (whether the string was related to
another stimuli on the screen or unrelated). Together, I think, this forms a
nested repeated measures design. The dependent variable is reaction time.

Here is my syntax for this model:
MIXED Rt2Adj_mean BY Relatedness Wordness Subj
  /CRITERIA=CIN(95) MXITER(100) MXSTEP(10) SCORING(1)
SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE)
PCONVERGE(0.000001, ABSOLUTE)
  /FIXED=Relatedness Wordness Relatedness*Wordness | SSTYPE(3)
  /METHOD=ML
  /PRINT=DESCRIPTIVES
  /RANDOM=Subj | COVTYPE(VC)
  /REPEATED=Relatedness*Wordness | SUBJECT(Subj) COVTYPE(UN).


I've also tried it with this slight adjustment, adding the intercept as a
random effect (which I think may be more correct?)
MIXED Rt2Adj_mean BY Relatedness Wordness Subj
  /CRITERIA=CIN(95) MXITER(100) MXSTEP(10) SCORING(1)
SINGULAR(0.000000000001) HCONVERGE(0, ABSOLUTE) LCONVERGE(0, ABSOLUTE)
PCONVERGE(0.000001, ABSOLUTE)
  /FIXED=Relatedness Wordness Relatedness*Wordness | SSTYPE(3)
  /METHOD=ML
  /PRINT=DESCRIPTIVES
  /RANDOM=INTERCEPT Subj | COVTYPE(VC)
  /REPEATED=Relatedness*Wordness | SUBJECT(Subj) COVTYPE(UN).



Either way, when I run the analysis, I get the following error: "Warnings
Iteration was terminated but convergence has not been achieved. The MIXED
procedure continues despite this warning. Subsequent results produced are
based on the last iteration. Validity of the model fit is uncertain." Output
is produced, but it doesn't come close to the the output I should be getting
that was produced by my colleague. I am wondering if the error is due to
specifying subjects as a random effect but also as a subjects variable
(using the GUI). However, I'm not sure how to get around this, as I do need
subjects to be a random effect.

It seems like this should be a simple enough design to analyze but I think
my unfamiliarity with mixed models is causing me to run into problems. I've
been trying to read up on them, but nothing has jumped out at me as a
solution (although I do think that overspecification is potentially my
problem). Any help would be greatly appreciated!



--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Mixed-Model-Error-overspecified-tp5725851.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Mixed Model Error- overspecified?

kw1130
Thanks for your response, Ryan. Unfortunately I don't have his code, only his output-- otherwise the problem might be much more easily solved! I can post the output if you'd like, but I'm not sure it will help.

I ended up reading about a similiar design and dilemma last night and the suggested solution was to use GLM and "trick" it into running a repeated measures analysis. I'm not sure why this was suggested over the mixed model, but I tried this analysis as well with the syntax below and got identical results:

UNIANOVA
Rt2Adj_mean BY Relatedness Wordness Subj
/RANDOM= Subj
/METHOD= SSTYPE(3)
/INTERCEPT= INCLUDE
  /EMMEANS=TABLES(Subj)
  /PRINT=DESCRIPTIVE
/PLOT= PROFILE(Relatedness*Wordness)
/CRITERIA = ALPHA(.05)
/DESIGN = Subj
            Relatedness Relatedness*Subj
            Wordness Wordness*Subj
            Relatedness * Wordness.


But no error message this time.



The only other idea I have about why the results are different between myself and my colleague is that when I look at the descriptives printed out by both the GLM and Mixed analyses, the means for my two levels of Relatedness (my primary variable of interest, and the variable whose values I'm most interested in confirming with my colleague) do not match what I would get if I printed the descriptives for this variable without doing the ANOVA. If I use my original database and aggregate across Rt2Adj with Subj and Relatedness as break variables and then look at the descriptives, I get means that are about 40 ms different from the ones I get with I do Mixed or GLM (and thus aggregate across Rt2Adj with Subj, Relatedness, and Wordness as break variables). I'm not sure why this would be-- I can understand small rounding differences but the differences I'm seeing are too big to be explained away by rounding.
Reply | Threaded
Open this post in threaded view
|

Re: Mixed Model Error- overspecified?

Bruce Weaver
Administrator
kw1130 wrote
I'm not sure why [tricking GLM into running a repeated measures analysis] was suggested over the mixed model...
This reminds me that I suggested the "tricking GLM" method myself a few years ago.  E.g.,

  http://www.angelfire.com/wv/bwhomedir/spss/repmeas_ANOVA_with_long_file.SPS

But as I recall, I made that suggestion before MIXED was an option, or at least before I had become familiar with it!  ;-)

I really ought to update that syntax file and show how to perform the analysis via MIXED.  Some day I might find the time.  


kw1130 wrote
Thanks for your response, Ryan. Unfortunately I don't have his code, only his output-- otherwise the problem might be much more easily solved! I can post the output if you'd like, but I'm not sure it will help.

I ended up reading about a similiar design and dilemma last night and the suggested solution was to use GLM and "trick" it into running a repeated measures analysis. I'm not sure why this was suggested over the mixed model, but I tried this analysis as well with the syntax below and got identical results:

UNIANOVA
Rt2Adj_mean BY Relatedness Wordness Subj
/RANDOM= Subj
/METHOD= SSTYPE(3)
/INTERCEPT= INCLUDE
  /EMMEANS=TABLES(Subj)
  /PRINT=DESCRIPTIVE
/PLOT= PROFILE(Relatedness*Wordness)
/CRITERIA = ALPHA(.05)
/DESIGN = Subj
            Relatedness Relatedness*Subj
            Wordness Wordness*Subj
            Relatedness * Wordness.


But no error message this time.



The only other idea I have about why the results are different between myself and my colleague is that when I look at the descriptives printed out by both the GLM and Mixed analyses, the means for my two levels of Relatedness (my primary variable of interest, and the variable whose values I'm most interested in confirming with my colleague) do not match what I would get if I printed the descriptives for this variable without doing the ANOVA. If I use my original database and aggregate across Rt2Adj with Subj and Relatedness as break variables and then look at the descriptives, I get means that are about 40 ms different from the ones I get with I do Mixed or GLM (and thus aggregate across Rt2Adj with Subj, Relatedness, and Wordness as break variables). I'm not sure why this would be-- I can understand small rounding differences but the differences I'm seeing are too big to be explained away by rounding.
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Mixed Model Error- overspecified?

kw1130
Bruce-- yes, it was yours suggestion I had read about using GLM to do repeated measures! I'm unfamiliar with when SPSS added the mixed models analysis, so that makes a lot of sense.

I am happy to see that the GLM method produces the same results as the mixed models output. I guess the only thing that still puzzles me is why the means are computed differently when I run the descriptives via the GLM versus just running the descriptives by themselves. The format for the descriptives printed by the ANOVA is this:


Relatedness     Wordness    Subj    Mean
0                    0               1         x
                                      2         x
                      ...             ...       ...
                                      Total    x
                     _____________________
                     1               1          x
                                      2          x
                                     ...        ...
                                     Total     x    


It continues for level '1' of relatedness. When I run the descriptives by itself (outside of the ANOVA), I get a means for Relatedness 0, Wordness 0
                 Relatedness 0, Wordness 1
                 Relatedness 1, Wordness 0
                 Relatedness 1, Wordness 1

I thought this should match up with the individual "total" means under each category of the descriptives from the ANOVA, but they don't. Am I misunderstanding the descriptives output from the ANOVA? Which means are the correct means? I had used the means generated by the descriptives alone to create a few graphs but now I'm concerned about using them because they don't seem to match.    
Reply | Threaded
Open this post in threaded view
|

Re: Mixed Model Error- overspecified?

Ryan
For starters, there is no reason hand-calculated means must be the same as model-predicted means from a linear mixed model (MIXED procedure). In fact, I would expect that to occur only under special circumstances. 

Second, you have access to your colleague's output but not syntax? That is bizarre. I will ask again. Please tell us which software package he used, which procedure he used, syntax (if available), and output. If you are trying to replicate what he did, we need to know what he did. If you are trying to perhaps improve upon what he did, that is a separate story. 

Third, I would avoid using a general linear model to estimate random effects models now that MIXED is available, especially if you have missing data and/or your random effects/residual covariance matrix is complex. 

Fourth, why is your DV a mean? If you have multiple measurements on each subject, why not model them jointly and allow for the expected correlation (or perhaps separately)? I need to understand your design and objectives before providing any specific advice regarding syntax. 

Ryan

On Wed, May 7, 2014 at 10:27 AM, kw1130 <[hidden email]> wrote:
Bruce-- yes, it was yours suggestion I had read about using GLM to do
repeated measures! I'm unfamiliar with when SPSS added the mixed models
analysis, so that makes a lot of sense.

I am happy to see that the GLM method produces the same results as the mixed
models output. I guess the only thing that still puzzles me is why the means
are computed differently when I run the descriptives via the GLM versus just
running the descriptives by themselves. The format for the descriptives
printed by the ANOVA is this:


Relatedness     Wordness    Subj    Mean
0                    0               1         x
                                      2         x
                      ...             ...       ...
                                      Total    x
                     _____________________
                     1               1          x
                                      2          x
                                     ...        ...
                                     Total     x


It continues for level '1' of relatedness. When I run the descriptives by
itself (outside of the ANOVA), I get a means for Relatedness 0, Wordness 0
                 Relatedness 0, Wordness 1
                 Relatedness 1, Wordness 0
                 Relatedness 1, Wordness 1

I thought this should match up with the individual "total" means under each
category of the descriptives from the ANOVA, but they don't. Am I
misunderstanding the descriptives output from the ANOVA? Which means are the
correct means? I had used the means generated by the descriptives alone to
create a few graphs but now I'm concerned about using them because they
don't seem to match.



--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Mixed-Model-Error-overspecified-tp5725851p5725871.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: Mixed Model Error- overspecified?

kw1130
Ryan, to address points 2 and 4:

2. Yes; I was sent the output alone but not the syntax. The reason for this is that I was originally intended to present the results of HIS analysis and then eventually conduct follow-up work of my own; for this reason, he only sent me his output because I was not originally intending to redo his work. He had conducted three experiments that were run exactly the same way but used different stimuli. However, in preparing to present his results I noticed some errors/inconsistencies between the analyses for the three experiments (mainly, some inconsistent scaling/correction of the outcome variable) so my goal at the moment was to 1) correct for those inconsistencies and 2) re-do the analyses. I do not believe he used a standard statistical program but rather I think he used a number of scripts he programmed in C++. I have provided an example of the output at the bottom of this message in case that helps clarify his procedure in general or the design. I am really most interested in doing the right analysis for the data (not necessarily replicating his exact procedure, as this may be difficult without the syntax or other clues as to how the analysis was run).

4. I am not entirely sure I understand your fourth point but I have two answers to your question. The first answer is that in the original analysis, a mean was calculated for each variable combination (Relatedness 0 Wordness 0, Relatedness 0 Wordness 1, Relatedness 1 Wordness 0, Relatedness 1 Wordness 1) and then these means were used in the analysis; so in trying to keep as closely to the original analysis as possible, I created the same means. Additionally, while there were multiple measurements for each variable combination we're not interested in the RT of any particular observation, but rather we are interested in RTs for a category-- e.g. are RTs for related words faster than RTs for unrelated words? I think this question lends itself to using means rather than individual observations, although I may be mistaken or misinterpreting your suggestion.








Output from one of the analyses:
SOURCE: grand mean
Scene   Words      N       MEAN         SD         SE
                  60   885.2785   255.8844    33.0345

SOURCE: Scene
Scene  Words      N       MEAN         SD         SE
Related           30   882.0898   284.8639    52.0088
Unrelat           30   888.4673   228.1293    41.6505

SOURCE: Words
Scene   Words      N       MEAN         SD         SE
        Non-words   30   952.4472   302.0184    55.1408
        Words     30   818.1098   180.7262    32.9959

SOURCE: Scene Words
Scene   Words      N       MEAN         SD         SE
Related Non-words   15   945.0256   336.5531    86.8976
Related Words     15   819.1540   215.2568    55.5791
Unrelated Non-words   15   959.8689   274.8810    70.9740
Unrelated Words     15   817.0657   146.0106    37.6998

FACTOR  :    Subject      Scene      Words         RT
LEVELS  :         15          2          2                 60
TYPE    :     RANDOM     WITHIN     WITHIN       DATA

SOURCE                SS     df             MS         F      p
===============================================================
mean       47023084.6887      1  47023084.6887   336.528  0.000 ***
S/        1956223.0686     14    139730.2192

Scene              610.0812      1       610.0812     0.025  0.877
SS/         341315.7793     14     24379.6985

Words           270698.0773      1    270698.0773     4.231  0.059
WS/         895804.6322     14     63986.0452

SW           1075.0447      1      1075.0447     0.038  0.848
SWS/         397406.3454     14     28386.1675



Also, here is a little snippit of the data to show you what I working with in case, again, it helps clarify the design and research question.
Subj Time2   Word RT2 Relatedness  Wordness Rt2Adj
4 1604   diamond 4404 0          1        2800.61
4 1603   library 4410 1          1        2806.47
4 1603   alert 4760 0          1        3156.29
4 1603   denk 4412 0          0        2808.27
4 1604   prandy 5181 0          0        3577.52
4 1603   pool 4267 1          1        2663.66

This is how the dataset was originally structured. I used SPSS' aggregate function using Subj, Relatedness, and Wordness as break variables to prepare the data for analysis with Mixed. Subjects is a random factor; relatedness and wordness are within-subjects factors. We are interested in determining whether reaction time (Rt2Adj) differs as a function of Relatedness, Wordness, or the interaction between Relatedness and Wordness. Relatedness and Wordness are coded like this:
Relatedness 0= Unrelated
Relatedness 1= Related
Wordness 0= Non-word
Wordness 1= Word


I hope this helps to clarify things.
Reply | Threaded
Open this post in threaded view
|

Re: Mixed Model Error- overspecified?

Karen Grace-Martin
First, I want to say I agree with all of Ryan's points.  The reason you don't want to use a mean of multiple trials is that SPSS now thinks that's a single data point, with no variation.  Since there is variation across trials, and you're "pretending" there isn't, you're underestimating standard errors.

Before Mixed was around, you had to do this b/c of the way Repeated Measures GLM was set up in the wide format.  But it's not necessary anymore.

And to answer your original question, yes, the way you specified the model in mixed is over-specified.

It's because you have both a random statement and a repeated with an unstructured covariance structure.  

Try removing one or the other and it should work.  A random statement with only a random intercept and a VC structure and a repeated statement with a CS structure are mathematically equivalent.

In some designs and data sets, you will get the identical results as the Unianova "tricked" results, but if they're at all different, the mixed results are more accurate.