SPSSX Discussion

GenLinMixed, GEE, or something else for unbalanced repeated measures data with dichotomous DV?

Classic

List

Threaded

4 messages Options

kw1130

GenLinMixed, GEE, or something else for unbalanced repeated measures data with dichotomous DV?

I am trying to analyze some unbalanced repeated measures data with a dichotomous dependent variable. I have already searched for the best way to analyze this data and there seems to be some disagreement.

My design is as such: 16 participants viewed three sets of twenty images. The same twenty images were present in each "set"-- the only difference was that there was a consistent color manipulation in each set of images. Therefore, I have two variables: ColorType (three levels) and ImageType(20 levels). Participants were instructed to identify each image as it was presented, and each response was coded as correct or incorrect. Therefore, each participant gave 60 responses (which were each coded as 1= correct, or 0= incorrect): 20 responses per ColorType, or 3 responses per ImageType. Primarily, I'm interested in determining whether identification accuracy varied by ColorType. I'm also interested in determining whether identification accuracy varied by ImageType, and whether there is an interaction between ColorType and ImageType. My dataset is currently set up like this:

Subj ColorType ImageType Accuracy
1 1 1 0
1 1 2 0
1 1 3 1
1 1 4 0

etc. The data is unbalanced in that two subjects failed to give responses for images of one color type.

Someone suggested that I could use the GenLinMixed procedure to analyze the data, which I have tried. I've been using the GUI to run the procedure but I added my syntax at the end. I specified "subj" as the subject variable, both ColorType and ImageType as repeated measures variable, and accuracy as the outcome (binomial logit). However, when I run the procedure it pretty much runs indefinitely without producing any output. I've started the procedure and come back 8 hours later to find no output and "RUNNING GENLINMIXED" in the status bar. I'm not sure if there is a mistake in my syntax or in the way that I've set up my data for this procedure, or if this is just a flaw in the genlinmixed procedure when using a dataset with many rows.

Is genlinmixed the best procedure to analyze this data? Is GEE an alternative method? Or is there a different method altogether that I should be using to analyze data with these requirements (unbalanced, repeated measures, dichotomous DV)?

Any tips would be greatly appreciated.

Here is the syntax:
GENLINMIXED
/DATA_STRUCTURE SUBJECTS=Subj REPEATED_MEASURES=ColorType*ImageType COVARIANCE_TYPE=DIAGONAL
/FIELDS TARGET=Accuracy TRIALS=NONE OFFSET=NONE
/TARGET_OPTIONS DISTRIBUTION=BINOMIAL LINK=LOGIT
/FIXED EFFECTS=ColorType ImageType ColorType*ImageType USE_INTERCEPT=TRUE
/RANDOM EFFECTS=Subj USE_INTERCEPT=FALSE COVARIANCE_TYPE=VARIANCE_COMPONENTS
/BUILD_OPTIONS TARGET_CATEGORY_ORDER=ASCENDING INPUTS_CATEGORY_ORDER=ASCENDING MAX_ITERATIONS=100
CONFIDENCE_LEVEL=95 DF_METHOD=RESIDUAL COVB=MODEL
/EMMEANS_OPTIONS SCALE=ORIGINAL PADJUST=LSD
/OUTFILE MODEL = 'ImageIDOutput.zip'.

Bruce Weaver

Re: GenLinMixed, GEE, or something else for unbalanced repeated measures data with dichotomous DV?

Administrator

I think GEE via GENLIN is a legitimate option for your analysis. Here's an example I worked out a few years ago with a colleague who does smoking cessation work. In that field, people had been (maybe still are) running up to 3 separate binary logistic regression models, one for each time point. I suggested that we could run a single model, using GEE to handle the correlated nature of the repeated measures. Here's a bit of the syntax from one of my files. You'll notice that I chose to use my own indicator variables for the 3 time points rather than including variable Time as a factor. I can't remember right now why I did that. (Perhaps I found it was a bit easier to check for equivalence of estimates from the overall model with those obtained in the 3 separate models.)

Anyway, this might give you some ideas about how to specify a model for your situation.

Finally, someone from the mailing list (Rick Oliver, I think?) helped me sort out some problem I was having on the /REPEATED sub-command when I was first playing around with this stuff. So thanks again to him (or whoever it was).

compute t1 = (time EQ 1).
compute t2 = (time EQ 2).
compute t3 = (time EQ 3).
execute.
format t1 to t3 (f1.0).

****************************************************************** .
*** Binary logistic regression: Enter TREAT 1st, then add CABG *** .
****************************************************************** .
temp.
select if t1. /* Use 3-month data only .
LOGISTIC REGRESSION VAR=quit
/METHOD=ENTER treat /ENTER cabg
/PRINT= CI(95)
/CRITERIA PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .

temp.
select if t2. /* Use 6-month data only .
LOGISTIC REGRESSION VAR=quit
/METHOD=ENTER treat /ENTER cabg
/PRINT= CI(95)
/CRITERIA PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .

temp.
select if t3. /* Use 12-month data only .
LOGISTIC REGRESSION VAR=quit
/METHOD=ENTER treat /ENTER cabg
/PRINT= CI(95)
/CRITERIA PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .

****************************************************************** .
*** GEE: Quit = const + treat + t2 + t3 + t2*treat + t3*treat' *** .
****************************************************************** .

* Generalized Estimating Equations.
GENLIN quit (REFERENCE=FIRST) WITH treat t2 t3
/MODEL treat t2 t3 treat*t2 treat*t3 INTERCEPT=YES
DISTRIBUTION=BINOMIAL LINK=LOGIT
/CRITERIA METHOD=FISHER(1) SCALE=1 MAXITERATIONS=100 MAXSTEPHALVING=5
PCONVERGE=1E-006(ABSOLUTE)
SINGULAR=1E-012 ANALYSISTYPE=3(WALD) CILEVEL=95 LIKELIHOOD=FULL
/REPEATED SUBJECT=mrnum WITHINSUBJECT=t2*t3 SORT=YES CORRTYPE=unstructured
ADJUSTCORR=YES
COVB=ROBUST MAXITERATIONS=100 PCONVERGE=1e-006(ABSOLUTE) UPDATECORR=1
/MISSING CLASSMISSING=EXCLUDE
/PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION (EXPONENTIATED).

* For TREAT and the Constant, I duplicate the results seen at 3 months
* in the binary logistic regression.

I then ran two similar models, one using t1 & t3, the other using t1 & t2. And in those models, I duplicated the TREAT and Constant terms seen at 6 and 12 months respectively. That gave me some confidence that the model made sense.

HTH.

kw1130 wrote

I am trying to analyze some unbalanced repeated measures data with a dichotomous dependent variable. I have already searched for the best way to analyze this data and there seems to be some disagreement.

My design is as such: 16 participants viewed three sets of twenty images. The same twenty images were present in each "set"-- the only difference was that there was a consistent color manipulation in each set of images. Therefore, I have two variables: ColorType (three levels) and ImageType(20 levels). Participants were instructed to identify each image as it was presented, and each response was coded as correct or incorrect. Therefore, each participant gave 60 responses (which were each coded as 1= correct, or 0= incorrect): 20 responses per ColorType, or 3 responses per ImageType. Primarily, I'm interested in determining whether identification accuracy varied by ColorType. I'm also interested in determining whether identification accuracy varied by ImageType, and whether there is an interaction between ColorType and ImageType. My dataset is currently set up like this:

Subj ColorType ImageType Accuracy
1 1 1 0
1 1 2 0
1 1 3 1
1 1 4 0

etc. The data is unbalanced in that two subjects failed to give responses for images of one color type.

Someone suggested that I could use the GenLinMixed procedure to analyze the data, which I have tried. I've been using the GUI to run the procedure but I added my syntax at the end. I specified "subj" as the subject variable, both ColorType and ImageType as repeated measures variable, and accuracy as the outcome (binomial logit). However, when I run the procedure it pretty much runs indefinitely without producing any output. I've started the procedure and come back 8 hours later to find no output and "RUNNING GENLINMIXED" in the status bar. I'm not sure if there is a mistake in my syntax or in the way that I've set up my data for this procedure, or if this is just a flaw in the genlinmixed procedure when using a dataset with many rows.

Is genlinmixed the best procedure to analyze this data? Is GEE an alternative method? Or is there a different method altogether that I should be using to analyze data with these requirements (unbalanced, repeated measures, dichotomous DV)?

Any tips would be greatly appreciated.

Here is the syntax:
GENLINMIXED
/DATA_STRUCTURE SUBJECTS=Subj REPEATED_MEASURES=ColorType*ImageType COVARIANCE_TYPE=DIAGONAL
/FIELDS TARGET=Accuracy TRIALS=NONE OFFSET=NONE
/TARGET_OPTIONS DISTRIBUTION=BINOMIAL LINK=LOGIT
/FIXED EFFECTS=ColorType ImageType ColorType*ImageType USE_INTERCEPT=TRUE
/RANDOM EFFECTS=Subj USE_INTERCEPT=FALSE COVARIANCE_TYPE=VARIANCE_COMPONENTS
/BUILD_OPTIONS TARGET_CATEGORY_ORDER=ASCENDING INPUTS_CATEGORY_ORDER=ASCENDING MAX_ITERATIONS=100
CONFIDENCE_LEVEL=95 DF_METHOD=RESIDUAL COVB=MODEL
/EMMEANS_OPTIONS SCALE=ORIGINAL PADJUST=LSD
/OUTFILE MODEL = 'ImageIDOutput.zip'.

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Ryan

Re: GenLinMixed, GEE, or something else for unbalanced repeated measures data with dichotomous DV?

In reply to this post by kw1130

If you want to discuss the effect on a single subject, you ought to employ a random effects model via GENLINMIXED. If all you care about is a population average interpretation then a GEE model via GENLIN should suffice.

Both types of models can account for correlation due to repeated measures on the same subject.

Ryan

Sent from my iPhone

> On Mar 26, 2014, at 11:54 AM, kw1130 <[hidden email]> wrote:
>
> I am trying to analyze some unbalanced repeated measures data with a
> dichotomous dependent variable. I have already searched for the best way to
> analyze this data and there seems to be some disagreement.
>
> My design is as such: 16 participants viewed three sets of twenty images.
> The same twenty images were present in each "set"-- the only difference was
> that there was a consistent color manipulation in each set of images.
> Therefore, I have two variables: ColorType (three levels) and ImageType(20
> levels). Participants were instructed to identify each image as it was
> presented, and each response was coded as correct or incorrect. Therefore,
> each participant gave 60 responses (which were each coded as 1= correct, or
> 0= incorrect): 20 responses per ColorType, or 3 responses per ImageType.
> Primarily, I'm interested in determining whether identification accuracy
> varied by ColorType. I'm also interested in determining whether
> identification accuracy varied by ImageType, and whether there is an
> interaction between ColorType and ImageType. My dataset is currently set up
> like this:
>
>
> Subj ColorType ImageType Accuracy
> 1 1 1 0
> 1 1 2 0
> 1 1 3 1
> 1 1 4 0
>
> etc. The data is unbalanced in that two subjects failed to give responses
> for images of one color type.
>
> Someone suggested that I could use the GenLinMixed procedure to analyze the
> data, which I have tried. I've been using the GUI to run the procedure but I
> added my syntax at the end. I specified "subj" as the subject variable, both
> ColorType and ImageType as repeated measures variable, and accuracy as the
> outcome (binomial logit). However, when I run the procedure it pretty much
> runs indefinitely without producing any output. I've started the procedure
> and come back 8 hours later to find no output and "RUNNING GENLINMIXED" in
> the status bar. I'm not sure if there is a mistake in my syntax or in the
> way that I've set up my data for this procedure, or if this is just a flaw
> in the genlinmixed procedure when using a dataset with many rows.
>
> Is genlinmixed the best procedure to analyze this data? Is GEE an
> alternative method? Or is there a different method altogether that I should
> be using to analyze data with these requirements (unbalanced, repeated
> measures, dichotomous DV)?
>
>
> Any tips would be greatly appreciated.
>
>
>
> Here is the syntax:
> GENLINMIXED
> /DATA_STRUCTURE SUBJECTS=Subj REPEATED_MEASURES=ColorType*ImageType
> COVARIANCE_TYPE=DIAGONAL
> /FIELDS TARGET=Accuracy TRIALS=NONE OFFSET=NONE
> /TARGET_OPTIONS DISTRIBUTION=BINOMIAL LINK=LOGIT
> /FIXED EFFECTS=ColorType ImageType ColorType*ImageType USE_INTERCEPT=TRUE
> /RANDOM EFFECTS=Subj USE_INTERCEPT=FALSE
> COVARIANCE_TYPE=VARIANCE_COMPONENTS
> /BUILD_OPTIONS TARGET_CATEGORY_ORDER=ASCENDING
> INPUTS_CATEGORY_ORDER=ASCENDING MAX_ITERATIONS=100
> CONFIDENCE_LEVEL=95 DF_METHOD=RESIDUAL COVB=MODEL
> /EMMEANS_OPTIONS SCALE=ORIGINAL PADJUST=LSD
> /OUTFILE MODEL = 'ImageIDOutput.zip'.
>
>
>
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/GenLinMixed-GEE-or-something-else-for-unbalanced-repeated-measures-data-with-dichotomous-DV-tp5725040.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Bruce Weaver

Re: GenLinMixed, GEE, or something else for unbalanced repeated measures data with dichotomous DV?

Administrator

In reply to this post by Bruce Weaver

After a little digging, I found the old thread I was thinking about -- see below. The context was my attempt to run a logistic regression model using GEE that would be comparable to a conditional logistic regression with 3:1 matching (which I had run using Stata). And it was Dave Matheson, not Rick Oliver, who helped me get over the final hurdle. (Sorry Rick!) ;-)

https://groups.google.com/forum/#!topic/comp.soft-sys.stat.spss/zkxR016mZxM

HTH.

Bruce Weaver wrote

I think GEE via GENLIN is a legitimate option for your analysis. Here's an example I worked out a few years ago with a colleague who does smoking cessation work. In that field, people had been (maybe still are) running up to 3 separate binary logistic regression models, one for each time point. I suggested that we could run a single model, using GEE to handle the correlated nature of the repeated measures. Here's a bit of the syntax from one of my files. You'll notice that I chose to use my own indicator variables for the 3 time points rather than including variable Time as a factor. I can't remember right now why I did that. (Perhaps I found it was a bit easier to check for equivalence of estimates from the overall model with those obtained in the 3 separate models.)

Anyway, this might give you some ideas about how to specify a model for your situation.

Finally, someone from the mailing list (Rick Oliver, I think?) helped me sort out some problem I was having on the /REPEATED sub-command when I was first playing around with this stuff. So thanks again to him (or whoever it was).

compute t1 = (time EQ 1).
compute t2 = (time EQ 2).
compute t3 = (time EQ 3).
execute.
format t1 to t3 (f1.0).

****************************************************************** .
*** Binary logistic regression: Enter TREAT 1st, then add CABG *** .
****************************************************************** .
temp.
select if t1. /* Use 3-month data only .
LOGISTIC REGRESSION VAR=quit
/METHOD=ENTER treat /ENTER cabg
/PRINT= CI(95)
/CRITERIA PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .

temp.
select if t2. /* Use 6-month data only .
LOGISTIC REGRESSION VAR=quit
/METHOD=ENTER treat /ENTER cabg
/PRINT= CI(95)
/CRITERIA PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .

temp.
select if t3. /* Use 12-month data only .
LOGISTIC REGRESSION VAR=quit
/METHOD=ENTER treat /ENTER cabg
/PRINT= CI(95)
/CRITERIA PIN(.05) POUT(.10) ITERATE(20) CUT(.5) .

****************************************************************** .
*** GEE: Quit = const + treat + t2 + t3 + t2*treat + t3*treat' *** .
****************************************************************** .

* Generalized Estimating Equations.
GENLIN quit (REFERENCE=FIRST) WITH treat t2 t3
/MODEL treat t2 t3 treat*t2 treat*t3 INTERCEPT=YES
DISTRIBUTION=BINOMIAL LINK=LOGIT
/CRITERIA METHOD=FISHER(1) SCALE=1 MAXITERATIONS=100 MAXSTEPHALVING=5
PCONVERGE=1E-006(ABSOLUTE)
SINGULAR=1E-012 ANALYSISTYPE=3(WALD) CILEVEL=95 LIKELIHOOD=FULL
/REPEATED SUBJECT=mrnum WITHINSUBJECT=t2*t3 SORT=YES CORRTYPE=unstructured
ADJUSTCORR=YES
COVB=ROBUST MAXITERATIONS=100 PCONVERGE=1e-006(ABSOLUTE) UPDATECORR=1
/MISSING CLASSMISSING=EXCLUDE
/PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION (EXPONENTIATED).

* For TREAT and the Constant, I duplicate the results seen at 3 months
* in the binary logistic regression.

I then ran two similar models, one using t1 & t3, the other using t1 & t2. And in those models, I duplicated the TREAT and Constant terms seen at 6 and 12 months respectively. That gave me some confidence that the model made sense.

HTH.

kw1130 wrote

I am trying to analyze some unbalanced repeated measures data with a dichotomous dependent variable. I have already searched for the best way to analyze this data and there seems to be some disagreement.

My design is as such: 16 participants viewed three sets of twenty images. The same twenty images were present in each "set"-- the only difference was that there was a consistent color manipulation in each set of images. Therefore, I have two variables: ColorType (three levels) and ImageType(20 levels). Participants were instructed to identify each image as it was presented, and each response was coded as correct or incorrect. Therefore, each participant gave 60 responses (which were each coded as 1= correct, or 0= incorrect): 20 responses per ColorType, or 3 responses per ImageType. Primarily, I'm interested in determining whether identification accuracy varied by ColorType. I'm also interested in determining whether identification accuracy varied by ImageType, and whether there is an interaction between ColorType and ImageType. My dataset is currently set up like this:

Subj ColorType ImageType Accuracy
1 1 1 0
1 1 2 0
1 1 3 1
1 1 4 0

etc. The data is unbalanced in that two subjects failed to give responses for images of one color type.

Someone suggested that I could use the GenLinMixed procedure to analyze the data, which I have tried. I've been using the GUI to run the procedure but I added my syntax at the end. I specified "subj" as the subject variable, both ColorType and ImageType as repeated measures variable, and accuracy as the outcome (binomial logit). However, when I run the procedure it pretty much runs indefinitely without producing any output. I've started the procedure and come back 8 hours later to find no output and "RUNNING GENLINMIXED" in the status bar. I'm not sure if there is a mistake in my syntax or in the way that I've set up my data for this procedure, or if this is just a flaw in the genlinmixed procedure when using a dataset with many rows.

Is genlinmixed the best procedure to analyze this data? Is GEE an alternative method? Or is there a different method altogether that I should be using to analyze data with these requirements (unbalanced, repeated measures, dichotomous DV)?

Any tips would be greatly appreciated.

Here is the syntax:
GENLINMIXED
/DATA_STRUCTURE SUBJECTS=Subj REPEATED_MEASURES=ColorType*ImageType COVARIANCE_TYPE=DIAGONAL
/FIELDS TARGET=Accuracy TRIALS=NONE OFFSET=NONE
/TARGET_OPTIONS DISTRIBUTION=BINOMIAL LINK=LOGIT
/FIXED EFFECTS=ColorType ImageType ColorType*ImageType USE_INTERCEPT=TRUE
/RANDOM EFFECTS=Subj USE_INTERCEPT=FALSE COVARIANCE_TYPE=VARIANCE_COMPONENTS
/BUILD_OPTIONS TARGET_CATEGORY_ORDER=ASCENDING INPUTS_CATEGORY_ORDER=ASCENDING MAX_ITERATIONS=100
CONFIDENCE_LEVEL=95 DF_METHOD=RESIDUAL COVB=MODEL
/EMMEANS_OPTIONS SCALE=ORIGINAL PADJUST=LSD
/OUTFILE MODEL = 'ImageIDOutput.zip'.