SPSSX Discussion

Using GLM for Repeated Measures Logistic Regression

Classic

List

Threaded

6 messages Options

mdb

Using GLM for Repeated Measures Logistic Regression

We have a dichotomous dependent variable DRUG. 36 individuals, in random order, received drug A at one timepoint and drug B at a later timepoint. We have a number of independent variables (e.g. blood pressure) measured at both timepoints. We're interested in determining how well all of the independent variables together predict which drug the subject received at a given timepoint.

We were initially using the Binary Logistic Regression in SPSS 17 to do this, but realized we needed to account for the fact that the measures of an independent variable for a single subject wouldn't be independent across timepoints (e.g. if blookd pressure is high under drug A, blood pressure may be more likely to be high under drug B). So we moved to using the Generalized Linear Models, selecting Binomial distribution and Logit link function. However, I can't figure out how to add the within-subject piece of it through the user interface. I think I can do it through the syntax window using the "/Repeated Subject=name" line of code, but then the omnibus table disappears. (This happens in both SPSS 17 and PASW 18) The code I'm using (simplified for one indep variable) is copied at the end of this post.

Is there a way to avoid losing the omnibus table? Or a way to run it through the user interface?

And one follow up question - the nice part of using the Binary Logistic Regression rather than the Generalized Linear Models is that the former gives you that nice classification table showing how many cases the model correctly predicted. Is there anyway to get something similar in the Generalized Linear Models?

Thanks,
mdb

------------------------------------------------------------------
GENLIN flag (REFERENCE=LAST) WITH BloodPressure
/MODEL BloodPressure INTERCEPT=YES
DISTRIBUTION=BINOMIAL LINK=LOGIT
/Repeated Subject=name
/CRITERIA METHOD=FISHER(1) SCALE=1 COVB=MODEL MAXITERATIONS=100 MAXSTEPHALVING=5
PCONVERGE=1E-006(ABSOLUTE) SINGULAR=1E-012 ANALYSISTYPE=3(WALD) CILEVEL=95 CITYPE=WALD
LIKELIHOOD=FULL
/MISSING CLASSMISSING=EXCLUDE
/PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION.

Alex Reutter

Re: Using GLM for Repeated Measures Logistic Regression

The repeated measures piece can be done in the GUI through the Analyze > Generalized Linear Models > Generalized Estimating Equations dialog.

Good question about the omnibus tests; the answer is in the GENLIN algorithms, in the section on Generalized Estimating Equations: "Since GEE is not a likelihood-based method of estimation, the inferences based on likelihoods are not possible for GEEs. Most notably, the Lagrange multiplier test, goodness-of-fit tests, and omnibus tests are invalid and will not be offered. " The algorithms are in PDF format on the installation disks, or as part of the help system (Help > Algorithms).

I'm afraid GENLIN doesn't have a classification table as part of the output, so you would need to /SAVE PREDVAL in GENLIN and then run CROSSTABS.

Also note that in v19, Generalized Linear Mixed Models provides an alternative to GEE for fitting repeated measures.

Alex

From:	mdb <[hidden email]>
To:	[hidden email]
Date:	05/20/2011 12:04 PM
Subject:	Using GLM for Repeated Measures Logistic Regression
Sent by:	"SPSSX(r) Discussion" <[hidden email]>

We have a dichotomous dependent variable DRUG. 36 individuals, in random order, received drug A at one timepoint and drug B at a later timepoint. We have a number of independent variables (e.g. blood pressure) measured at both timepoints. We're interested in determining how well all of the independent variables together predict which drug the subject received at a given timepoint. We were initially using the Binary Logistic Regression in SPSS 17 to do this, but realized we needed to account for the fact that the measures of an independent variable for a single subject wouldn't be independent across timepoints (e.g. if blookd pressure is high under drug A, blood pressure may be more likely to be high under drug B). So we moved to using the Generalized Linear Models, selecting Binomial distribution and Logit link function. However, I can't figure out how to add the within-subject piece of it through the user interface. I think I can do it through the syntax window using the "/Repeated Subject=name" line of code, but then the omnibus table disappears. (This happens in both SPSS 17 and PASW 18) The code I'm using (simplified for one indep variable) is copied at the end of this post. Is there a way to avoid losing the omnibus table? Or a way to run it through the user interface? And one follow up question - the nice part of using the Binary Logistic Regression rather than the Generalized Linear Models is that the former gives you that nice classification table showing how many cases the model correctly predicted. Is there anyway to get something similar in the Generalized Linear Models? Thanks, mdb ------------------------------------------------------------------ GENLIN flag (REFERENCE=LAST) WITH BloodPressure /MODEL BloodPressure INTERCEPT=YES DISTRIBUTION=BINOMIAL LINK=LOGIT /Repeated Subject=name /CRITERIA METHOD=FISHER(1) SCALE=1 COVB=MODEL MAXITERATIONS=100 MAXSTEPHALVING=5 PCONVERGE=1E-006(ABSOLUTE) SINGULAR=1E-012 ANALYSISTYPE=3(WALD) CILEVEL=95 CITYPE=WALD LIKELIHOOD=FULL /MISSING CLASSMISSING=EXCLUDE /PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION. -- View this message in context:http://spssx-discussion.1045642.n5.nabble.com/Using-GLM-for-Repeated-Measures-Logistic-Regression-tp4413047p4413047.htmlSent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Maguin, Eugene

Re: Using GLM for Repeated Measures Logistic Regression

In reply to this post by mdb

MDB,

I might be misunderstanding your purpose but this analysis seems pointless.
Except for random variation, nothing should be significant and everything
should be B coefficient = 0.00. I say that because at time 1 people were
randomly assigned to receive drug A or drug B. Using time 1 data you could
predict Drug A receipt (yes/no) in a straight forward logistic regression
but all you'd be checking is whether your randomization worked.

I must be missing something but what is it?

Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of mdb
Sent: Friday, May 20, 2011 1:01 PM
To: [hidden email]
Subject: Using GLM for Repeated Measures Logistic Regression

We have a dichotomous dependent variable DRUG. 36 individuals, in random
order, received drug A at one timepoint and drug B at a later timepoint. We
have a number of independent variables (e.g. blood pressure) measured at
both timepoints. We're interested in determining how well all of the
independent variables together predict which drug the subject received at a
given timepoint.

We were initially using the Binary Logistic Regression in SPSS 17 to do
this, but realized we needed to account for the fact that the measures of an
independent variable for a single subject wouldn't be independent across
timepoints (e.g. if blookd pressure is high under drug A, blood pressure may
be more likely to be high under drug B). So we moved to using the
Generalized Linear Models, selecting Binomial distribution and Logit link
function. However, I can't figure out how to add the within-subject piece
of it through the user interface. I think I can do it through the syntax
window using the "/Repeated Subject=name" line of code, but then the omnibus
table disappears. (This happens in both SPSS 17 and PASW 18) The code I'm
using (simplified for one indep variable) is copied at the end of this post.

Is there a way to avoid losing the omnibus table? Or a way to run it
through the user interface?

And one follow up question - the nice part of using the Binary Logistic
Regression rather than the Generalized Linear Models is that the former
gives you that nice classification table showing how many cases the model
correctly predicted. Is there anyway to get something similar in the
Generalized Linear Models?

Thanks,
mdb

------------------------------------------------------------------
GENLIN flag (REFERENCE=LAST) WITH BloodPressure
/MODEL BloodPressure INTERCEPT=YES
DISTRIBUTION=BINOMIAL LINK=LOGIT
/Repeated Subject=name
/CRITERIA METHOD=FISHER(1) SCALE=1 COVB=MODEL MAXITERATIONS=100
MAXSTEPHALVING=5
PCONVERGE=1E-006(ABSOLUTE) SINGULAR=1E-012 ANALYSISTYPE=3(WALD)
CILEVEL=95 CITYPE=WALD
LIKELIHOOD=FULL
/MISSING CLASSMISSING=EXCLUDE
/PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION.

--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Using-GLM-for-Repeated-Measure
s-Logistic-Regression-tp4413047p4413047.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Bruce Weaver

Re: Using GLM for Repeated Measures Logistic Regression

Administrator

In reply to this post by mdb

I don't see a TIME variable in your syntax. Here's an example I used for a situation with repeated measures at 3 time points. Rather than include a single TIME variable as a "factor", I included two indicator variables for times 2 and 3 in this instance.

GENLIN quit (REFERENCE=FIRST) WITH treat t2 t3
/MODEL treat t2 t3 treat*t2 treat*t3 INTERCEPT=YES
DISTRIBUTION=BINOMIAL LINK=LOGIT
/CRITERIA METHOD=FISHER(1) SCALE=1 MAXITERATIONS=100 MAXSTEPHALVING=5 PCONVERGE=1E-006(ABSOLUTE)
SINGULAR=1E-012 ANALYSISTYPE=3(WALD) CILEVEL=95 LIKELIHOOD=FULL
/REPEATED SUBJECT=mrnum WITHINSUBJECT=t2*t3 SORT=YES CORRTYPE=unstructured ADJUSTCORR=YES
COVB=ROBUST MAXITERATIONS=100 PCONVERGE=1e-006(ABSOLUTE) UPDATECORR=1
/MISSING CLASSMISSING=EXCLUDE
/PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION (EXPONENTIATED).

Applying that approach to your situation gives something like this:

GENLIN flag (REFERENCE=LAST) WITH BloodPressure t2
/MODEL BloodPressure t2 BloodPressure*t2 INTERCEPT=YES
DISTRIBUTION=BINOMIAL LINK=LOGIT
/CRITERIA METHOD=FISHER(1) SCALE=1 MAXITERATIONS=100 MAXSTEPHALVING=5 PCONVERGE=1E-006(ABSOLUTE)
SINGULAR=1E-012 ANALYSISTYPE=3(WALD) CILEVEL=95 LIKELIHOOD=FULL
/REPEATED SUBJECT=name WITHINSUBJECT=t2 SORT=YES CORRTYPE=unstructured ADJUSTCORR=YES
COVB=ROBUST MAXITERATIONS=100 PCONVERGE=1e-006(ABSOLUTE) UPDATECORR=1
/MISSING CLASSMISSING=EXCLUDE
/PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION (EXPONENTIATED).

Variable t2 is an indicator for the second time point (i.e., t2=0 for time 1, t2=1 for time 2), so the first time point is the reference category for odds ratios. If there is no evidence for a BloodPressure*t2 interaction, you might want to remove that term.

This does not address all your questions, but I hope it helps.

mdb wrote

We have a dichotomous dependent variable DRUG. 36 individuals, in random order, received drug A at one timepoint and drug B at a later timepoint. We have a number of independent variables (e.g. blood pressure) measured at both timepoints. We're interested in determining how well all of the independent variables together predict which drug the subject received at a given timepoint.

We were initially using the Binary Logistic Regression in SPSS 17 to do this, but realized we needed to account for the fact that the measures of an independent variable for a single subject wouldn't be independent across timepoints (e.g. if blookd pressure is high under drug A, blood pressure may be more likely to be high under drug B). So we moved to using the Generalized Linear Models, selecting Binomial distribution and Logit link function. However, I can't figure out how to add the within-subject piece of it through the user interface. I think I can do it through the syntax window using the "/Repeated Subject=name" line of code, but then the omnibus table disappears. (This happens in both SPSS 17 and PASW 18) The code I'm using (simplified for one indep variable) is copied at the end of this post.

Is there a way to avoid losing the omnibus table? Or a way to run it through the user interface?

And one follow up question - the nice part of using the Binary Logistic Regression rather than the Generalized Linear Models is that the former gives you that nice classification table showing how many cases the model correctly predicted. Is there anyway to get something similar in the Generalized Linear Models?

Thanks,
mdb

------------------------------------------------------------------
GENLIN flag (REFERENCE=LAST) WITH BloodPressure
/MODEL BloodPressure INTERCEPT=YES
DISTRIBUTION=BINOMIAL LINK=LOGIT
/Repeated Subject=name
/CRITERIA METHOD=FISHER(1) SCALE=1 COVB=MODEL MAXITERATIONS=100 MAXSTEPHALVING=5
PCONVERGE=1E-006(ABSOLUTE) SINGULAR=1E-012 ANALYSISTYPE=3(WALD) CILEVEL=95 CITYPE=WALD
LIKELIHOOD=FULL
/MISSING CLASSMISSING=EXCLUDE
/PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION.

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

mdb

Re: Using GLM for Repeated Measures Logistic Regression

Alex - Your pointer to saving PREDVAL and then running CROSSTABS was excellent. Works perfectly. But with respect to the lack of an omnibus table, then how does one determine the significance of the overall model (as opposed to the individual coefficients)?

Gene - We're actually interested in checking how good our measures (blood pressure was a simple example, but we have hundreds obtained using different tools) taken together are in predicting whether A or B was administered, the timepoint is largely irrelevant to us. Maybe my initial phrasing contributed to the confusion.

Bruce - Thanks for the suggested syntax - I am still playing with it. But I'm not sure I understand the logic of including the time variable in the withinSubject . We are actually not concerned with what timepoint someone received A or B. Each person got both A and B, and we don't expect order to matter. All we're trying to do is account for the fact that they were repeated measures - as above, maybe my initial description was fuzzy on this.

Given all that, is it possible to just ignore the fact that these are repeated measures and run a standard binomial logistic regression (not through GLM), perhaps by first running another test to get comfortable that there is some independence between the repeated measures for a subject?

Thanks, all, for your help.

mdb

Alex Reutter

Re: Using GLM for Repeated Measures Logistic Regression

There isn't a significance test for this, but you can use the information criteria reported in the goodness-of-fit table to compare models. From Help > Case Studies, then Advanced Statistics > Generalized Linear Models > Generalized Estimating Equations,

* The Quasi-likelihood under Independence Model Criterion (QIC) can be used to help you choose between two correlation structures, given a set of model terms. The structure that obtains the smaller QIC is "better" according to this criterion.
* The Corrected Quasi-likelihood under Independence Model Criterion (QICC) can be used to help you choose between two sets of model terms, given a correlation structure. The model that obtains the smaller QICC is "better" according to this criterion. The computation of the QICC assumes that the distribution, link function, and working correlation matrix specifications are all "correct" for the dataset.

You could compare the QICC for your model with one for a "null" model (one with no predictors) and the same correlation structure. If the QICC for your model is lower, then at least you know you're doing better than guessing.

Alex

From:	mdb <[hidden email]>
To:	[hidden email]
Date:	05/20/2011 01:56 PM
Subject:	Re: Using GLM for Repeated Measures Logistic Regression
Sent by:	"SPSSX(r) Discussion" <[hidden email]>

Alex - Your pointer to saving PREDVAL and then running CROSSTABS was excellent. Works perfectly. But with respect to the lack of an omnibus table, then how does one determine the significance of the overall model (as opposed to the individual coefficients)?