SPSSX Discussion

Genlinmixed

Classic

List

Threaded

6 messages Options

David Fine

Genlinmixed

I've used spss logistic regression extensively. But, I have a data set
with a dichotomous outcome (labtest result) and some individual patient
risk factors that could be predictors. I have about 100+ clinic sites and
30K test result records. Given this is a sample of clinics I believed the
correct approach would be to use genlinmixed and include a random effect
for clinic--instead of logistic reg. I have 2 questions--and will start
with the simpler one:
Should I be using genlinmixed and specifying clinic as random effect, as
follows (for looking at race/ethnicity (nominal) and test result? Or
should i be using a different procedure?
GENLINMIXED
/DATA_STRUCTURE SUBJECTS=clinic*patientID
/FIELDS TARGET=testresult TRIALS=NONE OFFSET=NONE
/TARGET_OPTIONS DISTRIBUTION=BINOMIAL LINK=LOGIT
/FIXED EFFECTS= racethnicity USE_INTERCEPT=TRUE
/RANDOM USE_INTERCEPT=TRUE SUBJECTS=clinic
COVARIANCE_TYPE=VARIANCE_COMPONENTS
/BUILD_OPTIONS TARGET_CATEGORY_ORDER=DESCENDING
INPUTS_CATEGORY_ORDER=DESCENDING
MAX_ITERATIONS=100 CONFIDENCE_LEVEL=95 DF_METHOD=RESIDUAL COVB= MODEL
/EMMEANS_OPTIONS SCALE=ORIGINAL PADJUST=LSD.

Here's the 2nd question. The data set and analysis is actually a bit more
complicated. This is a multi-level analysis where, besides patient level
measures, I have some clinic characteristics as well as some areal
measures for each patiet based on their residence's zip code. So when
doing bivariate analyses to assess impact of patient or clinic-level
characteristics I have included a random effect for clinic. When looking
at areal (SES, pop density etc.) measures in bivariate analyses I have
used ZIPcode as the random effect. I believe this is the correct way to go
(but welcome feedback).
But, now that I want to do multivariate runs looking at individual, clinic
and areal predictors I do not see how I can include 2 random effect
measures (clinic and zip) using genlinmixed. Should I be using a different
spss procedure or what am I missing in terms of syntax within genlinmixed?
Thanks
dave
PS--this is my first post (and a long one). If I've done somethhing wrong
in terms of posting, please lemme know and i will modify re: future posts.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Ryan

Re: Genlinmixed

David,

Generalized linear mixed model procedures, such as GENLINMIXED, are designed to model data which come from a family of exponential distributions, conditional upon normally distributed random effects. Your data appear to fall under that umbrella. Admittedly, some of your terminology (e.g., where you use the terms bivariate and multivariate) is a bit confusing to me, but I will simply chalk it up to semantics.

I see merit in treating both clinic and patient zip code as random effects. Whether it is possible to do so using the GENLINMXED procedure is something you will have to test. (I would use the GLIMMIX procedure in SAS for this type of analysis due to its ability to employ a superior estimation method, and if too computationally intensive, I would likely use the MCMC procedure in SAS.) If permitted in GENLINMIXED, you should add RANDOM statement where the subject is "zipcode."

Side note: Since I can't see your dataset, let me just say that you want your subject identification variable specification to result in each subject having a unique identifier. For example, you don't want GENLINMIXED to treat the first person in the first zipcode to be treated as the same person who happens to be the first person in second zipcode. I don't think that will happen based on your subject*clinic subject identification specification, but just make sure that is true. If I were you, I would just create a unique identifier for each subject that starts at 1 and ends at N number of subjects.

With the inclusion of the second RANDOM statement, it would be acceptable to add fixed effects predictors at the patient-level, clinic-level, and now patient zipcode-level.

Good luck.

Ryan

On Sat, Apr 6, 2013 at 1:05 PM, David Fine <[hidden email]> wrote:

I've used spss logistic regression extensively. But, I have a data set
with a dichotomous outcome (labtest result) and some individual patient
risk factors that could be predictors. I have about 100+ clinic sites and
30K test result records. Given this is a sample of clinics I believed the
correct approach would be to use genlinmixed and include a random effect
for clinic--instead of logistic reg. I have 2 questions--and will start
with the simpler one:
Should I be using genlinmixed and specifying clinic as random effect, as
follows (for looking at race/ethnicity (nominal) and test result? Or
should i be using a different procedure?
GENLINMIXED
/DATA_STRUCTURE SUBJECTS=clinic*patientID
/FIELDS TARGET=testresult TRIALS=NONE OFFSET=NONE
/TARGET_OPTIONS DISTRIBUTION=BINOMIAL LINK=LOGIT
/FIXED EFFECTS= racethnicity USE_INTERCEPT=TRUE
/RANDOM USE_INTERCEPT=TRUE SUBJECTS=clinic
COVARIANCE_TYPE=VARIANCE_COMPONENTS
/BUILD_OPTIONS TARGET_CATEGORY_ORDER=DESCENDING
INPUTS_CATEGORY_ORDER=DESCENDING
MAX_ITERATIONS=100 CONFIDENCE_LEVEL=95 DF_METHOD=RESIDUAL COVB= MODEL
/EMMEANS_OPTIONS SCALE=ORIGINAL PADJUST=LSD.

Here's the 2nd question. The data set and analysis is actually a bit more
complicated. This is a multi-level analysis where, besides patient level
measures, I have some clinic characteristics as well as some areal
measures for each patiet based on their residence's zip code. So when
doing bivariate analyses to assess impact of patient or clinic-level
characteristics I have included a random effect for clinic. When looking
at areal (SES, pop density etc.) measures in bivariate analyses I have
used ZIPcode as the random effect. I believe this is the correct way to go
(but welcome feedback).
But, now that I want to do multivariate runs looking at individual, clinic
and areal predictors I do not see how I can include 2 random effect
measures (clinic and zip) using genlinmixed. Should I be using a different
spss procedure or what am I missing in terms of syntax within genlinmixed?
Thanks
dave
PS--this is my first post (and a long one). If I've done somethhing wrong
in terms of posting, please lemme know and i will modify re: future posts.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Ryan

Re: Genlinmixed

It just hit me...When speaking about unique identification, I was actually referring to the possibility of 2 or more subjects in the same zipcode inadvertently having the same ID #. Bottom line is to make sure subjects are not being mixed up.

Best,

Ryan

Sent from my iPhone

On Apr 6, 2013, at 3:01 PM, R B <[hidden email]> wrote:

David,

Generalized linear mixed model procedures, such as GENLINMIXED, are designed to model data which come from a family of exponential distributions, conditional upon normally distributed random effects. Your data appear to fall under that umbrella. Admittedly, some of your terminology (e.g., where you use the terms bivariate and multivariate) is a bit confusing to me, but I will simply chalk it up to semantics.

I see merit in treating both clinic and patient zip code as random effects. Whether it is possible to do so using the GENLINMXED procedure is something you will have to test. (I would use the GLIMMIX procedure in SAS for this type of analysis due to its ability to employ a superior estimation method, and if too computationally intensive, I would likely use the MCMC procedure in SAS.) If permitted in GENLINMIXED, you should add RANDOM statement where the subject is "zipcode."

Side note: Since I can't see your dataset, let me just say that you want your subject identification variable specification to result in each subject having a unique identifier. For example, you don't want GENLINMIXED to treat the first person in the first zipcode to be treated as the same person who happens to be the first person in second zipcode. I don't think that will happen based on your subject*clinic subject identification specification, but just make sure that is true. If I were you, I would just create a unique identifier for each subject that starts at 1 and ends at N number of subjects.

With the inclusion of the second RANDOM statement, it would be acceptable to add fixed effects predictors at the patient-level, clinic-level, and now patient zipcode-level.

Good luck.

Ryan

On Sat, Apr 6, 2013 at 1:05 PM, David Fine <[hidden email]> wrote:

I've used spss logistic regression extensively. But, I have a data set
with a dichotomous outcome (labtest result) and some individual patient
risk factors that could be predictors. I have about 100+ clinic sites and
30K test result records. Given this is a sample of clinics I believed the
correct approach would be to use genlinmixed and include a random effect
for clinic--instead of logistic reg. I have 2 questions--and will start
with the simpler one:
Should I be using genlinmixed and specifying clinic as random effect, as
follows (for looking at race/ethnicity (nominal) and test result? Or
should i be using a different procedure?
GENLINMIXED
/DATA_STRUCTURE SUBJECTS=clinic*patientID
/FIELDS TARGET=testresult TRIALS=NONE OFFSET=NONE
/TARGET_OPTIONS DISTRIBUTION=BINOMIAL LINK=LOGIT
/FIXED EFFECTS= racethnicity USE_INTERCEPT=TRUE
/RANDOM USE_INTERCEPT=TRUE SUBJECTS=clinic
COVARIANCE_TYPE=VARIANCE_COMPONENTS
/BUILD_OPTIONS TARGET_CATEGORY_ORDER=DESCENDING
INPUTS_CATEGORY_ORDER=DESCENDING
MAX_ITERATIONS=100 CONFIDENCE_LEVEL=95 DF_METHOD=RESIDUAL COVB= MODEL
/EMMEANS_OPTIONS SCALE=ORIGINAL PADJUST=LSD.

Here's the 2nd question. The data set and analysis is actually a bit more
complicated. This is a multi-level analysis where, besides patient level
measures, I have some clinic characteristics as well as some areal
measures for each patiet based on their residence's zip code. So when
doing bivariate analyses to assess impact of patient or clinic-level
characteristics I have included a random effect for clinic. When looking
at areal (SES, pop density etc.) measures in bivariate analyses I have
used ZIPcode as the random effect. I believe this is the correct way to go
(but welcome feedback).
But, now that I want to do multivariate runs looking at individual, clinic
and areal predictors I do not see how I can include 2 random effect
measures (clinic and zip) using genlinmixed. Should I be using a different
spss procedure or what am I missing in terms of syntax within genlinmixed?
Thanks
dave
PS--this is my first post (and a long one). If I've done somethhing wrong
in terms of posting, please lemme know and i will modify re: future posts.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Gareth Edwards-7

Automatic reply: Genlinmixed

I am currently out of the office and will return Jan 3, 2013. Please e-mail Elizabeth McGaha at [hidden email] or Alicia Filson at [hidden email] if you need assistance.

Maguin, Eugene

Re: Genlinmixed

In reply to this post by David Fine

Yeah, multilevel analysis with
Level 1: Y = b0 + b1*race + e
Level 2: b0 = g0 + ??? ... ?? + r

Is the clinic id specific to the patient across clinics or might (enough to matter) patients have gone to multiple clinics, thereby acquiring multiple patient ids? Would you know if they did?

Race, coded as percent of clinic patients, could be a level 2 predictor.
You could evaluate whether the level 1 b1 coefficient is fixed or random, i.e., odds ratio of race and test results varies across clinics.

Zipcodes makes for a much more complicated model, I think, because clinics might well serve patients from multiple zipcodes and patients from given zipcode share the zipcode level data.

I'm curious: what is (are) the question(s) you want to answer using this dataset?

Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Fine
Sent: Saturday, April 06, 2013 1:06 PM
To: [hidden email]
Subject: Genlinmixed

I've used spss logistic regression extensively. But, I have a data set with a dichotomous outcome (labtest result) and some individual patient risk factors that could be predictors. I have about 100+ clinic sites and 30K test result records. Given this is a sample of clinics I believed the correct approach would be to use genlinmixed and include a random effect for clinic--instead of logistic reg. I have 2 questions--and will start with the simpler one:
Should I be using genlinmixed and specifying clinic as random effect, as follows (for looking at race/ethnicity (nominal) and test result? Or should i be using a different procedure?
GENLINMIXED
/DATA_STRUCTURE SUBJECTS=clinic*patientID
/FIELDS TARGET=testresult TRIALS=NONE OFFSET=NONE
/TARGET_OPTIONS DISTRIBUTION=BINOMIAL LINK=LOGIT
/FIXED EFFECTS= racethnicity USE_INTERCEPT=TRUE
/RANDOM USE_INTERCEPT=TRUE SUBJECTS=clinic COVARIANCE_TYPE=VARIANCE_COMPONENTS
/BUILD_OPTIONS TARGET_CATEGORY_ORDER=DESCENDING INPUTS_CATEGORY_ORDER=DESCENDING
MAX_ITERATIONS=100 CONFIDENCE_LEVEL=95 DF_METHOD=RESIDUAL COVB= MODEL
/EMMEANS_OPTIONS SCALE=ORIGINAL PADJUST=LSD.

Here's the 2nd question. The data set and analysis is actually a bit more complicated. This is a multi-level analysis where, besides patient level measures, I have some clinic characteristics as well as some areal measures for each patiet based on their residence's zip code. So when doing bivariate analyses to assess impact of patient or clinic-level characteristics I have included a random effect for clinic. When looking at areal (SES, pop density etc.) measures in bivariate analyses I have used ZIPcode as the random effect. I believe this is the correct way to go (but welcome feedback).
But, now that I want to do multivariate runs looking at individual, clinic and areal predictors I do not see how I can include 2 random effect measures (clinic and zip) using genlinmixed. Should I be using a different spss procedure or what am I missing in terms of syntax within genlinmixed?
Thanks
dave
PS--this is my first post (and a long one). If I've done somethhing wrong in terms of posting, please lemme know and i will modify re: future posts.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

David Fine

Re: Genlinmixed

Gene et al.,
See answers (as best as possible) below.
dave

________________________________________
From: SPSSX(r) Discussion [[hidden email]] on behalf of Maguin, Eugene [[hidden email]]
Sent: Sunday, April 07, 2013 12:19 PM
To: [hidden email]
Subject: Re: Genlinmixed

Yeah, multilevel analysis with
Level 1: Y = b0 + b1*race + e
Level 2: b0 = g0 + ??? ... ?? + r

Is the clinic id specific to the patient across clinics or might (enough to matter) patients have gone to multiple clinics, thereby acquiring multiple patient ids? Would you know if they did?

Ans: it depends. Some agencies with multiple clinics use a unique patient ID across sites. But, in general we cannot determine patient mobility. The record of interest in these data is an STD test (chlamydia)--rather than a patient-level record. Also, in general there are relatively few patients with multiple test visits per year (based on some other work).

Race, coded as percent of clinic patients, could be a level 2 predictor.
You could evaluate whether the level 1 b1 coefficient is fixed or random, i.e., odds ratio of race and test results varies across clinics.

Yes. similarly their are individual and clinic-level analogues for a few other measures, e.g. age

Zipcodes makes for a much more complicated model, I think, because clinics might well serve patients from multiple zipcodes and patients from given zipcode share the zipcode level data.
The zip code is used to create ordered categorical measures of areal socioeconomic position...or ABSM area-based socioeconomic measures.

I'm curious: what is (are) the question(s) you want to answer using this dataset?
What are the predictors of chlamydial infection among young (15-24 year old) female family planning clients in the northwest. Possible predictors include demographics, sexual risk behaviors, clinical measures, clinic characteristics and areal SEP vars. It's an attempt to address some elements of (implied) multi-level models of STI acquisition (e.g. Hogben and Leichliter 2008 and others working on social determinants of health)

Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of David Fine
Sent: Saturday, April 06, 2013 1:06 PM
To: [hidden email]
Subject: Genlinmixed

I've used spss logistic regression extensively. But, I have a data set with a dichotomous outcome (labtest result) and some individual patient risk factors that could be predictors. I have about 100+ clinic sites and 30K test result records. Given this is a sample of clinics I believed the correct approach would be to use genlinmixed and include a random effect for clinic--instead of logistic reg. I have 2 questions--and will start with the simpler one:
Should I be using genlinmixed and specifying clinic as random effect, as follows (for looking at race/ethnicity (nominal) and test result? Or should i be using a different procedure?
GENLINMIXED
/DATA_STRUCTURE SUBJECTS=clinic*patientID
/FIELDS TARGET=testresult TRIALS=NONE OFFSET=NONE
/TARGET_OPTIONS DISTRIBUTION=BINOMIAL LINK=LOGIT
/FIXED EFFECTS= racethnicity USE_INTERCEPT=TRUE
/RANDOM USE_INTERCEPT=TRUE SUBJECTS=clinic COVARIANCE_TYPE=VARIANCE_COMPONENTS
/BUILD_OPTIONS TARGET_CATEGORY_ORDER=DESCENDING INPUTS_CATEGORY_ORDER=DESCENDING
MAX_ITERATIONS=100 CONFIDENCE_LEVEL=95 DF_METHOD=RESIDUAL COVB= MODEL
/EMMEANS_OPTIONS SCALE=ORIGINAL PADJUST=LSD.

Here's the 2nd question. The data set and analysis is actually a bit more complicated. This is a multi-level analysis where, besides patient level measures, I have some clinic characteristics as well as some areal measures for each patiet based on their residence's zip code. So when doing bivariate analyses to assess impact of patient or clinic-level characteristics I have included a random effect for clinic. When looking at areal (SES, pop density etc.) measures in bivariate analyses I have used ZIPcode as the random effect. I believe this is the correct way to go (but welcome feedback).
But, now that I want to do multivariate runs looking at individual, clinic and areal predictors I do not see how I can include 2 random effect measures (clinic and zip) using genlinmixed. Should I be using a different spss procedure or what am I missing in terms of syntax within genlinmixed?
Thanks
dave
PS--this is my first post (and a long one). If I've done somethhing wrong in terms of posting, please lemme know and i will modify re: future posts.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD