|
* Brief review of my project: I'm investing MZ & DZ twin pairs and using
two dichotomous categorical variables to examine differences on several different IVs (some categorical). My analytic strategy is to use SPSS GEE to account for non-independence of twin pairs. * Question 1: What is the appropriate "working correlation matrix"? I have a variable that identifies each individual as belonging to one dyad. I am using this variable as the "Subject" variable on the GEE "Repeated" tab. I have another variable that arbitrarily designates one twin as Twin 1 and one as Twin 2. I have added this variable to "Within-Subjects." What should I specify as the working correlation matrix? Someone advised me that "robust estimator" is appropriate for covariance matrix, as well as the "independent working correlation matrix", but I am not certain that this is correct. Obviously, the "within-subjects" variable is arbitrarily designating one half of the sample as "1" and one half as "2". Does the independent working correlation matrix account for this arbitrary designation? (In simple terms, what assumptions does it make?) * Question 2: Would I need to consider changing the nature of the working correlation matrix depending on the type and distribution of my IV? * Question 3: The distribution of several of my IVs is extremely, extremely skewed and is comparable to the Poisson distribution. Would selecting the Poisson log be more robust than log-transforming the IV and using a linear distribution? (I realize this is a very general question; I just wondered how careful I need to be.) ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Simon,
I do not have time to read your entire post carefully, but I think I have read enough to provide some [hopefully] useful feedback to help you get started. Suppose your data set is structured as:
ID Twin X1 Y
1 1 24 0
1 2 36 1
2 1 16 1
2 2 14 1
3 1 22 0
3 2 10 1
.
.
.
ID = Dyad Identification Number
Twin = Twin Indicator
X1 = Continuous Predictor
Y = Binary Dependent Variable
If you wanted to test for the effect of X1 on the binary dependent variable, Y, while accounting for correlation of residuals obtained from Twins, then you could fit a generalized linear model using the following code:
GENLIN Y (REFERENCE=FIRST) WITH X1 /MODEL X1 INTERCEPT=YES DISTRIBUTION=BINOMIAL LINK=LOGIT /REPEATED SUBJECT=ID WITHINSUBJECT=Twin SORT=YES CORRTYPE=EXCHANGEABLE ADJUSTCORR=YES COVB=ROBUST /PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION. Note that ID is the SUBJECT variable, Twin is the WITHINSUBJECT variable, and the specified correlation type is EXCHANGEABLE. The EXCHANGEABLE type assumes there is residual corrrelation, while the default INDEPENDENT type does not assume any such correlation.
If your dependent variable is continuous, then I suggest you consider fitting a linear mixed model via the MIXED procedure. I prefer not to comment any further in this particular post.
HTH,
Ryan
On Tue, Sep 14, 2010 at 11:44 PM, Simon - [hidden email] <[hidden email]> wrote: * Brief review of my project: I'm investing MZ & DZ twin pairs and using |
|
I read the OP a little differently. I believe the should be an
additional IV nZygotes to distinguish dizyotic and monozygotic
twins.
Art On 9/15/2010 10:38 AM, R B wrote: ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
In reply to this post by Simon - slmartys@gmail.com
I am up on design of experiments but not on the GEE procedure (i.e.,
with a dichotomous DV). Perhaps someone else is more familiar with GEE. BTW I see this as repeated measures. I am familiar with "clustering" in the context of cluster analyses (pattern detection) and in the context of clusters of cases in complex samples, but may be missing something in the subject of the OP. Do GEE people use "clustering" to refer to repeated measures, i.e., the same measurement method on the same case/dyad under different conditions (doses, times, member of a dyad, litter mates, etc.) Also please explicate what you mean about the twins being more correlated rather than the measurements o the twins being more correlated? or were you just using shorthand? Also if you are trying to tease out heredity vs environment the difference between the adjusted correlations for measurements within each kind of win would be an important aspect. Art Kendall On 9/15/2010 4:16 PM, Simon - [hidden email] wrote: > Hmmm, I hadn't thought much about the MZ-DZ distinction because I was so > concerned with the other questions. My guess is that it wouldn't matter > for this, but if I wanted to account for it, would I add it as another IV > factor...? > > There is the problem that MZ twins are more correlated than DZ twins, so > we would guess that their residuals will be as well. Using the > exchangeable covariance matrix will provide the most rigorous correction > for clustering, correct? I'm bamboozled why someone told me independent > correlation matrix...he said it was "completely counterintuitive" but the > correct option. > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
Administrator
|
Hi Art. You, and others who are interested, may find this introduction to GEE useful: http://aje.oxfordjournals.org/content/157/4/364.short If you don't have access, drop me a line off-list (at the e-mail given in my sig file below), and I'll send you a copy. Cheers, Bruce
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
In reply to this post by Simon - slmartys@gmail.com
Bruce has posted a link that I down loaded and that looks useful.
In the social/behavioral sciences "correlation" most commonly refers to a relation of variables. I spoke many years ago to the people at SUDAAN that they would do well to clarify that they were using "correlated" as in the inclusion of one case was related to the inclusion of another. (in your application the other twin). Another instance of what I think of as "correlated inclusion" is when there is complex sampling like schools within districts, etc. It will be interesting to see whether the article recognizes the distinctive way it uses the word "correlated" data. In DOE terminology any factor between or within (aka repeated) can be fixed (representing a stratification, having all the values the possible the variable representing the factor can take) or random (a random subset of the possible values the factor could take). WRT twins' measurements being more correlated the extreme example would be that both members of a MZ dyad would have the same value for gender, whereas many DZ twins would have different values. Art On 9/15/2010 5:02 PM, Simon - [hidden email] wrote: > I'm still learning this stuff, but I *believe* that people sometimes use the > term "clustering" in GEE to refer to correlated data in which people belong > to a dyad, litter, family, group, as opposed to longitudinal data invlving > multiple time points. But yes, I think it all falls under the general > repeated measures rubric. > > Sorry for the confusion -- yes, I meant that measurements between MZ twins > are more correlated than measurements between DZ twins. > > On Wed, 15 Sep 2010 16:34:17 -0400, Art Kendall<[hidden email]> wrote: > >> I am up on design of experiments but not on the GEE procedure (i.e., >> with a dichotomous DV). >> Perhaps someone else is more familiar with GEE. >> >> BTW I see this as repeated measures. I am familiar with "clustering" in >> the context of cluster analyses (pattern detection) and in the context >> of clusters of cases in complex samples, but may be missing something in >> the subject of the OP. Do GEE people use "clustering" to refer to >> repeated measures, i.e., the same measurement method on the same >> case/dyad under different conditions (doses, times, member of a dyad, >> litter mates, etc.) >> >> Also please explicate what you mean about the twins being more >> correlated rather than the measurements o the twins being more >> correlated? or were you just using shorthand? >> >> Also if you are trying to tease out heredity vs environment the >> difference between the adjusted correlations for measurements within >> each kind of win would be an important aspect. >> >> Art Kendall >> >> >> On 9/15/2010 4:16 PM, Simon - [hidden email] wrote: >>> Hmmm, I hadn't thought much about the MZ-DZ distinction because I was so >>> concerned with the other questions. My guess is that it wouldn't matter >>> for this, but if I wanted to account for it, would I add it as another IV >>> factor...? >>> >>> There is the problem that MZ twins are more correlated than DZ twins, so >>> we would guess that their residuals will be as well. Using the >>> exchangeable covariance matrix will provide the most rigorous correction >>> for clustering, correct? I'm bamboozled why someone told me independent >>> correlation matrix...he said it was "completely counterintuitive" but the >>> correct option. >>> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> [hidden email] (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
In reply to this post by Bruce Weaver
Thanks. I was able to down load it.
Art Kendall On 9/15/2010 5:21 PM, Bruce Weaver wrote: ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARDArt Kendall wrote:I am up on design of experiments but not on the GEE procedure (i.e., with a dichotomous DV). Perhaps someone else is more familiar with GEE. --- snip ---Hi Art. You, and others who are interested, may find this introduction to GEE useful: http://aje.oxfordjournals.org/content/157/4/364.short If you don't have access, drop me a line off-list (at the e-mail given in my sig file below), and I'll send you a copy. Cheers, Bruce ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Generalized-Estimating-Equations-Clustering-tp2840658p2841404.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
In reply to this post by Art Kendall
Specifying a generalized estimating equation (GEE) via the GENLIN procedure allows one to account for residual correlation due to repeated measures. Thus, the model I presented can be thought of as a repeated measures binary logistic regression model. If the residual covariance structure were assumed to be independent, we would be back at the standard binary logistic regression model. Really, the only point I was trying to make in my previous post was to show how to account for residual correlation due to repeated measures.
If the residual correlation among MZ twins is different than the residual correlation among DZ twins, then one would need to allow for that heterogeneity in the covariance matrix. It seems to me that to determine the most appropriate model (fixed effects portion and residual covariance portion), one would require more information--at least more than I absorbed when reading the original post!
I think Art's comments are certainly helping the OP get closer to a more appropriate model.
Ryan
On Wed, Sep 15, 2010 at 4:34 PM, Art Kendall <[hidden email]> wrote: I am up on design of experiments but not on the GEE procedure (i.e., |
| Free forum by Nabble | Edit this page |
