Interpretation of a valid negative variance component / ICC

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Interpretation of a valid negative variance component / ICC

Ryan
All,
 
A comment was made regarding the possibility of fitting an incorrect variance-covariance structure, which could result in a negative variance component. While it is true that an incorrect variance-covariance could result in a negative variance component, it should be noted that, within a linear mixed model, often when this occurs, the variance-covariance structure is overly-complex given the data at hand. In fact, it is so common that some multilevel textbooks recommend that when a negative variance component is observed, simplification of the variance-covariance model be considered as the next logical step.
 
However, if one fits a simple random intercept model, assuming a reasonably balanced design with adequate sample sizes at both levels, then the possibility does exist that the negative variance component is not only accurate, but interpretable. Moreover, if one forces the valid negative variance component to be zero (e.g. MIXED procedure in SPSS forces a random intercept to be non-negative), simulation studies have shown that the Type I Error Rate is inflated. OTOH, if one allows the variance component to be negative, the Type I Error Rate is maintained (reference: SAS for Mixed Models, 2nd ed., by Littell et al.).
 
So, the question some might be pondering is...What type of naturally occurring environment would lead to a negative variance component? Littell et al. in SAS for mixed models (2nd ed.) discuss how a naturally competitive environment within a plot can result in a negative variance component.
 
Competitive Environment Example: Suppose there are 50 cages, and within each of the 50 cages, 3 adult dogs of the same breed, health status, gender, age, and weight are placed in each cage on day 1. A single, large bowl of food of the same amount is placed in each cage every day for 30 days. Given that some dogs tend to be dominant while others tend to be submissive, would we be surprised to see a great deal of within-cage weight variability at the end of 30 days? Probably not. On the other hand, the means across the cages would be expected to be quite similar since the same amount of food had been placed in each cage every day. This could easily result in greater within-cage variability as compared to between-cage variability. Under such a scenario, it would not be surprising  to obtain a negative between-cage variance component, and ultimately and negative intraclass correlation coefficient. (Note that this example was adapted from an example provided on SAS-L)
 
Using the simulated example I provide BELOW, the between-cage variance component is -.310601.
 
Interpretation: The between-cage variance is smaller than would be expected by a value of .310601.
 
How do we calculate this value using data that is simulated from the code below? In addition to reading the output from the VARCOMP procedure or MIXED procedure (changing the structure from CSR to CS), we could subtract the between-cage variability that would be expected (within-subject variability / number of dogs per cage = 1.266169/3 = .422056) from the observed between cage variance (variance of cage means = .111455):
 
between cage-variance component = .111455 - .422056 = .310601
 
Note: If we wanted to obtain the within-subject variability estimate (1.266169) without employing VARCOMP or MIXED, we could also use the MEANS procedure.
 
Let's go a step further now and calculate the ICC (between-cage variance component) / (between-cage variance component + within-cage variance component)
 
ICC  = -.310601 / (-.310601 + 1.266169) = -.325043
 
Interpretation: This negative ICC reflects the within-cage correlation to be -.325043. Again, under a naturally competitive environment with a mix of dominant and submissive animals, would we be surprised to observe a negative within-cage correlation with respect to dog weight? Of course not! It makes perfect sense, and the model is deriving estimates that are interpretable.
 
The information provided above comes directly from various sources, including messages posted on SAS-L over the years, along with a section in the book, SAS for Mixed Models, 2nd Edition, dedicated to interpreting negative variance components. I will admit that I was stuck trying to figure out the simulation code for quite a while, which delayed my posting this message. As some of you can tell, I prefer to provide simulated data to illustrate (and to some extent, validate) what I state. I was fortunate to obtain help creating the simulation code from someone off-list. I am grateful to that person. This can, at times, slow me down from responding to a message, but it's a preference of mine.
 
Before I provide the code, I'd like to make one final comment. I understand how a negative variance component is disconcerting, to say the least. However, when interpreted correctly, it no longer seems problematic [to me]. For those who are uncomfortable observing negative variance components, there is an alternative parameterization that directly estimates the within-subject correlation! As you will see, the MIXED code below estimates the within-subject correlation under the compound-symmetric residual correlation structure. After employing the MIXED model, I take advantage of the VARCOMP procedure to estimate the negative variance component. Note that one could also simply change the residual variance-covariance structure in the MIXED code from CSR to CS to obtain the negative variance component.
 
I will stop at this point and simply provide the code to estimate the model.
 
Hope this is of interest to others.
 
Ryan
--
 
set seed 98734523.
 
new file.

 inp pro.
 compute plot=-99.
 compute subject = -99.
 compute x1 = -99.
 compute x2 = -99.
 compute x3 = -99.
 compute e1 = -99.
 compute e2 = -99.
 compute e3 = -99.
 compute sigma = 1.
 compute rho = -0.35.
 compute a11 = 1.
 compute a21 = rho.
 compute a31 = rho.
 compute a22 = sqrt(1 - rho**2).
 compute a32 = (rho/(1+rho))*sqrt(1 - rho**2).
 compute a33 = sqrt(((1 - rho)*(1 + 2*rho))/(1 + rho)).
 
 leave plot to a33.
 
   loop plot= 1 to 50.
   compute x1 = rv.normal(0,1).
   compute x2 = rv.normal(0,1).
   compute x3 = rv.normal(0,1).
   compute e1 = sigma * a11*x1.
   compute e2 = sigma * (a21*x1 + a22*x2).
   compute e3 = sigma * (a31*x1 + a32*x2 + a33*x3).
 
       loop subject = 1 to 3.
       compute y = e1*(subject=1) + e2*(subject=2) + e3*(subject=3).
       end case.
    end loop.
  end loop.
 end file.
 end inp pro.
 exe.
 
delete variables x1 x2 x3 sigma rho a11 a21 a31 a22 a32 a33 e1 e2 e3.
 
MIXED y
   /FIXED= | SSTYPE(3)
   /METHOD=REML
   /PRINT=R SOLUTION
   /REPEATED=subject | SUBJECT(plot) COVTYPE(CSR).
 
VARCOMP y BY plot
  /RANDOM=plot
  /METHOD=SSTYPE(3)
  /PRINT=SS
  /PRINT=EMS
  /DESIGN
  /INTERCEPT=INCLUDE.
 
Reply | Threaded
Open this post in threaded view
|

Re: Interpretation of a valid negative variance component / ICC

Rich Ulrich
Ryan,
Thanks for the discussion and the example.
I see this as a response to what I pointed out.

I can say that I never considered a negative ICC as
something to call a "variance component" so I never
was distressed by the fact that it is negative.  Like any
correlation, its variance is positive -- but it adds to the
error for certain tests, rather than subtracting from them.

The problem that I pointed to was that the incorrect,
"indifferent" default var-cov matrix could thus artificially
inflate the rejection rate -- even as you do point out, below. 
Thus, the default is not always "safe" and conservative. 

Thanks, again.

--
Rich Ulrich
(on vacation until Wednesday - possibly not reading the List til then.)


Date: Thu, 21 Mar 2013 22:08:11 -0400
From: [hidden email]
Subject: Interpretation of a valid negative variance component / ICC
To: [hidden email]

All,
 
A comment was made regarding the possibility of fitting an incorrect variance-covariance structure, which could result in a negative variance component. While it is true that an incorrect variance-covariance could result in a negative variance component, it should be noted that, within a linear mixed model, often when this occurs, the variance-covariance structure is overly-complex given the data at hand. In fact, it is so common that some multilevel textbooks recommend that when a negative variance component is observed, simplification of the variance-covariance model be considered as the next logical step.
 
However, if one fits a simple random intercept model, assuming a reasonably balanced design with adequate sample sizes at both levels, then the possibility does exist that the negative variance component is not only accurate, but interpretable. Moreover, if one forces the valid negative variance component to be zero (e.g. MIXED procedure in SPSS forces a random intercept to be non-negative), simulation studies have shown that the Type I Error Rate is inflated. OTOH, if one allows the variance component to be negative, the Type I Error Rate is maintained (reference: SAS for Mixed Models, 2nd ed., by Littell et al.).
 
So, the question some might be pondering is...What type of naturally occurring environment would lead to a negative variance component? Littell et al. in SAS for mixed models (2nd ed.) discuss how a naturally competitive environment within a plot can result in a negative variance component.
 
Competitive Environment Example: Suppose there are 50 cages, and within each of the 50 cages, 3 adult dogs of the same breed, health status, gender, age, and weight are placed in each cage on day 1. A single, large bowl of food of the same amount is placed in each cage every day for 30 days. Given that some dogs tend to be dominant while others tend to be submissive, would we be surprised to see a great deal of within-cage weight variability at the end of 30 days? Probably not. On the other hand, the means across the cages would be expected to be quite similar since the same amount of food had been placed in each cage every day. This could easily result in greater within-cage variability as compared to between-cage variability. Under such a scenario, it would not be surprising  to obtain a negative between-cage variance component, and ultimately and negative intraclass correlation coefficient. (Note that this example was adapted from an example provided on SAS-L)
 
Using the simulated example I provide BELOW, the between-cage variance component is -.310601.
 
Interpretation: The between-cage variance is smaller than would be expected by a value of .310601.
 
How do we calculate this value using data that is simulated from the code below? In addition to reading the output from the VARCOMP procedure or MIXED procedure (changing the structure from CSR to CS), we could subtract the between-cage variability that would be expected (within-subject variability / number of dogs per cage = 1.266169/3 = .422056) from the observed between cage variance (variance of cage means = .111455):
 
between cage-variance component = .111455 - .422056 = .310601
 
Note: If we wanted to obtain the within-subject variability estimate (1.266169) without employing VARCOMP or MIXED, we could also use the MEANS procedure.
 
Let's go a step further now and calculate the ICC (between-cage variance component) / (between-cage variance component + within-cage variance component)
 
ICC  = -.310601 / (-.310601 + 1.266169) = -.325043
 
Interpretation: This negative ICC reflects the within-cage correlation to be -.325043. Again, under a naturally competitive environment with a mix of dominant and submissive animals, would we be surprised to observe a negative within-cage correlation with respect to dog weight? Of course not! It makes perfect sense, and the model is deriving estimates that are interpretable.
 
The information provided above comes directly from various sources, including messages posted on SAS-L over the years, along with a section in the book, SAS for Mixed Models, 2nd Edition, dedicated to interpreting negative variance components. I will admit that I was stuck trying to figure out the simulation code for quite a while, which delayed my posting this message. As some of you can tell, I prefer to provide simulated data to illustrate (and to some extent, validate) what I state. I was fortunate to obtain help creating the simulation code from someone off-list. I am grateful to that person. This can, at times, slow me down from responding to a message, but it's a preference of mine.
 
Before I provide the code, I'd like to make one final comment. I understand how a negative variance component is disconcerting, to say the least. However, when interpreted correctly, it no longer seems problematic [to me]. For those who are uncomfortable observing negative variance components, there is an alternative parameterization that directly estimates the within-subject correlation! As you will see, the MIXED code below estimates the within-subject correlation under the compound-symmetric residual correlation structure. After employing the MIXED model, I take advantage of the VARCOMP procedure to estimate the negative variance component. Note that one could also simply change the residual variance-covariance structure in the MIXED code from CSR to CS to obtain the negative variance component.
 
I will stop at this point and simply provide the code to estimate the model.
 
Hope this is of interest to others.
 
Ryan
--
 
set seed 98734523.
 
new file.

 inp pro.
 compute plot=-99.
 compute subject = -99.
 compute x1 = -99.
 compute x2 = -99.
 compute x3 = -99.
 compute e1 = -99.
 compute e2 = -99.
 compute e3 = -99.
 compute sigma = 1.
 compute rho = -0.35.
 compute a11 = 1.
 compute a21 = rho.
 compute a31 = rho.
 compute a22 = sqrt(1 - rho**2).
 compute a32 = (rho/(1+rho))*sqrt(1 - rho**2).
 compute a33 = sqrt(((1 - rho)*(1 + 2*rho))/(1 + rho)).
 
 leave plot to a33.
 
   loop plot= 1 to 50.
   compute x1 = rv.normal(0,1).
   compute x2 = rv.normal(0,1).
   compute x3 = rv.normal(0,1).
   compute e1 = sigma * a11*x1.
   compute e2 = sigma * (a21*x1 + a22*x2).
   compute e3 = sigma * (a31*x1 + a32*x2 + a33*x3).
 
       loop subject = 1 to 3.
       compute y = e1*(subject=1) + e2*(subject=2) + e3*(subject=3).
       end case.
    end loop.
  end loop.
 end file.
 end inp pro.
 exe.
 
delete variables x1 x2 x3 sigma rho a11 a21 a31 a22 a32 a33 e1 e2 e3.
 
MIXED y
   /FIXED= | SSTYPE(3)
   /METHOD=REML
   /PRINT=R SOLUTION
   /REPEATED=subject | SUBJECT(plot) COVTYPE(CSR).
 
VARCOMP y BY plot
  /RANDOM=plot
  /METHOD=SSTYPE(3)
  /PRINT=SS
  /PRINT=EMS
  /DESIGN
  /INTERCEPT=INCLUDE.
 
Reply | Threaded
Open this post in threaded view
|

Re: Interpretation of a valid negative variance component / ICC

Mike
In reply to this post by Ryan
Not to get into any religious wars about negative variance components
and whether they are valid or not and/or whether they occur or not,
I'd just like to make a couple of  points.
 
In one of the posts made a few days ago, someone pointed out that if
one were to do a correlated groups t-test using the standard formula like
the following:
 
obtained t= (M1 - M2) /sqrt (VE1 + VE2 - 2*r*SE1*SE2)
 
Where M1, M2 are means of time 1 and time 2 (or sample 1 and 2),
VE1, VE2 are variance errors for time 1 and time 2,
SE1, SE2 are standard errors for time 1 and time,
r is the correlation between values at time 1 and time 2,
and the constant value=2.
 
I have never come across a situation where r is negative, especially
in a within-subjects design but, for the moment, assume that this is
possible.  To illustrate what r is telling us, assume the formula
r= sum(Z1*Z2)/(N-1)
where Z1 and Z2 are the z-score transforms of values at time 1 and time2,
sum(Z1*Z2) is the sum of the products of the Z1 and Z2, and N is the
number of pairs.
 
For r to be negative, then one of the Z values has to be negative.
To beat a dead horse:  if Z1 is a deviation above its mean, then
Z2 is BELOW its mean.  Similarly, if Z1 is a deviation below its
mean then Z2 is ABOVE its mean.  The pattern of deviations are
reversed and I cannot think of any situation in, say, human or
animal psychology where this would occur.  If anyone has an
example, I'd like to see it and the explanation for why it occurs.
In human dyads, this would require an "opposites attract" assumption
(for the appropriate submission/compensatory mechanism to work)
and not "like gets like" (i.e., males who are a "9" or "10" get females
who are a "9" or "10").
 
I believe Kenny has argued for something like this in the analysis
of dyads, such as when one member of a couple in a relationship
is high on some attribute or behavior, the opposite is lower. But
note, they cannot be simpler lower, they have to be below the
mean of their group which I find odd since, just from personal
experience, I don't think most couples engage in such inhibitory
and compensatory mechanisms.  In male-female couples,
outgoing males would have to have submissive females who
compensate for their male partners when the males inhibit their
behaviors.
 
In the example provided below involving dogs, my first reaction is
to ask whether this is based on actual research data or just a made up
example.  With made-up data, of course, any set of constraints
can be imagined and assumed but the more important question is
are these constraints commonly experienced or represent rare instances? 
 
For example, is it always true that if you took 3 dogs at a time, one
would be dominant, and the others submissive (or at least one submissive)? 
What if there were 3 "alpha" dogs who fight it out?  What if there were
3 submissive dogs?  Does one have make sure that one always selects
a "dominant/alpha" dog, a "submissive/dog", and perhaps one that
falls in-between (i.e., can change roles easily)?
 
I've only worked with rats and pigeons who were tested alone and
the only firm conclusion that I take away from that experience is
that animals rarely behave the way you expect them to.  Your
example below assumes a stable pattern of dominance relationship
(i.e., Dog 1 dominates Dog 2 who Dominates Dog 3 or Dog 1
always dominates both Dogs 2 and 3) across cages, which would
produce stable within-cage variability.  But if this dominance pattern
is not constant, then this is likely to NOT be the case. So, this appears
to be an important but unstated assumption. 
 
Among other unstated assumptions:
 
(1)  There is no "treatment effect", that is, if weight after 30 days is
the dependent variable and all dogs are given the same amount of
food, then weight gain/loss should be the same, that is, the means
should only differ because of random factors. 
 
This would suggest that the obtained F should be close to 1.00,
under the ordinary null hypothesis.  But the Means procedure ANOVA
produces an F-value of F(49,100)=0.264, p= 1.00.  Now, I know
the ANOVA F is a one-tailed test but I was taught that whenever
you get an F-value close to 0.00, then there is something that is
reducing the variability among the means relative to your estimate
of random error.  That is, you do have a treatment effect but it is
reducing the variability among means.  The next question is Why? 
For a made-up example, the answer is irrelevant -- it was constructed
that way.  For real data, what factors are operating to reduce variability? 
What did you not control for?
 
(2)  You also are assuming that all dogs are the same breed and
metabolize food at the same rate.  However, if you have Great Danes
in one cage and Chihuahuas in the next cage, I suspect that the
number changes.  And if you mix up Great Danes or Pitbulls
or German Shepards and Chihuahuas together, you will get
severely different numbers depending on what breeds are in
the cage.  I know that one can get rats with a known "breed"
but where is one going to get the equivalent in dogs?  But if
you can, then your conclusions apply only to that breed.  If
you use different breeds to increase external validity, I don't
think one gets that numbers you're using, IMO.
 
Okay, bottom line, it seems to me that negative intraclass coefficients
are possible under certain conditions.  If one creates such conditions,
then I think it is fair to use the negative intraclass coefficients as
indicators of those conditions.  However, if one gets negative intraclass
coefficients in naturalistic situations, one probably has to examine those
situations closely for what factors are operating (e.g., submission/compensatory
responding, factors that restrict variability, etc.) and determine why
they are operating.  I could be wrong but I think that these situations
are not that common.
 
-Mike Palij
New York University
 
 
 
----- Original Message -----
Sent: Thursday, March 21, 2013 10:08 PM
Subject: Interpretation of a valid negative variance component / ICC

All,
 
A comment was made regarding the possibility of fitting an incorrect variance-covariance structure, which could result in a negative variance component. While it is true that an incorrect variance-covariance could result in a negative variance component, it should be noted that, within a linear mixed model, often when this occurs, the variance-covariance structure is overly-complex given the data at hand. In fact, it is so common that some multilevel textbooks recommend that when a negative variance component is observed, simplification of the variance-covariance model be considered as the next logical step.
 
However, if one fits a simple random intercept model, assuming a reasonably balanced design with adequate sample sizes at both levels, then the possibility does exist that the negative variance component is not only accurate, but interpretable. Moreover, if one forces the valid negative variance component to be zero (e.g. MIXED procedure in SPSS forces a random intercept to be non-negative), simulation studies have shown that the Type I Error Rate is inflated. OTOH, if one allows the variance component to be negative, the Type I Error Rate is maintained (reference: SAS for Mixed Models, 2nd ed., by Littell et al.).
 
So, the question some might be pondering is...What type of naturally occurring environment would lead to a negative variance component? Littell et al. in SAS for mixed models (2nd ed.) discuss how a naturally competitive environment within a plot can result in a negative variance component.
 
Competitive Environment Example: Suppose there are 50 cages, and within each of the 50 cages, 3 adult dogs of the same breed, health status, gender, age, and weight are placed in each cage on day 1. A single, large bowl of food of the same amount is placed in each cage every day for 30 days. Given that some dogs tend to be dominant while others tend to be submissive, would we be surprised to see a great deal of within-cage weight variability at the end of 30 days? Probably not. On the other hand, the means across the cages would be expected to be quite similar since the same amount of food had been placed in each cage every day. This could easily result in greater within-cage variability as compared to between-cage variability. Under such a scenario, it would not be surprising  to obtain a negative between-cage variance component, and ultimately and negative intraclass correlation coefficient. (Note that this example was adapted from an example provided on SAS-L)
 
Using the simulated example I provide BELOW, the between-cage variance component is -.310601.
 
Interpretation: The between-cage variance is smaller than would be expected by a value of .310601.
 
How do we calculate this value using data that is simulated from the code below? In addition to reading the output from the VARCOMP procedure or MIXED procedure (changing the structure from CSR to CS), we could subtract the between-cage variability that would be expected (within-subject variability / number of dogs per cage = 1.266169/3 = .422056) from the observed between cage variance (variance of cage means = .111455):
 
between cage-variance component = .111455 - .422056 = .310601
 
Note: If we wanted to obtain the within-subject variability estimate (1.266169) without employing VARCOMP or MIXED, we could also use the MEANS procedure.
 
Let's go a step further now and calculate the ICC (between-cage variance component) / (between-cage variance component + within-cage variance component)
 
ICC  = -.310601 / (-.310601 + 1.266169) = -.325043
 
Interpretation: This negative ICC reflects the within-cage correlation to be -.325043. Again, under a naturally competitive environment with a mix of dominant and submissive animals, would we be surprised to observe a negative within-cage correlation with respect to dog weight? Of course not! It makes perfect sense, and the model is deriving estimates that are interpretable.
 
The information provided above comes directly from various sources, including messages posted on SAS-L over the years, along with a section in the book, SAS for Mixed Models, 2nd Edition, dedicated to interpreting negative variance components. I will admit that I was stuck trying to figure out the simulation code for quite a while, which delayed my posting this message. As some of you can tell, I prefer to provide simulated data to illustrate (and to some extent, validate) what I state. I was fortunate to obtain help creating the simulation code from someone off-list. I am grateful to that person. This can, at times, slow me down from responding to a message, but it's a preference of mine.
 
Before I provide the code, I'd like to make one final comment. I understand how a negative variance component is disconcerting, to say the least. However, when interpreted correctly, it no longer seems problematic [to me]. For those who are uncomfortable observing negative variance components, there is an alternative parameterization that directly estimates the within-subject correlation! As you will see, the MIXED code below estimates the within-subject correlation under the compound-symmetric residual correlation structure. After employing the MIXED model, I take advantage of the VARCOMP procedure to estimate the negative variance component. Note that one could also simply change the residual variance-covariance structure in the MIXED code from CSR to CS to obtain the negative variance component.
 
I will stop at this point and simply provide the code to estimate the model.
 
Hope this is of interest to others.
 
Ryan
--
 
set seed 98734523.
 
new file.

 inp pro.
 compute plot=-99.
 compute subject = -99.
 compute x1 = -99.
 compute x2 = -99.
 compute x3 = -99.
 compute e1 = -99.
 compute e2 = -99.
 compute e3 = -99.
 compute sigma = 1.
 compute rho = -0.35.
 compute a11 = 1.
 compute a21 = rho.
 compute a31 = rho.
 compute a22 = sqrt(1 - rho**2).
 compute a32 = (rho/(1+rho))*sqrt(1 - rho**2).
 compute a33 = sqrt(((1 - rho)*(1 + 2*rho))/(1 + rho)).
 
 leave plot to a33.
 
   loop plot= 1 to 50.
   compute x1 = rv.normal(0,1).
   compute x2 = rv.normal(0,1).
   compute x3 = rv.normal(0,1).
   compute e1 = sigma * a11*x1.
   compute e2 = sigma * (a21*x1 + a22*x2).
   compute e3 = sigma * (a31*x1 + a32*x2 + a33*x3).
 
       loop subject = 1 to 3.
       compute y = e1*(subject=1) + e2*(subject=2) + e3*(subject=3).
       end case.
    end loop.
  end loop.
 end file.
 end inp pro.
 exe.
 
delete variables x1 x2 x3 sigma rho a11 a21 a31 a22 a32 a33 e1 e2 e3.
 
MIXED y
   /FIXED= | SSTYPE(3)
   /METHOD=REML
   /PRINT=R SOLUTION
   /REPEATED=subject | SUBJECT(plot) COVTYPE(CSR).
 
VARCOMP y BY plot
  /RANDOM=plot
  /METHOD=SSTYPE(3)
  /PRINT=SS
  /PRINT=EMS
  /DESIGN
  /INTERCEPT=INCLUDE.
 
Reply | Threaded
Open this post in threaded view
|

Re: Interpretation of a valid negative variance component / ICC

Ryan
Mike,
 
I see you had quite a bit to say, much of which I could spend a great deal of time responding to, but it's been a long day, and truth be told, I agree with your concluding paragraph which I presume summarizes your point.
 
SAS for Mixed Models (2nd ed.), a very highly regarded mixed models textbook, has a section dedicated to addressing this issue in great detail, indicating that when negative within-plot correlations occur, they usually arise from designs in which subjects (e.g., rats, pigs) are placed in competitive or interference environments. The illustration I provided was simply intended to resemble what I have read in textbooks, online forums, and articles that have discussed/reported negative within-plot correlations.
 
I do not have access to my personal library, but if you are interested in reading more on this topic, I know for certain that "SAS for Mixed Models" (2nd ed.) has a very detailed section on this topic, along with an example. It provides a comprehensive evaluation of negative variance components. When I get back to my office and can sift through my library, I can send you other references.
 
For the record, the only dogs I deal with are the ones in my house, both of which are free to roam as they please, including on top of my keyboard...I wonder who the alphas and betas are in my house?!
 
No "religious wars", my friend. :-)
 
Best wishes,
 
Ryan
 
On Fri, Mar 22, 2013 at 9:57 AM, Mike Palij <[hidden email]> wrote:
>
> Not to get into any religious wars about negative variance components
> and whether they are valid or not and/or whether they occur or not,
> I'd just like to make a couple of  points.
>  
> In one of the posts made a few days ago, someone pointed out that if
> one were to do a correlated groups t-test using the standard formula like
> the following:
>  
> obtained t= (M1 - M2) /sqrt (VE1 + VE2 - 2*r*SE1*SE2)
>  
> Where M1, M2 are means of time 1 and time 2 (or sample 1 and 2),
> VE1, VE2 are variance errors for time 1 and time 2,
> SE1, SE2 are standard errors for time 1 and time,
> r is the correlation between values at time 1 and time 2,
> and the constant value=2.
>  
> I have never come across a situation where r is negative, especially
> in a within-subjects design but, for the moment, assume that this is
> possible.  To illustrate what r is telling us, assume the formula
> r= sum(Z1*Z2)/(N-1)
> where Z1 and Z2 are the z-score transforms of values at time 1 and time2,
> sum(Z1*Z2) is the sum of the products of the Z1 and Z2, and N is the
> number of pairs.
>  
> For r to be negative, then one of the Z values has to be negative.
> To beat a dead horse:  if Z1 is a deviation above its mean, then
> Z2 is BELOW its mean.  Similarly, if Z1 is a deviation below its
> mean then Z2 is ABOVE its mean.  The pattern of deviations are
> reversed and I cannot think of any situation in, say, human or
> animal psychology where this would occur.  If anyone has an
> example, I'd like to see it and the explanation for why it occurs.
> In human dyads, this would require an "opposites attract" assumption
> (for the appropriate submission/compensatory mechanism to work)
> and not "like gets like" (i.e., males who are a "9" or "10" get females
> who are a "9" or "10").
>  
> I believe Kenny has argued for something like this in the analysis
> of dyads, such as when one member of a couple in a relationship
> is high on some attribute or behavior, the opposite is lower. But
> note, they cannot be simpler lower, they have to be below the
> mean of their group which I find odd since, just from personal
> experience, I don't think most couples engage in such inhibitory
> and compensatory mechanisms.  In male-female couples,
> outgoing males would have to have submissive females who
> compensate for their male partners when the males inhibit their
> behaviors.
>  
> In the example provided below involving dogs, my first reaction is
> to ask whether this is based on actual research data or just a made up
> example.  With made-up data, of course, any set of constraints
> can be imagined and assumed but the more important question is
> are these constraints commonly experienced or represent rare instances?
>  
> For example, is it always true that if you took 3 dogs at a time, one
> would be dominant, and the others submissive (or at least one submissive)?
> What if there were 3 "alpha" dogs who fight it out?  What if there were
> 3 submissive dogs?  Does one have make sure that one always selects
> a "dominant/alpha" dog, a "submissive/dog", and perhaps one that
> falls in-between (i.e., can change roles easily)?
>  
> I've only worked with rats and pigeons who were tested alone and
> the only firm conclusion that I take away from that experience is
> that animals rarely behave the way you expect them to.  Your
> example below assumes a stable pattern of dominance relationship
> (i.e., Dog 1 dominates Dog 2 who Dominates Dog 3 or Dog 1
> always dominates both Dogs 2 and 3) across cages, which would
> produce stable within-cage variability.  But if this dominance pattern
> is not constant, then this is likely to NOT be the case. So, this appears
> to be an important but unstated assumption.
>  
> Among other unstated assumptions:
>  
> (1)  There is no "treatment effect", that is, if weight after 30 days is
> the dependent variable and all dogs are given the same amount of
> food, then weight gain/loss should be the same, that is, the means
> should only differ because of random factors.
>  
> This would suggest that the obtained F should be close to 1.00,
> under the ordinary null hypothesis.  But the Means procedure ANOVA
> produces an F-value of F(49,100)=0.264, p= 1.00.  Now, I know
> the ANOVA F is a one-tailed test but I was taught that whenever
> you get an F-value close to 0.00, then there is something that is
> reducing the variability among the means relative to your estimate
> of random error.  That is, you do have a treatment effect but it is
> reducing the variability among means.  The next question is Why?
> For a made-up example, the answer is irrelevant -- it was constructed
> that way.  For real data, what factors are operating to reduce variability?
> What did you not control for?
>  
> (2)  You also are assuming that all dogs are the same breed and
> metabolize food at the same rate.  However, if you have Great Danes
> in one cage and Chihuahuas in the next cage, I suspect that the
> number changes.  And if you mix up Great Danes or Pitbulls
> or German Shepards and Chihuahuas together, you will get
> severely different numbers depending on what breeds are in
> the cage.  I know that one can get rats with a known "breed"
> but where is one going to get the equivalent in dogs?  But if
> you can, then your conclusions apply only to that breed.  If
> you use different breeds to increase external validity, I don't
> think one gets that numbers you're using, IMO.
>  
> Okay, bottom line, it seems to me that negative intraclass coefficients
> are possible under certain conditions.  If one creates such conditions,
> then I think it is fair to use the negative intraclass coefficients as
> indicators of those conditions.  However, if one gets negative intraclass
> coefficients in naturalistic situations, one probably has to examine those
> situations closely for what factors are operating (e.g., submission/compensatory
> responding, factors that restrict variability, etc.) and determine why
> they are operating.  I could be wrong but I think that these situations
> are not that common.
>  
> -Mike Palij
> New York University
>  
>  
>  
>
> ----- Original Message -----
> From: R B
> Sent: Thursday, March 21, 2013 10:08 PM
> Subject: Interpretation of a valid negative variance component / ICC
>
> All,
>  
> A comment was made regarding the possibility of fitting an incorrect variance-covariance structure, which could result in a negative variance component. While it is true that an incorrect variance-covariance could result in a negative variance component, it should be noted that, within a linear mixed model, often when this occurs, the variance-covariance structure is overly-complex given the data at hand. In fact, it is so common that some multilevel textbooks recommend that when a negative variance component is observed, simplification of the variance-covariance model be considered as the next logical step.
>  
> However, if one fits a simple random intercept model, assuming a reasonably balanced design with adequate sample sizes at both levels, then the possibility does exist that the negative variance component is not only accurate, but interpretable. Moreover, if one forces the valid negative variance component to be zero (e.g. MIXED procedure in SPSS forces a random intercept to be non-negative), simulation studies have shown that the Type I Error Rate is inflated. OTOH, if one allows the variance component to be negative, the Type I Error Rate is maintained (reference: SAS for Mixed Models, 2nd ed., by Littell et al.).
>  
> So, the question some might be pondering is...What type of naturally occurring environment would lead to a negative variance component? Littell et al. in SAS for mixed models (2nd ed.) discuss how a naturally competitive environment within a plot can result in a negative variance component.
>  
> Competitive Environment Example: Suppose there are 50 cages, and within each of the 50 cages, 3 adult dogs of the same breed, health status, gender, age, and weight are placed in each cage on day 1. A single, large bowl of food of the same amount is placed in each cage every day for 30 days. Given that some dogs tend to be dominant while others tend to be submissive, would we be surprised to see a great deal of within-cage weight variability at the end of 30 days? Probably not. On the other hand, the means across the cages would be expected to be quite similar since the same amount of food had been placed in each cage every day. This could easily result in greater within-cage variability as compared to between-cage variability. Under such a scenario, it would not be surprising  to obtain a negative between-cage variance component, and ultimately and negative intraclass correlation coefficient. (Note that this example was adapted from an example provided on SAS-L)
>  
> Using the simulated example I provide BELOW, the between-cage variance component is -.310601.
>  
> Interpretation: The between-cage variance is smaller than would be expected by a value of .310601.
>  
> How do we calculate this value using data that is simulated from the code below? In addition to reading the output from the VARCOMP procedure or MIXED procedure (changing the structure from CSR to CS), we could subtract the between-cage variability that would be expected (within-subject variability / number of dogs per cage = 1.266169/3 = .422056) from the observed between cage variance (variance of cage means = .111455):
>  
> between cage-variance component = .111455 - .422056 = .310601
>  
> Note: If we wanted to obtain the within-subject variability estimate (1.266169) without employing VARCOMP or MIXED, we could also use the MEANS procedure.
>  
> Let's go a step further now and calculate the ICC (between-cage variance component) / (between-cage variance component + within-cage variance component)
>  
> ICC  = -.310601 / (-.310601 + 1.266169) = -.325043
>  
> Interpretation: This negative ICC reflects the within-cage correlation to be -.325043. Again, under a naturally competitive environment with a mix of dominant and submissive animals, would we be surprised to observe a negative within-cage correlation with respect to dog weight? Of course not! It makes perfect sense, and the model is deriving estimates that are interpretable.
>  
> The information provided above comes directly from various sources, including messages posted on SAS-L over the years, along with a section in the book, SAS for Mixed Models, 2nd Edition, dedicated to interpreting negative variance components. I will admit that I was stuck trying to figure out the simulation code for quite a while, which delayed my posting this message. As some of you can tell, I prefer to provide simulated data to illustrate (and to some extent, validate) what I state. I was fortunate to obtain help creating the simulation code from someone off-list. I am grateful to that person. This can, at times, slow me down from responding to a message, but it's a preference of mine.
>  
> Before I provide the code, I'd like to make one final comment. I understand how a negative variance component is disconcerting, to say the least. However, when interpreted correctly, it no longer seems problematic [to me]. For those who are uncomfortable observing negative variance components, there is an alternative parameterization that directly estimates the within-subject correlation! As you will see, the MIXED code below estimates the within-subject correlation under the compound-symmetric residual correlation structure. After employing the MIXED model, I take advantage of the VARCOMP procedure to estimate the negative variance component. Note that one could also simply change the residual variance-covariance structure in the MIXED code from CSR to CS to obtain the negative variance component.
>  
> I will stop at this point and simply provide the code to estimate the model.
>  
> Hope this is of interest to others.
>  
> Ryan
> --
>  
> set seed 98734523.
>  
> new file.
>
>  inp pro.
>  compute plot=-99.
>  compute subject = -99.
>  compute x1 = -99.
>  compute x2 = -99.
>  compute x3 = -99.
>  compute e1 = -99.
>  compute e2 = -99.
>  compute e3 = -99.
>  compute sigma = 1.
>  compute rho = -0.35.
>  compute a11 = 1.
>  compute a21 = rho.
>  compute a31 = rho.
>  compute a22 = sqrt(1 - rho**2).
>  compute a32 = (rho/(1+rho))*sqrt(1 - rho**2).
>  compute a33 = sqrt(((1 - rho)*(1 + 2*rho))/(1 + rho)).
>  
>  leave plot to a33.
>  
>    loop plot= 1 to 50.
>    compute x1 = rv.normal(0,1).
>    compute x2 = rv.normal(0,1).
>    compute x3 = rv.normal(0,1).
>    compute e1 = sigma * a11*x1.
>    compute e2 = sigma * (a21*x1 + a22*x2).
>    compute e3 = sigma * (a31*x1 + a32*x2 + a33*x3).
>  
>        loop subject = 1 to 3.
>        compute y = e1*(subject=1) + e2*(subject=2) + e3*(subject=3).
>        end case.
>     end loop.
>   end loop.
>  end file.
>  end inp pro.
>  exe.
>  
> delete variables x1 x2 x3 sigma rho a11 a21 a31 a22 a32 a33 e1 e2 e3.
>  
> MIXED y
>    /FIXED= | SSTYPE(3)
>    /METHOD=REML
>    /PRINT=R SOLUTION
>    /REPEATED=subject | SUBJECT(plot) COVTYPE(CSR).
>  
> VARCOMP y BY plot
>   /RANDOM=plot
>   /METHOD=SSTYPE(3)
>   /PRINT=SS
>   /PRINT=EMS
>   /DESIGN
>   /INTERCEPT=INCLUDE.
>  
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Interpretation of a valid negative variance component / ICC

Mike
Hi Ryan and All,
 
You're right that my concluding paragraph summarizes my point.
I've surveyed articles that make use of negative variance estimates
and negative intraclass coefficients and find that most are in biology,
ecology, and similar areas.  In psychology, this situation rarely
comes up (at least as far as I am aware; there could be new situations
that I am not aware of).  I think that it is important to keep in mind
what the nature of the actual conditions are that give rise to negative
variance components and not to simply treat them as mathematical
abstractions or best solutions to some models. Do these conditions
really make sense (e.g., in married couple dyads, is it really reasonable
to assume that a wife will suppress her response to below the mean
of her group if her husband makes a response)?
 
I thank you for your offer on more readings in this area but I think
I can find them, if I am so inclined.  However, allow me to suggest
one to you that may be of interest though it may also indicate that
things are bit more complicated when dealing with negative variance
components; see:
 
As for egalitarian dogs, all I can say is: Squirrel! ;-)
 
-Mike Palij
New York University
 
----- Original Message -----
Sent: Friday, March 22, 2013 9:37 PM
Subject: Re: Interpretation of a valid negative variance component / ICC

Mike,
 
I see you had quite a bit to say, much of which I could spend a great deal of time responding to, but it's been a long day, and truth be told, I agree with your concluding paragraph which I presume summarizes your point.
 
SAS for Mixed Models (2nd ed.), a very highly regarded mixed models textbook, has a section dedicated to addressing this issue in great detail, indicating that when negative within-plot correlations occur, they usually arise from designs in which subjects (e.g., rats, pigs) are placed in competitive or interference environments. The illustration I provided was simply intended to resemble what I have read in textbooks, online forums, and articles that have discussed/reported negative within-plot correlations.
 
I do not have access to my personal library, but if you are interested in reading more on this topic, I know for certain that "SAS for Mixed Models" (2nd ed.) has a very detailed section on this topic, along with an example. It provides a comprehensive evaluation of negative variance components. When I get back to my office and can sift through my library, I can send you other references.
 
For the record, the only dogs I deal with are the ones in my house, both of which are free to roam as they please, including on top of my keyboard...I wonder who the alphas and betas are in my house?!
 
No "religious wars", my friend. :-)
 
Best wishes,
 
Ryan
 
On Fri, Mar 22, 2013 at 9:57 AM, Mike Palij <[hidden email]> wrote:
>
> Not to get into any religious wars about negative variance components
> and whether they are valid or not and/or whether they occur or not,
> I'd just like to make a couple of  points.
>  
> In one of the posts made a few days ago, someone pointed out that if
> one were to do a correlated groups t-test using the standard formula like
> the following:
>  
> obtained t= (M1 - M2) /sqrt (VE1 + VE2 - 2*r*SE1*SE2)
>  
> Where M1, M2 are means of time 1 and time 2 (or sample 1 and 2),
> VE1, VE2 are variance errors for time 1 and time 2,
> SE1, SE2 are standard errors for time 1 and time,
> r is the correlation between values at time 1 and time 2,
> and the constant value=2.
>  
> I have never come across a situation where r is negative, especially
> in a within-subjects design but, for the moment, assume that this is
> possible.  To illustrate what r is telling us, assume the formula
> r= sum(Z1*Z2)/(N-1)
> where Z1 and Z2 are the z-score transforms of values at time 1 and time2,
> sum(Z1*Z2) is the sum of the products of the Z1 and Z2, and N is the
> number of pairs.
>  
> For r to be negative, then one of the Z values has to be negative.
> To beat a dead horse:  if Z1 is a deviation above its mean, then
> Z2 is BELOW its mean.  Similarly, if Z1 is a deviation below its
> mean then Z2 is ABOVE its mean.  The pattern of deviations are
> reversed and I cannot think of any situation in, say, human or
> animal psychology where this would occur.  If anyone has an
> example, I'd like to see it and the explanation for why it occurs.
> In human dyads, this would require an "opposites attract" assumption
> (for the appropriate submission/compensatory mechanism to work)
> and not "like gets like" (i.e., males who are a "9" or "10" get females
> who are a "9" or "10").
>  
> I believe Kenny has argued for something like this in the analysis
> of dyads, such as when one member of a couple in a relationship
> is high on some attribute or behavior, the opposite is lower. But
> note, they cannot be simpler lower, they have to be below the
> mean of their group which I find odd since, just from personal
> experience, I don't think most couples engage in such inhibitory
> and compensatory mechanisms.  In male-female couples,
> outgoing males would have to have submissive females who
> compensate for their male partners when the males inhibit their
> behaviors.
>  
> In the example provided below involving dogs, my first reaction is
> to ask whether this is based on actual research data or just a made up
> example.  With made-up data, of course, any set of constraints
> can be imagined and assumed but the more important question is
> are these constraints commonly experienced or represent rare instances?
>  
> For example, is it always true that if you took 3 dogs at a time, one
> would be dominant, and the others submissive (or at least one submissive)?
> What if there were 3 "alpha" dogs who fight it out?  What if there were
> 3 submissive dogs?  Does one have make sure that one always selects
> a "dominant/alpha" dog, a "submissive/dog", and perhaps one that
> falls in-between (i.e., can change roles easily)?
>  
> I've only worked with rats and pigeons who were tested alone and
> the only firm conclusion that I take away from that experience is
> that animals rarely behave the way you expect them to.  Your
> example below assumes a stable pattern of dominance relationship
> (i.e., Dog 1 dominates Dog 2 who Dominates Dog 3 or Dog 1
> always dominates both Dogs 2 and 3) across cages, which would
> produce stable within-cage variability.  But if this dominance pattern
> is not constant, then this is likely to NOT be the case. So, this appears
> to be an important but unstated assumption.
>  
> Among other unstated assumptions:
>  
> (1)  There is no "treatment effect", that is, if weight after 30 days is
> the dependent variable and all dogs are given the same amount of
> food, then weight gain/loss should be the same, that is, the means
> should only differ because of random factors.
>  
> This would suggest that the obtained F should be close to 1.00,
> under the ordinary null hypothesis.  But the Means procedure ANOVA
> produces an F-value of F(49,100)=0.264, p= 1.00.  Now, I know
> the ANOVA F is a one-tailed test but I was taught that whenever
> you get an F-value close to 0.00, then there is something that is
> reducing the variability among the means relative to your estimate
> of random error.  That is, you do have a treatment effect but it is
> reducing the variability among means.  The next question is Why?
> For a made-up example, the answer is irrelevant -- it was constructed
> that way.  For real data, what factors are operating to reduce variability?
> What did you not control for?
>  
> (2)  You also are assuming that all dogs are the same breed and
> metabolize food at the same rate.  However, if you have Great Danes
> in one cage and Chihuahuas in the next cage, I suspect that the
> number changes.  And if you mix up Great Danes or Pitbulls
> or German Shepards and Chihuahuas together, you will get
> severely different numbers depending on what breeds are in
> the cage.  I know that one can get rats with a known "breed"
> but where is one going to get the equivalent in dogs?  But if
> you can, then your conclusions apply only to that breed.  If
> you use different breeds to increase external validity, I don't
> think one gets that numbers you're using, IMO.
>  
> Okay, bottom line, it seems to me that negative intraclass coefficients
> are possible under certain conditions.  If one creates such conditions,
> then I think it is fair to use the negative intraclass coefficients as
> indicators of those conditions.  However, if one gets negative intraclass
> coefficients in naturalistic situations, one probably has to examine those
> situations closely for what factors are operating (e.g., submission/compensatory
> responding, factors that restrict variability, etc.) and determine why
> they are operating.  I could be wrong but I think that these situations
> are not that common.
>  
> -Mike Palij
> New York University
>  
>  
>  
>
> ----- Original Message -----
> From: R B
> Sent: Thursday, March 21, 2013 10:08 PM
> Subject: Interpretation of a valid negative variance component / ICC
>
> All,
>  
> A comment was made regarding the possibility of fitting an incorrect variance-covariance structure, which could result in a negative variance component. While it is true that an incorrect variance-covariance could result in a negative variance component, it should be noted that, within a linear mixed model, often when this occurs, the variance-covariance structure is overly-complex given the data at hand. In fact, it is so common that some multilevel textbooks recommend that when a negative variance component is observed, simplification of the variance-covariance model be considered as the next logical step.
>  
> However, if one fits a simple random intercept model, assuming a reasonably balanced design with adequate sample sizes at both levels, then the possibility does exist that the negative variance component is not only accurate, but interpretable. Moreover, if one forces the valid negative variance component to be zero (e.g. MIXED procedure in SPSS forces a random intercept to be non-negative), simulation studies have shown that the Type I Error Rate is inflated. OTOH, if one allows the variance component to be negative, the Type I Error Rate is maintained (reference: SAS for Mixed Models, 2nd ed., by Littell et al.).
>  
> So, the question some might be pondering is...What type of naturally occurring environment would lead to a negative variance component? Littell et al. in SAS for mixed models (2nd ed.) discuss how a naturally competitive environment within a plot can result in a negative variance component.
>  
> Competitive Environment Example: Suppose there are 50 cages, and within each of the 50 cages, 3 adult dogs of the same breed, health status, gender, age, and weight are placed in each cage on day 1. A single, large bowl of food of the same amount is placed in each cage every day for 30 days. Given that some dogs tend to be dominant while others tend to be submissive, would we be surprised to see a great deal of within-cage weight variability at the end of 30 days? Probably not. On the other hand, the means across the cages would be expected to be quite similar since the same amount of food had been placed in each cage every day. This could easily result in greater within-cage variability as compared to between-cage variability. Under such a scenario, it would not be surprising  to obtain a negative between-cage variance component, and ultimately and negative intraclass correlation coefficient. (Note that this example was adapted from an example provided on SAS-L)
>  
> Using the simulated example I provide BELOW, the between-cage variance component is -.310601.
>  
> Interpretation: The between-cage variance is smaller than would be expected by a value of .310601.
>  
> How do we calculate this value using data that is simulated from the code below? In addition to reading the output from the VARCOMP procedure or MIXED procedure (changing the structure from CSR to CS), we could subtract the between-cage variability that would be expected (within-subject variability / number of dogs per cage = 1.266169/3 = .422056) from the observed between cage variance (variance of cage means = .111455):
>  
> between cage-variance component = .111455 - .422056 = .310601
>  
> Note: If we wanted to obtain the within-subject variability estimate (1.266169) without employing VARCOMP or MIXED, we could also use the MEANS procedure.
>  
> Let's go a step further now and calculate the ICC (between-cage variance component) / (between-cage variance component + within-cage variance component)
>  
> ICC  = -.310601 / (-.310601 + 1.266169) = -.325043
>  
> Interpretation: This negative ICC reflects the within-cage correlation to be -.325043. Again, under a naturally competitive environment with a mix of dominant and submissive animals, would we be surprised to observe a negative within-cage correlation with respect to dog weight? Of course not! It makes perfect sense, and the model is deriving estimates that are interpretable.
>  
> The information provided above comes directly from various sources, including messages posted on SAS-L over the years, along with a section in the book, SAS for Mixed Models, 2nd Edition, dedicated to interpreting negative variance components. I will admit that I was stuck trying to figure out the simulation code for quite a while, which delayed my posting this message. As some of you can tell, I prefer to provide simulated data to illustrate (and to some extent, validate) what I state. I was fortunate to obtain help creating the simulation code from someone off-list. I am grateful to that person. This can, at times, slow me down from responding to a message, but it's a preference of mine.
>  
> Before I provide the code, I'd like to make one final comment. I understand how a negative variance component is disconcerting, to say the least. However, when interpreted correctly, it no longer seems problematic [to me]. For those who are uncomfortable observing negative variance components, there is an alternative parameterization that directly estimates the within-subject correlation! As you will see, the MIXED code below estimates the within-subject correlation under the compound-symmetric residual correlation structure. After employing the MIXED model, I take advantage of the VARCOMP procedure to estimate the negative variance component. Note that one could also simply change the residual variance-covariance structure in the MIXED code from CSR to CS to obtain the negative variance component.
>  
> I will stop at this point and simply provide the code to estimate the model.
>  
> Hope this is of interest to others.
>  
> Ryan
> --
>  
> set seed 98734523.
>  
> new file.
>
>  inp pro.
>  compute plot=-99.
>  compute subject = -99.
>  compute x1 = -99.
>  compute x2 = -99.
>  compute x3 = -99.
>  compute e1 = -99.
>  compute e2 = -99.
>  compute e3 = -99.
>  compute sigma = 1.
>  compute rho = -0.35.
>  compute a11 = 1.
>  compute a21 = rho.
>  compute a31 = rho.
>  compute a22 = sqrt(1 - rho**2).
>  compute a32 = (rho/(1+rho))*sqrt(1 - rho**2).
>  compute a33 = sqrt(((1 - rho)*(1 + 2*rho))/(1 + rho)).
>  
>  leave plot to a33.
>  
>    loop plot= 1 to 50.
>    compute x1 = rv.normal(0,1).
>    compute x2 = rv.normal(0,1).
>    compute x3 = rv.normal(0,1).
>    compute e1 = sigma * a11*x1.
>    compute e2 = sigma * (a21*x1 + a22*x2).
>    compute e3 = sigma * (a31*x1 + a32*x2 + a33*x3).
>  
>        loop subject = 1 to 3.
>        compute y = e1*(subject=1) + e2*(subject=2) + e3*(subject=3).
>        end case.
>     end loop.
>   end loop.
>  end file.
>  end inp pro.
>  exe.
>  
> delete variables x1 x2 x3 sigma rho a11 a21 a31 a22 a32 a33 e1 e2 e3.
>  
> MIXED y
>    /FIXED= | SSTYPE(3)
>    /METHOD=REML
>    /PRINT=R SOLUTION
>    /REPEATED=subject | SUBJECT(plot) COVTYPE(CSR).
>  
> VARCOMP y BY plot
>   /RANDOM=plot
>   /METHOD=SSTYPE(3)
>   /PRINT=SS
>   /PRINT=EMS
>   /DESIGN
>   /INTERCEPT=INCLUDE.
>  
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Interpretation of a valid negative variance component / ICC

Ryan
Mike,

It seems to me like your reaction has been all over the place on this topic. 

First, you questioned whether a negative variance component is ever valid and by the end of your post you concluded that it probably was possible under certain circumstances. Clearly, you were grappling with this issue as anyone would (myself included). So, to be helpful, I offered a reputable textbook that provides great detail, along with an example, with an interpretation from both sides of the coin. Now you tell me you can find your own, if you are so inclined? 

The impetus of my starting this thread was intended to address a comment from someone else about employing an incorrect var-cov matrix, which could result in a negative variance. If a negative variance occurs, my experience has been that 9 times out of 10, the model is mis-specified. Most times, the model is overly complex. This is stated in just about any multilevel textbook one can find. If not addressed, invalid estimates, standard errors etc. may arise.

However, it dawned on me that it was worthwhile to take the opportunity to discuss, and provide a solution when a negative variance component is valid to help those avoid the trap of employing the wrong parameterization resulting in  the very serous implications of inflation of type I error.

Now you are discussing the viability of a negative variance component with a human dyad? Before you discussed use of independent animal models These issues are off-topic. Modeling data from human dyads via mixed modeling has received a great deal of attention because it is unique in several ways. Again, quite different from the example I provided. 

Clearly one needs to understand their field of research to know whether a negative ICC is reasonable. I have provided a situation where it has been found to be a real possibility (competitive environments in animal modeling within plots).

Since I have provided a viable solution for when a valid negative variance component is present, and I have addressed the initial question posed to me, I will discontinue responding to this thread.

Others may obviously continue if they so desire.

Ryan

On Mar 22, 2013, at 10:54 PM, "Mike Palij" <[hidden email]> wrote:

Hi Ryan and All,
 
You're right that my concluding paragraph summarizes my point.
I've surveyed articles that make use of negative variance estimates
and negative intraclass coefficients and find that most are in biology,
ecology, and similar areas.  In psychology, this situation rarely
comes up (at least as far as I am aware; there could be new situations
that I am not aware of).  I think that it is important to keep in mind
what the nature of the actual conditions are that give rise to negative
variance components and not to simply treat them as mathematical
abstractions or best solutions to some models. Do these conditions
really make sense (e.g., in married couple dyads, is it really reasonable
to assume that a wife will suppress her response to below the mean
of her group if her husband makes a response)?
 
I thank you for your offer on more readings in this area but I think
I can find them, if I am so inclined.  However, allow me to suggest
one to you that may be of interest though it may also indicate that
things are bit more complicated when dealing with negative variance
components; see:
 
As for egalitarian dogs, all I can say is: Squirrel! ;-)
 
-Mike Palij
New York University
 
----- Original Message -----
Sent: Friday, March 22, 2013 9:37 PM
Subject: Re: Interpretation of a valid negative variance component / ICC

Mike,
 
I see you had quite a bit to say, much of which I could spend a great deal of time responding to, but it's been a long day, and truth be told, I agree with your concluding paragraph which I presume summarizes your point.
 
SAS for Mixed Models (2nd ed.), a very highly regarded mixed models textbook, has a section dedicated to addressing this issue in great detail, indicating that when negative within-plot correlations occur, they usually arise from designs in which subjects (e.g., rats, pigs) are placed in competitive or interference environments. The illustration I provided was simply intended to resemble what I have read in textbooks, online forums, and articles that have discussed/reported negative within-plot correlations.
 
I do not have access to my personal library, but if you are interested in reading more on this topic, I know for certain that "SAS for Mixed Models" (2nd ed.) has a very detailed section on this topic, along with an example. It provides a comprehensive evaluation of negative variance components. When I get back to my office and can sift through my library, I can send you other references.
 
For the record, the only dogs I deal with are the ones in my house, both of which are free to roam as they please, including on top of my keyboard...I wonder who the alphas and betas are in my house?!
 
No "religious wars", my friend. :-)
 
Best wishes,
 
Ryan
 
On Fri, Mar 22, 2013 at 9:57 AM, Mike Palij <[hidden email]> wrote:
>
> Not to get into any religious wars about negative variance components
> and whether they are valid or not and/or whether they occur or not,
> I'd just like to make a couple of  points.
>  
> In one of the posts made a few days ago, someone pointed out that if
> one were to do a correlated groups t-test using the standard formula like
> the following:
>  
> obtained t= (M1 - M2) /sqrt (VE1 + VE2 - 2*r*SE1*SE2)
>  
> Where M1, M2 are means of time 1 and time 2 (or sample 1 and 2),
> VE1, VE2 are variance errors for time 1 and time 2,
> SE1, SE2 are standard errors for time 1 and time,
> r is the correlation between values at time 1 and time 2,
> and the constant value=2.
>  
> I have never come across a situation where r is negative, especially
> in a within-subjects design but, for the moment, assume that this is
> possible.  To illustrate what r is telling us, assume the formula
> r= sum(Z1*Z2)/(N-1)
> where Z1 and Z2 are the z-score transforms of values at time 1 and time2,
> sum(Z1*Z2) is the sum of the products of the Z1 and Z2, and N is the
> number of pairs.
>  
> For r to be negative, then one of the Z values has to be negative.
> To beat a dead horse:  if Z1 is a deviation above its mean, then
> Z2 is BELOW its mean.  Similarly, if Z1 is a deviation below its
> mean then Z2 is ABOVE its mean.  The pattern of deviations are
> reversed and I cannot think of any situation in, say, human or
> animal psychology where this would occur.  If anyone has an
> example, I'd like to see it and the explanation for why it occurs.
> In human dyads, this would require an "opposites attract" assumption
> (for the appropriate submission/compensatory mechanism to work)
> and not "like gets like" (i.e., males who are a "9" or "10" get females
> who are a "9" or "10").
>  
> I believe Kenny has argued for something like this in the analysis
> of dyads, such as when one member of a couple in a relationship
> is high on some attribute or behavior, the opposite is lower. But
> note, they cannot be simpler lower, they have to be below the
> mean of their group which I find odd since, just from personal
> experience, I don't think most couples engage in such inhibitory
> and compensatory mechanisms.  In male-female couples,
> outgoing males would have to have submissive females who
> compensate for their male partners when the males inhibit their
> behaviors.
>  
> In the example provided below involving dogs, my first reaction is
> to ask whether this is based on actual research data or just a made up
> example.  With made-up data, of course, any set of constraints
> can be imagined and assumed but the more important question is
> are these constraints commonly experienced or represent rare instances?
>  
> For example, is it always true that if you took 3 dogs at a time, one
> would be dominant, and the others submissive (or at least one submissive)?
> What if there were 3 "alpha" dogs who fight it out?  What if there were
> 3 submissive dogs?  Does one have make sure that one always selects
> a "dominant/alpha" dog, a "submissive/dog", and perhaps one that
> falls in-between (i.e., can change roles easily)?
>  
> I've only worked with rats and pigeons who were tested alone and
> the only firm conclusion that I take away from that experience is
> that animals rarely behave the way you expect them to.  Your
> example below assumes a stable pattern of dominance relationship
> (i.e., Dog 1 dominates Dog 2 who Dominates Dog 3 or Dog 1
> always dominates both Dogs 2 and 3) across cages, which would
> produce stable within-cage variability.  But if this dominance pattern
> is not constant, then this is likely to NOT be the case. So, this appears
> to be an important but unstated assumption.
>  
> Among other unstated assumptions:
>  
> (1)  There is no "treatment effect", that is, if weight after 30 days is
> the dependent variable and all dogs are given the same amount of
> food, then weight gain/loss should be the same, that is, the means
> should only differ because of random factors.
>  
> This would suggest that the obtained F should be close to 1.00,
> under the ordinary null hypothesis.  But the Means procedure ANOVA
> produces an F-value of F(49,100)=0.264, p= 1.00.  Now, I know
> the ANOVA F is a one-tailed test but I was taught that whenever
> you get an F-value close to 0.00, then there is something that is
> reducing the variability among the means relative to your estimate
> of random error.  That is, you do have a treatment effect but it is
> reducing the variability among means.  The next question is Why?
> For a made-up example, the answer is irrelevant -- it was constructed
> that way.  For real data, what factors are operating to reduce variability?
> What did you not control for?
>  
> (2)  You also are assuming that all dogs are the same breed and
> metabolize food at the same rate.  However, if you have Great Danes
> in one cage and Chihuahuas in the next cage, I suspect that the
> number changes.  And if you mix up Great Danes or Pitbulls
> or German Shepards and Chihuahuas together, you will get
> severely different numbers depending on what breeds are in
> the cage.  I know that one can get rats with a known "breed"
> but where is one going to get the equivalent in dogs?  But if
> you can, then your conclusions apply only to that breed.  If
> you use different breeds to increase external validity, I don't
> think one gets that numbers you're using, IMO.
>  
> Okay, bottom line, it seems to me that negative intraclass coefficients
> are possible under certain conditions.  If one creates such conditions,
> then I think it is fair to use the negative intraclass coefficients as
> indicators of those conditions.  However, if one gets negative intraclass
> coefficients in naturalistic situations, one probably has to examine those
> situations closely for what factors are operating (e.g., submission/compensatory
> responding, factors that restrict variability, etc.) and determine why
> they are operating.  I could be wrong but I think that these situations
> are not that common.
>  
> -Mike Palij
> New York University
>  
>  
>  
>
> ----- Original Message -----
> From: R B
> Sent: Thursday, March 21, 2013 10:08 PM
> Subject: Interpretation of a valid negative variance component / ICC
>
> All,
>  
> A comment was made regarding the possibility of fitting an incorrect variance-covariance structure, which could result in a negative variance component. While it is true that an incorrect variance-covariance could result in a negative variance component, it should be noted that, within a linear mixed model, often when this occurs, the variance-covariance structure is overly-complex given the data at hand. In fact, it is so common that some multilevel textbooks recommend that when a negative variance component is observed, simplification of the variance-covariance model be considered as the next logical step.
>  
> However, if one fits a simple random intercept model, assuming a reasonably balanced design with adequate sample sizes at both levels, then the possibility does exist that the negative variance component is not only accurate, but interpretable. Moreover, if one forces the valid negative variance component to be zero (e.g. MIXED procedure in SPSS forces a random intercept to be non-negative), simulation studies have shown that the Type I Error Rate is inflated. OTOH, if one allows the variance component to be negative, the Type I Error Rate is maintained (reference: SAS for Mixed Models, 2nd ed., by Littell et al.).
>  
> So, the question some might be pondering is...What type of naturally occurring environment would lead to a negative variance component? Littell et al. in SAS for mixed models (2nd ed.) discuss how a naturally competitive environment within a plot can result in a negative variance component.
>  
> Competitive Environment Example: Suppose there are 50 cages, and within each of the 50 cages, 3 adult dogs of the same breed, health status, gender, age, and weight are placed in each cage on day 1. A single, large bowl of food of the same amount is placed in each cage every day for 30 days. Given that some dogs tend to be dominant while others tend to be submissive, would we be surprised to see a great deal of within-cage weight variability at the end of 30 days? Probably not. On the other hand, the means across the cages would be expected to be quite similar since the same amount of food had been placed in each cage every day. This could easily result in greater within-cage variability as compared to between-cage variability. Under such a scenario, it would not be surprising  to obtain a negative between-cage variance component, and ultimately and negative intraclass correlation coefficient. (Note that this example was adapted from an example provided on SAS-L)
>  
> Using the simulated example I provide BELOW, the between-cage variance component is -.310601.
>  
> Interpretation: The between-cage variance is smaller than would be expected by a value of .310601.
>  
> How do we calculate this value using data that is simulated from the code below? In addition to reading the output from the VARCOMP procedure or MIXED procedure (changing the structure from CSR to CS), we could subtract the between-cage variability that would be expected (within-subject variability / number of dogs per cage = 1.266169/3 = .422056) from the observed between cage variance (variance of cage means = .111455):
>  
> between cage-variance component = .111455 - .422056 = .310601
>  
> Note: If we wanted to obtain the within-subject variability estimate (1.266169) without employing VARCOMP or MIXED, we could also use the MEANS procedure.
>  
> Let's go a step further now and calculate the ICC (between-cage variance component) / (between-cage variance component + within-cage variance component)
>  
> ICC  = -.310601 / (-.310601 + 1.266169) = -.325043
>  
> Interpretation: This negative ICC reflects the within-cage correlation to be -.325043. Again, under a naturally competitive environment with a mix of dominant and submissive animals, would we be surprised to observe a negative within-cage correlation with respect to dog weight? Of course not! It makes perfect sense, and the model is deriving estimates that are interpretable.
>  
> The information provided above comes directly from various sources, including messages posted on SAS-L over the years, along with a section in the book, SAS for Mixed Models, 2nd Edition, dedicated to interpreting negative variance components. I will admit that I was stuck trying to figure out the simulation code for quite a while, which delayed my posting this message. As some of you can tell, I prefer to provide simulated data to illustrate (and to some extent, validate) what I state. I was fortunate to obtain help creating the simulation code from someone off-list. I am grateful to that person. This can, at times, slow me down from responding to a message, but it's a preference of mine.
>  
> Before I provide the code, I'd like to make one final comment. I understand how a negative variance component is disconcerting, to say the least. However, when interpreted correctly, it no longer seems problematic [to me]. For those who are uncomfortable observing negative variance components, there is an alternative parameterization that directly estimates the within-subject correlation! As you will see, the MIXED code below estimates the within-subject correlation under the compound-symmetric residual correlation structure. After employing the MIXED model, I take advantage of the VARCOMP procedure to estimate the negative variance component. Note that one could also simply change the residual variance-covariance structure in the MIXED code from CSR to CS to obtain the negative variance component.
>  
> I will stop at this point and simply provide the code to estimate the model.
>  
> Hope this is of interest to others.
>  
> Ryan
> --
>  
> set seed 98734523.
>  
> new file.
>
>  inp pro.
>  compute plot=-99.
>  compute subject = -99.
>  compute x1 = -99.
>  compute x2 = -99.
>  compute x3 = -99.
>  compute e1 = -99.
>  compute e2 = -99.
>  compute e3 = -99.
>  compute sigma = 1.
>  compute rho = -0.35.
>  compute a11 = 1.
>  compute a21 = rho.
>  compute a31 = rho.
>  compute a22 = sqrt(1 - rho**2).
>  compute a32 = (rho/(1+rho))*sqrt(1 - rho**2).
>  compute a33 = sqrt(((1 - rho)*(1 + 2*rho))/(1 + rho)).
>  
>  leave plot to a33.
>  
>    loop plot= 1 to 50.
>    compute x1 = rv.normal(0,1).
>    compute x2 = rv.normal(0,1).
>    compute x3 = rv.normal(0,1).
>    compute e1 = sigma * a11*x1.
>    compute e2 = sigma * (a21*x1 + a22*x2).
>    compute e3 = sigma * (a31*x1 + a32*x2 + a33*x3).
>  
>        loop subject = 1 to 3.
>        compute y = e1*(subject=1) + e2*(subject=2) + e3*(subject=3).
>        end case.
>     end loop.
>   end loop.
>  end file.
>  end inp pro.
>  exe.
>  
> delete variables x1 x2 x3 sigma rho a11 a21 a31 a22 a32 a33 e1 e2 e3.
>  
> MIXED y
>    /FIXED= | SSTYPE(3)
>    /METHOD=REML
>    /PRINT=R SOLUTION
>    /REPEATED=subject | SUBJECT(plot) COVTYPE(CSR).
>  
> VARCOMP y BY plot
>   /RANDOM=plot
>   /METHOD=SSTYPE(3)
>   /PRINT=SS
>   /PRINT=EMS
>   /DESIGN
>   /INTERCEPT=INCLUDE.
>  
 
 
Reply | Threaded
Open this post in threaded view
|

Automatic reply: Interpretation of a valid negative variance component / ICC

dtaylor_elca



I'm out of the office until April 2nd.  Thank you.

 

Reply | Threaded
Open this post in threaded view
|

Re: Interpretation of a valid negative variance component / ICC

Rich Ulrich
In reply to this post by Mike
(I'm back.)

Mike, I brought up the paired t-test example.  Contrary to your
expectation, the example that originally drew my attention was
from psychology ... though not your branch of it. 

The t-test case, which someone brought to the sci.stat.* groups a dozen
or more years ago, was an animal conditioning experiment.  These
goldfish (or planarian, maybe?) faced a forced choice in the simplest
maze: a T junction.  The critter was recorded as going left or going right,
which clearly are negatively correlated.  Occasionally they just stopped
for too long, so that move was "neither" and totals did not add up perfectly
to the number of trials.  If you record Left and Right as Yes/No, you can
analyze this nicely as a t-test of paired responses.  - As a one-sample test
(which, for a long time, is the only way that SAS could handle paired t-test),
you would do a one-sample t-test on the scores (L minus R) of -1, 0, 1   to
compare to zero.)   - For human psychology, I'm pretty sure that some
experiments do entail competitions that might be close to (but not exactly)
zero-sum, so that for winners tend to be paired with losers.

The negative group intra-correlation case was also from animal studies
in psychology.  Certain inbred rats produce litters of very consistent size
and total weight.  The 7 pups (or whatever the number was) do vary from
"big" to "runt", in every litter, producing a good, negative intra-correlation...
though I don't recall what analyses that was leading to.

There was a further example that I saw presented, ignorantly, by one of the
nominal experts in hierarchical analysis.  That one featured a negative correlation
based (IIRC) on analyzing "compositional data" in an setting of analyzing an
educational experiment, measuring students in classrooms.  (You can get that,
also, any time you rank-score the individuals in classrooms, instead of using
raw scores.)

--
Rich Ulrich


Date: Fri, 22 Mar 2013 09:57:58 -0400
From: [hidden email]
Subject: Re: Interpretation of a valid negative variance component / ICC
To: [hidden email]

Not to get into any religious wars about negative variance components
and whether they are valid or not and/or whether they occur or not,
I'd just like to make a couple of  points.
 
In one of the posts made a few days ago, someone pointed out that if
one were to do a correlated groups t-test using the standard formula like
the following:
 
obtained t= (M1 - M2) /sqrt (VE1 + VE2 - 2*r*SE1*SE2)
 
Where M1, M2 are means of time 1 and time 2 (or sample 1 and 2),
VE1, VE2 are variance errors for time 1 and time 2,
SE1, SE2 are standard errors for time 1 and time,
r is the correlation between values at time 1 and time 2,
and the constant value=2.
 
I have never come across a situation where r is negative, especially
in a within-subjects design but, for the moment, assume that this is
possible.  To illustrate what r is telling us, assume the formula
r= sum(Z1*Z2)/(N-1)
where Z1 and Z2 are the z-score transforms of values at time 1 and time2,
sum(Z1*Z2) is the sum of the products of the Z1 and Z2, and N is the
number of pairs.
 
For r to be negative, then one of the Z values has to be negative.
To beat a dead horse:  if Z1 is a deviation above its mean, then
Z2 is BELOW its mean.  Similarly, if Z1 is a deviation below its
mean then Z2 is ABOVE its mean.  The pattern of deviations are
reversed and I cannot think of any situation in, say, human or
animal psychology where this would occur.  If anyone has an
example, I'd like to see it and the explanation for why it occurs.
In human dyads, this would require an "opposites attract" assumption
(for the appropriate submission/compensatory mechanism to work)
and not "like gets like" (i.e., males who are a "9" or "10" get females
who are a "9" or "10").
 
I believe Kenny has argued for something like this in the analysis
of dyads, such as when one member of a couple in a relationship
is high on some attribute or behavior, the opposite is lower. But
note, they cannot be simpler lower, they have to be below the
mean of their group which I find odd since, just from personal
experience, I don't think most couples engage in such inhibitory
and compensatory mechanisms.  In male-female couples,
outgoing males would have to have submissive females who
compensate for their male partners when the males inhibit their
behaviors.


[... snip, a bunch]
 
Okay, bottom line, it seems to me that negative intraclass coefficients
are possible under certain conditions.  If one creates such conditions,
then I think it is fair to use the negative intraclass coefficients as
indicators of those conditions.  However, if one gets negative intraclass
coefficients in naturalistic situations, one probably has to examine those
situations closely for what factors are operating (e.g., submission/compensatory
responding, factors that restrict variability, etc.) and determine why
they are operating.  I could be wrong but I think that these situations
are not that common.
 
-Mike Palij
New York University
 
[ ... snip, previous post]