Given the paucity of information online, I was wondering if anyone knows the procedural approach to the evaluation of sphericity when Mauchly's test is undefined, which is the case
when the number of repeated levels is larger than the number of subjects (insufficient df). I am not sure if sphericity can still be assumed based on the reported values of epsilon larger than 0.75, whether based on Greenhouse-Geisser or Huynh-Feldt. In one
particular dataset, epsilon is less than 0.1. Presumably it can be assumed that sphericity is violated when epsilon is that low.
I am aware of using mixed models to overcome the assumptions of sphericity. My concern is with GLM in this case. Citations would be welcome.
CONFIDENTIALITY NOTICE: This e-mail communication and any attachments may contain confidential and privileged information for the use of the designated recipients named above. If you are not the intended recipient, you are hereby notified that you have received this communication in error and that any review, disclosure, dissemination, distribution or copying of it or its contents is prohibited. If you have received this communication in error, please notify me immediately by replying to this message and destroy all copies of this communication and any attachments. Thank you.
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
|
I don't know of any specific procedure(s) for
testing sphericity when
there are more variables than subjects but I
would suggest using the
RELIABILITY procedure to get descriptive
statistics on the two
components of sphericity:
(1) What is the ratio of largest variance to
the smallest variance.
If this number is large, it provides evidence
that sphericity may
not be present (i.e., heterogeneity of
variance). I know that
compound symmetry requires all variances to be
the same but
sphericity does not (the correlation and
variance/SD are involved).
(2) What is the ratio of the largest
correlation to the smallest
correlation. Again if the number is
large, or there are negative
correlations, this would be evidence for lack
of sphericity.
One could do significance testing between the
largest and smallest
variances and/or the correlated correlations
to determine whether
they are "significantly" different but that
will probably depend upon
the number of subjects/cases you
have.
If you can get a sorted covariance matrix
graphic, it could also
help in seeing whether there are patterns in
covariance patterns
(e.g., banding) but SPSS does not provide this
though one could
probably write a macro to do it..
I would think that the presence of any
negative covariance would
imply the absence of sphericity.
If others know of more appropriate tests or
procedure, I to would
like to know. There may be better
general alternatives but the
appropriateness for any actual dataset will
depend upon the
characteristics of that dataset.
-Mike Palij
New York University
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Rudobeck, Emil (LLU)
For small and moderate samples, a non-significant Mauchly's test does not mean much at all. That is
why many people will recommend, wisely, that followup test be performed as paired t-tests instead of using some pooled variance term.
What are you measuring? Is it a good measure, with good scaling expected and no outliers observed? I don't like analyses where those corrections are made, unless I have a decent understanding of why they are required, such as, the presence of excess zeroes.
Would some transformation be thought of, by anyone? Analyzing with unnecessarily-unequal variances
is a way to get into unneeded trouble. If the "levels" represent time, it might be appropriate and proper to test a much more powerful hypothesis that makes use of contrasts (linear for growth, etc.) in order to overcome the inevitable decline in correlations across time.
You say: more levels than subjects -- Is this because you have very small N or because you have moderate N but also have too many levels to test a sensible hypothesis across them all?
State your hypotheses. What tests them? A single-d.f. test is what gives best power, whenever one of those can be used. I favor constructing contrasts -- sometimes in the form of separate variables -- over tests that include multiple d.f. and multiple hypotheses, all at once. And I would rather remove the causes of
heterogeneity (variances or correlations) beforehand, than have to hope that I have suitably corrected for it. -- Rich Ulrich From: SPSSX(r) Discussion <[hidden email]> on behalf of Rudobeck, Emil (LLU) <[hidden email]>
Sent: Thursday, October 6, 2016 2:20 PM To: [hidden email] Subject: Undefined Mauchly's Test Given the paucity of information online, I was wondering if anyone knows the procedural approach to the evaluation of sphericity when Mauchly's test is undefined, which is the case
when the number of repeated levels is larger than the number of subjects (insufficient df). I am not sure if sphericity can still be assumed based on the reported values of epsilon larger than 0.75, whether based on Greenhouse-Geisser or Huynh-Feldt. In one
particular dataset, epsilon is less than 0.1. Presumably it can be assumed that sphericity is violated when epsilon is that low.
I am aware of using mixed models to overcome the assumptions of sphericity. My concern is with GLM in this case. |
In reply to this post by Mike
There is still a question as to what would be considered a large value for these ratios. I ran reliability for one of the datasets and the max/min variance ratio yields 5.0. The
ratio for correlations is -6.5, but it seems you're saying that the presence of any negative correlation or covariance is already evidence of sphericity violation.
I don't know if this an widely used approach to sphericity, but at least it's a starting point to get a better idea about the structure of the data. Hopefully others will chime in if there are more rigorous or established methods available. From: SPSSX(r) Discussion [[hidden email]] on behalf of Mike Palij [[hidden email]]
Sent: Thursday, October 06, 2016 11:43 AM To: [hidden email] Subject: Re: Undefined Mauchly's Test I don't know of any specific procedure(s) for testing sphericity when
there are more variables than subjects but I would suggest using the
RELIABILITY procedure to get descriptive statistics on the two
components of sphericity:
(1) What is the ratio of largest variance to the smallest variance.
If this number is large, it provides evidence that sphericity may
not be present (i.e., heterogeneity of variance). I know that
compound symmetry requires all variances to be the same but
sphericity does not (the correlation and variance/SD are involved).
(2) What is the ratio of the largest correlation to the smallest
correlation. Again if the number is large, or there are negative
correlations, this would be evidence for lack of sphericity.
One could do significance testing between the largest and smallest
variances and/or the correlated correlations to determine whether
they are "significantly" different but that will probably depend upon
the number of subjects/cases you have.
If you can get a sorted covariance matrix graphic, it could also
help in seeing whether there are patterns in covariance patterns
(e.g., banding) but SPSS does not provide this though one could
probably write a macro to do it..
I would think that the presence of any negative
covariance would
imply the absence of sphericity.
If others know of more appropriate tests or procedure, I to would
like to know. There may be better general alternatives but the
appropriateness for any actual dataset will depend upon the
characteristics of that dataset.
-Mike Palij
New York University
===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD WARNING: Please be vigilant when opening emails that appear to be the least bit out of the ordinary, e.g. someone you usually don’t hear from, or attachments you usually don’t receive or didn’t expect, requests to click links or log into systems, etc. If you receive suspicious emails, please do not open attachments or links and immediately forward the suspicious email to [hidden email] and then delete the suspicious email.
CONFIDENTIALITY NOTICE: This e-mail communication and any attachments may contain confidential and privileged information for the use of the designated recipients named above. If you are not the intended recipient, you are hereby notified that you have received this communication in error and that any review, disclosure, dissemination, distribution or copying of it or its contents is prohibited. If you have received this communication in error, please notify me immediately by replying to this message and destroy all copies of this communication and any attachments. Thank you.
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
|
In reply to this post by Rich Ulrich
When it comes to reporting the findings, despite the shortcomings, Mauchly's test is widely used and understood. I haven't come across running t-tests on variances. Where can I
read more about that approach?
The measurement is time - brain responses are sampled from each subject for a period of time (e.g., 72 samples during an hour) after a "learning" stimulus is applied. So change over time is expected biologically. This is essentially a nonlinear growth curve and I know that there are more advanced approaches (LMM, SEM, etc) which I use as well, but my concern here is with sphericity in GLM. Transformation is not going to address the issue, nor will larger N be feasible. It is possible to perhaps average adjacent time points, but this would introduce its own problems. This is a mixed design since subjects are grouped into different treatments and it's the differences in treatment that's important. Your question is more about design than stats. Certainly, if you have any suggestions, I would be interested. The current method is well established and has been used for decades. Whether the individual repeated measures are different or not does not matter too much in this case. It's more important whether the curves themselves between treatment groups are different (the between factor). Using repeated measures overcomes the issue of correlations, since there is no other way around it. From: Rich Ulrich [[hidden email]]
Sent: Thursday, October 06, 2016 8:06 PM To: [hidden email]; Rudobeck, Emil (LLU) Subject: Re: Undefined Mauchly's Test For small and moderate samples, a non-significant Mauchly's test does not mean much at all. That is
why many people will recommend, wisely, that followup test be performed as paired t-tests instead of using some pooled variance term.
What are you measuring? Is it a good measure, with good scaling expected and no outliers observed? I don't like analyses where those corrections are made, unless I have a decent understanding of why they are required, such as, the presence of excess zeroes.
Would some transformation be thought of, by anyone? Analyzing with unnecessarily-unequal variances
is a way to get into unneeded trouble. If the "levels" represent time, it might be appropriate and proper to test a much more powerful hypothesis that makes use of contrasts (linear for growth, etc.) in order to overcome the inevitable decline in correlations across time.
You say: more levels than subjects -- Is this because you have very small N or because you have moderate N but also have too many levels to test a sensible hypothesis across them all?
State your hypotheses. What tests them? A single-d.f. test is what gives best power, whenever one of those can be used. I favor constructing contrasts -- sometimes in the form of separate variables -- over tests that include multiple d.f. and multiple hypotheses, all at once. And I would rather remove the causes of
heterogeneity (variances or correlations) beforehand, than have to hope that I have suitably corrected for it. -- Rich Ulrich From: SPSSX(r) Discussion <[hidden email]> on behalf of Rudobeck, Emil (LLU) <[hidden email]>
Sent: Thursday, October 6, 2016 2:20 PM To: [hidden email] Subject: Undefined Mauchly's Test Given the paucity of information online, I was wondering if anyone knows the procedural approach to the evaluation of sphericity when Mauchly's test is undefined, which is the case
when the number of repeated levels is larger than the number of subjects (insufficient df). I am not sure if sphericity can still be assumed based on the reported values of epsilon larger than 0.75, whether based on Greenhouse-Geisser or Huynh-Feldt. In one
particular dataset, epsilon is less than 0.1. Presumably it can be assumed that sphericity is violated when epsilon is that low.
I am aware of using mixed models to overcome the assumptions of sphericity. My concern is with GLM in this case. WARNING: Please be vigilant when opening emails that appear to be the least bit out of the ordinary, e.g. someone you usually don’t hear from, or attachments you usually don’t receive or didn’t expect, requests to click links or log into systems, etc. If you receive suspicious emails, please do not open attachments or links and immediately forward the suspicious email to [hidden email] and then delete the suspicious email.
CONFIDENTIALITY NOTICE: This e-mail communication and any attachments may contain confidential and privileged information for the use of the designated recipients named above. If you are not the intended recipient, you are hereby notified that you have received this communication in error and that any review, disclosure, dissemination, distribution or copying of it or its contents is prohibited. If you have received this communication in error, please notify me immediately by replying to this message and destroy all copies of this communication and any attachments. Thank you.
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
|
Administrator
|
In reply to this post by Rudobeck, Emil (LLU)
Thom Baguley has a nice note on sphericity, which you can view here:
http://homepages.gold.ac.uk/aphome/spheric.html He defines sphericity as homogeneity of variance for all possible pair-wise differences. So rather than--or at least, in addition to--looking at variances of the original variables, I think you ought to compute all pair-wise differences, and determine how homogeneous (or not) they are. I don't have time to attempt any code right now, but a nested loop ought to do the trick. I think you want the outer loop going from 1 to k-1 and the inner loop from 2 to k (where k = number of repeated measures), with a difference score being computed on each loop. The naming of the variables might be a bit tricky in ordinary syntax, but it would likely work in a macro. (Or Python if you're so inclined. Or possibly MATRIX.) HTH. p.s. - Note that Thom B. is not very keen on Mauchly's test! Scroll down to "A warning about Mauchly's sphericity test".
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Rudobeck, Emil (LLU)
On Friday, October 07, 2016 1:13 PM, Emil Rudobeck wrote:
>There is still a question as to what would be considered a large >value for these ratios. There are statistical tests that you can do, depending upon how small your sample size is. But before I discuss these, let me point out that one way of thinking about spheriticity is that it is a comination of the following assumptions: (a) homogeneity of variances, and (b) homogeneity of correlations. This is known as compound symmetry and is a special case of sphericity. On the IBM SPSS website there is a list of variance-covariance matrices and the different patterns that they can take; see: https://www.ibm.com/support/knowledgecenter/SSLVMB_21.0.0/com.ibm.spss.statistics.help/covariance_structures.htm The 4th matrix down is the variance-covariance commpound symmetry. Beneath it is the standardized variance-covariance matrix or, because all of the variances and standard deviations are now equal to one, the correlation matrix. Note that all of the populations correlations are equal. Beneath that is a variance-covariance matrix with heterogeneous variances but constant correlations. Not shown are other combinations of variance-covariance matrices, such as homogeneous variances but heterogeneous correlations (e.g., the correlations may decrease systematically (some of this is shown in the matrices before that for compound symmetry. The Mauchley test for sphericity refers to a more general condition of variance-covariance matrices. Quoting from Cramer, D., & Howitt, D. L. (2004). The Sage dictionary of statistics: a practical resource for students in the social sciences. Sage. |Mauchly’s test of sphericity: like many tests, the analysis of |variance makes certain assumptions about the data used. |Violations of these assumptions tend to affect the value of the |test adversely. One assumption is that the variances of each |of the cells should be more or less equal (exactly equal is a |practical impossibility). In repeated-measures designs, |it is also necessary that the covariances of the differences |between each condition are equal. That is, subtract condition A |from condition B, condition A from condition C etc., and calculate |the covariance of these difference scores until all possibilities |are exhausted.The covariances of all of the differences |between conditions should be equal. If you want a mathematical presentation of this point locate a copy of: Rogan, J. C., Keselman, H. J., & Mendoza, J. L. (1979). Analysis of repeated measurements. British Journal of Mathematical and Statistical Psychology, 32(2), 269-286. One strategy to use is to determine whether compound symmetry holds, that is, you have homogeneity of variance and homogeneity of correlation. If the differencee between the maximum value and the minimum value for either statistics implies that compound symmetry is violated. You may still have sphericity but because you have fewer subjects/cases than variables, you can't use Mauchly's test. You may be able to use other tests of sphericity (not available in SPSS but could probably programmed in syntax) and one source for these other tests is the following reference: Cornell, J. E., Young, D. M., Seaman, S. L., & Kirk, R. E. (1992). Power comparisons of eight tests for sphericity in repeated measures designs. Journal of Educational and Behavioral Statistics, 17(3), 233-249. Being able to use the Matrix procedure would be quite helpful. It is unclear, however, how many of these tests would also fail because of the N to variables problem. >I ran reliability for one of the datasets and the max/min variance >ratio yields 5.0. You can do a t-test for related variances; see:page 170 in Guilford, J. P. 8: Fruchter, B.(1973). Fundamental statistics in psychology and education. New York: McGraw Hill.. or page 190 in Walker, H. M. & Lev, J. (1953) Statistical Inference. New York: Holt The t test has the following in the numerator: (Var1 - Var2)*sqrt(N-2) and the denominator has 2*SD1*SD2*sqrt(1 - r^2) where * is multiplication and ^ mean raised to a power and r is the correlation between the two samples that the variances/SDs come from. This has has df - N - 2 If the t-test is significant, I believe that one can reject the assumption of sphericity, and use the Box epsilon value to get corrected F values. >The ratio for correlations is -6.5, Under compound symmetry (CS), the ratio should equal to 1.00, or close to it, because of the assumption of homogeneity of correlation. Given that the largest correlation is six times the smallest which is also negative, this raises some doubt about compound symmetry or sphericity being met. However, in general, CS will imply that the Pearson r should be posiive but altering assumptions underlying the estimate of the correlation allows negative correlation (i.e., compound symmetry can be interpreted as containing a between-subjects variance component that can be represented by the vairnace covariance matrix G plus a variance-covariance matrix representing the within-subject covariance structure which can be represented by the matrix R. If one sets the G matrix = 0 and R as compound symmetric, then negative correlations are possible. For more on this point, see page 1800 in Littell, R. C., Pendergast, J., & Natarajan, R. (2000). Tutorial in biostatistics: modelling covariance structure in the analysis of repeated measures data. Statistics in medicine, 19, 1793-1819. NOTE: Litell et al show how the two analyses can be done in SAS.Mixed. >but it seems you're saying that the presence of any negative >correlation or covariance is already evidence of sphericity violation. Because the variance-covariance matrix can be specified in two different ways, one that allows only positive correlations and the other that also allows negative correlations, the question is how does the statistical software do the calculation, more specifically, what does GLM do? Given Littel et al's presentation, I tend to doubt that SPSS GLM was programmed to have the G matrix equal to zero but maybe someone from SPSS can make the record clear? It may be possible to translate the SAS analysis (specified in example (10)) into SPSS Mixed that sets the R matrix to zero. But as you said originally, you aren't interested in using mixed model analysis. This would suggest that you might just go with Multivariate analysis. One reference to look at is: O'Brien, R. G., & Kaiser, M. K. (1985). MANOVA method for analyzing repeated measures designs: an extensive primer. Psychological Bulletin, 97(2), 316-333. Whether the SPSS Manova procedure can handle your design that would implement O'Brien & Kaiser suggest will be up to you >I don't know if this an widely used approach to sphericity, I don't think it is commonly used but if one gets this information, it might lead one to do a mixed model analysis or do structural equation modeling. And as mentioned above, you tell sphericity to go screw itself and go with Multivariate results. ;-) >but at least it's a starting point to get a better idea about the >structure of the data. Hopefully others will chime in if there are >more rigorous or established methods available. I'd be interesting in finding out as well. The only method that I have not mentioned is a nonparametric analysis but I'm not sure how that would be done. -Mike Palij New York University [hidden email] ----- Original Message #2 ----- On Thursday, October 06, 2016 11:43 AM, Mike Palij wrote: I don't know of any specific procedure(s) for testing sphericity when there are more variables than subjects but I would suggest using the RELIABILITY procedure to get descriptive statistics on the two components of sphericity: (1) What is the ratio of largest variance to the smallest variance. If this number is large, it provides evidence that sphericity may not be present (i.e., heterogeneity of variance). I know that compound symmetry requires all variances to be the same but sphericity does not (the correlation and variance/SD are involved). (2) What is the ratio of the largest correlation to the smallest correlation. Again if the number is large, or there are negative correlations, this would be evidence for lack of sphericity. One could do significance testing between the largest and smallest variances and/or the correlated correlations to determine whether they are "significantly" different but that will probably depend upon the number of subjects/cases you have. If you can get a sorted covariance matrix graphic, it could also help in seeing whether there are patterns in covariance patterns (e.g., banding) but SPSS does not provide this though one could probably write a macro to do it.. I would think that the presence of any negative covariance would imply the absence of sphericity. If others know of more appropriate tests or procedure, I to would like to know. There may be better general alternatives but the appropriateness for any actual dataset will depend upon the characteristics of that dataset. -Mike Palij New York University [hidden email] ----- Original Message #1 ----- On Thursday, October 06, 2016 2:20 PM, Emil Rudobeck wrote: Subject: Undefined Mauchly's Test Given the paucity of information online, I was wondering if anyone knows the procedural approach to the evaluation of sphericity when Mauchly's test is undefined, which is the case when the number of repeated levels is larger than the number of subjects (insufficient df). I am not sure if sphericity can still be assumed based on the reported values of epsilon larger than 0.75, whether based on Greenhouse-Geisser or Huynh-Feldt. In one particular dataset, epsilon is less than 0.1. Presumably it can be assumed that sphericity is violated when epsilon is that low. I am aware of using mixed models to overcome the assumptions of sphericity. My concern is with GLM in this case. Citations would be welcome. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Rudobeck, Emil (LLU)
In many instances there are many more repeats than there are cases.
Much depends on what the data are gathered to model. Are there crossed repeat factors? Are the repeats indexed by time? Are the repeats repeated measures on a construct, e.g., spelling items, attitude items? If you were to describe the substantive nature of your questions and what your data looks like it is possible that list members could make useful suggestions.
Art Kendall
Social Research Consultants |
In reply to this post by Rudobeck, Emil (LLU)
Yes, my question was about design. If you don't have a design, you don't have anything to talk about. If your method is "well established" and in use for decades, you should hardly have any question for us. If you can't follow their method, you must be
doing something different. They never reported Mauchly's test in a situation where it is impossible to compute (I hope).
Mauchly's is a warning that you should be careful about the tests. Well, with 72 periods of measure in an hour, you /should/ be careful about the tests, period.
But the tests that you should be concerned about are tests of hypothesis. I preferred using the old BDMP program (2V? 4V?) because it gave linear trends (say) with the test performed against the actual variation measured for the trend. (I don't know if
SPSS can do that testing, but it is not automatic.) Anyway, if you can't state the hypothesis, you can't get started.
If you expect an early response followed by a later decline (which could be reasonable for a brain-response to stimulus), the simplest execution might be to break the 72 periods into two or more sets: early response, middle, later. That is especially true
if the early response is very strong: Look for the early linear trend, then see if it continues or if the means regress back to the start.
Rich Ulrich From: SPSSX(r) Discussion <[hidden email]> on behalf of Rudobeck, Emil (LLU) <[hidden email]>
Sent: Friday, October 7, 2016 1:27 PM To: [hidden email] Subject: Re: Undefined Mauchly's Test When it comes to reporting the findings, despite the shortcomings, Mauchly's test is widely used and understood. I haven't come across running t-tests on variances. Where can I read
more about that approach?
The measurement is time - brain responses are sampled from each subject for a period of time (e.g., 72 samples during an hour) after a "learning" stimulus is applied. So change over time is expected biologically. This is essentially a nonlinear growth curve and I know that there are more advanced approaches (LMM, SEM, etc) which I use as well, but my concern here is with sphericity in GLM. Transformation is not going to address the issue, nor will larger N be feasible. It is possible to perhaps average adjacent time points, but this would introduce its own problems. This is a mixed design since subjects are grouped into different treatments and it's the differences in treatment that's important. Your question is more about design than stats. Certainly, if you have any suggestions, I would be interested. The current method is well established and has been used for decades. Whether the individual repeated measures are different or not does not matter too much in this case. It's more important whether the curves themselves between treatment groups are different (the between factor). Using repeated measures overcomes the issue of correlations, since there is no other way around it. From: Rich Ulrich [[hidden email]]
Sent: Thursday, October 06, 2016 8:06 PM To: [hidden email]; Rudobeck, Emil (LLU) Subject: Re: Undefined Mauchly's Test For small and moderate samples, a non-significant Mauchly's test does not mean much at all. That is
why many people will recommend, wisely, that followup test be performed as paired t-tests instead of using some pooled variance term.
What are you measuring? Is it a good measure, with good scaling expected and no outliers observed? I don't like analyses where those corrections are made, unless I have a decent understanding of why they are required, such as, the presence of excess zeroes.
Would some transformation be thought of, by anyone? Analyzing with unnecessarily-unequal variances
is a way to get into unneeded trouble. If the "levels" represent time, it might be appropriate and proper to test a much more powerful hypothesis that makes use of contrasts (linear for growth, etc.) in order to overcome the inevitable decline in correlations across time.
You say: more levels than subjects -- Is this because you have very small N or because you have moderate N but also have too many levels to test a sensible hypothesis across them all?
State your hypotheses. What tests them? A single-d.f. test is what gives best power, whenever one of those can be used. I favor constructing contrasts -- sometimes in the form of separate variables -- over tests that include multiple d.f. and multiple hypotheses, all at once. And I would rather remove the causes of
heterogeneity (variances or correlations) beforehand, than have to hope that I have suitably corrected for it. -- Rich Ulrich From: SPSSX(r) Discussion <[hidden email]> on behalf of Rudobeck, Emil (LLU) <[hidden email]>
Sent: Thursday, October 6, 2016 2:20 PM To: [hidden email] Subject: Undefined Mauchly's Test Given the paucity of information online, I was wondering if anyone knows the procedural approach to the evaluation of sphericity when Mauchly's test is undefined, which is the case
when the number of repeated levels is larger than the number of subjects (insufficient df). I am not sure if sphericity can still be assumed based on the reported values of epsilon larger than 0.75, whether based on Greenhouse-Geisser or Huynh-Feldt. In one
particular dataset, epsilon is less than 0.1. Presumably it can be assumed that sphericity is violated when epsilon is that low.
I am aware of using mixed models to overcome the assumptions of sphericity. My concern is with GLM in this case. WARNING: Please be vigilant when opening emails that appear to be the least bit out of the ordinary, e.g. someone you usually don’t hear from, or attachments you usually don’t receive or didn’t expect, requests to click links or log into systems, etc. If you receive suspicious emails, please do not open attachments or links and immediately forward the suspicious email to [hidden email] and then delete the suspicious email. CONFIDENTIALITY NOTICE: This e-mail communication and any attachments may contain confidential and privileged information for the use of the designated recipients named above. If you are not the intended recipient, you are hereby notified that you have
received this communication in error and that any review, disclosure, dissemination, distribution or copying of it or its contents is prohibited. If you have received this communication in error, please notify me immediately by replying to this message and
destroy all copies of this communication and any attachments. Thank you.
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
|
On Saturday, October 08, 2016 12:53 PM, Rich Ulrich writes:
> Yes, my question was about design. If you don't have a design, >you don't have anything to talk about. If I can re-state what Rich is saying: "Design drives Analysis". Designs are set up so that certain variables/factors are allowed to express an effect on an outcome/dependent variable, both alone or in combination with other factors. >If your method is "well established" and in use for decades, >you should hardly have any question for us. It might clarify things if the OP provided a reference to a published article(s) that show the analysis/analyses he is trying to duplicate. >If you can't follow their method, you must be doing something >different. They never reported Mauchly's test in a situation where >it is impossible to compute (I hope). After doing a search of the literature, let me try to restate the OP original question/situation. Let N equal the sample size and P equal the number of repeated measures. Mauchly's test and other likelihood tests are undefined when N < P. Are there tests for sphericity when N < P? Muni S, Srivastava has done most of the work in this area in the past few decades and one relevant source for the OP is: Srivastava, M. S. (2006). Some tests criteria for the covariance matrix with fewer observations than the dimension. Acta Comment. Univ. Tartu. Math, 10, 77-93. A copy can be obtained at: http://www.utstat.utoronto.ca/~srivasta/covariance1.pdf A scholar.google.com search of Srivastava and sphericity test will provide a shipload of references by and on Srivastava's work in this area. The next question is whether Srivastava's test is implemented in any of the standard statistical packages or does one have row one's own version. I found one paper dealing with this situation but it uses SAS IML for a macro called LINMOD for conducting testing. It is: Chi, Y.-Y., Gribbin, M., Lamers, Y., Gregory, J. F., & Muller, K. E. (2012). Global hypothesis testing for high-dimensional repeated measures outcomes. Statistics in Medicine, 31(8), 724-742. http://doi.org/10.1002/sim.4435 Available at pubmed at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3396026/ LINMOD and other software goodies is available at: http://samplesizeshop.org/software-downloads/other/ >Mauchly's is a warning that you should be careful about the tests. >Well, with 72 periods of measure in an hour, you /should/ be careful >about the tests, period. The references above appear to deal with situations where P is large (e.g., DNA microarrays) and the traditional methods fail. The OP should be familiar with at least some of this literature. Also, I guess that LINMOD might be translatable into SPSS matrix language which the OP might consider doing (if doing the analysis in SAS is not an option). Or he can pay Dave Marso to do it. ;-) >But the tests that you should be concerned about are tests of >hypothesis. I preferred using the old BDMP program (2V? 4V?) If you're talking about orthogonal polynomial analysis, it is 2V (I still have the manuals; one specifies "Orthogonal." in the /Design paragraph, along with "Point(j)= .." if the spacing is not constant). >because it gave linear trends (say) with the test performed >against the actual variation measured for the trend. (I don't >know if SPSS can do that testing, but it is not automatic.) When done in GLM, a repeated measures ANOVA will automatically generate the orthogonal polynomial or one can specify the degree of the polynomial (one probably doesn't want the output for 71 polynomials). This is one of the annoying features of GLM because it produces this output even when the within-subject factor is a unordered category. >Anyway, if you can't state the hypothesis, you can't get started. Or, you can use that statistics to serve as an "automatic inference engine" as Gerd Gigerenzer calls it when one engages in "mindless statistics". >If you expect an early response followed by a later decline >(which could be reasonable for a brain-response to stimulus), >the simplest execution might be to break the 72 periods into >two or more sets: early response, middle, later. That is >especially true if the early response is very strong: Look for >the early linear trend, then see if it continues or if the means >regress back to the start. Or ask for linear, quadratic and cubic polynomials (maybe up to quintic) as well as looking at the profiles. NOTE: In support of Rich's point for using polynomials, Tabachnick & Fidell (6th Ed) make the same point, in fact, calling it the "best" solution (see page 332). -Mike Palij New York University [hidden email] >Rich Ulrich ----------- Original Message --------- On Friday, October 7, 2016 1:27 PM, Emil Rudobeck wrote: >When it comes to reporting the findings, despite the >shortcomings, Mauchly's test is widely used and understood. >I haven't come across running t-tests on variances. Where >can I read more about that approach? I think that Rich meant paired t-tests between means at two time points. In another post I identify a t-test for testing whether two related variances are equal. >The measurement is time - brain responses are sampled >from each subject for a period of time (e.g., 72 samples >during an hour) after a "learning" stimulus is applied. >So change over time is expected biologically. This is >essentially a nonlinear growth curve and I know that there >are more advanced approaches (LMM, SEM, etc) which >I use as well, but my concern here is with sphericity in GLM. >Transformation is not going to address the issue, nor will >larger N be feasible. It is possible to perhaps average >adjacent time points, but this would introduce its own problems. >This is a mixed design since subjects are grouped into >different treatments and it's the differences in treatment >that's important. See the references I provide above. >Your question is more about design than stats. Certainly, >if you have any suggestions, I would be interested. It is typical to describe the design in terms of whether one has within-subject factors, between-subjects factors, or both and how many levels there are. Given what you say above, one would assume you have a 2 x 72 mixed design with one between-subjects factor with 2 levels and one within-subject factor with 72 levels. But I think that your design might be a little more complicated. >The current method is well established and has been used >for decades. Whether the individual repeated measures >are different or not does not matter too much in this case. >It's more important whether the curves themselves between >treatment groups are different (the between factor). Using >repeated measures overcomes the issue of correlations, >since there is no other way around it. I'm not sure I understand the last sentence but I'd just point out that you can graph the profile (i.e., repeated measures) for each group and see if they are parallel or have different curves -- the latter would be indicated by a significant group by level of polynomial effect in the polynomial results. -MP From: Rich Ulrich [[hidden email]] Sent: Thursday, October 06, 2016 8:06 PM To: [hidden email]; Rudobeck, Emil (LLU) Subject: Re: Undefined Mauchly's Test For small and moderate samples, a non-significant Mauchly's test does not mean much at all. That is why many people will recommend, wisely, that followup test be performed as paired t-tests instead of using some pooled variance term. What are you measuring? Is it a good measure, with good scaling expected and no outliers observed? I don't like analyses where those corrections are made, unless I have a decent understanding of why they are required, such as, the presence of excess zeroes. Would some transformation be thought of, by anyone? Analyzing with unnecessarily-unequal variances is a way to get into unneeded trouble. If the "levels" represent time, it might be appropriate and proper to test a much more powerful hypothesis that makes use of contrasts (linear for growth, etc.) in order to overcome the inevitable decline in correlations across time. You say: more levels than subjects -- Is this because you have very small N or because you have moderate N but also have too many levels to test a sensible hypothesis across them all? State your hypotheses. What tests them? A single-d.f. test is what gives best power, whenever one of those can be used. I favor constructing contrasts -- sometimes in the form of separate variables -- over tests that include multiple d.f. and multiple hypotheses, all at once. And I would rather remove the causes of heterogeneity (variances or correlations) beforehand, than have to hope that I have suitably corrected for it. -- Rich Ulrich From: SPSSX(r) Discussion <[hidden email]> on behalf of Rudobeck, Emil (LLU) <[hidden email]> Sent: Thursday, October 6, 2016 2:20 PM To: [hidden email] Subject: Undefined Mauchly's Test Given the paucity of information online, I was wondering if anyone knows the procedural approach to the evaluation of sphericity when Mauchly's test is undefined, which is the case when the number of repeated levels is larger than the number of subjects (insufficient df). I am not sure if sphericity can still be assumed based on the reported values of epsilon larger than 0.75, whether based on Greenhouse-Geisser or Huynh-Feldt. In one particular dataset, epsilon is less than 0.1. Presumably it can be assumed that sphericity is violated when epsilon is that low. I am aware of using mixed models to overcome the assumptions of sphericity. My concern is with GLM in this case. WARNING: Please be vigilant when opening emails that appear to be the least bit out of the ordinary, e.g. someone you usually don't hear from, or attachments you usually don't receive or didn't expect, requests to click links or log into systems, etc. If you receive suspicious emails, please do not open attachments or links and immediately forward the suspicious email to [hidden email] and then delete the suspicious email. CONFIDENTIALITY NOTICE: This e-mail communication and any attachments may contain confidential and privileged information for the use of the designated recipients named above. If you are not the intended recipient, you are hereby notified that you have received this communication in error and that any review, disclosure, dissemination, distribution or copying of it or its contents is prohibited. If you have received this communication in error, please notify me immediately by replying to this message and destroy all copies of this communication and any attachments. Thank you. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Mike, thank you for your explanations and references. Even without a widely used procedure, I think the methods you mentioned are rather thorough for finding out if sphericity is violated and I can use them whenever Mauchly’s test is unavailable (or in some cases to complement or substitute for Mauchly’s test).
I am familiar with the matrices in your link since I use linear mixed models (LMM). As long as we’re on the topic, I wanted to clarify something: I had read a while ago that repeated measures ANOVA assumes the CS structure and it’s one of its weaknesses as compared to LMM, which has flexible covariance structures. With a better understanding of sphericity, I’m curious as to how ANOVA could use CS where in fact sphericity (a special case of CS) is all that’s required to meet the conditions of the test. Maybe I have misunderstood something. My main question was addressing specifically Mauchly’s test. Design is a separate question, we can certainly look at that. Before proceeding, I should say that I am in fact using LMM to analyze most repeated measures since it has many advantages. However, building models is rather time consuming and sometimes I still do resort to ANOVA for quick calculations. Another quick note: although the measurements I’ve been doing have been employed in neuroscience for decades, the vast majority of statistical analyses have been anything but rigorous (and many are usually considered wrong, such as running individual t-tests for specific repeated measures without adjusting alpha). Bad stats in papers is a well-known issue and unfortunately neuroscience is prone to incorrect stats much more than the social or geological sciences. Quite often there aren’t enough (or any) details to trace back the statistics. Bare bones of the design: 30 animals are divided into 2 treatment doses and one control, for a total of 3 groups (sometimes more). Each animal’s hippocampus is sectioned into thin slices. Recordings are collected from 1-2 slices per animal. Essentially, each slice is electrically stimulated and the baseline response is recorded every 50 s, for about 15 min (18 repetitions). Then a strong train of pulses is applied, after which the recordings (by now potentiated due to the train) are resumed for another 60 minutes or longer (72+ repetitions). This is known as long-term potentiation and is thought to be the process that helps us learn new information. The final result looks like an exponential curve, as can be seen here in Fig.A: http://www.pnas.org/content/109/43/17651/F1.large.jpg. The responses are normalized to the pre-train input and only the post-train curves are compared to each other. The repeated measures are a time-varying covariate, so I have been using LMM with polynomial regression to analyze the data, which I think is perfect for it. However, if there are other suggestions, I’d be curious to hear them. By the way, I have also tried SEM, but SEM is really sensitive to sample size. I cannot analyze this data with SEM unless I drop points or average them, which introduces its own statistical issues. I prefer to use the entire data since interactions can be important. The hypothesis is that the later phase of the curves will be decreased compared to the control group. Sure, I could just compare the later phases to each other, where the trends are purely linear, but having done the experiments, it would make no sense at all to ignore any possible differences during the early phase. Hence my notion that the entire duration is important – the data is too precious to waste. While the biological mechanisms are different for the early vs late response, no strict cutoff has been established. I could choose an approximate cutoff and divide the curve into 2 or 3 pieces. I think this would require spline analysis, which SPSS can’t do easily. Furthermore, alpha would need to be further adjusted for each additional piece that’s created and I think this “punishment” could be rather severe. That’s why my solution remains LMM, despite that it's a pain in the ass to go through all the models. ER ________________________________________ From: SPSSX(r) Discussion [[hidden email]] on behalf of Mike Palij [[hidden email]] Sent: Saturday, October 08, 2016 1:13 PM To: [hidden email] Subject: Re: Undefined Mauchly's Test On Saturday, October 08, 2016 12:53 PM, Rich Ulrich writes: > Yes, my question was about design. If you don't have a design, >you don't have anything to talk about. If I can re-state what Rich is saying: "Design drives Analysis". Designs are set up so that certain variables/factors are allowed to express an effect on an outcome/dependent variable, both alone or in combination with other factors. >If your method is "well established" and in use for decades, >you should hardly have any question for us. It might clarify things if the OP provided a reference to a published article(s) that show the analysis/analyses he is trying to duplicate. >If you can't follow their method, you must be doing something >different. They never reported Mauchly's test in a situation where >it is impossible to compute (I hope). After doing a search of the literature, let me try to restate the OP original question/situation. Let N equal the sample size and P equal the number of repeated measures. Mauchly's test and other likelihood tests are undefined when N < P. Are there tests for sphericity when N < P? Muni S, Srivastava has done most of the work in this area in the past few decades and one relevant source for the OP is: Srivastava, M. S. (2006). Some tests criteria for the covariance matrix with fewer observations than the dimension. Acta Comment. Univ. Tartu. Math, 10, 77-93. A copy can be obtained at: http://www.utstat.utoronto.ca/~srivasta/covariance1.pdf A scholar.google.com search of Srivastava and sphericity test will provide a shipload of references by and on Srivastava's work in this area. The next question is whether Srivastava's test is implemented in any of the standard statistical packages or does one have row one's own version. I found one paper dealing with this situation but it uses SAS IML for a macro called LINMOD for conducting testing. It is: Chi, Y.-Y., Gribbin, M., Lamers, Y., Gregory, J. F., & Muller, K. E. (2012). Global hypothesis testing for high-dimensional repeated measures outcomes. Statistics in Medicine, 31(8), 724-742. http://doi.org/10.1002/sim.4435 Available at pubmed at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3396026/ LINMOD and other software goodies is available at: http://samplesizeshop.org/software-downloads/other/ >Mauchly's is a warning that you should be careful about the tests. >Well, with 72 periods of measure in an hour, you /should/ be careful >about the tests, period. The references above appear to deal with situations where P is large (e.g., DNA microarrays) and the traditional methods fail. The OP should be familiar with at least some of this literature. Also, I guess that LINMOD might be translatable into SPSS matrix language which the OP might consider doing (if doing the analysis in SAS is not an option). Or he can pay Dave Marso to do it. ;-) >But the tests that you should be concerned about are tests of >hypothesis. I preferred using the old BDMP program (2V? 4V?) If you're talking about orthogonal polynomial analysis, it is 2V (I still have the manuals; one specifies "Orthogonal." in the /Design paragraph, along with "Point(j)= .." if the spacing is not constant). >because it gave linear trends (say) with the test performed >against the actual variation measured for the trend. (I don't >know if SPSS can do that testing, but it is not automatic.) When done in GLM, a repeated measures ANOVA will automatically generate the orthogonal polynomial or one can specify the degree of the polynomial (one probably doesn't want the output for 71 polynomials). This is one of the annoying features of GLM because it produces this output even when the within-subject factor is a unordered category. >Anyway, if you can't state the hypothesis, you can't get started. Or, you can use that statistics to serve as an "automatic inference engine" as Gerd Gigerenzer calls it when one engages in "mindless statistics". >If you expect an early response followed by a later decline >(which could be reasonable for a brain-response to stimulus), >the simplest execution might be to break the 72 periods into >two or more sets: early response, middle, later. That is >especially true if the early response is very strong: Look for >the early linear trend, then see if it continues or if the means >regress back to the start. Or ask for linear, quadratic and cubic polynomials (maybe up to quintic) as well as looking at the profiles. NOTE: In support of Rich's point for using polynomials, Tabachnick & Fidell (6th Ed) make the same point, in fact, calling it the "best" solution (see page 332). -Mike Palij New York University [hidden email] >Rich Ulrich ----------- Original Message --------- On Friday, October 7, 2016 1:27 PM, Emil Rudobeck wrote: >When it comes to reporting the findings, despite the >shortcomings, Mauchly's test is widely used and understood. >I haven't come across running t-tests on variances. Where >can I read more about that approach? I think that Rich meant paired t-tests between means at two time points. In another post I identify a t-test for testing whether two related variances are equal. >The measurement is time - brain responses are sampled >from each subject for a period of time (e.g., 72 samples >during an hour) after a "learning" stimulus is applied. >So change over time is expected biologically. This is >essentially a nonlinear growth curve and I know that there >are more advanced approaches (LMM, SEM, etc) which >I use as well, but my concern here is with sphericity in GLM. >Transformation is not going to address the issue, nor will >larger N be feasible. It is possible to perhaps average >adjacent time points, but this would introduce its own problems. >This is a mixed design since subjects are grouped into >different treatments and it's the differences in treatment >that's important. See the references I provide above. >Your question is more about design than stats. Certainly, >if you have any suggestions, I would be interested. It is typical to describe the design in terms of whether one has within-subject factors, between-subjects factors, or both and how many levels there are. Given what you say above, one would assume you have a 2 x 72 mixed design with one between-subjects factor with 2 levels and one within-subject factor with 72 levels. But I think that your design might be a little more complicated. >The current method is well established and has been used >for decades. Whether the individual repeated measures >are different or not does not matter too much in this case. >It's more important whether the curves themselves between >treatment groups are different (the between factor). Using >repeated measures overcomes the issue of correlations, >since there is no other way around it. I'm not sure I understand the last sentence but I'd just point out that you can graph the profile (i.e., repeated measures) for each group and see if they are parallel or have different curves -- the latter would be indicated by a significant group by level of polynomial effect in the polynomial results. -MP From: Rich Ulrich [[hidden email]] Sent: Thursday, October 06, 2016 8:06 PM To: [hidden email]; Rudobeck, Emil (LLU) Subject: Re: Undefined Mauchly's Test For small and moderate samples, a non-significant Mauchly's test does not mean much at all. That is why many people will recommend, wisely, that followup test be performed as paired t-tests instead of using some pooled variance term. What are you measuring? Is it a good measure, with good scaling expected and no outliers observed? I don't like analyses where those corrections are made, unless I have a decent understanding of why they are required, such as, the presence of excess zeroes. Would some transformation be thought of, by anyone? Analyzing with unnecessarily-unequal variances is a way to get into unneeded trouble. If the "levels" represent time, it might be appropriate and proper to test a much more powerful hypothesis that makes use of contrasts (linear for growth, etc.) in order to overcome the inevitable decline in correlations across time. You say: more levels than subjects -- Is this because you have very small N or because you have moderate N but also have too many levels to test a sensible hypothesis across them all? State your hypotheses. What tests them? A single-d.f. test is what gives best power, whenever one of those can be used. I favor constructing contrasts -- sometimes in the form of separate variables -- over tests that include multiple d.f. and multiple hypotheses, all at once. And I would rather remove the causes of heterogeneity (variances or correlations) beforehand, than have to hope that I have suitably corrected for it. -- Rich Ulrich From: SPSSX(r) Discussion <[hidden email]> on behalf of Rudobeck, Emil (LLU) <[hidden email]> Sent: Thursday, October 6, 2016 2:20 PM To: [hidden email] Subject: Undefined Mauchly's Test Given the paucity of information online, I was wondering if anyone knows the procedural approach to the evaluation of sphericity when Mauchly's test is undefined, which is the case when the number of repeated levels is larger than the number of subjects (insufficient df). I am not sure if sphericity can still be assumed based on the reported values of epsilon larger than 0.75, whether based on Greenhouse-Geisser or Huynh-Feldt. In one particular dataset, epsilon is less than 0.1. Presumably it can be assumed that sphericity is violated when epsilon is that low. I am aware of using mixed models to overcome the assumptions of sphericity. My concern is with GLM in this case. WARNING: Please be vigilant when opening emails that appear to be the least bit out of the ordinary, e.g. someone you usually don't hear from, or attachments you usually don't receive or didn't expect, requests to click links or log into systems, etc. If you receive suspicious emails, please do not open attachments or links and immediately forward the suspicious email to [hidden email] and then delete the suspicious email. CONFIDENTIALITY NOTICE: This e-mail communication and any attachments may contain confidential and privileged information for the use of the designated recipients named above. If you are not the intended recipient, you are hereby notified that you have received this communication in error and that any review, disclosure, dissemination, distribution or copying of it or its contents is prohibited. If you have received this communication in error, please notify me immediately by replying to this message and destroy all copies of this communication and any attachments. Thank you. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Right, it /looks like/ the first 10 or 12 minutes are different from the later minutes. That rather undermines the hope of fitting
a good single, 1-parameter curve to the whole.
Design? The question of adjusting alpha only arises if you are assuming that all the tests are equally important, and have no hierarchy.
It does appear, if those error bars are meaningful, that there is a very clear difference in the latter portion of the curves.
If that is a "primary and most important effect", it seems worth reporting based on its on difference in the linear trend lines,
both mean and slope. Whether the early (and different) part of the curve also differs would obviously be of interest, too, and I would feel comfortable in no-correction, no "punishment" at all.
-- Rich Ulrich
From: SPSSX(r) Discussion <[hidden email]> on behalf of Rudobeck, Emil (LLU) <[hidden email]>
Sent: Thursday, October 13, 2016 6:56:38 PM To: [hidden email] Subject: Re: Undefined Mauchly's Test Mike, thank you for your explanations and references. Even without a widely used procedure, I think the methods you mentioned are rather thorough for finding out if sphericity is violated and I can use them whenever Mauchly’s test is
unavailable (or in some cases to complement or substitute for Mauchly’s test).
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
I am familiar with the matrices in your link since I use linear mixed models (LMM). As long as we’re on the topic, I wanted to clarify something: I had read a while ago that repeated measures ANOVA assumes the CS structure and it’s one of its weaknesses as compared to LMM, which has flexible covariance structures. With a better understanding of sphericity, I’m curious as to how ANOVA could use CS where in fact sphericity (a special case of CS) is all that’s required to meet the conditions of the test. Maybe I have misunderstood something. My main question was addressing specifically Mauchly’s test. Design is a separate question, we can certainly look at that. Before proceeding, I should say that I am in fact using LMM to analyze most repeated measures since it has many advantages. However, building models is rather time consuming and sometimes I still do resort to ANOVA for quick calculations. Another quick note: although the measurements I’ve been doing have been employed in neuroscience for decades, the vast majority of statistical analyses have been anything but rigorous (and many are usually considered wrong, such as running individual t-tests for specific repeated measures without adjusting alpha). Bad stats in papers is a well-known issue and unfortunately neuroscience is prone to incorrect stats much more than the social or geological sciences. Quite often there aren’t enough (or any) details to trace back the statistics. Bare bones of the design: 30 animals are divided into 2 treatment doses and one control, for a total of 3 groups (sometimes more). Each animal’s hippocampus is sectioned into thin slices. Recordings are collected from 1-2 slices per animal. Essentially, each slice is electrically stimulated and the baseline response is recorded every 50 s, for about 15 min (18 repetitions). Then a strong train of pulses is applied, after which the recordings (by now potentiated due to the train) are resumed for another 60 minutes or longer (72+ repetitions). This is known as long-term potentiation and is thought to be the process that helps us learn new information. The final result looks like an exponential curve, as can be seen here in Fig.A: http://www.pnas.org/content/109/43/17651/F1.large.jpg. The responses are normalized to the pre-train input and only the post-train curves are compared to each other. The repeated measures are a time-varying covariate, so I have been using LMM with polynomial regression to analyze the data, which I think is perfect for it. However, if there are other suggestions, I’d be curious to hear them. By the way, I have also tried SEM, but SEM is really sensitive to sample size. I cannot analyze this data with SEM unless I drop points or average them, which introduces its own statistical issues. I prefer to use the entire data since interactions can be important. The hypothesis is that the later phase of the curves will be decreased compared to the control group. Sure, I could just compare the later phases to each other, where the trends are purely linear, but having done the experiments, it would make no sense at all to ignore any possible differences during the early phase. Hence my notion that the entire duration is important – the data is too precious to waste. While the biological mechanisms are different for the early vs late response, no strict cutoff has been established. I could choose an approximate cutoff and divide the curve into 2 or 3 pieces. I think this would require spline analysis, which SPSS can’t do easily. Furthermore, alpha would need to be further adjusted for each additional piece that’s created and I think this “punishment” could be rather severe. That’s why my solution remains LMM, despite that it's a pain in the ass to go through all the models. ER ________________________________________ From: SPSSX(r) Discussion [[hidden email]] on behalf of Mike Palij [[hidden email]] Sent: Saturday, October 08, 2016 1:13 PM To: [hidden email] Subject: Re: Undefined Mauchly's Test On Saturday, October 08, 2016 12:53 PM, Rich Ulrich writes: > Yes, my question was about design. If you don't have a design, >you don't have anything to talk about. If I can re-state what Rich is saying: "Design drives Analysis". Designs are set up so that certain variables/factors are allowed to express an effect on an outcome/dependent variable, both alone or in combination with other factors. >If your method is "well established" and in use for decades, >you should hardly have any question for us. It might clarify things if the OP provided a reference to a published article(s) that show the analysis/analyses he is trying to duplicate. >If you can't follow their method, you must be doing something >different. They never reported Mauchly's test in a situation where >it is impossible to compute (I hope). After doing a search of the literature, let me try to restate the OP original question/situation. Let N equal the sample size and P equal the number of repeated measures. Mauchly's test and other likelihood tests are undefined when N < P. Are there tests for sphericity when N < P? Muni S, Srivastava has done most of the work in this area in the past few decades and one relevant source for the OP is: Srivastava, M. S. (2006). Some tests criteria for the covariance matrix with fewer observations than the dimension. Acta Comment. Univ. Tartu. Math, 10, 77-93. A copy can be obtained at: http://www.utstat.utoronto.ca/~srivasta/covariance1.pdf A scholar.google.com search of Srivastava and sphericity test will provide a shipload of references by and on Srivastava's work in this area. The next question is whether Srivastava's test is implemented in any of the standard statistical packages or does one have row one's own version. I found one paper dealing with this situation but it uses SAS IML for a macro called LINMOD for conducting testing. It is: Chi, Y.-Y., Gribbin, M., Lamers, Y., Gregory, J. F., & Muller, K. E. (2012). Global hypothesis testing for high-dimensional repeated measures outcomes. Statistics in Medicine, 31(8), 724-742. http://doi.org/10.1002/sim.4435 Available at pubmed at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3396026/ LINMOD and other software goodies is available at: http://samplesizeshop.org/software-downloads/other/ >Mauchly's is a warning that you should be careful about the tests. >Well, with 72 periods of measure in an hour, you /should/ be careful >about the tests, period. The references above appear to deal with situations where P is large (e.g., DNA microarrays) and the traditional methods fail. The OP should be familiar with at least some of this literature. Also, I guess that LINMOD might be translatable into SPSS matrix language which the OP might consider doing (if doing the analysis in SAS is not an option). Or he can pay Dave Marso to do it. ;-) >But the tests that you should be concerned about are tests of >hypothesis. I preferred using the old BDMP program (2V? 4V?) If you're talking about orthogonal polynomial analysis, it is 2V (I still have the manuals; one specifies "Orthogonal." in the /Design paragraph, along with "Point(j)= .." if the spacing is not constant). >because it gave linear trends (say) with the test performed >against the actual variation measured for the trend. (I don't >know if SPSS can do that testing, but it is not automatic.) When done in GLM, a repeated measures ANOVA will automatically generate the orthogonal polynomial or one can specify the degree of the polynomial (one probably doesn't want the output for 71 polynomials). This is one of the annoying features of GLM because it produces this output even when the within-subject factor is a unordered category. >Anyway, if you can't state the hypothesis, you can't get started. Or, you can use that statistics to serve as an "automatic inference engine" as Gerd Gigerenzer calls it when one engages in "mindless statistics". >If you expect an early response followed by a later decline >(which could be reasonable for a brain-response to stimulus), >the simplest execution might be to break the 72 periods into >two or more sets: early response, middle, later. That is >especially true if the early response is very strong: Look for >the early linear trend, then see if it continues or if the means >regress back to the start. Or ask for linear, quadratic and cubic polynomials (maybe up to quintic) as well as looking at the profiles. NOTE: In support of Rich's point for using polynomials, Tabachnick & Fidell (6th Ed) make the same point, in fact, calling it the "best" solution (see page 332). -Mike Palij New York University [hidden email] >Rich Ulrich ----------- Original Message --------- On Friday, October 7, 2016 1:27 PM, Emil Rudobeck wrote: >When it comes to reporting the findings, despite the >shortcomings, Mauchly's test is widely used and understood. >I haven't come across running t-tests on variances. Where >can I read more about that approach? I think that Rich meant paired t-tests between means at two time points. In another post I identify a t-test for testing whether two related variances are equal. >The measurement is time - brain responses are sampled >from each subject for a period of time (e.g., 72 samples >during an hour) after a "learning" stimulus is applied. >So change over time is expected biologically. This is >essentially a nonlinear growth curve and I know that there >are more advanced approaches (LMM, SEM, etc) which >I use as well, but my concern here is with sphericity in GLM. >Transformation is not going to address the issue, nor will >larger N be feasible. It is possible to perhaps average >adjacent time points, but this would introduce its own problems. >This is a mixed design since subjects are grouped into >different treatments and it's the differences in treatment >that's important. See the references I provide above. >Your question is more about design than stats. Certainly, >if you have any suggestions, I would be interested. It is typical to describe the design in terms of whether one has within-subject factors, between-subjects factors, or both and how many levels there are. Given what you say above, one would assume you have a 2 x 72 mixed design with one between-subjects factor with 2 levels and one within-subject factor with 72 levels. But I think that your design might be a little more complicated. >The current method is well established and has been used >for decades. Whether the individual repeated measures >are different or not does not matter too much in this case. >It's more important whether the curves themselves between >treatment groups are different (the between factor). Using >repeated measures overcomes the issue of correlations, >since there is no other way around it. I'm not sure I understand the last sentence but I'd just point out that you can graph the profile (i.e., repeated measures) for each group and see if they are parallel or have different curves -- the latter would be indicated by a significant group by level of polynomial effect in the polynomial results. -MP From: Rich Ulrich [[hidden email]] Sent: Thursday, October 06, 2016 8:06 PM To: [hidden email]; Rudobeck, Emil (LLU) Subject: Re: Undefined Mauchly's Test For small and moderate samples, a non-significant Mauchly's test does not mean much at all. That is why many people will recommend, wisely, that followup test be performed as paired t-tests instead of using some pooled variance term. What are you measuring? Is it a good measure, with good scaling expected and no outliers observed? I don't like analyses where those corrections are made, unless I have a decent understanding of why they are required, such as, the presence of excess zeroes. Would some transformation be thought of, by anyone? Analyzing with unnecessarily-unequal variances is a way to get into unneeded trouble. If the "levels" represent time, it might be appropriate and proper to test a much more powerful hypothesis that makes use of contrasts (linear for growth, etc.) in order to overcome the inevitable decline in correlations across time. You say: more levels than subjects -- Is this because you have very small N or because you have moderate N but also have too many levels to test a sensible hypothesis across them all? State your hypotheses. What tests them? A single-d.f. test is what gives best power, whenever one of those can be used. I favor constructing contrasts -- sometimes in the form of separate variables -- over tests that include multiple d.f. and multiple hypotheses, all at once. And I would rather remove the causes of heterogeneity (variances or correlations) beforehand, than have to hope that I have suitably corrected for it. -- Rich Ulrich From: SPSSX(r) Discussion <[hidden email]> on behalf of Rudobeck, Emil (LLU) <[hidden email]> Sent: Thursday, October 6, 2016 2:20 PM To: [hidden email] Subject: Undefined Mauchly's Test Given the paucity of information online, I was wondering if anyone knows the procedural approach to the evaluation of sphericity when Mauchly's test is undefined, which is the case when the number of repeated levels is larger than the number of subjects (insufficient df). I am not sure if sphericity can still be assumed based on the reported values of epsilon larger than 0.75, whether based on Greenhouse-Geisser or Huynh-Feldt. In one particular dataset, epsilon is less than 0.1. Presumably it can be assumed that sphericity is violated when epsilon is that low. I am aware of using mixed models to overcome the assumptions of sphericity. My concern is with GLM in this case. WARNING: Please be vigilant when opening emails that appear to be the least bit out of the ordinary, e.g. someone you usually don't hear from, or attachments you usually don't receive or didn't expect, requests to click links or log into systems, etc. If you receive suspicious emails, please do not open attachments or links and immediately forward the suspicious email to [hidden email] and then delete the suspicious email. CONFIDENTIALITY NOTICE: This e-mail communication and any attachments may contain confidential and privileged information for the use of the designated recipients named above. If you are not the intended recipient, you are hereby notified that you have received this communication in error and that any review, disclosure, dissemination, distribution or copying of it or its contents is prohibited. If you have received this communication in error, please notify me immediately by replying to this message and destroy all copies of this communication and any attachments. Thank you. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
I have found that cubic/quartic polynomials, along with the occasional transformation, provide a good fit with LMM - based on both visual examinations and curve fitting tests in
SigmaPlot. In some cases, non-linear mixed models would probably fit better, but SPSS wouldn't help here.
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
"The question of adjusting alpha only arises if you are assuming that all the tests are equally important, and have no hierarchy. It does appear, if those error bars are meaningful, that there is a very clear difference in the latter portion of the curves." Need some clarification of the above. I always assume if you're publishing a result, then it's important. Without it, this could leave the door open for all kinds of statistical acrobatics. It seems you're also advocating analyzing the later portion since the difference is there. However, here again alpha of 0.05 would be violated if one looks at the graph and analyses the part with the greatest difference. Paramount to visual statistics vs true a priori selection. The curves don't always look so nicely separated in either case: http://anesthesiology.pubs.asahq.org/data/Journals/JASA/931052/17FF5.png. That's also true for some of my datasets. Are you suggesting fitting a line for each individual animal and then running two-way ANOVA comparing the slopes and means between treatments groups? No intercept? And how would the early, non-linear part of the curves be compared? I would be rather curious about references that would allow me to skip adjustments of alpha. I have talked to several statisticians and when they had suggested breaking the graph into several parts, I specifically asked about apha and was told that an adjustment would need to be made. That's why some sort of a reference would be pretty helpful here. Maybe others can chime in. From: Rich Ulrich [[hidden email]]
Sent: Thursday, October 13, 2016 11:03 PM To: [hidden email]; Rudobeck, Emil (LLU) Subject: Re: Undefined Mauchly's Test
Right, it /looks like/ the first 10 or 12 minutes are different from the later minutes. That rather undermines the hope of fitting
a good single, 1-parameter curve to the whole.
Design? The question of adjusting alpha only arises if you are assuming that all the tests are equally important, and have no hierarchy.
It does appear, if those error bars are meaningful, that there is a very clear difference in the latter portion of the curves.
If that is a "primary and most important effect", it seems worth reporting based on its on difference in the linear trend lines,
both mean and slope. Whether the early (and different) part of the curve also differs would obviously be of interest, too, and I would feel comfortable in no-correction, no "punishment" at all.
-- Rich Ulrich
From: SPSSX(r) Discussion <[hidden email]> on behalf of Rudobeck, Emil (LLU) <[hidden email]>
Sent: Thursday, October 13, 2016 6:56:38 PM To: [hidden email] Subject: Re: Undefined Mauchly's Test Mike, thank you for your explanations and references. Even without a widely used procedure, I think the methods you mentioned are rather thorough for finding out if sphericity is violated and I can use them whenever Mauchly’s test is
unavailable (or in some cases to complement or substitute for Mauchly’s test).
I am familiar with the matrices in your link since I use linear mixed models (LMM). As long as we’re on the topic, I wanted to clarify something: I had read a while ago that repeated measures ANOVA assumes the CS structure and it’s one of its weaknesses as compared to LMM, which has flexible covariance structures. With a better understanding of sphericity, I’m curious as to how ANOVA could use CS where in fact sphericity (a special case of CS) is all that’s required to meet the conditions of the test. Maybe I have misunderstood something. My main question was addressing specifically Mauchly’s test. Design is a separate question, we can certainly look at that. Before proceeding, I should say that I am in fact using LMM to analyze most repeated measures since it has many advantages. However, building models is rather time consuming and sometimes I still do resort to ANOVA for quick calculations. Another quick note: although the measurements I’ve been doing have been employed in neuroscience for decades, the vast majority of statistical analyses have been anything but rigorous (and many are usually considered wrong, such as running individual t-tests for specific repeated measures without adjusting alpha). Bad stats in papers is a well-known issue and unfortunately neuroscience is prone to incorrect stats much more than the social or geological sciences. Quite often there aren’t enough (or any) details to trace back the statistics. Bare bones of the design: 30 animals are divided into 2 treatment doses and one control, for a total of 3 groups (sometimes more). Each animal’s hippocampus is sectioned into thin slices. Recordings are collected from 1-2 slices per animal. Essentially, each slice is electrically stimulated and the baseline response is recorded every 50 s, for about 15 min (18 repetitions). Then a strong train of pulses is applied, after which the recordings (by now potentiated due to the train) are resumed for another 60 minutes or longer (72+ repetitions). This is known as long-term potentiation and is thought to be the process that helps us learn new information. The final result looks like an exponential curve, as can be seen here in Fig.A: http://www.pnas.org/content/109/43/17651/F1.large.jpg. The responses are normalized to the pre-train input and only the post-train curves are compared to each other. The repeated measures are a time-varying covariate, so I have been using LMM with polynomial regression to analyze the data, which I think is perfect for it. However, if there are other suggestions, I’d be curious to hear them. By the way, I have also tried SEM, but SEM is really sensitive to sample size. I cannot analyze this data with SEM unless I drop points or average them, which introduces its own statistical issues. I prefer to use the entire data since interactions can be important. The hypothesis is that the later phase of the curves will be decreased compared to the control group. Sure, I could just compare the later phases to each other, where the trends are purely linear, but having done the experiments, it would make no sense at all to ignore any possible differences during the early phase. Hence my notion that the entire duration is important – the data is too precious to waste. While the biological mechanisms are different for the early vs late response, no strict cutoff has been established. I could choose an approximate cutoff and divide the curve into 2 or 3 pieces. I think this would require spline analysis, which SPSS can’t do easily. Furthermore, alpha would need to be further adjusted for each additional piece that’s created and I think this “punishment” could be rather severe. That’s why my solution remains LMM, despite that it's a pain in the ass to go through all the models. ER ________________________________________ From: SPSSX(r) Discussion [[hidden email]] on behalf of Mike Palij [[hidden email]] Sent: Saturday, October 08, 2016 1:13 PM To: [hidden email] Subject: Re: Undefined Mauchly's Test On Saturday, October 08, 2016 12:53 PM, Rich Ulrich writes: > Yes, my question was about design. If you don't have a design, >you don't have anything to talk about. If I can re-state what Rich is saying: "Design drives Analysis". Designs are set up so that certain variables/factors are allowed to express an effect on an outcome/dependent variable, both alone or in combination with other factors. >If your method is "well established" and in use for decades, >you should hardly have any question for us. It might clarify things if the OP provided a reference to a published article(s) that show the analysis/analyses he is trying to duplicate. >If you can't follow their method, you must be doing something >different. They never reported Mauchly's test in a situation where >it is impossible to compute (I hope). After doing a search of the literature, let me try to restate the OP original question/situation. Let N equal the sample size and P equal the number of repeated measures. Mauchly's test and other likelihood tests are undefined when N < P. Are there tests for sphericity when N < P? Muni S, Srivastava has done most of the work in this area in the past few decades and one relevant source for the OP is: Srivastava, M. S. (2006). Some tests criteria for the covariance matrix with fewer observations than the dimension. Acta Comment. Univ. Tartu. Math, 10, 77-93. A copy can be obtained at: http://www.utstat.utoronto.ca/~srivasta/covariance1.pdf A scholar.google.com search of Srivastava and sphericity test will provide a shipload of references by and on Srivastava's work in this area. The next question is whether Srivastava's test is implemented in any of the standard statistical packages or does one have row one's own version. I found one paper dealing with this situation but it uses SAS IML for a macro called LINMOD for conducting testing. It is: Chi, Y.-Y., Gribbin, M., Lamers, Y., Gregory, J. F., & Muller, K. E. (2012). Global hypothesis testing for high-dimensional repeated measures outcomes. Statistics in Medicine, 31(8), 724-742. http://doi.org/10.1002/sim.4435 Available at pubmed at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3396026/ LINMOD and other software goodies is available at: http://samplesizeshop.org/software-downloads/other/ >Mauchly's is a warning that you should be careful about the tests. >Well, with 72 periods of measure in an hour, you /should/ be careful >about the tests, period. The references above appear to deal with situations where P is large (e.g., DNA microarrays) and the traditional methods fail. The OP should be familiar with at least some of this literature. Also, I guess that LINMOD might be translatable into SPSS matrix language which the OP might consider doing (if doing the analysis in SAS is not an option). Or he can pay Dave Marso to do it. ;-) >But the tests that you should be concerned about are tests of >hypothesis. I preferred using the old BDMP program (2V? 4V?) If you're talking about orthogonal polynomial analysis, it is 2V (I still have the manuals; one specifies "Orthogonal." in the /Design paragraph, along with "Point(j)= .." if the spacing is not constant). >because it gave linear trends (say) with the test performed >against the actual variation measured for the trend. (I don't >know if SPSS can do that testing, but it is not automatic.) When done in GLM, a repeated measures ANOVA will automatically generate the orthogonal polynomial or one can specify the degree of the polynomial (one probably doesn't want the output for 71 polynomials). This is one of the annoying features of GLM because it produces this output even when the within-subject factor is a unordered category. >Anyway, if you can't state the hypothesis, you can't get started. Or, you can use that statistics to serve as an "automatic inference engine" as Gerd Gigerenzer calls it when one engages in "mindless statistics". >If you expect an early response followed by a later decline >(which could be reasonable for a brain-response to stimulus), >the simplest execution might be to break the 72 periods into >two or more sets: early response, middle, later. That is >especially true if the early response is very strong: Look for >the early linear trend, then see if it continues or if the means >regress back to the start. Or ask for linear, quadratic and cubic polynomials (maybe up to quintic) as well as looking at the profiles. NOTE: In support of Rich's point for using polynomials, Tabachnick & Fidell (6th Ed) make the same point, in fact, calling it the "best" solution (see page 332). -Mike Palij New York University [hidden email] >Rich Ulrich ----------- Original Message --------- On Friday, October 7, 2016 1:27 PM, Emil Rudobeck wrote: >When it comes to reporting the findings, despite the >shortcomings, Mauchly's test is widely used and understood. >I haven't come across running t-tests on variances. Where >can I read more about that approach? I think that Rich meant paired t-tests between means at two time points. In another post I identify a t-test for testing whether two related variances are equal. >The measurement is time - brain responses are sampled >from each subject for a period of time (e.g., 72 samples >during an hour) after a "learning" stimulus is applied. >So change over time is expected biologically. This is >essentially a nonlinear growth curve and I know that there >are more advanced approaches (LMM, SEM, etc) which >I use as well, but my concern here is with sphericity in GLM. >Transformation is not going to address the issue, nor will >larger N be feasible. It is possible to perhaps average >adjacent time points, but this would introduce its own problems. >This is a mixed design since subjects are grouped into >different treatments and it's the differences in treatment >that's important. See the references I provide above. >Your question is more about design than stats. Certainly, >if you have any suggestions, I would be interested. It is typical to describe the design in terms of whether one has within-subject factors, between-subjects factors, or both and how many levels there are. Given what you say above, one would assume you have a 2 x 72 mixed design with one between-subjects factor with 2 levels and one within-subject factor with 72 levels. But I think that your design might be a little more complicated. >The current method is well established and has been used >for decades. Whether the individual repeated measures >are different or not does not matter too much in this case. >It's more important whether the curves themselves between >treatment groups are different (the between factor). Using >repeated measures overcomes the issue of correlations, >since there is no other way around it. I'm not sure I understand the last sentence but I'd just point out that you can graph the profile (i.e., repeated measures) for each group and see if they are parallel or have different curves -- the latter would be indicated by a significant group by level of polynomial effect in the polynomial results. -MP From: Rich Ulrich [[hidden email]] Sent: Thursday, October 06, 2016 8:06 PM To: [hidden email]; Rudobeck, Emil (LLU) Subject: Re: Undefined Mauchly's Test For small and moderate samples, a non-significant Mauchly's test does not mean much at all. That is why many people will recommend, wisely, that followup test be performed as paired t-tests instead of using some pooled variance term. What are you measuring? Is it a good measure, with good scaling expected and no outliers observed? I don't like analyses where those corrections are made, unless I have a decent understanding of why they are required, such as, the presence of excess zeroes. Would some transformation be thought of, by anyone? Analyzing with unnecessarily-unequal variances is a way to get into unneeded trouble. If the "levels" represent time, it might be appropriate and proper to test a much more powerful hypothesis that makes use of contrasts (linear for growth, etc.) in order to overcome the inevitable decline in correlations across time. You say: more levels than subjects -- Is this because you have very small N or because you have moderate N but also have too many levels to test a sensible hypothesis across them all? State your hypotheses. What tests them? A single-d.f. test is what gives best power, whenever one of those can be used. I favor constructing contrasts -- sometimes in the form of separate variables -- over tests that include multiple d.f. and multiple hypotheses, all at once. And I would rather remove the causes of heterogeneity (variances or correlations) beforehand, than have to hope that I have suitably corrected for it. -- Rich Ulrich From: SPSSX(r) Discussion <[hidden email]> on behalf of Rudobeck, Emil (LLU) <[hidden email]> Sent: Thursday, October 6, 2016 2:20 PM To: [hidden email] Subject: Undefined Mauchly's Test Given the paucity of information online, I was wondering if anyone knows the procedural approach to the evaluation of sphericity when Mauchly's test is undefined, which is the case when the number of repeated levels is larger than the number of subjects (insufficient df). I am not sure if sphericity can still be assumed based on the reported values of epsilon larger than 0.75, whether based on Greenhouse-Geisser or Huynh-Feldt. In one particular dataset, epsilon is less than 0.1. Presumably it can be assumed that sphericity is violated when epsilon is that low. I am aware of using mixed models to overcome the assumptions of sphericity. My concern is with GLM in this case. WARNING: Please be vigilant when opening emails that appear to be the least bit out of the ordinary, e.g. someone you usually don't hear from, or attachments you usually don't receive or didn't expect, requests to click links or log into systems, etc. If you receive suspicious emails, please do not open attachments or links and immediately forward the suspicious email to [hidden email] and then delete the suspicious email. CONFIDENTIALITY NOTICE: This e-mail communication and any attachments may contain confidential and privileged information for the use of the designated recipients named above. If you are not the intended recipient, you are hereby notified that you have received this communication in error and that any review, disclosure, dissemination, distribution or copying of it or its contents is prohibited. If you have received this communication in error, please notify me immediately by replying to this message and destroy all copies of this communication and any attachments. Thank you. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Rudobeck, Emil (LLU)
On Thursday, October 13, 2016 6:56 PM, Emil Rudobeck
>Mike, thank you for your explanations and references. Even >without a widely used procedure, I think the methods you >mentioned are rather thorough for finding out if sphericity is >violated and I can use them whenever Mauchly’s test is unavailable >(or in some cases to complement or substitute for Mauchly’s test). You're welcome. In the situation where N < P, I would suggest checking the current statistical literature whenever you can because it appears to me to be an active area of development, especially in the context of "big data" (e.g., when there are many more measures than cases). >I am familiar with the matrices in your link since I use linear >mixed models (LMM). As long as we’re on the topic, I wanted >to clarify something: I had read a while ago that repeated >measures ANOVA assumes the CS structure and it’s one >of its weaknesses as compared to LMM, which has flexible >covariance structures. With a better understanding of sphericity, >I’m curious as to how ANOVA could use CS where in fact >sphericity (a special case of CS) is all that’s required to meet >the conditions of the test. Maybe I have misunderstood something. First, let me point out that many statistics textbooks, especially in psychology, are terrible at citing sources for the points they make in their text. I think this is one reason there has been the "unholy amalgamation of Fisherian and Neyman-Pearson approaches" that Gerd Gigerenzer has complained about. Textbook authors often do not appear to understand how the Fisherian framework and the Neyman-Pearson framework differ nor how acrimonious the exchanges became between Fisher and Neyman over time (at one point Fisher said something compared Neyman's approach to a "communist plot" in statistics -- Fisher went over the edge and/or held some "unreasonable" beliefs even though he was genius in other areas). Getting to the point, it is unclear to me when repeated measures ANOVA as we know it was first presented (one could bet that Fisher did so in one of the editions of his book on Methods for Research Methods but I think a better bet would be one of the editions of Snedecor & Cochran's "Statistical Methods" -- Snedecor was the one who converted what Fisher called his "z-test" into what we now refer to as the "F test" [he provided the first F tables which by-passed the need to do calculations with logarithms in Fisher's z-test). so, it is unclear how the assumption of compound symmetry was asserted as necessary for the repeated measures ANOVA. I don't have the reference for Box circa 1950s that showed that if sphericity was violated, his correction to the degrees of freedom could be used to determine the significance of the F-test and would become the basis for further correctiosn by Huynh-Feldt and Greenhouse and Geisser. Given that Sphericity can be obtained without compound symmetry, the emphasis on sphericity and downplaying compound symmetry is understandable. Second, the concern with compound symmetry may not have originated with ANOVA. but in the field of psychometrics and the measurement model being used for the data. I think the following reference is relevant to this point: Wilks, S. S. Sample Criteria for Testing Equality of Means, Equality of Variances, and Equality of Covariances in a Normal Multivariate Distribution. Ann. Math. Statist. 17 (1946), no. 3, 257--281. NOTE: Available at: http://projecteuclid.org/euclid.aoms/1177730940 The paper is concerned with developing likelihood tests that test the following hypotheses: (1) All means are equal: H(m) tested by L(m) (2) All variances are equal: H(v) tested by L(v) (3) All covariances are equal: H(cv) text by L(cv) The working example that is used are three variables that are assumed to follow a "parallel" measurement model which assumes all means are equal, all variances are equal, and all covariances are equal. An omnibus test that test all of these conditions, that is, L(m,v,cv) is presented as well as likelihood tests for specific components, say, whether all variances are equal and all covariances are equal, that is, L(v,cv). Note that if the L(v,cv) is nonsignificant, one has compound symmetry and one can validly do a one-way repeated measures ANOVA (or as Wilks puts it "analysis of variance test for a k by n layout where k is the number of measures and n is the number of cases).-- see section 1.5 in Wilk's paper for some really ugly math in support of the point that L(m) is equivalent to one-way repeated measures ANOVA. I think that Wilks is working on extending the traditional assumption of independent groups ANOVA, namely, homogeneity of variance, and avoiding the problem that in the two group situation was known as the Fisher-Behrens problem, that is, heterogeneous variances (which wasn't yet solved but Welch, Brown-Forsyth, and others would provide solutions). One drawback of using L(m) is that it requires a large sample (see page 265). In his section 1.7, Wilks compares the test L(v,cv) with Mauchly's test for sphericity of a normal multivariate. The difference between the two test is that Mauchley's test was designed to test the hypothesis that all variances are equal and all covariances are equal to ZERO, regardless of whether the pop means differ or not. Wilks designates Mauchly's test as the likelihood L(s) which uses the sample standard deviation and variances -- Wilks points out that the test actually uses the sqrt[L(s)] but the two are equivalent. After some realy ugly math, Wilks concludes this section with the following: |Stated in other words, | | Mauchly's criterion L(s) is a test for the hypothesis that contours |of equal probability density in the multivariate normal population |distribution are spheres, while L(v,cv) is a test for the hypothesis that |the contours of equal prob'l,bility are k-dimensional ellipsoids |with k - 1 equal axes in general shorter than the k-th axis vhich is |equally inclined to the k coordinate axes of the distribution function. What this translates into is unclear but if one meditates on it like a Zen koan, I'm sure that enlightenment will eventually come. ;-) Wilks works through an example for his test and provide additional derivations. With respect to ANOVA, Wilks work in this article focuses on the measurement model for the data one is collecting. Clearly, he shows that the assumed model has an impact on the tests he is presenting but does not really connect it to ANOVA outside of point out how L(v,cv) differs from Mauchly's test, or, how compound symmetry differs from sphericity. Sphericity is a looser criterion to meet, focusing primarily on equality of variances, the traditional assumption made for ANOVA. It seems to me that most researchers don't think about the measurement model for their data and, thus, don't care whether their data meet the requirements for compound symmetry or sphericity. I believe that there probably has been more work in this area since 1948 but I don't know what that is outside of the various modifications to the degrees of freedom to adjust the F-tests for violations of sphericity. I'll stop my comments on this point since I have gone on far too long but, hopefully, with some benefit. HTH. -Mike Palij New York University [hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
A S.W.A.G.
perhaps the reason that the CS was assumed was that made hand calculation easier. We did not always have calculators that did square roots, let alone the software that we have in this century.
Art Kendall
Social Research Consultants |
On Friday, October 14, 2016 2:50 PM, Art Kendall wrote:
> >A S.W.A.G. Uh, okay. > perhaps the reason that the CS was assumed was that made > hand calculation easier. > > We did not always have calculators that did square roots, let alone > the > software that we have in this century. This would make sense if one had to compute that variance-covariance matrix in order to do a repeated measures ANOVA but, as someone who has done a repeated measures ANOVA by hand calculator, one does not need it. I have handouts with the definitional and computational formulas for one-way repeated measures ANOVA that I still use though today I don't have students do the hand computations (they used to but some can so long as well as make errors) instead I show how one can do it via Excel (with the Data Analysis toolpak: 2-way ANOVA without replication; one could go through the process of calculating the sum squares from the raw data but again this is time consuming).and then what SPSS GLM provides in addition to the simple output of Excel. I think that it was easier to assume compound symmetry because it is a simple extension of the homogeneity of variance assumption -- it is the assumption of homogeneity of variance plus homogeneity of correlation. After making these assumptions, wave your hands, and, say "Presto! Here are the Rep Meas ANOVA results!" NOTE: GLM does not even allow one to print out the variance-covariance matrix if one wanted to examine it. One has to obtain it with another procedure like correlation or reliability or one of the regression procedures and so on. MANOVA allows one to print the covariance matrix and other useful statistics. -Mike Palij New York University [hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
S.W.A.G.s about why people do things are silly, wild, guesses after all.
Another S.W.A.G. is that unfortunately most users do not think about assumptions. They copy what somebody else has done. Copy techniques is also "the path of least resistance". In my experience, in many academic departments the newest faculty member gets stuck with teaching stat and just follow a book.
Art Kendall
Social Research Consultants |
In reply to this post by Rudobeck, Emil (LLU)
(Outlook has started presenting this posts in a new way, without ">" indentation. I'm trying to find what works for Replies.) I have labeled the paragraphs below from A through E, and here are comments by paragraph.
For A. "Linear" is easy to understand. The problem with quadratic, cubic, quartic, etc., is that you seldom have just one.
So you have to look at whole plot. But I'll move on from that. I thought that you could "roll your own" models with tests in SPSS non-linear ML regression, but that wants to start with an obvious model. Which you lack.
For B(mine) and C. Yes, you want to avoid statistical acrobatics. If your study is totally exploratory, you should not be
worried about experiment-wise alpha level - you are generating hypotheses, not testing them. If you know of
similar studies or pilot data, you have expectations, some great and some small. SOMETHING justified spending
the money to collect the data. In the biggest studies I worked on, in psychiatric research of schizophrenic outpatients, the single hypothesis that justified the study was something like, "Are the rates of relapse (re-hospitalization) different?"
If "Yes", then the 20 or more rating scales provided supporting evidence of why. If "No", then the rating scales (hopefully) would supply clues as to why. In either case, I would proceed with a hierarchy of testing -- Test a composite score: If it is "significant" at 5%, then its sub-scores are legitimate to test separately at 5%, more or less, to describe why. what was otherwise showing up in the hierarchy of tests. In my own experience, the largest effects were almost always found where the PIs expected to find large effects, using the best scales, where effects had been seen before. - The journal illustration that you cite shows curves that are /fantastically/ well-separated, contrary to your description. After 5 or 10 minutes, the two groups are 3 or 4 s.e.'s apart, minute by minute, with Ns of 6+7 and 6+9. In both figures (like in your figure A), one group is asymptotic near the 100% baseline for Pre. For D. Yes, fit each animal; except that it is merely a one-way ANOVA (t-test) if you do one variable at a time and
Bonferroni-correct for having tested two variables, slope and mean. Generating the contrast for each animal gets you beyond all that concern with sphericity, etc. And it is clear from the pictures that the different slopes (if different) are not blamed on simple "regression to the mean" ... which is something to consider, whenever initial means differ.
[If I recall correctly, the BMDP2V program I mentioned before had the excellent default of computing its between-S contrasts
based on errors for slopes as actually computed within-S, in place of using the conventional decomposition of SS that is affected by sphericity.]
How you test the early, non-linear part of the curve depends on what you know about it and what you can figure out
to say about it. And that depends, probably, on what you know or suspect about the actual biology or chemistry or
physics that is taking place. My uneducated suggestion, from the pictures, would be to try an exponential decline of the excess over "zero" where the zero is modeled as the lowest value (say) of the latter part of the fitted line.
If that is possible, on the basis of single animals. For E. This is "Experimental Design", and it may go beyond "experimental design". I never took a course in that, and I don't know how much they say about "replication studies". There is always a little controversy or discussion of what comprises "separate and distinct hypotheses". When do you respect experiment-wise error, family-wise error, or single test error of 5%? Or 1%. Or whatever. When I say that the question may go beyond design, what I am thinking of is that your own area might have settled on standards for what to control. However, you still must have (I think) the power to say that THIS is what I think is important... and not THAT... The latter part of the line (say) is Main hypothesis; the early part is Exploratory. How many hypotheses are you trying to control for? How new are they? How much power do you have to spare? If a study has a bunch of hypotheses - 5? 10? - of equal merit and expectation-to-be-confirmed, are they separate and distinct hypotheses which merit a 5% test, each? Really? And not exploratory? If the pictures tell the story, your /main/ hypothesis of difference should be the latter minutes. If, for other reasons, the first 5 minutes tell the important story, then... What story is that? It might have seemed inconvenient to some people, but I thought it was fine that the protocol for our grant applications wanted us to state our hypotheses before the study started. In one case, we wrote into a grant that we intended to test one particular interaction with a 10% one-tailed test: because it was very relevant to /extending/ the narrative that we expected, but the statistical power would be too low to draw conclusions from the conventional, 2-tailed, 5% test. And a few years later, we got the editor and reviewers to accept the report of the test. It was not cherry-picking, since it was the single such test that we had laid out in advance. -- Rich Ulrich From: SPSSX(r) Discussion <[hidden email]> on behalf of Rudobeck, Emil (LLU) <[hidden email]>
Sent: Friday, October 14, 2016 11:28 AM To: [hidden email] Subject: Re: Undefined Mauchly's Test A.
I have found that cubic/quartic polynomials, along with the occasional transformation, provide a good fit with LMM - based on both visual examinations and curve fitting tests in SigmaPlot.
In some cases, non-linear mixed models would probably fit better, but SPSS wouldn't help here.
B. "The question of adjusting alpha only arises if you are assuming that all the tests are equally important, and have no hierarchy. It does appear, if those error bars are meaningful, that there is a very clear difference in the latter portion of the curves." C. Need some clarification of the above. I always assume if you're publishing a result, then it's important. Without it, this could leave the door open for all kinds of statistical acrobatics. It seems you're also advocating analyzing the later portion since the difference is there. However, here again alpha of 0.05 would be violated if one looks at the graph and analyses the part with the greatest difference. Paramount to visual statistics vs true a priori selection. The curves don't always look so nicely separated in either case: http://anesthesiology.pubs.asahq.org/data/Journals/JASA/931052/17FF5.png. That's also true for some of my datasets. D. Are you suggesting fitting a line for each individual animal and then running two-way ANOVA comparing the slopes and means between treatments groups? No intercept? And how would the early, non-linear part of the curves be compared? E. I would be rather curious about references that would allow me to skip adjustments of alpha. I have talked to several statisticians and when they had suggested breaking the graph into several parts, I specifically asked about apha and was told that an adjustment would need to be made. That's why some sort of a reference would be pretty helpful here. Maybe others can chime in. From: Rich Ulrich [[hidden email]]
Sent: Thursday, October 13, 2016 11:03 PM To: [hidden email]; Rudobeck, Emil (LLU) Subject: Re: Undefined Mauchly's Test
Right, it /looks like/ the first 10 or 12 minutes are different from the later minutes. That rather undermines the hope of fitting
a good single, 1-parameter curve to the whole.
Design? The question of adjusting alpha only arises if you are assuming that all the tests are equally important, and have no hierarchy.
It does appear, if those error bars are meaningful, that there is a very clear difference in the latter portion of the curves.
If that is a "primary and most important effect", it seems worth reporting based on its on difference in the linear trend lines,
both mean and slope. Whether the early (and different) part of the curve also differs would obviously be of interest, too, and I would feel comfortable in no-correction, no "punishment" at all.
-- Rich Ulrich
|
Administrator
|
In reply to this post by Mike
AWESOME Mike! Great historical recap.
"What this translates into is unclear but if one meditates on it like a Zen koan, I'm sure that enlightenment will eventually come. ;-) " Isn't is f'ing obvious? Don't need no skinkin koan. Start with 2D, think about it, do 3D think about it, try 4D if your head doesn't explode then do a recursive thing with 4+, dose on some psychedelics and you'll find it ;-)))).
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
On Sunday, October 16, 2016 2:42 AM, David Marso wrote:
> AWESOME Mike! Great historical recap. Thank you for the kind words. But I'm still trying to find when repeated measues ANOVA appeared in its current form (damned OCD! ;-). I do know how the form of the correlated groups t-test got into its current form. [snipped Wilk's description of multidimensional probability distributions for different tests] > "What this translates into is unclear but if one meditates on it like > a Zen koan, I'm sure that enlightenment will eventually come. ;-) " > > Isn't is f'ing obvious? > Don't need no skinkin koan. > Start with 2D, think about it, do 3D think about it, try 4D if your > head > doesn't explode then do a recursive thing with 4+, dose on some > psychedelics > and you'll find it ;-)))). Fisher, who had poor vision but strong visual-spatial imagination, could probably work it out in his head without the benefit of 'shrooms but would still have problems translating it into ordinary English. The problem is going beyond the 4th dimension to whatever the k-th dimension is. In grad school (the 1970s), my experimental design prof claimed to be doing work on the perception of 4D hypercubes -- I say "claimed to be doing work" because I never understood what he meant and apparently most other grad students felt the same way which meant that very little research got done -- so, he was made director of graduate studies in the psych dept. ;-) Back then, there were folks who did various drugs and I think they lost focus while doing psychedelics but I do know a couple of folks who would toke up before statistical analysis by hand (I don't know how they did it but one guy who was stoned all the time got his Ph.D. in clinical psych in three years -- I don't recommend being stoned all of the time as a way of getting through grad school). So, though drugs might be useful, meditation or deep thought might be a more useful procedure to use. -Mike Palij New York University [hidden email] ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |