Consider you have a between-within design: several between-subject
groups and several (3 or more) repeated measures (= within-subject)
trials. It's all very classic and typical. The nuance, however, is
that the values for every subject sum across the repeated levels to
a **constant**. This is because the data are complementary, i.e.
percentages of fractions, so, in this case they sum to 100 for every
individual. For example, with 3 RM levels, a respondent's data is
like 30%, 22%, 48% (sum=100); for another respondent 25%, 33%, 42%
(sum=100).
I know that I can analyze between-groups X repeated-measures count data via Generalized Estimating Equations procedure. By I doubt in this case because the values *sum to a constant*, they are complementary fractions; they are not counts of successes in repeated independent trials! Can I analyze such data in SPSS and how? Thanks. |
Administrator
|
Hello Kirill. This is not a direct answer to your question. I'm just pointing to a thread from a couple years ago that addressed the same question. One of my posts in it gives a couple of references that may be of interest to you. Both of them suggest that ANOVA generally works quite well with "ipsative" data (or "allocated observations"). You can see the relevant messages here:
http://listserv.uga.edu/cgi-bin/wa?A2=ind1101&L=spssx-l&P=36237 HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
I'd like to chip in to report that since that exchange, I came across real-world data where subjects were asked to rank order items. When faced with these data, I found an elegant solution to this problem:
I'm not suggesting that this article presents a solution to the OPs problem, but the article is relevant to the dependency issue. It's a good read for those who must contend with rank-ordered items.
Best, Ryan On Thu, Apr 4, 2013 at 6:30 AM, Bruce Weaver <[hidden email]> wrote: Hello Kirill. This is not a direct answer to your question. I'm just |
In reply to this post by Kirill Orlov
There is a literature on "compositional data" which probably will be helpful.
Years ago, I found Aitchison to be readable. I have no idea whether it will work for your model, but I will mention that you escape the absolute linear dependency if you represent each fraction as its log-odds, like log(25/75) in place of 25%. -- Rich Ulrich Date: Thu, 4 Apr 2013 12:05:47 +0400 From: [hidden email] Subject: Repeated measures analysis of fractions summing to a constant To: [hidden email] Consider you have a between-within design: several between-subject groups and several (3 or more) repeated measures (= within-subject) trials. It's all very classic and typical. The nuance, however, is that the values for every subject sum across the repeated levels to a **constant**. This is because the data are complementary, i.e. percentages of fractions, so, in this case they sum to 100 for every individual. For example, with 3 RM levels, a respondent's data is like 30%, 22%, 48% (sum=100); for another respondent 25%, 33%, 42% (sum=100). I know that I can analyze between-groups X repeated-measures count data via Generalized Estimating Equations procedure. By I doubt in this case because the values *sum to a constant*, they are complementary fractions; they are not counts of successes in repeated independent trials! Can I analyze such data in SPSS and how? Thanks. |
I will be out of the office until Tuesday December 11th. I will check email periodically. Dr. Gonzales |
Administrator
|
In reply to this post by Rich Ulrich
Judging from what I see on the Wikipedia page (http://en.wikipedia.org/wiki/Compositional_data), "compositional data" is another name for with Shaffer called "allocated observations" and Greer & Dunlap called "ipsative data". But it also looks like there are two sets of literature that do not overlap all that much.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
The Wikip article on "ipsative" tells me that my own use of
that term falls under the third type that they mention, where educators may standardize the scores for an individual based only on that individuals previous scores. It seems that you are apt to find several different uses under "ipsative" in addition to the one that resembles "compositional". -- Rich Ulrich > Date: Thu, 4 Apr 2013 11:42:15 -0700 > From: [hidden email] > Subject: Re: Repeated measures analysis of fractions summing to a constant > To: [hidden email] > > Judging from what I see on the Wikipedia page > (http://en.wikipedia.org/wiki/Compositional_data), "compositional data" is > another name for with Shaffer called "allocated observations" and Greer & > Dunlap called "ipsative data". But it also looks like there are two sets of > literature that do not overlap all that much. > > > > Rich Ulrich-2 wrote > > There is a literature on "compositional data" which probably will be > > helpful. > > Years ago, I found Aitchison to be readable. > > > > I have no idea whether it will work for your model, but I will mention > > that you escape the absolute linear dependency if you represent each > > fraction as its log-odds, like log(25/75) in place of 25%. > > > > -- > > Rich Ulrich > > > > Date: Thu, 4 Apr 2013 12:05:47 +0400 > > From: > > > kior@ > > > Subject: Repeated measures analysis of fractions summing to a constant > > To: > > > SPSSX-L@.UGA > > > > > > > Consider you have a between-within design: several between-subject > > groups and several (3 or more) repeated measures (= within-subject) > > trials. It's all very classic and typical. The nuance, however, is > > that the values for every subject sum across the repeated levels to > > a **constant**. This is because the data are complementary, i.e. > > percentages of fractions, so, in this case they sum to 100 for every > > individual. For example, with 3 RM levels, a respondent's data is > > like 30%, 22%, 48% (sum=100); for another respondent 25%, 33%, 42% > > (sum=100). > > > > > > > > I know that I can analyze between-groups X repeated-measures count > > data via Generalized Estimating Equations procedure. By I doubt in > > this case because the values *sum to a constant*, they are > > complementary fractions; they are not counts of successes in > > repeated independent trials! > > > > > > > > Can I analyze such data in SPSS and how? Thanks. > |
In reply to this post by Kirill Orlov
Would the OP mind explaining exactly what the DV is? It might help shed light on how to proceed, notwithstanding other interesting solutions. Speaking of which, the idea of converting the probs to logits is intriguing. I am generally in favor of the logit scale because of its properties, but the fact that it addresses the linear dependence is an added bonus here.
Anyway, I would appreciate if the OP would be willing to tell us more about the DV. Ryan On Apr 4, 2013, at 4:05 AM, Kirill Orlov <[hidden email]> wrote: > Consider you have a between-within design: several between-subject groups and several (3 or more) repeated measures (= within-subject) trials. It's all very classic and typical. The nuance, however, is that the values for every subject sum across the repeated levels to a **constant**. This is because the data are complementary, i.e. percentages of fractions, so, in this case they sum to 100 for every individual. For example, with 3 RM levels, a respondent's data is like 30%, 22%, 48% (sum=100); for another respondent 25%, 33%, 42% (sum=100). > > I know that I can analyze between-groups X repeated-measures count data via Generalized Estimating Equations procedure. By I doubt in this case because the values *sum to a constant*, they are complementary fractions; they are not counts of successes in repeated independent trials! > > Can I analyze such data in SPSS and how? Thanks. > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Ryan,
For example, the DV might be "how do you spend your typical day?" question Work __% of time Meals__% of time Stroll__% of time Comp/TV/Reading__% of time Else__% of time [Please check that your answers sum to 100%] Converting to logits might be interesting idea, although not necessarily most right. But I wonder if SPSS (GEE or other procedure) have already forseen and provided tools (reference distribution + link function) exactly for a DV which is fractions summing to a constant; for such a DV isn't uncommon. 05.04.2013 2:32, Subscribe SAS-L
Anonymous пишет:
Would the OP mind explaining exactly what the DV is? It might help shed light on how to proceed, notwithstanding other interesting solutions. Speaking of which, the idea of converting the probs to logits is intriguing. I am generally in favor of the logit scale because of its properties, but the fact that it addresses the linear dependence is an added bonus here. Anyway, I would appreciate if the OP would be willing to tell us more about the DV. Ryan On Apr 4, 2013, at 4:05 AM, Kirill Orlov [hidden email] wrote:Consider you have a between-within design: several between-subject groups and several (3 or more) repeated measures (= within-subject) trials. It's all very classic and typical. The nuance, however, is that the values for every subject sum across the repeated levels to a **constant**. This is because the data are complementary, i.e. percentages of fractions, so, in this case they sum to 100 for every individual. For example, with 3 RM levels, a respondent's data is like 30%, 22%, 48% (sum=100); for another respondent 25%, 33%, 42% (sum=100). I know that I can analyze between-groups X repeated-measures count data via Generalized Estimating Equations procedure. By I doubt in this case because the values *sum to a constant*, they are complementary fractions; they are not counts of successes in repeated independent trials! Can I analyze such data in SPSS and how? Thanks.===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Use either GEE generalized Estimating Equation or Genealized linear models Your model is for COUNTS – recommend negative binomial with log link. The % should be converted to number of hours as response variable, participants is a subject variable, and category [work, meal, etc is rpeated measures]. Model has category as predictor and shold also be entered under fixed. Have fun Diana On 05/04/2013 04:43, "Kirill Orlov" <kior@...> wrote: Ryan, Emeritus Professor Diana Kornbrot email: d.e.kornbrot@... web: http://dianakornbrot.wordpress.com/ Work Department of Psychology School of Life and Medical Sciences University of Hertfordshire College Lane, Hatfield, Hertfordshire AL10 9AB, UK voice: +44 (0) 170 728 4626 Home 19 Elmhurst Avenue London N2 0LT, UK voice: +44 (0) 208 444 2081 mobile: +44 (0) 740 318 1612 |
Kirill,
This is a question that has come up on cross-validated a few times, see here for an example http://stats.stackexchange.com/q/24187/1036. A frequent recommendation seems to be a Stata library by the name of dirifit (see http://maartenbuis.nl/software/dirifit.html) or a synonymous R library DirichletReg. I do not know if the current GENLIN procedure can be wrangled to produce the same model. A quick perusing of some of the materials floating around the web related to said packages suggest a quick and dirty way is to fit separate beta regression models for each of the subsets - although that doesn't constrain the total to be 1. (Smithson & Verkuilen (2006) A Better Lemon Squeezer has supplementary material on how to fit beta regression models in SPSS.) Count data models are not appropriate here because of the ceiling effect. You can look up ways around that (like censored Poisson regression or Tobit models) - but those ignore the compositional nature of the data here. Another suggestion on the CV site recommends multinomial models - which I see the relationship but I don't quite understand how you turn this into discrete outcomes to feed into a multinomial logistic regression. Looks like you will have some (hopefully fun) reading to do to sort through all these disparate recommendations! Andy |
In reply to this post by Kirill Orlov
Correspondence analysis
is designed for compositional data.
Art Kendall Social Research ConsultantsOn 4/4/2013 4:07 AM, Kirill Orlov [via SPSSX Discussion] wrote: Consider you have a between-within design: several between-subject groups and several (3 or more) repeated measures (= within-subject) trials. It's all very classic and typical. The nuance, however, is that the values for every subject sum across the repeated levels to a **constant**. This is because the data are complementary, i.e. percentages of fractions, so, in this case they sum to 100 for every individual. For example, with 3 RM levels, a respondent's data is like 30%, 22%, 48% (sum=100); for another respondent 25%, 33%, 42% (sum=100).
Art Kendall
Social Research Consultants |
In reply to this post by Kornbrot, Diana
Thank you for all your answers that came so far. I haven't read them
carefully yet.
But here is what meanwhile came to my own mind after a little meditation. It is very simple: I just thought that (PLEASE correct me if I'm mistaken!) that there is no problem at all. The constraint that repeated-measures sum to a constant within individuals *does not* refute using common RM-ANOVA model. If only ANOVA distributional and spericity assumptions hold, no need for GEE or other procedures arise at all. Let's have some data: between-subject grouping factor GROUP and within-subject factor RM with 3 levels summing up to a constant (100). group rm1 rm2 rm3 sum 1 50 30 20 100 1 24 42 34 100 1 34 16 50 100 1 61 28 11 100 1 46 46 8 100 1 23 18 59 100 2 55 22 23 100 2 27 39 34 100 2 44 36 20 100 2 28 40 32 100 Run usual Repeated-measures ANOVA: GLM rm1 rm2 rm3 BY group /WSFACTOR= rm 3 /METHOD= SSTYPE(3) /WSDESIGN= rm /DESIGN= group. Summing up to a constant just means that upon collapsing the RM levels, all respondents appear to be the same: there exist no between-subject variation at all, or in other words, the "respondent ID" factor's effect is zero. Hence, in the table "Tests of Between-Subjects Effects" Error term is zero. Also, the effect of GROUP factor is zero too - of course, because the constant sum (100) in our data is the same for both groups 1 and 2. Now, - I'd ask you, - does these results invalid in any way? Do we say that ANOVA is misused when an error variation - which is left unxplained - is zero? I would not say it, and so RM-ANOVA *is* an appropriate method for fractions (i.e values summing up to a constant). If I'm wrong, please explain me why. |
No, you are a bit wrong in concluding that there is no problem.
If you think of the situation of dummy variables, you have provided an "extra" dummy, like entering dichotomies for both Male and Female. There is redundancy. There is over-parameterization. There is, somewhere, the loss of one d.f. for RM when you perform any analysis. A "fixed" zero-effect is not the same as a randomly occurring near-zero-effect. You retain full information (in the statistical sense) if you set up your model to leave out one of the categories, just as one would for any dummy coding. The others will be most "independent" if you omit the category that has the greatest variance. The drawback might lie in the ease of interpreting your results. -- Rich Ulrich Date: Fri, 5 Apr 2013 19:36:04 +0400 From: [hidden email] Subject: Re: Repeated measures analysis of fractions summing to a constant To: [hidden email] Thank you for all your answers that came so far. I haven't read them carefully yet. But here is what meanwhile came to my own mind after a little meditation. It is very simple: I just thought that (PLEASE correct me if I'm mistaken!) that there is no problem at all. The constraint that repeated-measures sum to a constant within individuals *does not* refute using common RM-ANOVA model. If only ANOVA distributional and spericity assumptions hold, no need for GEE or other procedures arise at all. Let's have some data: between-subject grouping factor GROUP and within-subject factor RM with 3 levels summing up to a constant (100). group rm1 rm2 rm3 sum 1 50 30 20 100 1 24 42 34 100 1 34 16 50 100 1 61 28 11 100 1 46 46 8 100 1 23 18 59 100 2 55 22 23 100 2 27 39 34 100 2 44 36 20 100 2 28 40 32 100 Run usual Repeated-measures ANOVA: GLM rm1 rm2 rm3 BY group /WSFACTOR= rm 3 /METHOD= SSTYPE(3) /WSDESIGN= rm /DESIGN= group. Summing up to a constant just means that upon collapsing the RM levels, all respondents appear to be the same: there exist no between-subject variation at all, or in other words, the "respondent ID" factor's effect is zero. Hence, in the table "Tests of Between-Subjects Effects" Error term is zero. Also, the effect of GROUP factor is zero too - of course, because the constant sum (100) in our data is the same for both groups 1 and 2. Now, - I'd ask you, - does these results invalid in any way? Do we say that ANOVA is misused when an error variation - which is left unxplained - is zero? I would not say it, and so RM-ANOVA *is* an appropriate method for fractions (i.e values summing up to a constant). If I'm wrong, please explain me why. |
Administrator
|
The articles by Shaffer (1981) and Greer & Dunlap (1997) say there is no problem. (I've sent both of them to Rich off-list.) Meanwhile, here are some relevant excerpts from Greer & Dunlap.
"Periodically, researchers in the behavioral sciences analyze measures that are ipsative. Ipsative measures are those for which the mean for each level of one or more variables (usually the participants) equals the same constant. Data with these constraints are also referred to as allocated observations (Shaffer, 1981) and compositional data (when the scores are proportions; Aitchison, 1986)." (p. 200) "The general conclusion is clear: Repeated measures ANOVA with ipsative data works quite well. Although it is known that techniques such as factor analysis are badly affected by ipsative scores, ANOVA is not, particularly if the epsilon correction for nonuniform variance-covariance matrices is used. Fortunately, the epsilon correction for repeated measures ANOVA is readily obtainable from most major computer statistical packages. Therefore, it is hoped that readers will no longer look with suspicion upon ANOVAs with ipsative data, even though the presence of sums of squares equal to zero is disconcerting." (p. 206) Reference Greer T, Dunlap WP. (1997). Analysis of variance with ipsative measures. Psychological Methods, 2(2), 200-207. HTH.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Okay. I pointed out that there was a loss of d.f. The cite from
G&D says the analysis is okay if you use the epsilon correction for repeated measures. Now, remember that the epsilon correction makes use of a reduction of d.f. I haven't yet looked at what Bruce sent me, but I'm willing to accept that the epsilon correction reduces the d.f. appropriately, either 100% of what is needed, or nearly 100%. Epsilon corrections could be large enough. It has been a long time since I looked at the epsilon correction, but I do remember that descriptions mentioned cells with zero or near-zero for variances. In my recollection, what was discussed were models where zeroes were apt to be due to "basement" or "ceiling" scoring effects. The d.f. corrections were not always small, so I expect that they could work here. (If I were reporting the data, I would be careful to report the epsilons as evidence that the model has accounted properly for the d.f.) -- Rich Ulrich > Date: Fri, 5 Apr 2013 12:36:24 -0700 > From: [hidden email] > Subject: Re: Repeated measures analysis of fractions summing to a constant > To: [hidden email] > > The articles by Shaffer (1981) and Greer & Dunlap (1997) say there is no > problem. (I've sent both of them to Rich off-list.) Meanwhile, here are > some relevant excerpts from Greer & Dunlap. > > "Periodically, researchers in the behavioral sciences analyze measures that > are ipsative. Ipsative measures are those for which the mean for each level > of one or more variables (usually the participants) equals the same > constant. Data with these constraints are also referred to as allocated > observations (Shaffer, 1981) and compositional data (when the scores are > proportions; Aitchison, 1986)." (p. 200) > > "The general conclusion is clear: Repeated measures ANOVA with ipsative data > works quite well. Although it is known that techniques such as factor > analysis are badly affected by ipsative scores, ANOVA is not, particularly > if the epsilon correction for nonuniform variance-covariance matrices is > used. Fortunately, the epsilon correction for repeated measures ANOVA is > readily obtainable from most major computer statistical packages. > Therefore, it is hoped that readers will no longer look with suspicion upon > ANOVAs with ipsative data, even though the presence of sums of squares equal > to zero is disconcerting." (p. 206) > > Reference > > Greer T, Dunlap WP. (1997). Analysis of variance with ipsative measures. > Psychological Methods, 2(2), 200-207. > > HTH. > > > > Rich Ulrich-2 wrote > > No, you are a bit wrong in concluding that there is no problem. > > > > If you think of the situation of dummy variables, you have provided > > an "extra" dummy, like entering dichotomies for both Male and Female. > > There is redundancy. There is over-parameterization. There is, > > somewhere, the loss of one d.f. for RM when you perform any analysis. > > A "fixed" zero-effect is not the same as a randomly occurring > > near-zero-effect. > > > > You retain full information (in the statistical sense) if you set up your > > model to leave out one of the categories, just as one would for any > > dummy coding. The others will be most "independent" if you omit the > > category that has the greatest variance. The drawback might lie in the > > ease of interpreting your results. > > > > -- > > Rich Ulrich > > > > > > Date: Fri, 5 Apr 2013 19:36:04 +0400 > > From: > > > kior@ > > > Subject: Re: Repeated measures analysis of fractions summing to a constant > > To: > > > SPSSX-L@.UGA > > > > > > > > > > > > > > > > > Thank you for all your answers that came so far. I haven't read them > > carefully yet. > > > > > > > > But here is what meanwhile came to my own mind after a little > > meditation. > > > > It is very simple: I just thought that (PLEASE correct me if I'm > > mistaken!) that there is no problem at all. The constraint that > > repeated-measures sum to a constant within individuals *does not* > > refute using common RM-ANOVA model. If only ANOVA distributional and > > spericity assumptions hold, no need for GEE or other procedures > > arise at all. > > > > > > > > Let's have some data: between-subject grouping factor GROUP and > > within-subject factor RM with 3 levels summing up to a constant > > (100). > > > > > > > > group rm1 > > rm2 rm3 sum > > > > > > > > 1 50 30 20 100 > > > > 1 24 42 34 100 > > > > 1 34 16 50 100 > > > > 1 61 28 11 100 > > > > 1 46 46 8 100 > > > > 1 23 18 59 100 > > > > 2 55 22 23 100 > > > > 2 27 39 34 100 > > > > 2 44 36 20 100 > > > > 2 28 40 32 100 > > > > > > > > Run usual Repeated-measures ANOVA: > > > > > > > > GLM rm1 rm2 rm3 BY group > > > > /WSFACTOR= rm 3 > > > > /METHOD= SSTYPE(3) > > > > /WSDESIGN= rm > > > > /DESIGN= group. > > > > > > > > Summing up to a constant just means that upon collapsing the RM > > levels, all respondents appear to be the same: there exist no > > between-subject variation at all, or in other words, the "respondent > > ID" factor's effect is zero. Hence, in the table "Tests of > > Between-Subjects Effects" Error term is zero. Also, the effect of > > GROUP factor is zero too - of course, because the constant sum (100) > > in our data is the same for both groups 1 and 2. > > > > > > > > Now, - I'd ask you, - does these results invalid in any way? Do we > > say that ANOVA is misused when an error variation - which is left > > unxplained - is zero? I would not say it, and so RM-ANOVA *is* an > > appropriate method for fractions (i.e values summing up to a > > constant). If I'm wrong, please explain me why. > |
In reply to this post by Kirill Orlov
'For example, the DV might be "how do you spend your typical day?" question
Work __% of time Meals__% of time ... [Please check that your answers sum to 100%]' Isn't this what Correspondence Analysis is for? http://sru.soc.surrey.ac.uk/SRU7.html A related approach: http://igitur-archive.library.uu.nl/fss/2007-1004-201532/heijden_van_der_88_the_analysis.pdf Ecologists have that kind of data very often, when they compare transects (what percentage of plants sampled are of what species, etc). Check Legendre & Legendre's Numerical Ecology. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |