T-test or ANOVA with a percentage dependent variable

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

T-test or ANOVA with a percentage dependent variable

凌楚定

Dear list,

 

Now I have a question about t-test or ANOVA:

 

A is the independent variable and has two levels (0,1). B is the dependent variable, each of whose values is a percentage. By means of SPSS or other instruments, how to compare the means of B under different levels of A, given such a type of data?

 

My concern is that whether t-test or ANOVA still applies due to the percentage dependent variable which does not follow the normal distribution?

 

I would be grateful if any of you can give me some suggestions. Thanks!

 

Chu-Ding LING
Ph.D. Student of Business Administration
School of Management
Zhejiang University
Reply | Threaded
Open this post in threaded view
|

Re: T-test or ANOVA with a percentage dependent variable

Bruce Weaver
Administrator
For OLS models, the main assumptions are that the *errors* are independently and identically distributed as normal with mean = 0 and some variance.  Of those assumptions, normality is arguably the least important.  

What is the range of percentages in your data?  If they are not too extreme (e.g., if they are in the 20-80% range), you'll probably get a fairly decent model with a t-test or ANOVA.  

But, there are alternatives.  For example, if you have the numerators and denominators for the percentages, you could use GENLIN to run what I would call a binomial (as opposed to a binary) logistic regression.  I don't have SPSS on this computer, but I believe the syntax would look something like this:


GENLIN Numerator OF Denominator BY { list of factors } WITH { list of covariates }
  /MODEL { list of terms in the model} INTERCEPT=YES DISTRIBUTION=BINOMIAL LINK=LOGIT
  /MISSING CLASSMISSING=EXCLUDE
  /PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION (EXPONENTIATED).

Notice that the outcome for this model is not a single variable, but is specified as Numerator OF Denominator (or Events-Var OF Trials-Var, as it says in the FM).  The Exp(B) column in the table of coefficients yields odds ratios for this type of model.

HTH.


凌楚定 wrote
Dear list,



Now I have a question about t-test or ANOVA:



A is the independent variable and has two levels (0,1). B is the dependent
variable, each of whose values is a percentage. By means of SPSS or other
instruments, how to compare the means of B under different levels of A,
given such a type of data?



*My concern is that whether t-test or ANOVA still applies due to the
percentage dependent variable which does not follow the normal
distribution?*



I would be grateful if any of you can give me some suggestions. Thanks!

------------------------------
Chu-Ding LING
Ph.D. Student of Business Administration
School of Management
Zhejiang University
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: T-test or ANOVA with a percentage dependent variable

Rich Ulrich
I will add a little bit to what Bruce says here about "independently
and identically distributed as normal" in regards to proportions.

Proportions in the 20-80% range do have the same variances, very
nearly, *when* they have the same denominator.  That is why a t-test
on two groups across such proportions would give a fine test.  For a
wider range of proportions, a good test will use a variance-stabilizing
transformation or the equivalent - like, a logit or probit.  I think that is
the effect of Bruce's GENLIN solution.

But the formula that says the variances are nearly equal relies on that
equal N.  If the N's vary a lot, then the expectations for the variances
vary, too.  When the N's vary a lot, then there is probably NO way to
get an ideal test, and the problem becomes one of finding the best
approximation. 

When the N's vary a lot, the better tests will make use of the counts
that make up the proportions (testing with Maximum Likelihood, I think).


On the other hand, a *lot* of people regularly do ignore this precaution,
and it does not necessarily hurt them much -- When an effect is huge and
systematic, it will show up almost regardless of such problems.  Assuming
that outliers with small N's do not create or uncreate an apparent effect,
the problem of unequal variances shows itself (say, if you were to do
Monte Carlo testing of the circumstances) as a loss of degrees of freedom
for the t-test.  So, the nominal effect is not large, assuming you still have
a moderate d.f.  so that the cut-off is still approximately 2.0 or so.

--
Rich Ulrich

> Date: Sun, 11 May 2014 18:21:37 -0700

> From: [hidden email]
> Subject: Re: T-test or ANOVA with a percentage dependent variable
> To: [hidden email]
>
> For OLS models, the main assumptions are that the *errors* are independently
> and identically distributed as normal with mean = 0 and some variance. Of
> those assumptions, normality is arguably the least important.
>
> What is the range of percentages in your data? If they are not too extreme
> (e.g., if they are in the 20-80% range), you'll probably get a fairly decent
> model with a t-test or ANOVA.
>
> But, there are alternatives. For example, if you have the numerators and
> denominators for the percentages, you could use GENLIN to run what I would
> call a binomial (as opposed to a binary) logistic regression. I don't have
> SPSS on this computer, but I believe the syntax would look something like
> this:
>
>
> GENLIN Numerator OF Denominator BY { list of factors } WITH { list of
> covariates }
> /MODEL { list of terms in the model} INTERCEPT=YES DISTRIBUTION=BINOMIAL
> LINK=LOGIT
> /MISSING CLASSMISSING=EXCLUDE
> /PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION (EXPONENTIATED).
>
> Notice that the outcome for this model is not a single variable, but is
> specified as Numerator OF Denominator (or Events-Var OF Trials-Var, as it
> says in the FM). The Exp(B) column in the table of coefficients yields odds
> ratios for this type of model.
>
> HTH.
>
>
>
> 凌楚定 wrote
> > Dear list,
> >
> >
> >
> > Now I have a question about t-test or ANOVA:
> >
> >
> >
> > A is the independent variable and has two levels (0,1). B is the dependent
> > variable, each of whose values is a percentage. By means of SPSS or other
> > instruments, how to compare the means of B under different levels of A,
> > given such a type of data?
> >
> >
> >
> > *My concern is that whether t-test or ANOVA still applies due to the
> > percentage dependent variable which does not follow the normal
> > distribution?*
> >
> >
> >
> > I would be grateful if any of you can give me some suggestions. Thanks!
> >
...

Reply | Threaded
Open this post in threaded view
|

Re: T-test or ANOVA with a percentage dependent variable

Kornbrot, Diana
In reply to this post by Bruce Weaver
Logistic regression is the most accurate method for proportions in most situations.some situations are better served by Poisson. Normals based methods are NEVER best.
Why use approximations when more exact method is available?

Use regression > logistic if all predictors between subject
Use mixed> > logistic if at least one predictor is within

SPSS worked hard to provide correct methods for proportions
It seems perverse to encourage people to use normal based methods, t, anova for proportions
Evidence that they are robust, even whn .2<p<.8 is poor especially for unequal n, as noted in this thread.

As a defender of good practice in statistics to advance science, I am shocked by recommendations of inappropriate methods.
best

diana
On 12 May 2014, at 02:21, Bruce Weaver <[hidden email]> wrote:

> For OLS models, the main assumptions are that the *errors* are independently
> and identically distributed as normal with mean = 0 and some variance.  Of
> those assumptions, normality is arguably the least important.
>
> What is the range of percentages in your data?  If they are not too extreme
> (e.g., if they are in the 20-80% range), you'll probably get a fairly decent
> model with a t-test or ANOVA.
>
> But, there are alternatives.  For example, if you have the numerators and
> denominators for the percentages, you could use GENLIN to run what I would
> call a binomial (as opposed to a binary) logistic regression.  I don't have
> SPSS on this computer, but I believe the syntax would look something like
> this:
>
>
> GENLIN Numerator OF Denominator BY { list of factors } WITH { list of
> covariates }
>  /MODEL { list of terms in the model} INTERCEPT=YES DISTRIBUTION=BINOMIAL
> LINK=LOGIT
>  /MISSING CLASSMISSING=EXCLUDE
>  /PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION (EXPONENTIATED).
>
> Notice that the outcome for this model is not a single variable, but is
> specified as Numerator OF Denominator (or Events-Var OF Trials-Var, as it
> says in the FM).  The Exp(B) column in the table of coefficients yields odds
> ratios for this type of model.
>
> HTH.
>
>
>
> 凌楚定 wrote
>> Dear list,
>>
>>
>>
>> Now I have a question about t-test or ANOVA:
>>
>>
>>
>> A is the independent variable and has two levels (0,1). B is the dependent
>> variable, each of whose values is a percentage. By means of SPSS or other
>> instruments, how to compare the means of B under different levels of A,
>> given such a type of data?
>>
>>
>>
>> *My concern is that whether t-test or ANOVA still applies due to the
>> percentage dependent variable which does not follow the normal
>> distribution?*
>>
>>
>>
>> I would be grateful if any of you can give me some suggestions. Thanks!
>>
>> ------------------------------
>> Chu-Ding LING
>> Ph.D. Student of Business Administration
>> School of Management
>> Zhejiang University
>
>
>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/T-test-or-ANOVA-with-a-percentage-dependent-variable-tp5725967p5725968.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

___________
Professor Diana Kornbrot
Work
University of Hertfordshire
College Lane, Hatfield, Hertfordshire AL10 9AB, UK
+44 (0) 170 728 4626
[hidden email]
http://dianakornbrot.wordpress.com/
 http://go.herts.ac.uk/Diana_Kornbrot
skype:   kornbrotme
Home
19 Elmhurst Avenue
London N2 0LT, UK
 +44 (0) 208 444 2081

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: T-test or ANOVA with a percentage dependent variable

Maguin, Eugene
In reply to this post by Rich Ulrich

Rich,

Would you please elaborate a bit on your comment that the equal N’s are important. Since it might be that I’m misunderstanding the situation, I’m assuming that you are referring to the situation where Bruce’s analysis suggestion is being used and the data consist of some sort of test or situation where the number of items or trials varies widely from person to person. The data consist of number of items/trials and number of ‘passes’. These types of data are something that I’ve never run across and I’m just completely unfamiliar with.

Thanks, Gene Maguin

 

 

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Rich Ulrich
Sent: Monday, May 12, 2014 2:26 AM
To: [hidden email]
Subject: Re: T-test or ANOVA with a percentage dependent variable

 

I will add a little bit to what Bruce says here about "independently
and identically distributed as normal" in regards to proportions.

Proportions in the 20-80% range do have the same variances, very
nearly, *when* they have the same denominator.  That is why a t-test
on two groups across such proportions would give a fine test.  For a
wider range of proportions, a good test will use a variance-stabilizing
transformation or the equivalent - like, a logit or probit.  I think that is
the effect of Bruce's GENLIN solution.

But the formula that says the variances are nearly equal relies on that
equal N.  If the N's vary a lot, then the expectations for the variances
vary, too.  When the N's vary a lot, then there is probably NO way to
get an ideal test, and the problem becomes one of finding the best
approximation. 

When the N's vary a lot, the better tests will make use of the counts
that make up the proportions (testing with Maximum Likelihood, I think).


On the other hand, a *lot* of people regularly do ignore this precaution,
and it does not necessarily hurt them much -- When an effect is huge and
systematic, it will show up almost regardless of such problems.  Assuming
that outliers with small N's do not create or uncreate an apparent effect,
the problem of unequal variances shows itself (say, if you were to do
Monte Carlo testing of the circumstances) as a loss of degrees of freedom
for the t-test.  So, the nominal effect is not large, assuming you still have
a moderate d.f.  so that the cut-off is still approximately 2.0 or so.

--
Rich Ulrich

> Date: Sun, 11 May 2014 18:21:37 -0700
> From: [hidden email]
> Subject: Re: T-test or ANOVA with a percentage dependent variable
> To: [hidden email]
>
> For OLS models, the main assumptions are that the *errors* are independently
> and identically distributed as normal with mean = 0 and some variance. Of
> those assumptions, normality is arguably the least important.
>
> What is the range of percentages in your data? If they are not too extreme
> (e.g., if they are in the 20-80% range), you'll probably get a fairly decent
> model with a t-test or ANOVA.
>
> But, there are alternatives. For example, if you have the numerators and
> denominators for the percentages, you could use GENLIN to run what I would
> call a binomial (as opposed to a binary) logistic regression. I don't have
> SPSS on this computer, but I believe the syntax would look something like
> this:
>
>
> GENLIN Numerator OF Denominator BY { list of factors } WITH { list of
> covariates }
> /MODEL { list of terms in the model} INTERCEPT=YES DISTRIBUTION=BINOMIAL
> LINK=LOGIT
> /MISSING CLASSMISSING=EXCLUDE
> /PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION (EXPONENTIATED).
>
> Notice that the outcome for this model is not a single variable, but is
> specified as Numerator OF Denominator (or Events-Var OF Trials-Var, as it
> says in the FM). The Exp(B) column in the table of coefficients yields odds
> ratios for this type of model.
>
> HTH.
>
>
>
>
凌楚定 wrote
> > Dear list,
> >
> >
> >
> > Now I have a question about t-test or ANOVA:
> >
> >
> >
> > A is the independent variable and has two levels (0,1). B is the dependent
> > variable, each of whose values is a percentage. By means of SPSS or other
> > instruments, how to compare the means of B under different levels of A,
> > given such a type of data?
> >
> >
> >
> > *My concern is that whether t-test or ANOVA still applies due to the
> > percentage dependent variable which does not follow the normal
> > distribution?*
> >
> >
> >
> > I would be grateful if any of you can give me some suggestions. Thanks!
> >
...

Reply | Threaded
Open this post in threaded view
|

Re: T-test or ANOVA with a percentage dependent variable

Bruce Weaver
Administrator
In reply to this post by Kornbrot, Diana
Given that Diana was "shocked" by my post, I feel somewhat compelled to defend my honour.  ;-)

I did not intend "you'll probably get a fairly decent model with a t-test or ANOVA" (if the proportions are not too extreme) to be read as a ringing endorsement  of OLS methods for analysis of proportions.  

In hindsight, I might have said "there are other BETTER options" rather than just "there are other options".  FWIW, I like the binomial logistic regression (via GENLIN) method that I suggested if the numerators & denominators are available.  


Kornbrot, Diana wrote
Logistic regression is the most accurate method for proportions in most situations.some situations are better served by Poisson. Normals based methods are NEVER best.
Why use approximations when more exact method is available?

Use regression > logistic if all predictors between subject
Use mixed> > logistic if at least one predictor is within

SPSS worked hard to provide correct methods for proportions
It seems perverse to encourage people to use normal based methods, t, anova for proportions
Evidence that they are robust, even whn .2<p<.8 is poor especially for unequal n, as noted in this thread.

As a defender of good practice in statistics to advance science, I am shocked by recommendations of inappropriate methods.
best

diana
On 12 May 2014, at 02:21, Bruce Weaver <[hidden email]> wrote:

> For OLS models, the main assumptions are that the *errors* are independently
> and identically distributed as normal with mean = 0 and some variance.  Of
> those assumptions, normality is arguably the least important.
>
> What is the range of percentages in your data?  If they are not too extreme
> (e.g., if they are in the 20-80% range), you'll probably get a fairly decent
> model with a t-test or ANOVA.
>
> But, there are alternatives.  For example, if you have the numerators and
> denominators for the percentages, you could use GENLIN to run what I would
> call a binomial (as opposed to a binary) logistic regression.  I don't have
> SPSS on this computer, but I believe the syntax would look something like
> this:
>
>
> GENLIN Numerator OF Denominator BY { list of factors } WITH { list of
> covariates }
>  /MODEL { list of terms in the model} INTERCEPT=YES DISTRIBUTION=BINOMIAL
> LINK=LOGIT
>  /MISSING CLASSMISSING=EXCLUDE
>  /PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION (EXPONENTIATED).
>
> Notice that the outcome for this model is not a single variable, but is
> specified as Numerator OF Denominator (or Events-Var OF Trials-Var, as it
> says in the FM).  The Exp(B) column in the table of coefficients yields odds
> ratios for this type of model.
>
> HTH.
>
>
>
> 凌楚定 wrote
>> Dear list,
>>
>>
>>
>> Now I have a question about t-test or ANOVA:
>>
>>
>>
>> A is the independent variable and has two levels (0,1). B is the dependent
>> variable, each of whose values is a percentage. By means of SPSS or other
>> instruments, how to compare the means of B under different levels of A,
>> given such a type of data?
>>
>>
>>
>> *My concern is that whether t-test or ANOVA still applies due to the
>> percentage dependent variable which does not follow the normal
>> distribution?*
>>
>>
>>
>> I would be grateful if any of you can give me some suggestions. Thanks!
>>
>> ------------------------------
>> Chu-Ding LING
>> Ph.D. Student of Business Administration
>> School of Management
>> Zhejiang University
>
>
>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/T-test-or-ANOVA-with-a-percentage-dependent-variable-tp5725967p5725968.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

___________
Professor Diana Kornbrot
Work
University of Hertfordshire
College Lane, Hatfield, Hertfordshire AL10 9AB, UK
+44 (0) 170 728 4626
[hidden email]
http://dianakornbrot.wordpress.com/
 http://go.herts.ac.uk/Diana_Kornbrot
skype:   kornbrotme
Home
19 Elmhurst Avenue
London N2 0LT, UK
 +44 (0) 208 444 2081

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: T-test or ANOVA with a percentage dependent variable

Kornbrot, Diana
Bruce's honour well intact, he's a great asset to SPSS list

I also like  binomial logistic via genlin
Mentioned MIXED because many psychologists think they have to fall back on anova when they have repeated measures
Am trying to promulgate the good things available in current SPSS

best

Diana
On 12 May 2014, at 15:10, Bruce Weaver <[hidden email]> wrote:

> Given that Diana was "shocked" by my post, I feel somewhat compelled to
> defend my honour.  ;-)
>
> I did not intend "you'll *probably* get a *fairly decent* model with a
> t-test or ANOVA" (if the proportions are not too extreme) to be read as a
> ringing endorsement  of OLS methods for analysis of proportions.
>
> In hindsight, I might have said "there are other BETTER options" rather than
> just "there are other options".  FWIW, I like the binomial logistic
> regression (via GENLIN) method that I suggested if the numerators &
> denominators are available.
>
>
>
> Kornbrot, Diana wrote
>> Logistic regression is the most accurate method for proportions in most
>> situations.some situations are better served by Poisson. Normals based
>> methods are NEVER best.
>> Why use approximations when more exact method is available?
>>
>> Use regression > logistic if all predictors between subject
>> Use mixed> > logistic if at least one predictor is within
>>
>> SPSS worked hard to provide correct methods for proportions
>> It seems perverse to encourage people to use normal based methods, t,
>> anova for proportions
>> Evidence that they are robust, even whn .2&lt;p&lt;.8 is poor especially
>> for unequal n, as noted in this thread.
>>
>> As a defender of good practice in statistics to advance science, I am
>> shocked by recommendations of inappropriate methods.
>> best
>>
>> diana
>> On 12 May 2014, at 02:21, Bruce Weaver &lt;
>
>> bruce.weaver@
>
>> &gt; wrote:
>>
>>> For OLS models, the main assumptions are that the *errors* are
>>> independently
>>> and identically distributed as normal with mean = 0 and some variance.
>>> Of
>>> those assumptions, normality is arguably the least important.
>>>
>>> What is the range of percentages in your data?  If they are not too
>>> extreme
>>> (e.g., if they are in the 20-80% range), you'll probably get a fairly
>>> decent
>>> model with a t-test or ANOVA.
>>>
>>> But, there are alternatives.  For example, if you have the numerators and
>>> denominators for the percentages, you could use GENLIN to run what I
>>> would
>>> call a binomial (as opposed to a binary) logistic regression.  I don't
>>> have
>>> SPSS on this computer, but I believe the syntax would look something like
>>> this:
>>>
>>>
>>> GENLIN Numerator OF Denominator BY { list of factors } WITH { list of
>>> covariates }
>>> /MODEL { list of terms in the model} INTERCEPT=YES DISTRIBUTION=BINOMIAL
>>> LINK=LOGIT
>>> /MISSING CLASSMISSING=EXCLUDE
>>> /PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION (EXPONENTIATED).
>>>
>>> Notice that the outcome for this model is not a single variable, but is
>>> specified as Numerator OF Denominator (or Events-Var OF Trials-Var, as it
>>> says in the FM).  The Exp(B) column in the table of coefficients yields
>>> odds
>>> ratios for this type of model.
>>>
>>> HTH.
>>>
>>>
>>>
>>> 凌楚定 wrote
>>>> Dear list,
>>>>
>>>>
>>>>
>>>> Now I have a question about t-test or ANOVA:
>>>>
>>>>
>>>>
>>>> A is the independent variable and has two levels (0,1). B is the
>>>> dependent
>>>> variable, each of whose values is a percentage. By means of SPSS or
>>>> other
>>>> instruments, how to compare the means of B under different levels of A,
>>>> given such a type of data?
>>>>
>>>>
>>>>
>>>> *My concern is that whether t-test or ANOVA still applies due to the
>>>> percentage dependent variable which does not follow the normal
>>>> distribution?*
>>>>
>>>>
>>>>
>>>> I would be grateful if any of you can give me some suggestions. Thanks!
>>>>
>>>> ------------------------------
>>>> Chu-Ding LING
>>>> Ph.D. Student of Business Administration
>>>> School of Management
>>>> Zhejiang University
>>>
>>>
>>>
>>>
>>>
>>> -----
>>> --
>>> Bruce Weaver
>>>
>
>> bweaver@
>
>>> http://sites.google.com/a/lakeheadu.ca/bweaver/
>>>
>>> "When all else fails, RTFM."
>>>
>>> NOTE: My Hotmail account is not monitored regularly.
>>> To send me an e-mail, please use the address shown above.
>>>
>>> --
>>> View this message in context:
>>> http://spssx-discussion.1045642.n5.nabble.com/T-test-or-ANOVA-with-a-percentage-dependent-variable-tp5725967p5725968.html
>>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>>
>>> =====================
>>> To manage your subscription to SPSSX-L, send a message to
>>>
>
>> LISTSERV@.UGA
>
>> (not to SPSSX-L), with no body text except the
>>> command. To leave the list, send the command
>>> SIGNOFF SPSSX-L
>>> For a list of commands to manage subscriptions, send the command
>>> INFO REFCARD
>>
>> ___________
>> Professor Diana Kornbrot
>> Work
>> University of Hertfordshire
>> College Lane, Hatfield, Hertfordshire AL10 9AB, UK
>> +44 (0) 170 728 4626
>
>> d.e.kornbrot@.ac
>
>> http://dianakornbrot.wordpress.com/
>> http://go.herts.ac.uk/Diana_Kornbrot
>> skype:   kornbrotme
>> Home
>> 19 Elmhurst Avenue
>> London N2 0LT, UK
>> +44 (0) 208 444 2081
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>
>> LISTSERV@.UGA
>
>> (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>
>
>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context: http://spssx-discussion.1045642.n5.nabble.com/T-test-or-ANOVA-with-a-percentage-dependent-variable-tp5725967p5725980.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

___________
Professor Diana Kornbrot
Work
University of Hertfordshire
College Lane, Hatfield, Hertfordshire AL10 9AB, UK
+44 (0) 170 728 4626
[hidden email]
http://dianakornbrot.wordpress.com/
 http://go.herts.ac.uk/Diana_Kornbrot
skype:   kornbrotme
Home
19 Elmhurst Avenue
London N2 0LT, UK
 +44 (0) 208 444 2081

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: T-test or ANOVA with a percentage dependent variable

Jon K Peck
In reply to this post by Rich Ulrich
There is also the STATS PROPOR REGR extension command that fits a beta distribution to proportions and might work well here.  It appears under Generalized Linear Models once installed and, of course, does not require numerator and denominator information.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Rich Ulrich <[hidden email]>
To:        [hidden email],
Date:        05/12/2014 12:27 AM
Subject:        Re: [SPSSX-L] T-test or ANOVA with a percentage dependent variable
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




I will add a little bit to what Bruce says here about "independently
and identically distributed as normal" in regards to proportions.

Proportions in the 20-80% range do have the same variances, very
nearly, *when* they have the same denominator.  That is why a t-test
on two groups across such proportions would give a fine test.  For a
wider range of proportions, a good test will use a variance-stabilizing
transformation or the equivalent - like, a logit or probit.  I think that is
the effect of Bruce's GENLIN solution.

But the formula that says the variances are nearly equal relies on that
equal N.  If the N's vary a lot, then the expectations for the variances
vary, too.  When the N's vary a lot, then there is probably NO way to
get an ideal test, and the problem becomes one of finding the best
approximation.  

When the N's vary a lot, the better tests will make use of the counts
that make up the proportions (testing with Maximum Likelihood, I think).


On the other hand, a *lot* of people regularly do ignore this precaution,
and it does not necessarily hurt them much -- When an effect is huge and
systematic, it will show up almost regardless of such problems.  Assuming
that outliers with small N's do not create or uncreate an apparent effect,
the problem of unequal variances shows itself (say, if you were to do
Monte Carlo testing of the circumstances) as a loss of degrees of freedom
for the t-test.  So, the nominal effect is not large, assuming you still have
a moderate d.f.  so that the cut-off is still approximately 2.0 or so.

--
Rich Ulrich

> Date: Sun, 11 May 2014 18:21:37 -0700
> From: [hidden email]
> Subject: Re: T-test or ANOVA with a percentage dependent variable
> To: [hidden email]
>
> For OLS models, the main assumptions are that the *errors* are independently
> and identically distributed as normal with mean = 0 and some variance. Of
> those assumptions, normality is arguably the least important.
>
> What is the range of percentages in your data? If they are not too extreme
> (e.g., if they are in the 20-80% range), you'll probably get a fairly decent
> model with a t-test or ANOVA.
>
> But, there are alternatives. For example, if you have the numerators and
> denominators for the percentages, you could use GENLIN to run what I would
> call a binomial (as opposed to a binary) logistic regression. I don't have
> SPSS on this computer, but I believe the syntax would look something like
> this:
>
>
> GENLIN Numerator OF Denominator BY { list of factors } WITH { list of
> covariates }
> /MODEL { list of terms in the model} INTERCEPT=YES DISTRIBUTION=BINOMIAL
> LINK=LOGIT
> /MISSING CLASSMISSING=EXCLUDE
> /PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION (EXPONENTIATED).
>
> Notice that the outcome for this model is not a single variable, but is
> specified as Numerator OF Denominator (or Events-Var OF Trials-Var, as it
> says in the FM). The Exp(B) column in the table of coefficients yields odds
> ratios for this type of model.
>
> HTH.
>
>
>
> $BN?A?Dj (B wrote
> > Dear list,
> >
> >
> >
> > Now I have a question about t-test or ANOVA:
> >
> >
> >
> > A is the independent variable and has two levels (0,1). B is the dependent
> > variable, each of whose values is a percentage. By means of SPSS or other
> > instruments, how to compare the means of B under different levels of A,
> > given such a type of data?
> >
> >
> >
> > *My concern is that whether t-test or ANOVA still applies due to the
> > percentage dependent variable which does not follow the normal
> > distribution?*
> >
> >
> >
> > I would be grateful if any of you can give me some suggestions. Thanks!
> >
...

Reply | Threaded
Open this post in threaded view
|

Re: T-test or ANOVA with a percentage dependent variable

Ryan
In reply to this post by Bruce Weaver
OT: With some advanced knowledge in math and access to the numerators and denominators for both groups, one could calculate *by hand* the estimated regression coefficient (log-odds ratio) and the respective standard error and Wald statistic. 

It would be an excellent exercise for someone interested in learning the underlying mechanics of maximum likelihood estimation to work out this problem by hand as opposed to using the built-in LOGISTIC REGRESSION procedure.

Ryan


On Mon, May 12, 2014 at 10:10 AM, Bruce Weaver <[hidden email]> wrote:
Given that Diana was "shocked" by my post, I feel somewhat compelled to
defend my honour.  ;-)

I did not intend "you'll *probably* get a *fairly decent* model with a
t-test or ANOVA" (if the proportions are not too extreme) to be read as a
ringing endorsement  of OLS methods for analysis of proportions.

In hindsight, I might have said "there are other BETTER options" rather than
just "there are other options".  FWIW, I like the binomial logistic
regression (via GENLIN) method that I suggested if the numerators &
denominators are available.



Kornbrot, Diana wrote
> Logistic regression is the most accurate method for proportions in most
> situations.some situations are better served by Poisson. Normals based
> methods are NEVER best.
> Why use approximations when more exact method is available?
>
> Use regression > logistic if all predictors between subject
> Use mixed> > logistic if at least one predictor is within
>
> SPSS worked hard to provide correct methods for proportions
> It seems perverse to encourage people to use normal based methods, t,
> anova for proportions
> Evidence that they are robust, even whn .2<p<.8 is poor especially
> for unequal n, as noted in this thread.
>
> As a defender of good practice in statistics to advance science, I am
> shocked by recommendations of inappropriate methods.
> best
>
> diana
> On 12 May 2014, at 02:21, Bruce Weaver <

> bruce.weaver@

> > wrote:
>
>> For OLS models, the main assumptions are that the *errors* are
>> independently
>> and identically distributed as normal with mean = 0 and some variance.
>> Of
>> those assumptions, normality is arguably the least important.
>>
>> What is the range of percentages in your data?  If they are not too
>> extreme
>> (e.g., if they are in the 20-80% range), you'll probably get a fairly
>> decent
>> model with a t-test or ANOVA.
>>
>> But, there are alternatives.  For example, if you have the numerators and
>> denominators for the percentages, you could use GENLIN to run what I
>> would
>> call a binomial (as opposed to a binary) logistic regression.  I don't
>> have
>> SPSS on this computer, but I believe the syntax would look something like
>> this:
>>
>>
>> GENLIN Numerator OF Denominator BY { list of factors } WITH { list of
>> covariates }
>>  /MODEL { list of terms in the model} INTERCEPT=YES DISTRIBUTION=BINOMIAL
>> LINK=LOGIT
>>  /MISSING CLASSMISSING=EXCLUDE
>>  /PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION (EXPONENTIATED).
>>
>> Notice that the outcome for this model is not a single variable, but is
>> specified as Numerator OF Denominator (or Events-Var OF Trials-Var, as it
>> says in the FM).  The Exp(B) column in the table of coefficients yields
>> odds
>> ratios for this type of model.
>>
>> HTH.
>>
>>
>>
>> 凌楚定 wrote
>>> Dear list,
>>>
>>>
>>>
>>> Now I have a question about t-test or ANOVA:
>>>
>>>
>>>
>>> A is the independent variable and has two levels (0,1). B is the
>>> dependent
>>> variable, each of whose values is a percentage. By means of SPSS or
>>> other
>>> instruments, how to compare the means of B under different levels of A,
>>> given such a type of data?
>>>
>>>
>>>
>>> *My concern is that whether t-test or ANOVA still applies due to the
>>> percentage dependent variable which does not follow the normal
>>> distribution?*
>>>
>>>
>>>
>>> I would be grateful if any of you can give me some suggestions. Thanks!
>>>
>>> ------------------------------
>>> Chu-Ding LING
>>> Ph.D. Student of Business Administration
>>> School of Management
>>> Zhejiang University
>>
>>
>>
>>
>>
>> -----
>> --
>> Bruce Weaver
>>

> bweaver@

>> http://sites.google.com/a/lakeheadu.ca/bweaver/
>>
>> "When all else fails, RTFM."
>>
>> NOTE: My Hotmail account is not monitored regularly.
>> To send me an e-mail, please use the address shown above.
>>
>> --
>> View this message in context:
>> http://spssx-discussion.1045642.n5.nabble.com/T-test-or-ANOVA-with-a-percentage-dependent-variable-tp5725967p5725968.html
>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>>

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>
> ___________
> Professor Diana Kornbrot
> Work
> University of Hertfordshire
> College Lane, Hatfield, Hertfordshire AL10 9AB, UK
> <a href="tel:%2B44%20%280%29%20170%20728%204626" value="+441707284626">+44 (0) 170 728 4626

> d.e.kornbrot@.ac

> http://dianakornbrot.wordpress.com/
>  http://go.herts.ac.uk/Diana_Kornbrot
> skype:   kornbrotme
> Home
> 19 Elmhurst Avenue
> London N2 0LT, UK
>  <a href="tel:%2B44%20%280%29%20208%20444%202081" value="+442084442081">+44 (0) 208 444 2081
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

>  (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD





-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/T-test-or-ANOVA-with-a-percentage-dependent-variable-tp5725967p5725980.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: T-test or ANOVA with a percentage dependent variable

Rich Ulrich
In reply to this post by Maguin, Eugene
Why is it important that the Ns are equal? - The assumption that
gives you a good F-test (or t-test) says that the variances (of residuals)
should be equal.  When some proportions are based on an Ns of 20 and
others on 2000, then looking at simple averages, ignoring N, can give
you bad inference - both for means and for testing.  I did mention that
the computable variance of any proportion depends on its N.

Where it has annoyed me most has been in a couple of meta-analyses.

I think that such studies have improved as a response to heavy criticisms,
but I remember seeing meta-analyses that failed to take any note when
they combined and compared studies with vastly different Ns.  I have seen
something like 30 (*and* a weird study, with unlikely results) weighted
equally with 1500.  Modern meta-analyses should test for the consistency
of the separate results as a condition of freely discussing their mean outcome.

 - If you have a set of proportions, without Ns, there is no way that you can
say that they are "consistent with each other" or not.   One of the best
uses of a mean is to represent a set of data that *are* consistent.  The
average size of a fish in a school of fish describes closely the size of every
fish.  That is (even) true when you only measure a few of them. The average
height or weight of every student in grades 1-12  is a pretty useless number,
when the only thing you know about Ns is that there are generally fewer in
the upper grades.

--
Rich Ulrich


Date: Mon, 12 May 2014 13:40:11 +0000
From: [hidden email]
Subject: Re: T-test or ANOVA with a percentage dependent variable
To: [hidden email]

Rich,

Would you please elaborate a bit on your comment that the equal N’s are important. Since it might be that I’m misunderstanding the situation, I’m assuming that you are referring to the situation where Bruce’s analysis suggestion is being used and the data consist of some sort of test or situation where the number of items or trials varies widely from person to person. The data consist of number of items/trials and number of ‘passes’. These types of data are something that I’ve never run across and I’m just completely unfamiliar with.

Thanks, Gene Maguin

 

 

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Rich Ulrich
Sent: Monday, May 12, 2014 2:26 AM
To: [hidden email]
Subject: Re: T-test or ANOVA with a percentage dependent variable

 

I will add a little bit to what Bruce says here about "independently
and identically distributed as normal" in regards to proportions.

Proportions in the 20-80% range do have the same variances, very
nearly, *when* they have the same denominator.  That is why a t-test
on two groups across such proportions would give a fine test.  For a
wider range of proportions, a good test will use a variance-stabilizing
transformation or the equivalent - like, a logit or probit.  I think that is
the effect of Bruce's GENLIN solution.

But the formula that says the variances are nearly equal relies on that
equal N.  If the N's vary a lot, then the expectations for the variances
vary, too.  When the N's vary a lot, then there is probably NO way to
get an ideal test, and the problem becomes one of finding the best
approximation. 

When the N's vary a lot, the better tests will make use of the counts
that make up the proportions (testing with Maximum Likelihood, I think).


On the other hand, a *lot* of people regularly do ignore this precaution,
and it does not necessarily hurt them much -- When an effect is huge and
systematic, it will show up almost regardless of such problems.  Assuming
that outliers with small N's do not create or uncreate an apparent effect,
the problem of unequal variances shows itself (say, if you were to do
Monte Carlo testing of the circumstances) as a loss of degrees of freedom
for the t-test.  So, the nominal effect is not large, assuming you still have
a moderate d.f.  so that the cut-off is still approximately 2.0 or so.

--
Rich Ulrich

> Date: Sun, 11 May 2014 18:21:37 -0700
> From: [hidden email]
> Subject: Re: T-test or ANOVA with a percentage dependent variable
> To: [hidden email]
>
> For OLS models, the main assumptions are that the *errors* are independently
> and identically distributed as normal with mean = 0 and some variance. Of
> those assumptions, normality is arguably the least important.
>
> What is the range of percentages in your data? If they are not too extreme
> (e.g., if they are in the 20-80% range), you'll probably get a fairly decent
> model with a t-test or ANOVA.
>
> But, there are alternatives. For example, if you have the numerators and
> denominators for the percentages, you could use GENLIN to run what I would
> call a binomial (as opposed to a binary) logistic regression. I don't have
> SPSS on this computer, but I believe the syntax would look something like
> this:
>
>
> GENLIN Numerator OF Denominator BY { list of factors } WITH { list of
> covariates }
> /MODEL { list of terms in the model} INTERCEPT=YES DISTRIBUTION=BINOMIAL
> LINK=LOGIT
> /MISSING CLASSMISSING=EXCLUDE
> /PRINT CPS DESCRIPTIVES MODELINFO FIT SUMMARY SOLUTION (EXPONENTIATED).
>
> Notice that the outcome for this model is not a single variable, but is
> specified as Numerator OF Denominator (or Events-Var OF Trials-Var, as it
> says in the FM). The Exp(B) column in the table of coefficients yields odds
> ratios for this type of model.
>
> HTH.
>
>
>
>
凌楚定 wrote
> > Dear list,
> >
> >
> >
> > Now I have a question about t-test or ANOVA:
> >
> >
> >
> > A is the independent variable and has two levels (0,1). B is the dependent
> > variable, each of whose values is a percentage. By means of SPSS or other
> > instruments, how to compare the means of B under different levels of A,
> > given such a type of data?
> >
> >
> >
> > *My concern is that whether t-test or ANOVA still applies due to the
> > percentage dependent variable which does not follow the normal
> > distribution?*
> >
> >
> >
> > I would be grateful if any of you can give me some suggestions. Thanks!
> >
...

Reply | Threaded
Open this post in threaded view
|

Re: T-test or ANOVA with a percentage dependent variable

凌楚定
In reply to this post by 凌楚定
Thanks for all your insightful and helpful suggestions!



Chu-Ding Ling


2014-05-13 13:30 GMT+08:00 凌楚定 <[hidden email]>:
---------- 转发的邮件 ----------
发件人:"凌楚定" <[hidden email]>
日期:2014-5-12 AM8:30
主题:T-test or ANOVA with a percentage dependent variable
收件人: <[hidden email]>
抄送:"毛程琦" <[hidden email]>


Dear list,

 

Now I have a question about t-test or ANOVA:

 

A is the independent variable and has two levels (0,1). B is the dependent variable, each of whose values is a percentage. By means of SPSS or other instruments, how to compare the means of B under different levels of A, given such a type of data?

 

My concern is that whether t-test or ANOVA still applies due to the percentage dependent variable which does not follow the normal distribution?

 

I would be grateful if any of you can give me some suggestions. Thanks!

 

Chu-Ding LING
Ph.D. Student of Business Administration
School of Management
Zhejiang University

Reply | Threaded
Open this post in threaded view
|

Re: T-test or ANOVA with a percentage dependent variable

jamiecr
In reply to this post by Kornbrot, Diana
I realize this is a fairly old post, but I just came across it, as I have been confused about something.

I understand using logistic regression for binary data (i.e., response for each subject is either success or failure), but it seems that people are suggesting logistic regression when the dependent variable is a proportion/percentage for *each* subject, and I want to confirm this is correct. For example, if 2 groups (let's say men and women) of 50 subjects were each given a 20-item test, where their score was #correct (so each subject can have a score ranging from 0-20), then logistic regression is still appropriate? It just seems that the expression of the data is different between the two examples. I would greatly appreciate any clarification!

Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: T-test or ANOVA with a percentage dependent variable

Ryan
Look up the difference between binary (event versus non-event) logistic regression and binomial (# events / # trials) logistic regression.

Ryan


On Sun, Aug 24, 2014 at 1:28 PM, jamiecr <[hidden email]> wrote:
I realize this is a fairly old post, but I just came across it, as I have
been confused about something.

I understand using logistic regression for binary data (i.e., response for
each subject is either success or failure), but it seems that people are
suggesting logistic regression when the dependent variable is a
proportion/percentage for *each* subject, and I want to confirm this is
correct. For example, if 2 groups (let's say men and women) of 50 subjects
were each given a 20-item test, where their score was #correct (so each
subject can have a score ranging from 0-20), then logistic regression is
still appropriate? It just seems that the expression of the data is
different between the two examples. I would greatly appreciate any
clarification!

Thanks!



--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/T-test-or-ANOVA-with-a-percentage-dependent-variable-tp5725967p5727075.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

RE: T-test or ANOVA with a percentage dependent variable

MaxJasper
In reply to this post by jamiecr

You need to define what is success what is failure.

 

I realize this is a fairly old post, but I just came across it, as I have been confused about something.

I understand using logistic regression for binary data (i.e., response for each subject is either success or failure), but it seems that people are suggesting logistic regression when the dependent variable is a proportion/percentage for *each* subject, and I want to confirm this is correct. For example, if 2 groups (let's say men and women) of 50 subjects were each given a 20-item test, where their score was #correct (so each subject can have a score ranging from 0-20), then logistic regression is still appropriate? It just seems that the expression of the data is different between the two examples. I would greatly appreciate any clarification!

Thanks!

Reply | Threaded
Open this post in threaded view
|

Re: T-test or ANOVA with a percentage dependent variable

jamiecr
In reply to this post by Ryan
Yes, that's it! Thank you!