Minimum N to get significance- paired sample t-test

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Minimum N to get significance- paired sample t-test

Anton-24
A client of a colleague wants to know the minimum number of cases it would
take to get a significant result (as close to .050 as possible, I guess)
using a paired sample t-test, given her actual results (the two means, the
standard deviations, and the correlation). She also has the raw data, if
that helps.

Does anyone have a recommendation on where to start?

I've already looked for an online calculator to get the N and considered
programming something in Excel.

We already had the discussion about effect sizes and power analysis, but she
is steadfast in her desire to know the exact N.

Thanks in advance for any suggestions.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Minimum N to get significance- paired sample t-test

Andy W
This is typically called Post-hoc power analysis, and is generally frowned upon, see this Q/A on crossvalidated for various references, http://stats.stackexchange.com/a/12127/1036

Given the difference in means and standard deviations of the differences you can estimate the needed sample size to detect a paticular effect. See below.

********************************************.
input program.
loop #i = 3 to 300.
compute n = #i.
end case.
end loop.
end file.
end input program.
dataset name pow.
exe.

*https://en.wikipedia.org/wiki/Student's_t-test#Dependent_t-test_for_paired_samples.
*Paired Sample T-test.
compute DF = n - 1. /*Degrees of Freedom.

*You fill these in based on sample stats.
compute #X = 1. /*Difference in means.
compute #null = 0. /*Null.
compute #sd = 3 /*Standard deviation of differences.

compute t = (#X - #null)/(#sd/SQRT(n)). /*T-value for your particular values given different sample sizes.
compute p = 1 - CDF.T(t,DF). /*One tailed test - for two tailed just multiply by two.
exe.
********************************************.

I'd suggest checking out G*Power 3, http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/,  for a free tool to conduct a priori power analysis for a variety of experiment designs, and I believe SPSS has add-ons for power analysis as well (have not used them though).
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Minimum N to get significance- paired sample t-test

henryilian
In reply to this post by Anton-24
I used Russ Lenth's  applet for sample size and power  http://homepage.cs.uiowa.edu/~rlenth/Power/, and I got an N of 34, for a two-tailed test with an alpha level of 0.5 to arrive at an 80% chance of detecting a true difference between means of .5 or greater.  You can use the applet to change some of the conditions, and come up with a different sample size if you want. For example, if the meaningful difference you're trying to detect is larger, the sample size will be smaller, and vice versa. http://homepage.cs.uiowa.edu/~rlenth/Power/

Good luck,

Henry

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Anton
Sent: Tuesday, July 16, 2013 12:12 PM
To: [hidden email]
Subject: Minimum N to get significance- paired sample t-test

A client of a colleague wants to know the minimum number of cases it would take to get a significant result (as close to .050 as possible, I guess) using a paired sample t-test, given her actual results (the two means, the standard deviations, and the correlation). She also has the raw data, if that helps.

Does anyone have a recommendation on where to start?

I've already looked for an online calculator to get the N and considered programming something in Excel.

We already had the discussion about effect sizes and power analysis, but she is steadfast in her desire to know the exact N.

Thanks in advance for any suggestions.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


Confidentiality Notice: This e-mail communication, and any attachments, contains confidential and privileged information for the exclusive use of the recipient(s) named above. If you are not an intended recipient, or the employee or agent responsible to deliver it to an intended recipient, you are hereby notified that you have received this communication in error and that any review, disclosure, dissemination, distribution or copying of it or its contents is prohibited. If you have received this communication in error, please notify me immediately by replying to this message and delete this communication from your computer. Thank you.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Minimum N to get significance- paired sample t-test

Maguin, Eugene
In reply to this post by Andy W
I'm certainly not an expert, but I was expecting to see a non-central cdf rather than a central cdf. Is a non-central distribution used for power? And, if not, what is it used for?
Thanks Gene Maguin

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Andy W
Sent: Tuesday, July 16, 2013 1:27 PM
To: [hidden email]
Subject: Re: Minimum N to get significance- paired sample t-test

This is typically called Post-hoc power analysis, and is generally frowned upon, see this Q/A on crossvalidated for various references,
http://stats.stackexchange.com/a/12127/1036

Given the difference in means and standard deviations of the differences you can estimate the needed sample size to detect a paticular effect. See below.

********************************************.
input program.
loop #i = 3 to 300.
compute n = #i.
end case.
end loop.
end file.
end input program.
dataset name pow.
exe.

*https://en.wikipedia.org/wiki/Student's_t-test#Dependent_t-test_for_paired_samples.
*Paired Sample T-test.
compute DF = n - 1. /*Degrees of Freedom.

*You fill these in based on sample stats.
compute #X = 1. /*Difference in means.
compute #null = 0. /*Null.
compute #sd = 3 /*Standard deviation of differences.

compute t = (#X - #null)/(#sd/SQRT(n)). /*T-value for your particular values given different sample sizes.
compute p = 1 - CDF.T(t,DF). /*One tailed test - for two tailed just multiply by two.
exe.
********************************************.

I'd suggest checking out G*Power 3,
http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/,  for a free tool to conduct a priori power analysis for a variety of experiment designs, and I believe SPSS has add-ons for power analysis as well (have not used them though).



-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Minimum-N-to-get-significance-paired-sample-t-test-tp5721204p5721205.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Minimum N to get significance- paired sample t-test

Andy W
I'm not an expert either (probably should have a do if for if the mean differences are negative values)! I answered the question at hand, given difference in means what is the sample size needed to obtain a p-value below .05 (assuming a *null* of no differences and known variance). Given this set up you wouldn't use a non-central t-distribution, as the null is no difference. [Note what I presented is not a power analysis - at least not in full.]

Based on the wikipedia page, http://en.wikipedia.org/wiki/Noncentral_t-distribution, you might use a non-central t for a power analysis when the null is not zero differences, but some pre-specified value (non-equivalence testing?). You probably know as well as I do though.
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Minimum N to get significance- paired sample t-test

Bruce Weaver
Administrator
In reply to this post by Andy W
Just a bit of terminological nit-picking here.  ;-)  

I would not call what the OP's colleague wants to do post hoc power analysis.  As I understand that term, it would entail plugging in the observed means, SDs and correlation, and computing the observed power.  But observed power is just a transformation of the p-value.  I.e., if p < .05, observed power will be adequate; and if p > .05, observed power will be inadequate.  That is why post hoc power analysis is frowned upon -- it doesn't tell us anything we didn't already know from looking at the p-value.

But that is not what the OP's colleague wants to do.  Rather, she wants to know what sample size would be required (presumably in a future study?) to achieve p = .05 given the same means, SDs and correlation that were observed, and some desired level of power.  That's not the same thing.  It is really just an ordinary sample size estimate where the means, SDs and correlation are taken from the current study.

There's my tuppence, FWIW!

p.s. - Russell Lenth, who is mentioned elsewhere in this thread has some of this on his Power & Sample Size page (http://homepage.stat.uiowa.edu/~rlenth/Power/), and in this summary of a conference presentation he gave:  http://homepage.stat.uiowa.edu/~rlenth/Power/2badHabits.pdf.



Andy W wrote
This is typically called Post-hoc power analysis, and is generally frowned upon, see this Q/A on crossvalidated for various references, http://stats.stackexchange.com/a/12127/1036

Given the difference in means and standard deviations of the differences you can estimate the needed sample size to detect a paticular effect. See below.

********************************************.
input program.
loop #i = 3 to 300.
compute n = #i.
end case.
end loop.
end file.
end input program.
dataset name pow.
exe.

*https://en.wikipedia.org/wiki/Student's_t-test#Dependent_t-test_for_paired_samples.
*Paired Sample T-test.
compute DF = n - 1. /*Degrees of Freedom.

*You fill these in based on sample stats.
compute #X = 1. /*Difference in means.
compute #null = 0. /*Null.
compute #sd = 3 /*Standard deviation of differences.

compute t = (#X - #null)/(#sd/SQRT(n)). /*T-value for your particular values given different sample sizes.
compute p = 1 - CDF.T(t,DF). /*One tailed test - for two tailed just multiply by two.
exe.
********************************************.

I'd suggest checking out G*Power 3, http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/,  for a free tool to conduct a priori power analysis for a variety of experiment designs, and I believe SPSS has add-ons for power analysis as well (have not used them though).
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Minimum N to get significance- paired sample t-test

Mike
Not to start a religious war about statistical practices, but
whether one accepts the legitimacy of doing retroactive
power analysis/sample size estimation/effect size estimation
(as Jack Cohen has shown) or reject this as a legitimate
practice (which Russell Lenth and some others have advocated)
comes down to a fundamental question:

What do you know about the POPULATION situation?

Lenth and other hold that proper power analysis, proper effect
size measures, and additional analyses such as sample size
estimation can ONLY be done with population parameters,
not sample statistics.

Quoting Lenth from his "Bad Habits" paper Bruce links to below:

|Retrospective power analysis comprises a number of different
|practices that involve computing the power of a
|test based on observed data. Personally, I think I can do
|without all such practices; but some of them are more
|understandable than others. The one that I really don't
|like is the idea of computing power using observed data,
|with the observed error variance and the observed effect
|size. (page 2, Section 3)

NOTE: Lenth's argument about p-values containing all the info
you need is misleading because the p-value tell you nothing about
the population parameters involved.  If the researcher really
does not know what the values of the population parameters
are or should be, the sample estimates are the only guides
they have.  The problem is these estimates have error (sampling
and probably other types as well), so retrospective power
analysis and related activities are APPROXIMATE at best
unless one can come up with ways of reducing the error.

Ideally, one somehow knows the population parameters that one
wants to use in a (prospective) power analysis.  One obtains such
information from theoretical considerations, maybe from a meta-analysis,
or from voices in one's head, and then does the power analysis.
Actually having data which provide sample estimates of the population
parameters that one is interested in, according to Lenth & Co,
are useless.

So, one has to ask: What are the population parameters that are
involved?  To most researcher who have no idea what the answers
are to this question, well, according to Lenth & Co, they're SOL.

-Mike Palij
New York University
[hidden email]

----- Original Message -----
From: "Bruce Weaver" <[hidden email]>
To: <[hidden email]>
Sent: Tuesday, July 16, 2013 2:18 PM
Subject: Re: Minimum N to get significance- paired sample t-test


> Just a bit of terminological nit-picking here.  ;-)
>
> I would not call what the OP's colleague wants to do post hoc power
> analysis.  As I understand that term, it would entail plugging in the
> observed means, SDs and correlation, and computing the /observed power/.
> But observed power is just a transformation of the p-value.  I.e., if p <
> .05, observed power will be adequate; and if p > .05, observed power will
> be
> inadequate.  That is why post hoc power analysis is frowned upon -- it
> doesn't tell us anything we didn't already know from looking at the
> p-value.
>
> But that is not what the OP's colleague wants to do.  Rather, she wants to
> know what /sample size/ would be required (presumably in a future study?)
> to
> achieve p = .05 given the same means, SDs and correlation that were
> observed, and some desired level of power.  That's not the same thing.  It
> is really just an ordinary sample size estimate where the means, SDs and
> correlation are taken from the current study.
>
> There's my tuppence, FWIW!
>
> p.s. - Russell Lenth, who is mentioned elsewhere in this thread has some
> of
> this on his Power & Sample Size page
> (http://homepage.stat.uiowa.edu/~rlenth/Power/), and in this summary of a
> conference presentation he gave:
> http://homepage.stat.uiowa.edu/~rlenth/Power/2badHabits.pdf.
>
>
>
>
> Andy W wrote
>> This is typically called Post-hoc power analysis, and is generally
>> frowned
>> upon, see this Q/A on crossvalidated for various references,
>> http://stats.stackexchange.com/a/12127/1036
>>
>> Given the difference in means and standard deviations of the differences
>> you can estimate the needed sample size to detect a paticular effect. See
>> below.
>>
>> ********************************************.
>> input program.
>> loop #i = 3 to 300.
>> compute n = #i.
>> end case.
>> end loop.
>> end file.
>> end input program.
>> dataset name pow.
>> exe.
>>
>> *https://en.wikipedia.org/wiki/Student's_t-test#Dependent_t-test_for_paired_samples.
>> *Paired Sample T-test.
>> compute DF = n - 1. /*Degrees of Freedom.
>>
>> *You fill these in based on sample stats.
>> compute #X = 1. /*Difference in means.
>> compute #null = 0. /*Null.
>> compute #sd = 3 /*Standard deviation of differences.
>>
>> compute t = (#X - #null)/(#sd/SQRT(n)). /*T-value for your particular
>> values given different sample sizes.
>> compute p = 1 - CDF.T(t,DF). /*One tailed test - for two tailed just
>> multiply by two.
>> exe.
>> ********************************************.
>>
>> I'd suggest checking out G*Power 3,
>> http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3/,  for a
>> free
>> tool to conduct a priori power analysis for a variety of experiment
>> designs, and I believe SPSS has add-ons for power analysis as well (have
>> not used them though).
>
>
>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/Minimum-N-to-get-significance-paired-sample-t-test-tp5721204p5721210.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Minimum N to get significance- paired sample t-test

Rich Ulrich
In reply to this post by Anton-24
The N that would be required for significance is somewhat common
(I think) in studies published in Europe.  It is a useful reminder of
the practical significance of the observed effect. The term that comes
to my mind is "effective N" but I know that I use that for something else,
so I would like to be reminded of what this N is called....

The paired-t is strictly proportional to N-1, the d.f.  for the number of pairs,
so this is a simple enough algebra problem to do by hand -- You do have
the paired t-test on the data that you are interested in, or can get it.

Since the cutoff for a two-tailed, 0.05 test is always near 2.0, start with
this crude approximation --
  If your observed t is about 4.0, then your N is twice the cases needed;
  if it is about 1.0, your N is half the cases needed;
  apply your observed t similarly to obtain an approximate value of N.

Look up the exact cut-off t-value for that N (presumably something between
1.96 and 2.05).  Find the desired, exact value of N-1 by using algebra
on the observed t, the exact t (cutoff), and the observed value of N-1.

--
Rich Ulrich


> Date: Tue, 16 Jul 2013 12:11:53 -0400

> From: [hidden email]
> Subject: Minimum N to get significance- paired sample t-test
> To: [hidden email]
>
> A client of a colleague wants to know the minimum number of cases it would
> take to get a significant result (as close to .050 as possible, I guess)
> using a paired sample t-test, given her actual results (the two means, the
> standard deviations, and the correlation). She also has the raw data, if
> that helps.
>
> Does anyone have a recommendation on where to start?
>
> I've already looked for an online calculator to get the N and considered
> programming something in Excel.
>
> We already had the discussion about effect sizes and power analysis, but she
> is steadfast in her desire to know the exact N.
>
> Thanks in advance for any suggestions.
>
Reply | Threaded
Open this post in threaded view
|

Re: Minimum N to get significance- paired sample t-test

Bruce Weaver
Administrator
In reply to this post by Mike
See below.

Mike Palij wrote
--- snip ---

Quoting Lenth from his "Bad Habits" paper Bruce links to below:

|Retrospective power analysis comprises a number of different
|practices that involve computing the power of a
|test based on observed data. Personally, I think I can do
|without all such practices; but some of them are more
|understandable than others. The one that I really don't
|like is the idea of computing power using observed data,
|with the observed error variance and the observed effect
|size. (page 2, Section 3)

NOTE: Lenth's argument about p-values containing all the info
you need is misleading because the p-value tell you nothing about
the population parameters involved.  If the researcher really
does not know what the values of the population parameters
are or should be, the sample estimates are the only guides
they have.  The problem is these estimates have error (sampling
and probably other types as well), so retrospective power
analysis and related activities are APPROXIMATE at best
unless one can come up with ways of reducing the error.
Mike, I find the first couple lines of your note above confusing.  I don't think Lenth was talking about one needs for (proper) power analysis at that point.  Rather, he was talking about what one gets when they do post hoc power analysis:  The observed power estimate and the "least significant number" (LSN) don't provide anything new that isn't more or less captured by the p-value.  That is the point of the table shown at the bottom of page 2.  Talking about that table, he says:

"It is immediately obvious that as the P value increases, retrospective power decreases, and least significant number increases. In fact, both are simply transformations of the P values. It can further be shown that when the P value is equal to [alpha], the retrospective power is approximately 0.5. That is true because the empirical effect size is right at the boundary of the critical region, so that about half of the probability falls in the critical region."

When I posted my earlier reply, I had forgotten about Lenth's discussion of the LSN.  This is exactly what the OP's colleague was trying to do--i.e., compute the minimum sample size to achieve statistical significance, given the same means, SDs and correlation.  So Lenth would probably have been harder on the OP's colleague than I was.

I need to digest some of your other points a bit more, but will say this:  I've never read Lenth as suggesting that one needs to know the actual population parameters to compute a proper sample size estimate.  If that were required, we would indeed be "SOL", because parameters are almost always unknowns.  I think what he is suggesting is that rather than using Cohen's d (or other such standardized measures of effect size), we need to plug in estimated values for the parameters that correspond to minimally important differences (for t-tests, for example), or minimally important measures of association, etc, that we want to have power to detect.  If the sample size is large enough to give us 80% (or 90%) power to detect that minimally important difference, we'll have even greater power if the actual difference is larger.

Another general approach to power analysis is to simulate a population with characteristics that correspond to the "minimally important difference" (or other measure of effect size) that one wishes to have power to detect.  (The simulated population would also have the estimated variances, correlations etc, depending on the nature of the problem.  Then sample repeatedly from that population (using the same sample size each time), run the proposed analysis and save the p-value.  Power = the proportion of p-values < alpha.  Note that this approach also requires estimates of the population parameters, not standardized effect size measures.

I hope I've not fanned the flames excessively.  ;-)

Cheers!
Bruce
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Minimum N to get significance- paired sample t-test

Mike
On Tuesday, July 16, 2013 4:25 PM, Bruce Weaver wrote:

> See below.
> Mike Palij wrote
>> --- snip ---
>> Quoting Lenth from his "Bad Habits" paper Bruce links to below:
>>
>> |Retrospective power analysis comprises a number of different
>> |practices that involve computing the power of a
>> |test based on observed data. Personally, I think I can do
>> |without all such practices; but some of them are more
>> |understandable than others. The one that I really don't
>> |like is the idea of computing power using observed data,
>> |with the observed error variance and the observed effect
>> |size. (page 2, Section 3)
>>
>> NOTE: Lenth's argument about p-values containing all the info
>> you need is misleading because the p-value tell you nothing about
>> the population parameters involved.  If the researcher really
>> does not know what the values of the population parameters
>> are or should be, the sample estimates are the only guides
>> they have.  The problem is these estimates have error (sampling
>> and probably other types as well), so retrospective power
>> analysis and related activities are APPROXIMATE at best
>> unless one can come up with ways of reducing the error.
>
> Mike, I find the first couple lines of your note above confusing.  I don't
> think Lenth was talking about one *needs* for (proper) power analysis at
> that point.  Rather, he was talking about what one *gets* when they do
> post
> hoc power analysis:

I believe you are wrong on this point because Lenth does not believe
in retrospective power analysis.  I assume that LSN is provided by SAS
and used by researchers because it tells one how many more subjects
one would need to have for statistical significance.  Look at the entry
for Hg (p= 0.0258, power=0.61, LSN=157) and then look at BMI
(p= 0.11, power= 0.36, LSN=304) -- Hg is significant but BMI is
not: is it because the null hypothesis is true for BMI or that the effect
size
for BMI is small?  If BMI is theoretically important, then knowing how
many more subjects one would need to run to get statistically significant
results is useful. The remaining variables with LSN=1002 either have
tiny effect sizes or the null hypothesis is true for them but the LSN lets
the research know what sample size he would need if one of them is
important while the p-values for these variables range from 0.48 to 0.72.
Do the p-values and LSN really provide the same information here?

> [snip]
> When I posted my earlier reply, I had forgotten about Lenth's discussion
> of
> the LSN.  This is exactly what the OP's colleague was trying to do--i.e.,
> compute the minimum sample size to achieve statistical significance, given
> the same means, SDs and correlation.  So Lenth would probably have been
> harder on the OP's colleague than I was.

I believe you are correct on this point and that Lenth would argue that it
was a pointless analysis.

> I need to digest some of your other points a bit more, but will say this:
> I've never read Lenth as suggesting that one needs to know the /actual/
> population parameters to compute a proper sample size estimate.  If that
> were required, we would indeed be "SOL", because parameters are almost
> always unknowns.  [snip]

Here is a quote from the following article:
Faul, F., Erdfelder, E., Lang, A. G., & Buchner, A. (2007). G* Power 3: A
flexible statistical power analysis program for the social, behavioral, and
biomedical sciences. Behavior research methods, 39(2), 175-191.

|Post Hoc Power Analyses
|In contrast to a priori power analyses, post hoc power
|analyses (Cohen, 1988) often make sense after a study
|has already been conducted. In post hoc analyses, 1 - beta
|is computed as a function of alpha, the population effect size
|parameter, and the sample size(s) used in a study. It thus
|becomes possible to assess whether or not a published
|statistical test in fact had a fair chance of rejecting an incorrect
|H0. Importantly, post hoc analyses, like a priori
|analyses, require an H1 effect size specification for the
|underlying population. Post hoc power analyses should
|not be confused with so-called retrospective power analyses,
|in which the effect size is estimated from sample
|data and used to calculate the observed power, a sample
|estimate of the true power. (Ref note 1) Retrospective power
|analyses are based on the highly questionable assumption that
|the sample effect size is essentially identical to the effect
|size in the population from which it was drawn (Zumbo &
|Hubley, 1998). Obviously, this assumption is likely to be
|false, and the more so the smaller the sample. In addition,
|sample effect sizes are typically biased estimates of their
|population counterparts (Richardson, 1996). For these
|reasons, we agree with other critics of retrospective power
|analyses (e.g., Gerard, Smith, & Weerakkody, 1998; Hoenig
|& Heisey, 2001; Kromrey & Hogarty, 2000; ****Lenth,
|2001****; Steidl, Hayes, & Schauber, 1997). Rather than use
|retrospective power analyses, researchers should specify
|population effect sizes on a priori grounds. To specify the
|effect size simply means to define the minimum degree
|of violation of H0 a researcher would like to detect with
|a probability not less than 1 - beta. Cohen's definitions of
|small, medium, and large effects can be helpful in such
|effect size specifications (see, e.g., Smith & Bayen, 2005).

The article can be obtained here:
www.its.fsu.edu/index.php/content/download/51987/428157/file/Faul2007.pdf

Given the above, I'll leave it to you to decide whether we're
all SOL. :-)

> Another general approach to power analysis is to simulate a population
> with
> characteristics that correspond to the "minimally important difference"
> (or
> other measure of effect size) that one wishes to have power to detect.
> (The
> simulated population would also have the estimated variances, correlations
> etc, depending on the nature of the problem.  Then sample repeatedly from
> that population (using the same sample size each time), run the proposed
> analysis and save the p-value.  Power = the proportion of p-values <
> alpha.
> Note that this approach also requires estimates of the population
> parameters, not standardized effect size measures.

I don't have a problem with this but I'm not the one making the rules. ;-)

> I hope I've not fanned the flames excessively.  ;-)

Please, it's hot enough in NYC. ;-)

> Cheers!
> Bruce

-Mike Palij
New York University
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Minimum N to get significance- paired sample t-test

Bruce Weaver
Administrator
"I believe you are wrong on this point because Lenth does not believe in retrospective power analysis."

Mike, once again you've confused me.  It is very clear in Lenth's article that he does not promote retrospective power analysis.  But I don't think I made any arguments that depend on Lenth promoting retrospective power analysis.  Hence my confusion over being thought wrong "because Lenth does not believe in retrospective power analysis".

Whenever you post, I generally find myself on pretty much the same page, and I suspect we're actually in pretty close agreement on this issue.  But for some reason, we're having trouble expressing things in ways that the other understands!  (Maybe we can blame it on the heat?)  ;-)

Cheers,
Bruce



Mike Palij wrote
On Tuesday, July 16, 2013 4:25 PM, Bruce Weaver wrote:
> See below.
> Mike Palij wrote
>> --- snip ---
>> Quoting Lenth from his "Bad Habits" paper Bruce links to below:
>>
>> |Retrospective power analysis comprises a number of different
>> |practices that involve computing the power of a
>> |test based on observed data. Personally, I think I can do
>> |without all such practices; but some of them are more
>> |understandable than others. The one that I really don't
>> |like is the idea of computing power using observed data,
>> |with the observed error variance and the observed effect
>> |size. (page 2, Section 3)
>>
>> NOTE: Lenth's argument about p-values containing all the info
>> you need is misleading because the p-value tell you nothing about
>> the population parameters involved.  If the researcher really
>> does not know what the values of the population parameters
>> are or should be, the sample estimates are the only guides
>> they have.  The problem is these estimates have error (sampling
>> and probably other types as well), so retrospective power
>> analysis and related activities are APPROXIMATE at best
>> unless one can come up with ways of reducing the error.
>
> Mike, I find the first couple lines of your note above confusing.  I don't
> think Lenth was talking about one *needs* for (proper) power analysis at
> that point.  Rather, he was talking about what one *gets* when they do
> post
> hoc power analysis:

I believe you are wrong on this point because Lenth does not believe
in retrospective power analysis.  I assume that LSN is provided by SAS
and used by researchers because it tells one how many more subjects
one would need to have for statistical significance.  Look at the entry
for Hg (p= 0.0258, power=0.61, LSN=157) and then look at BMI
(p= 0.11, power= 0.36, LSN=304) -- Hg is significant but BMI is
not: is it because the null hypothesis is true for BMI or that the effect
size
for BMI is small?  If BMI is theoretically important, then knowing how
many more subjects one would need to run to get statistically significant
results is useful. The remaining variables with LSN=1002 either have
tiny effect sizes or the null hypothesis is true for them but the LSN lets
the research know what sample size he would need if one of them is
important while the p-values for these variables range from 0.48 to 0.72.
Do the p-values and LSN really provide the same information here?

> [snip]
> When I posted my earlier reply, I had forgotten about Lenth's discussion
> of
> the LSN.  This is exactly what the OP's colleague was trying to do--i.e.,
> compute the minimum sample size to achieve statistical significance, given
> the same means, SDs and correlation.  So Lenth would probably have been
> harder on the OP's colleague than I was.

I believe you are correct on this point and that Lenth would argue that it
was a pointless analysis.

> I need to digest some of your other points a bit more, but will say this:
> I've never read Lenth as suggesting that one needs to know the /actual/
> population parameters to compute a proper sample size estimate.  If that
> were required, we would indeed be "SOL", because parameters are almost
> always unknowns.  [snip]

Here is a quote from the following article:
Faul, F., Erdfelder, E., Lang, A. G., & Buchner, A. (2007). G* Power 3: A
flexible statistical power analysis program for the social, behavioral, and
biomedical sciences. Behavior research methods, 39(2), 175-191.

|Post Hoc Power Analyses
|In contrast to a priori power analyses, post hoc power
|analyses (Cohen, 1988) often make sense after a study
|has already been conducted. In post hoc analyses, 1 - beta
|is computed as a function of alpha, the population effect size
|parameter, and the sample size(s) used in a study. It thus
|becomes possible to assess whether or not a published
|statistical test in fact had a fair chance of rejecting an incorrect
|H0. Importantly, post hoc analyses, like a priori
|analyses, require an H1 effect size specification for the
|underlying population. Post hoc power analyses should
|not be confused with so-called retrospective power analyses,
|in which the effect size is estimated from sample
|data and used to calculate the observed power, a sample
|estimate of the true power. (Ref note 1) Retrospective power
|analyses are based on the highly questionable assumption that
|the sample effect size is essentially identical to the effect
|size in the population from which it was drawn (Zumbo &
|Hubley, 1998). Obviously, this assumption is likely to be
|false, and the more so the smaller the sample. In addition,
|sample effect sizes are typically biased estimates of their
|population counterparts (Richardson, 1996). For these
|reasons, we agree with other critics of retrospective power
|analyses (e.g., Gerard, Smith, & Weerakkody, 1998; Hoenig
|& Heisey, 2001; Kromrey & Hogarty, 2000; ****Lenth,
|2001****; Steidl, Hayes, & Schauber, 1997). Rather than use
|retrospective power analyses, researchers should specify
|population effect sizes on a priori grounds. To specify the
|effect size simply means to define the minimum degree
|of violation of H0 a researcher would like to detect with
|a probability not less than 1 - beta. Cohen's definitions of
|small, medium, and large effects can be helpful in such
|effect size specifications (see, e.g., Smith & Bayen, 2005).

The article can be obtained here:
www.its.fsu.edu/index.php/content/download/51987/428157/file/Faul2007.pdf

Given the above, I'll leave it to you to decide whether we're
all SOL. :-)

> Another general approach to power analysis is to simulate a population
> with
> characteristics that correspond to the "minimally important difference"
> (or
> other measure of effect size) that one wishes to have power to detect.
> (The
> simulated population would also have the estimated variances, correlations
> etc, depending on the nature of the problem.  Then sample repeatedly from
> that population (using the same sample size each time), run the proposed
> analysis and save the p-value.  Power = the proportion of p-values <
> alpha.
> Note that this approach also requires estimates of the population
> parameters, not standardized effect size measures.

I don't have a problem with this but I'm not the one making the rules. ;-)

> I hope I've not fanned the flames excessively.  ;-)

Please, it's hot enough in NYC. ;-)

> Cheers!
> Bruce

-Mike Palij
New York University
[hidden email]

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Minimum N to get significance- paired sample t-test

Mike
On Tuesday, July 16, 2013 8:47 PM, Bruce Weaver wrote:
> "I believe you are wrong on this point because Lenth does not believe in
> retrospective power analysis."
>
> Mike, once again you've confused me.  It is very clear in Lenth's article
> that he does not promote retrospective power analysis.  But I don't think
> I
> made any arguments that depend on Lenth promoting retrospective power
> analysis.  Hence my confusion over being thought wrong "because Lenth does
> not believe in retrospective power analysis".

Maybe it is the heat. ;-)   But let me see if I can clear things up.

(1) The LSN calculations that he presents rely upon sample information
like the sample effect size which, because he does not believe in
retrospective
power analysis, are invalid from his perspective. I assume that Lenth
was referring to a formula for LSN like the one given by SAS/JMP
on the following website:
http://www.jmp.com/support/help/Power_Calculations.shtml
Note that LSN on this website is calculated in the context of a prospective
study and requires the specification of an effect size measure.
It seems odd to me that Lenth is trying to make points about why
one should use p-values instead of LSN when he doesn't believe
in the LSN that were calculated.

(2) Lenth makes the following statement:

|Similarly, LSNs don't add new information. True, if
|we collect more data to bring N up to the LSN, and the
|effect size stays the same, then we'll obtain statistical
|significance. Such a strategy is strictly asterisk-hunting:
|let's do whatever it takes to make P<05. Instead, as
|in the preceding section, I recommend consulting with
|subject-matter experts before the data are collected to determine
|absolute effect-size goals that are consistent with
|the scientific goals of the study.

Well, Lenth is to be commended for his high-mindedness in
eschewing statistical analysis just for obtaining statistically significant
results but perhaps he is overestimating the resources that a
researcher may have.  And does the LSN really not provide
any additional information?  Consider the following description
from the SAS website for POWER macro that does both
prospective and retrospective power analysis:

|Least Significant Number (LSN)
|
|Is the number of observations needed to reduce the variance of the
|estimates enough to achieve a significant result with the given values
|of alpha, sigma, and delta. If you need more data to achieve significance,
|the LSN helps tell you how many more. The LSN has the following
|characteristics:
|
|If the LSN is less than the actual sample size N, then the effect is
|significant. This means that you have more data than you need to
|detect the significance at the given alpha level.
|
|If the LSN is greater than the actual sample size N, the effect is
|not significant. In this case, if you believe that more data will show
|the same variance and structural results as the current sample, the
|LSN suggests how much data you would need to achieve significance.
|
|If the LSN is equal to N, then the p-value is equal to the significance
|level, alpha. The test is on the border of significance.
|
|Power calculated when N=LSN is always greater than or equal to 0.5.
|
|Power when N=LSN represents the power associated with using the
|sample size recommended by the LSN.
http://support.sas.com/kb/25/011.html

One could invert Lenth's argument and ask why does one need p-values
when one can determine whether one has an adequate sample size N
by comparing it to LSN?  At least with N you can always add more subjects
if your N is less than LSN.  What are you going to do if your p-value
is greater than your alpha (e.g, 0.05)?

> Whenever you post, I generally find myself on pretty much the same page,
> and
> I suspect we're actually in pretty close agreement on this issue.  But for
> some reason, we're having trouble expressing things in ways that the other
> understands!  (Maybe we can blame it on the heat?)  ;-)

Well, maybe I cleared things up. Or not.  But Mariano is up next
and I have to see the last time he pitches in an All Star Game.

> Cheers,
> Bruce

-Mike Palij
New York University
[hidden email]

>
>
> Mike Palij wrote
>> On Tuesday, July 16, 2013 4:25 PM, Bruce Weaver wrote:
>>> See below.
>>> Mike Palij wrote
>>>> --- snip ---
>>>> Quoting Lenth from his "Bad Habits" paper Bruce links to below:
>>>>
>>>> |Retrospective power analysis comprises a number of different
>>>> |practices that involve computing the power of a
>>>> |test based on observed data. Personally, I think I can do
>>>> |without all such practices; but some of them are more
>>>> |understandable than others. The one that I really don't
>>>> |like is the idea of computing power using observed data,
>>>> |with the observed error variance and the observed effect
>>>> |size. (page 2, Section 3)
>>>>
>>>> NOTE: Lenth's argument about p-values containing all the info
>>>> you need is misleading because the p-value tell you nothing about
>>>> the population parameters involved.  If the researcher really
>>>> does not know what the values of the population parameters
>>>> are or should be, the sample estimates are the only guides
>>>> they have.  The problem is these estimates have error (sampling
>>>> and probably other types as well), so retrospective power
>>>> analysis and related activities are APPROXIMATE at best
>>>> unless one can come up with ways of reducing the error.
>>>
>>> Mike, I find the first couple lines of your note above confusing.  I
>>> don't
>>> think Lenth was talking about one *needs* for (proper) power analysis at
>>> that point.  Rather, he was talking about what one *gets* when they do
>>> post
>>> hoc power analysis:
>>
>> I believe you are wrong on this point because Lenth does not believe
>> in retrospective power analysis.  I assume that LSN is provided by SAS
>> and used by researchers because it tells one how many more subjects
>> one would need to have for statistical significance.  Look at the entry
>> for Hg (p= 0.0258, power=0.61, LSN=157) and then look at BMI
>> (p= 0.11, power= 0.36, LSN=304) -- Hg is significant but BMI is
>> not: is it because the null hypothesis is true for BMI or that the effect
>> size
>> for BMI is small?  If BMI is theoretically important, then knowing how
>> many more subjects one would need to run to get statistically significant
>> results is useful. The remaining variables with LSN=1002 either have
>> tiny effect sizes or the null hypothesis is true for them but the LSN
>> lets
>> the research know what sample size he would need if one of them is
>> important while the p-values for these variables range from 0.48 to 0.72.
>> Do the p-values and LSN really provide the same information here?
>>
>>> [snip]
>>> When I posted my earlier reply, I had forgotten about Lenth's discussion
>>> of
>>> the LSN.  This is exactly what the OP's colleague was trying to
>>> do--i.e.,
>>> compute the minimum sample size to achieve statistical significance,
>>> given
>>> the same means, SDs and correlation.  So Lenth would probably have been
>>> harder on the OP's colleague than I was.
>>
>> I believe you are correct on this point and that Lenth would argue that
>> it
>> was a pointless analysis.
>>
>>> I need to digest some of your other points a bit more, but will say
>>> this:
>>> I've never read Lenth as suggesting that one needs to know the /actual/
>>> population parameters to compute a proper sample size estimate.  If that
>>> were required, we would indeed be "SOL", because parameters are almost
>>> always unknowns.  [snip]
>>
>> Here is a quote from the following article:
>> Faul, F., Erdfelder, E., Lang, A. G., & Buchner, A. (2007). G* Power 3: A
>> flexible statistical power analysis program for the social, behavioral,
>> and
>> biomedical sciences. Behavior research methods, 39(2), 175-191.
>>
>> |Post Hoc Power Analyses
>> |In contrast to a priori power analyses, post hoc power
>> |analyses (Cohen, 1988) often make sense after a study
>> |has already been conducted. In post hoc analyses, 1 - beta
>> |is computed as a function of alpha, the population effect size
>> |parameter, and the sample size(s) used in a study. It thus
>> |becomes possible to assess whether or not a published
>> |statistical test in fact had a fair chance of rejecting an incorrect
>> |H0. Importantly, post hoc analyses, like a priori
>> |analyses, require an H1 effect size specification for the
>> |underlying population. Post hoc power analyses should
>> |not be confused with so-called retrospective power analyses,
>> |in which the effect size is estimated from sample
>> |data and used to calculate the observed power, a sample
>> |estimate of the true power. (Ref note 1) Retrospective power
>> |analyses are based on the highly questionable assumption that
>> |the sample effect size is essentially identical to the effect
>> |size in the population from which it was drawn (Zumbo &
>> |Hubley, 1998). Obviously, this assumption is likely to be
>> |false, and the more so the smaller the sample. In addition,
>> |sample effect sizes are typically biased estimates of their
>> |population counterparts (Richardson, 1996). For these
>> |reasons, we agree with other critics of retrospective power
>> |analyses (e.g., Gerard, Smith, & Weerakkody, 1998; Hoenig
>> |& Heisey, 2001; Kromrey & Hogarty, 2000; ****Lenth,
>> |2001****; Steidl, Hayes, & Schauber, 1997). Rather than use
>> |retrospective power analyses, researchers should specify
>> |population effect sizes on a priori grounds. To specify the
>> |effect size simply means to define the minimum degree
>> |of violation of H0 a researcher would like to detect with
>> |a probability not less than 1 - beta. Cohen's definitions of
>> |small, medium, and large effects can be helpful in such
>> |effect size specifications (see, e.g., Smith & Bayen, 2005).
>>
>> The article can be obtained here:
>> www.its.fsu.edu/index.php/content/download/51987/428157/file/Faul2007.pdf
>>
>> Given the above, I'll leave it to you to decide whether we're
>> all SOL. :-)
>>
>>> Another general approach to power analysis is to simulate a population
>>> with
>>> characteristics that correspond to the "minimally important difference"
>>> (or
>>> other measure of effect size) that one wishes to have power to detect.
>>> (The
>>> simulated population would also have the estimated variances,
>>> correlations
>>> etc, depending on the nature of the problem.  Then sample repeatedly
>>> from
>>> that population (using the same sample size each time), run the proposed
>>> analysis and save the p-value.  Power = the proportion of p-values <
>>> alpha.
>>> Note that this approach also requires estimates of the population
>>> parameters, not standardized effect size measures.
>>
>> I don't have a problem with this but I'm not the one making the rules.
>> ;-)
>>
>>> I hope I've not fanned the flames excessively.  ;-)
>>
>> Please, it's hot enough in NYC. ;-)
>>
>>> Cheers!
>>> Bruce
>>
>> -Mike Palij
>> New York University
>
>> mp26@
>
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>
>> LISTSERV@.UGA
>
>>  (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>
>
>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/Minimum-N-to-get-significance-paired-sample-t-test-tp5721204p5721216.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Minimum N to get significance- paired sample t-test

Bruce Weaver
Administrator
Thanks Mike.  That is much clearer indeed.  And I do agree that Lenth could have been clearer in his argument about observed power and LSN not adding any new information.  I think he can be read as saying they don't add anything that is not captured by the p-value; but of course it is not the p-value alone that contains the information, but the p-value plus sample means & SDs (or whatever the statistics are, depending on what kind of test it is).  

I think he is right in saying that observed power and LSN are relatively simple transformations of the p-value (and those means & SDs).  But it does not necessarily follow that they are useless because of that.  Sometimes it is helpful to express the information in different ways.  For examples, clinicians seem to like the number needed to treat (NNT), which is just the reciprocal of the absolute risk reduction.  Correlation may be preferred over covariance for certain purposes.  And so on.  Are NNT and Pearson's r useless because they are simple transformations of other measures?  Not at all.

Cheers,
Bruce


Mike Palij wrote
On Tuesday, July 16, 2013 8:47 PM, Bruce Weaver wrote:
> "I believe you are wrong on this point because Lenth does not believe in
> retrospective power analysis."
>
> Mike, once again you've confused me.  It is very clear in Lenth's article
> that he does not promote retrospective power analysis.  But I don't think
> I
> made any arguments that depend on Lenth promoting retrospective power
> analysis.  Hence my confusion over being thought wrong "because Lenth does
> not believe in retrospective power analysis".

Maybe it is the heat. ;-)   But let me see if I can clear things up.

(1) The LSN calculations that he presents rely upon sample information
like the sample effect size which, because he does not believe in
retrospective
power analysis, are invalid from his perspective. I assume that Lenth
was referring to a formula for LSN like the one given by SAS/JMP
on the following website:
http://www.jmp.com/support/help/Power_Calculations.shtml
Note that LSN on this website is calculated in the context of a prospective
study and requires the specification of an effect size measure.
It seems odd to me that Lenth is trying to make points about why
one should use p-values instead of LSN when he doesn't believe
in the LSN that were calculated.

(2) Lenth makes the following statement:

|Similarly, LSNs don't add new information. True, if
|we collect more data to bring N up to the LSN, and the
|effect size stays the same, then we'll obtain statistical
|significance. Such a strategy is strictly asterisk-hunting:
|let's do whatever it takes to make P<05. Instead, as
|in the preceding section, I recommend consulting with
|subject-matter experts before the data are collected to determine
|absolute effect-size goals that are consistent with
|the scientific goals of the study.

Well, Lenth is to be commended for his high-mindedness in
eschewing statistical analysis just for obtaining statistically significant
results but perhaps he is overestimating the resources that a
researcher may have.  And does the LSN really not provide
any additional information?  Consider the following description
from the SAS website for POWER macro that does both
prospective and retrospective power analysis:

|Least Significant Number (LSN)
|
|Is the number of observations needed to reduce the variance of the
|estimates enough to achieve a significant result with the given values
|of alpha, sigma, and delta. If you need more data to achieve significance,
|the LSN helps tell you how many more. The LSN has the following
|characteristics:
|
|If the LSN is less than the actual sample size N, then the effect is
|significant. This means that you have more data than you need to
|detect the significance at the given alpha level.
|
|If the LSN is greater than the actual sample size N, the effect is
|not significant. In this case, if you believe that more data will show
|the same variance and structural results as the current sample, the
|LSN suggests how much data you would need to achieve significance.
|
|If the LSN is equal to N, then the p-value is equal to the significance
|level, alpha. The test is on the border of significance.
|
|Power calculated when N=LSN is always greater than or equal to 0.5.
|
|Power when N=LSN represents the power associated with using the
|sample size recommended by the LSN.
http://support.sas.com/kb/25/011.html

One could invert Lenth's argument and ask why does one need p-values
when one can determine whether one has an adequate sample size N
by comparing it to LSN?  At least with N you can always add more subjects
if your N is less than LSN.  What are you going to do if your p-value
is greater than your alpha (e.g, 0.05)?

> Whenever you post, I generally find myself on pretty much the same page,
> and
> I suspect we're actually in pretty close agreement on this issue.  But for
> some reason, we're having trouble expressing things in ways that the other
> understands!  (Maybe we can blame it on the heat?)  ;-)

Well, maybe I cleared things up. Or not.  But Mariano is up next
and I have to see the last time he pitches in an All Star Game.

> Cheers,
> Bruce

-Mike Palij
New York University
[hidden email]

>
>
> Mike Palij wrote
>> On Tuesday, July 16, 2013 4:25 PM, Bruce Weaver wrote:
>>> See below.
>>> Mike Palij wrote
>>>> --- snip ---
>>>> Quoting Lenth from his "Bad Habits" paper Bruce links to below:
>>>>
>>>> |Retrospective power analysis comprises a number of different
>>>> |practices that involve computing the power of a
>>>> |test based on observed data. Personally, I think I can do
>>>> |without all such practices; but some of them are more
>>>> |understandable than others. The one that I really don't
>>>> |like is the idea of computing power using observed data,
>>>> |with the observed error variance and the observed effect
>>>> |size. (page 2, Section 3)
>>>>
>>>> NOTE: Lenth's argument about p-values containing all the info
>>>> you need is misleading because the p-value tell you nothing about
>>>> the population parameters involved.  If the researcher really
>>>> does not know what the values of the population parameters
>>>> are or should be, the sample estimates are the only guides
>>>> they have.  The problem is these estimates have error (sampling
>>>> and probably other types as well), so retrospective power
>>>> analysis and related activities are APPROXIMATE at best
>>>> unless one can come up with ways of reducing the error.
>>>
>>> Mike, I find the first couple lines of your note above confusing.  I
>>> don't
>>> think Lenth was talking about one *needs* for (proper) power analysis at
>>> that point.  Rather, he was talking about what one *gets* when they do
>>> post
>>> hoc power analysis:
>>
>> I believe you are wrong on this point because Lenth does not believe
>> in retrospective power analysis.  I assume that LSN is provided by SAS
>> and used by researchers because it tells one how many more subjects
>> one would need to have for statistical significance.  Look at the entry
>> for Hg (p= 0.0258, power=0.61, LSN=157) and then look at BMI
>> (p= 0.11, power= 0.36, LSN=304) -- Hg is significant but BMI is
>> not: is it because the null hypothesis is true for BMI or that the effect
>> size
>> for BMI is small?  If BMI is theoretically important, then knowing how
>> many more subjects one would need to run to get statistically significant
>> results is useful. The remaining variables with LSN=1002 either have
>> tiny effect sizes or the null hypothesis is true for them but the LSN
>> lets
>> the research know what sample size he would need if one of them is
>> important while the p-values for these variables range from 0.48 to 0.72.
>> Do the p-values and LSN really provide the same information here?
>>
>>> [snip]
>>> When I posted my earlier reply, I had forgotten about Lenth's discussion
>>> of
>>> the LSN.  This is exactly what the OP's colleague was trying to
>>> do--i.e.,
>>> compute the minimum sample size to achieve statistical significance,
>>> given
>>> the same means, SDs and correlation.  So Lenth would probably have been
>>> harder on the OP's colleague than I was.
>>
>> I believe you are correct on this point and that Lenth would argue that
>> it
>> was a pointless analysis.
>>
>>> I need to digest some of your other points a bit more, but will say
>>> this:
>>> I've never read Lenth as suggesting that one needs to know the /actual/
>>> population parameters to compute a proper sample size estimate.  If that
>>> were required, we would indeed be "SOL", because parameters are almost
>>> always unknowns.  [snip]
>>
>> Here is a quote from the following article:
>> Faul, F., Erdfelder, E., Lang, A. G., & Buchner, A. (2007). G* Power 3: A
>> flexible statistical power analysis program for the social, behavioral,
>> and
>> biomedical sciences. Behavior research methods, 39(2), 175-191.
>>
>> |Post Hoc Power Analyses
>> |In contrast to a priori power analyses, post hoc power
>> |analyses (Cohen, 1988) often make sense after a study
>> |has already been conducted. In post hoc analyses, 1 - beta
>> |is computed as a function of alpha, the population effect size
>> |parameter, and the sample size(s) used in a study. It thus
>> |becomes possible to assess whether or not a published
>> |statistical test in fact had a fair chance of rejecting an incorrect
>> |H0. Importantly, post hoc analyses, like a priori
>> |analyses, require an H1 effect size specification for the
>> |underlying population. Post hoc power analyses should
>> |not be confused with so-called retrospective power analyses,
>> |in which the effect size is estimated from sample
>> |data and used to calculate the observed power, a sample
>> |estimate of the true power. (Ref note 1) Retrospective power
>> |analyses are based on the highly questionable assumption that
>> |the sample effect size is essentially identical to the effect
>> |size in the population from which it was drawn (Zumbo &
>> |Hubley, 1998). Obviously, this assumption is likely to be
>> |false, and the more so the smaller the sample. In addition,
>> |sample effect sizes are typically biased estimates of their
>> |population counterparts (Richardson, 1996). For these
>> |reasons, we agree with other critics of retrospective power
>> |analyses (e.g., Gerard, Smith, & Weerakkody, 1998; Hoenig
>> |& Heisey, 2001; Kromrey & Hogarty, 2000; ****Lenth,
>> |2001****; Steidl, Hayes, & Schauber, 1997). Rather than use
>> |retrospective power analyses, researchers should specify
>> |population effect sizes on a priori grounds. To specify the
>> |effect size simply means to define the minimum degree
>> |of violation of H0 a researcher would like to detect with
>> |a probability not less than 1 - beta. Cohen's definitions of
>> |small, medium, and large effects can be helpful in such
>> |effect size specifications (see, e.g., Smith & Bayen, 2005).
>>
>> The article can be obtained here:
>> www.its.fsu.edu/index.php/content/download/51987/428157/file/Faul2007.pdf
>>
>> Given the above, I'll leave it to you to decide whether we're
>> all SOL. :-)
>>
>>> Another general approach to power analysis is to simulate a population
>>> with
>>> characteristics that correspond to the "minimally important difference"
>>> (or
>>> other measure of effect size) that one wishes to have power to detect.
>>> (The
>>> simulated population would also have the estimated variances,
>>> correlations
>>> etc, depending on the nature of the problem.  Then sample repeatedly
>>> from
>>> that population (using the same sample size each time), run the proposed
>>> analysis and save the p-value.  Power = the proportion of p-values <
>>> alpha.
>>> Note that this approach also requires estimates of the population
>>> parameters, not standardized effect size measures.
>>
>> I don't have a problem with this but I'm not the one making the rules.
>> ;-)
>>
>>> I hope I've not fanned the flames excessively.  ;-)
>>
>> Please, it's hot enough in NYC. ;-)
>>
>>> Cheers!
>>> Bruce
>>
>> -Mike Palij
>> New York University
>
>> mp26@
>
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>
>> LISTSERV@.UGA
>
>>  (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>
>
>
>
>
> -----
> --
> Bruce Weaver
> [hidden email]
> http://sites.google.com/a/lakeheadu.ca/bweaver/
>
> "When all else fails, RTFM."
>
> NOTE: My Hotmail account is not monitored regularly.
> To send me an e-mail, please use the address shown above.
>
> --
> View this message in context:
> http://spssx-discussion.1045642.n5.nabble.com/Minimum-N-to-get-significance-paired-sample-t-test-tp5721204p5721216.html
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> [hidden email] (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Minimum N to get significance- paired sample t-test

Anton-24
In reply to this post by Anton-24
I'd like to thank all who made contributions to my post.

Andy, with a couple of periods after the computes, the code worked great.
Thanks also to Henry for the website recommendation, and to Bruce & Mike for
their lively discusion.  Also to Rich for the quick method.

I continue to learn a lot from all of your responses over the years.

Anton



On Tue, 16 Jul 2013 17:47:32 -0700, Bruce Weaver <[hidden email]>
wrote:

>"I believe you are wrong on this point because Lenth does not believe in
>retrospective power analysis."
>
>Mike, once again you've confused me.  It is very clear in Lenth's article
>that he does not promote retrospective power analysis.  But I don't think I
>made any arguments that depend on Lenth promoting retrospective power
>analysis.  Hence my confusion over being thought wrong "because Lenth does
>not believe in retrospective power analysis".
>
>Whenever you post, I generally find myself on pretty much the same page, and
>I suspect we're actually in pretty close agreement on this issue.  But for
>some reason, we're having trouble expressing things in ways that the other
>understands!  (Maybe we can blame it on the heat?)  ;-)
>
>Cheers,
>Bruce
>
>
>
>
>Mike Palij wrote
>> On Tuesday, July 16, 2013 4:25 PM, Bruce Weaver wrote:
>>> See below.
>>> Mike Palij wrote
>>>> --- snip ---
>>>> Quoting Lenth from his "Bad Habits" paper Bruce links to below:
>>>>
>>>> |Retrospective power analysis comprises a number of different
>>>> |practices that involve computing the power of a
>>>> |test based on observed data. Personally, I think I can do
>>>> |without all such practices; but some of them are more
>>>> |understandable than others. The one that I really don't
>>>> |like is the idea of computing power using observed data,
>>>> |with the observed error variance and the observed effect
>>>> |size. (page 2, Section 3)
>>>>
>>>> NOTE: Lenth's argument about p-values containing all the info
>>>> you need is misleading because the p-value tell you nothing about
>>>> the population parameters involved.  If the researcher really
>>>> does not know what the values of the population parameters
>>>> are or should be, the sample estimates are the only guides
>>>> they have.  The problem is these estimates have error (sampling
>>>> and probably other types as well), so retrospective power
>>>> analysis and related activities are APPROXIMATE at best
>>>> unless one can come up with ways of reducing the error.
>>>
>>> Mike, I find the first couple lines of your note above confusing.  I
>>> don't
>>> think Lenth was talking about one *needs* for (proper) power analysis at
>>> that point.  Rather, he was talking about what one *gets* when they do
>>> post
>>> hoc power analysis:
>>
>> I believe you are wrong on this point because Lenth does not believe
>> in retrospective power analysis.  I assume that LSN is provided by SAS
>> and used by researchers because it tells one how many more subjects
>> one would need to have for statistical significance.  Look at the entry
>> for Hg (p= 0.0258, power=0.61, LSN=157) and then look at BMI
>> (p= 0.11, power= 0.36, LSN=304) -- Hg is significant but BMI is
>> not: is it because the null hypothesis is true for BMI or that the effect
>> size
>> for BMI is small?  If BMI is theoretically important, then knowing how
>> many more subjects one would need to run to get statistically significant
>> results is useful. The remaining variables with LSN=1002 either have
>> tiny effect sizes or the null hypothesis is true for them but the LSN lets
>> the research know what sample size he would need if one of them is
>> important while the p-values for these variables range from 0.48 to 0.72.
>> Do the p-values and LSN really provide the same information here?
>>
>>> [snip]
>>> When I posted my earlier reply, I had forgotten about Lenth's discussion
>>> of
>>> the LSN.  This is exactly what the OP's colleague was trying to do--i.e.,
>>> compute the minimum sample size to achieve statistical significance,
>>> given
>>> the same means, SDs and correlation.  So Lenth would probably have been
>>> harder on the OP's colleague than I was.
>>
>> I believe you are correct on this point and that Lenth would argue that it
>> was a pointless analysis.
>>
>>> I need to digest some of your other points a bit more, but will say this:
>>> I've never read Lenth as suggesting that one needs to know the /actual/
>>> population parameters to compute a proper sample size estimate.  If that
>>> were required, we would indeed be "SOL", because parameters are almost
>>> always unknowns.  [snip]
>>
>> Here is a quote from the following article:
>> Faul, F., Erdfelder, E., Lang, A. G., & Buchner, A. (2007). G* Power 3: A
>> flexible statistical power analysis program for the social, behavioral,
>> and
>> biomedical sciences. Behavior research methods, 39(2), 175-191.
>>
>> |Post Hoc Power Analyses
>> |In contrast to a priori power analyses, post hoc power
>> |analyses (Cohen, 1988) often make sense after a study
>> |has already been conducted. In post hoc analyses, 1 - beta
>> |is computed as a function of alpha, the population effect size
>> |parameter, and the sample size(s) used in a study. It thus
>> |becomes possible to assess whether or not a published
>> |statistical test in fact had a fair chance of rejecting an incorrect
>> |H0. Importantly, post hoc analyses, like a priori
>> |analyses, require an H1 effect size specification for the
>> |underlying population. Post hoc power analyses should
>> |not be confused with so-called retrospective power analyses,
>> |in which the effect size is estimated from sample
>> |data and used to calculate the observed power, a sample
>> |estimate of the true power. (Ref note 1) Retrospective power
>> |analyses are based on the highly questionable assumption that
>> |the sample effect size is essentially identical to the effect
>> |size in the population from which it was drawn (Zumbo &
>> |Hubley, 1998). Obviously, this assumption is likely to be
>> |false, and the more so the smaller the sample. In addition,
>> |sample effect sizes are typically biased estimates of their
>> |population counterparts (Richardson, 1996). For these
>> |reasons, we agree with other critics of retrospective power
>> |analyses (e.g., Gerard, Smith, & Weerakkody, 1998; Hoenig
>> |& Heisey, 2001; Kromrey & Hogarty, 2000; ****Lenth,
>> |2001****; Steidl, Hayes, & Schauber, 1997). Rather than use
>> |retrospective power analyses, researchers should specify
>> |population effect sizes on a priori grounds. To specify the
>> |effect size simply means to define the minimum degree
>> |of violation of H0 a researcher would like to detect with
>> |a probability not less than 1 - beta. Cohen's definitions of
>> |small, medium, and large effects can be helpful in such
>> |effect size specifications (see, e.g., Smith & Bayen, 2005).
>>
>> The article can be obtained here:
>> www.its.fsu.edu/index.php/content/download/51987/428157/file/Faul2007.pdf
>>
>> Given the above, I'll leave it to you to decide whether we're
>> all SOL. :-)
>>
>>> Another general approach to power analysis is to simulate a population
>>> with
>>> characteristics that correspond to the "minimally important difference"
>>> (or
>>> other measure of effect size) that one wishes to have power to detect.
>>> (The
>>> simulated population would also have the estimated variances,
>>> correlations
>>> etc, depending on the nature of the problem.  Then sample repeatedly from
>>> that population (using the same sample size each time), run the proposed
>>> analysis and save the p-value.  Power = the proportion of p-values <
>>> alpha.
>>> Note that this approach also requires estimates of the population
>>> parameters, not standardized effect size measures.
>>
>> I don't have a problem with this but I'm not the one making the rules. ;-)
>>
>>> I hope I've not fanned the flames excessively.  ;-)
>>
>> Please, it's hot enough in NYC. ;-)
>>
>>> Cheers!
>>> Bruce
>>
>> -Mike Palij
>> New York University
>
>> mp26@
>
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>
>> LISTSERV@.UGA
>
>>  (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>
>
>
>
>
>-----
>--
>Bruce Weaver
>[hidden email]
>http://sites.google.com/a/lakeheadu.ca/bweaver/
>
>"When all else fails, RTFM."
>
>NOTE: My Hotmail account is not monitored regularly.
>To send me an e-mail, please use the address shown above.
>
>--
>View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Minimum-N-to-get-significance-paired-sample-t-test-tp5721204p5721216.html
>Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
>=====================
>To manage your subscription to SPSSX-L, send a message to
>[hidden email] (not to SPSSX-L), with no body text except the
>command. To leave the list, send the command
>SIGNOFF SPSSX-L
>For a list of commands to manage subscriptions, send the command
>INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD