Type 1 error

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Type 1 error

Stats Q
Hello all,

I was wondering if there is a formula (either hand calculated or via spss
syntax) to calculate how many correlations in a correlatiom matrix are
likely to be a result of type I error? For example, if I have 35 variables
in a correlation matrix, how do I calculate how many of these are due to
type I errors using p<.05 or p<.01.? I'm sure the calculations are quite
simple.

Thanks. Any help much appreciated.

_________________________________________________________________
Find Love This New Year With match.com! http://msnuk.match.com
Reply | Threaded
Open this post in threaded view
|

Re: Type 1 error

Richard Ristow
At 07:02 AM 1/16/2007, Stats Q wrote:

>I was wondering if there is a formula (either hand calculated or via
>spss syntax) to calculate how many correlations in a correlation
>matrix are likely to be a result of type I error? For example, if I
>have 35 variables in a correlation matrix, how do I calculate how many
>of these are due to type I errors using p<.05 or p<.01.? I'm sure the
>calculations are quite simple.

They are fairly simple.

First, the number of correlations: for 35 variables, it's
35*(35-1)/2=595.

For p=.05, the expected number of significant results by chance (Type I
error) is 595*.05=30. Use this figure to estimate the number of
expected Type I errors for the p<.05 criterion.

For p=.01, the expected number by chance is 595*.01=6; use this as the
estimate for p<.01.
Reply | Threaded
Open this post in threaded view
|

Re: Type 1 error

Stats Q
Hi Richard,

I had a feeling it would be quite simple. Thanks very much for that!!!
Best wishes.



>From: Richard Ristow <[hidden email]>
>To: Stats Q <[hidden email]>,[hidden email]
>Subject: Re: Type 1 error
>Date: Tue, 16 Jan 2007 21:55:16 -0500
>
>At 07:02 AM 1/16/2007, Stats Q wrote:
>
>>I was wondering if there is a formula (either hand calculated or via spss
>>syntax) to calculate how many correlations in a correlation matrix are
>>likely to be a result of type I error? For example, if I have 35 variables
>>in a correlation matrix, how do I calculate how many of these are due to
>>type I errors using p<.05 or p<.01.? I'm sure the calculations are quite
>>simple.
>
>They are fairly simple.
>
>First, the number of correlations: for 35 variables, it's 35*(35-1)/2=595.
>
>For p=.05, the expected number of significant results by chance (Type I
>error) is 595*.05=30. Use this figure to estimate the number of expected
>Type I errors for the p<.05 criterion.
>
>For p=.01, the expected number by chance is 595*.01=6; use this as the
>estimate for p<.01.
>

_________________________________________________________________
Get Hotmail, News, Sport and Entertainment from MSN on your mobile.
http://www.msn.txt4content.com/
Reply | Threaded
Open this post in threaded view
|

Re: Type 1 error

Richard Ristow
In reply to this post by Richard Ristow
At 05:43 AM 1/17/2007, Stats Q asked:

>I take it that if I want to work out the number of type I errors in a
>set of say 24 t-tests I would simply 24 by .05 = 1,

That's the calculation. But, that isn't the *actual* number of type I
errors, which can't be known. It's the mathematically *expected* number
of type I errors, if the null hypothesis is correct (there is no actual
effect present) and model assumptions for t-test are satisfied.

If the distinction between 'actual' and 'expected' number isn't clear,
review the concept of 'expected value' in probability theory. You'll
need that much understanding to make sense of statistics at all.

>but if I was running a hierarchical multiple regression, would it
>simply be the number of predictors in the model multiplied by the sig
>level e.g., 18 predictors x ..05 = 1, to work out the type I error
>rate?

It's not the number of predictors as such; it's the number of
significance tests. Yes, if you have 18 t-tests, that's the arithmetic.

HOWEVER, in a complex single model like a hierarchical multiple
regression, that's not the way to reason. There are tests for the
*model* having a significant relationship to the dependent variable,
and you should be looking at those.

You may be getting yourself into trouble. The meaning of a hierarchical
multiple regression is subtle; indeed, so is the meaning of an ordinary
multiple regression. Perhaps you understand what they mean; but most
people who do, wouldn't have to ask the questions you've asked.

Best, you should find someone who can help you and instruct you in
statistics, a lot more than we can on this list. Second-best, find
books or Web tutorials (I can't recommend a set, but maybe others here
can) and learn carefully, starting with simple analyses.

For a first exercise, before going farther: Describe how to run an
independent-samples t-test, including: What is the null hypothesis?
(And, what does the term 'null hypothesis' mean?) Supposing the test is
not statistically significant, what can you say regarding the null and
alternative hypotheses? What if it is statistically significant?

I'm not offering to go farther, helping with this; I'm not offering to
be a statistical tutor for you. Probably, nobody else on the list will,
either. But you need to be able to answer at least what I've asked,
accurately and with good understanding, before going any farther. I
can't recommend you try to use hierarchical multiple regression, or
even simple linear regression, until you can answer those, and then go
step by step to understanding how the same concepts apply in more
complex models.

>Finally, whilst i'm on the subject, is there are a formula for
>calculating how many type II errors have occurred? Again, I know this
>is something simple and i'm sure i've come across this before

Yes, it's simple: there's no such formula, and never will be.

Estimating the likelihood of Type II errors - failing to find an
effect, when one is actually present - is called statistical power
analysis. It requires at least one additional assumption: the size of
the effect looked for.

Here, again, to understand the concepts, you should start very simple.
The independent-samples t-test is a good place to start here, too:
learn to do a power analysis for such a test, and what the result
means, before going farther. And to understand why "there's no such
formula, and never will be" in the absence of an estimate of effect
size.

-With best wishes for your growth in statistics,
  Richard Ristow
Reply | Threaded
Open this post in threaded view
|

Re: Type 1 error

Stats Q
Dear Richard,

You've provided a lot of useful information in your e-mail. I am already
familiar with many of the concepts you outline and have run various
statistical tests.  On the other hand, there are many things I need to brush
up on (mainly the good old  basics of hypothesis testing!) as I haven't
looked at them in a while. You've given me some good advice and some
starting points. I did think that hierarchical multiple regression would be
more complex and would most likely involve knowing other statistics e.g.,
effect size and sample size.

Thank you very much for your useful guide Richard.
Best wishes.


>From: Richard Ristow <[hidden email]>
>To: "Stats Q" <[hidden email]>, SPSS discussion list
><[hidden email]>
>Subject: Re: Type 1 error
>Date: Wed, 17 Jan 2007 12:05:49 -0500
>
>At 05:43 AM 1/17/2007, Stats Q asked:
>
>>I take it that if I want to work out the number of type I errors in a set
>>of say 24 t-tests I would simply 24 by .05 = 1,
>
>That's the calculation. But, that isn't the *actual* number of type I
>errors, which can't be known. It's the mathematically *expected* number of
>type I errors, if the null hypothesis is correct (there is no actual effect
>present) and model assumptions for t-test are satisfied.
>
>If the distinction between 'actual' and 'expected' number isn't clear,
>review the concept of 'expected value' in probability theory. You'll need
>that much understanding to make sense of statistics at all.
>
>>but if I was running a hierarchical multiple regression, would it simply
>>be the number of predictors in the model multiplied by the sig level e.g.,
>>18 predictors x ..05 = 1, to work out the type I error rate?
>
>It's not the number of predictors as such; it's the number of significance
>tests. Yes, if you have 18 t-tests, that's the arithmetic.
>
>HOWEVER, in a complex single model like a hierarchical multiple regression,
>that's not the way to reason. There are tests for the *model* having a
>significant relationship to the dependent variable, and you should be
>looking at those.
>
>You may be getting yourself into trouble. The meaning of a hierarchical
>multiple regression is subtle; indeed, so is the meaning of an ordinary
>multiple regression. Perhaps you understand what they mean; but most people
>who do, wouldn't have to ask the questions you've asked.
>
>Best, you should find someone who can help you and instruct you in
>statistics, a lot more than we can on this list. Second-best, find books or
>Web tutorials (I can't recommend a set, but maybe others here can) and
>learn carefully, starting with simple analyses.
>
>For a first exercise, before going farther: Describe how to run an
>independent-samples t-test, including: What is the null hypothesis? (And,
>what does the term 'null hypothesis' mean?) Supposing the test is not
>statistically significant, what can you say regarding the null and
>alternative hypotheses? What if it is statistically significant?
>
>I'm not offering to go farther, helping with this; I'm not offering to be a
>statistical tutor for you. Probably, nobody else on the list will, either.
>But you need to be able to answer at least what I've asked, accurately and
>with good understanding, before going any farther. I can't recommend you
>try to use hierarchical multiple regression, or even simple linear
>regression, until you can answer those, and then go step by step to
>understanding how the same concepts apply in more complex models.
>
>>Finally, whilst i'm on the subject, is there are a formula for calculating
>>how many type II errors have occurred? Again, I know this is something
>>simple and i'm sure i've come across this before
>
>Yes, it's simple: there's no such formula, and never will be.
>
>Estimating the likelihood of Type II errors - failing to find an effect,
>when one is actually present - is called statistical power analysis. It
>requires at least one additional assumption: the size of the effect looked
>for.
>
>Here, again, to understand the concepts, you should start very simple. The
>independent-samples t-test is a good place to start here, too: learn to do
>a power analysis for such a test, and what the result means, before going
>farther. And to understand why "there's no such formula, and never will be"
>in the absence of an estimate of effect size.
>
>-With best wishes for your growth in statistics,
>  Richard Ristow
>

_________________________________________________________________
Get Hotmail, News, Sport and Entertainment from MSN on your mobile.
http://www.msn.txt4content.com/
Reply | Threaded
Open this post in threaded view
|

Re: Type 1 error

David Greenberg
In reply to this post by Richard Ristow
There is a complication. The computation below assumes that each
correlation is independent of the others. In a correlation matrix that
assumption is not correct. I believe that the computation would have to
take this into account. David Greenberg, Sociology Department, New York
University

----- Original Message -----
From: Richard Ristow <[hidden email]>
Date: Tuesday, January 16, 2007 9:55 pm
Subject: Re: Type 1 error

> At 07:02 AM 1/16/2007, Stats Q wrote:
>
> >I was wondering if there is a formula (either hand calculated or via
> >spss syntax) to calculate how many correlations in a correlation
> >matrix are likely to be a result of type I error? For example, if I
> >have 35 variables in a correlation matrix, how do I calculate how
> many>of these are due to type I errors using p<.05 or p<.01.? I'm
> sure the
> >calculations are quite simple.
>
> They are fairly simple.
>
> First, the number of correlations: for 35 variables, it's
> 35*(35-1)/2=595.
>
> For p=.05, the expected number of significant results by chance
> (Type I
> error) is 595*.05=30. Use this figure to estimate the number of
> expected Type I errors for the p<.05 criterion.
>
> For p=.01, the expected number by chance is 595*.01=6; use this as the
> estimate for p<.01.
>
Reply | Threaded
Open this post in threaded view
|

Re: Type 1 error

statisticsdoc
David raises an important point here.  The standard Bonferroni correction
(dividing alpha by the number of statistical tests that are performed) is
very conservative when the tests are correlated.  This applies to testing
correlations in a matrix, as well as making group comparisons on multiple
correlated variables.  I have seen some journals accept papers that use the
Bonferroni correction to control the overall type-I error rate because they
prefer to err on the side of caution in rejecting the null hypothesis, at
the expense of reducing statistical power.  The Sidak correction is somewhat
less conservative, but also overcorrects alpha when the tests are conducted
on inter-correlated variables.

HTH,

Stephen Brand


For personalized and professional consultation in statistics and research
design, visit
www.statisticsdoc.com


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of
David Greenberg
Sent: Wednesday, January 17, 2007 4:52 PM
To: [hidden email]
Subject: Re: Type 1 error


There is a complication. The computation below assumes that each
correlation is independent of the others. In a correlation matrix that
assumption is not correct. I believe that the computation would have to
take this into account. David Greenberg, Sociology Department, New York
University

----- Original Message -----
From: Richard Ristow <[hidden email]>
Date: Tuesday, January 16, 2007 9:55 pm
Subject: Re: Type 1 error

> At 07:02 AM 1/16/2007, Stats Q wrote:
>
> >I was wondering if there is a formula (either hand calculated or via
> >spss syntax) to calculate how many correlations in a correlation
> >matrix are likely to be a result of type I error? For example, if I
> >have 35 variables in a correlation matrix, how do I calculate how
> many>of these are due to type I errors using p<.05 or p<.01.? I'm
> sure the
> >calculations are quite simple.
>
> They are fairly simple.
>
> First, the number of correlations: for 35 variables, it's
> 35*(35-1)/2=595.
>
> For p=.05, the expected number of significant results by chance
> (Type I
> error) is 595*.05=30. Use this figure to estimate the number of
> expected Type I errors for the p<.05 criterion.
>
> For p=.01, the expected number by chance is 595*.01=6; use this as the
> estimate for p<.01.
>
Reply | Threaded
Open this post in threaded view
|

Re: Dependence of correlation coefficients

Richard Ristow
In reply to this post by David Greenberg
At 04:52 PM 1/17/2007, David Greenberg wrote (under subject head "Re:
Type 1 error"):

>There is a complication [with the following].
>
>>For p=.05, the expected number of [correlations significant at the
>>.05 level by chance] (Type I error) is [the number of
>>correlations]*.05=30. p<.05 criterion.
>
>[This computation] assumes that each correlation is independent of the
>others. In a correlation matrix that assumption is not correct.

Thank you; I need educating, on this one.

The computation itself is OK. It relies on the expectation of the sum
being the sum of the expectations, and that's true regardless of
dependence of the summands.

But you're quite right. I'd blithely assumed the correlation
coefficients were independent; and they can't be. Correlation matrices
are positive semi-definite; that's exact both for exact correlation
matrices in the population, and correlation matrices estimated from
data (assuming no missing data).

My linear algebra is very rusty. I can't rattle off the proof of that,
and I've no idea how large is the subspace of positive semi-definite
matrices within the space of symmetric matrices all of whose elements
have the same sign. (The latter, I think, being the subspace of
matrices that would be spanned by the correlation matrices, if the
correlations were independent.)

But yes, you're right.

Thank you,
Richard