Hello all,
I was wondering if there is a formula (either hand calculated or via spss syntax) to calculate how many correlations in a correlatiom matrix are likely to be a result of type I error? For example, if I have 35 variables in a correlation matrix, how do I calculate how many of these are due to type I errors using p<.05 or p<.01.? I'm sure the calculations are quite simple. Thanks. Any help much appreciated. _________________________________________________________________ Find Love This New Year With match.com! http://msnuk.match.com |
At 07:02 AM 1/16/2007, Stats Q wrote:
>I was wondering if there is a formula (either hand calculated or via >spss syntax) to calculate how many correlations in a correlation >matrix are likely to be a result of type I error? For example, if I >have 35 variables in a correlation matrix, how do I calculate how many >of these are due to type I errors using p<.05 or p<.01.? I'm sure the >calculations are quite simple. They are fairly simple. First, the number of correlations: for 35 variables, it's 35*(35-1)/2=595. For p=.05, the expected number of significant results by chance (Type I error) is 595*.05=30. Use this figure to estimate the number of expected Type I errors for the p<.05 criterion. For p=.01, the expected number by chance is 595*.01=6; use this as the estimate for p<.01. |
Hi Richard,
I had a feeling it would be quite simple. Thanks very much for that!!! Best wishes. >From: Richard Ristow <[hidden email]> >To: Stats Q <[hidden email]>,[hidden email] >Subject: Re: Type 1 error >Date: Tue, 16 Jan 2007 21:55:16 -0500 > >At 07:02 AM 1/16/2007, Stats Q wrote: > >>I was wondering if there is a formula (either hand calculated or via spss >>syntax) to calculate how many correlations in a correlation matrix are >>likely to be a result of type I error? For example, if I have 35 variables >>in a correlation matrix, how do I calculate how many of these are due to >>type I errors using p<.05 or p<.01.? I'm sure the calculations are quite >>simple. > >They are fairly simple. > >First, the number of correlations: for 35 variables, it's 35*(35-1)/2=595. > >For p=.05, the expected number of significant results by chance (Type I >error) is 595*.05=30. Use this figure to estimate the number of expected >Type I errors for the p<.05 criterion. > >For p=.01, the expected number by chance is 595*.01=6; use this as the >estimate for p<.01. > _________________________________________________________________ Get Hotmail, News, Sport and Entertainment from MSN on your mobile. http://www.msn.txt4content.com/ |
In reply to this post by Richard Ristow
At 05:43 AM 1/17/2007, Stats Q asked:
>I take it that if I want to work out the number of type I errors in a >set of say 24 t-tests I would simply 24 by .05 = 1, That's the calculation. But, that isn't the *actual* number of type I errors, which can't be known. It's the mathematically *expected* number of type I errors, if the null hypothesis is correct (there is no actual effect present) and model assumptions for t-test are satisfied. If the distinction between 'actual' and 'expected' number isn't clear, review the concept of 'expected value' in probability theory. You'll need that much understanding to make sense of statistics at all. >but if I was running a hierarchical multiple regression, would it >simply be the number of predictors in the model multiplied by the sig >level e.g., 18 predictors x ..05 = 1, to work out the type I error >rate? It's not the number of predictors as such; it's the number of significance tests. Yes, if you have 18 t-tests, that's the arithmetic. HOWEVER, in a complex single model like a hierarchical multiple regression, that's not the way to reason. There are tests for the *model* having a significant relationship to the dependent variable, and you should be looking at those. You may be getting yourself into trouble. The meaning of a hierarchical multiple regression is subtle; indeed, so is the meaning of an ordinary multiple regression. Perhaps you understand what they mean; but most people who do, wouldn't have to ask the questions you've asked. Best, you should find someone who can help you and instruct you in statistics, a lot more than we can on this list. Second-best, find books or Web tutorials (I can't recommend a set, but maybe others here can) and learn carefully, starting with simple analyses. For a first exercise, before going farther: Describe how to run an independent-samples t-test, including: What is the null hypothesis? (And, what does the term 'null hypothesis' mean?) Supposing the test is not statistically significant, what can you say regarding the null and alternative hypotheses? What if it is statistically significant? I'm not offering to go farther, helping with this; I'm not offering to be a statistical tutor for you. Probably, nobody else on the list will, either. But you need to be able to answer at least what I've asked, accurately and with good understanding, before going any farther. I can't recommend you try to use hierarchical multiple regression, or even simple linear regression, until you can answer those, and then go step by step to understanding how the same concepts apply in more complex models. >Finally, whilst i'm on the subject, is there are a formula for >calculating how many type II errors have occurred? Again, I know this >is something simple and i'm sure i've come across this before Yes, it's simple: there's no such formula, and never will be. Estimating the likelihood of Type II errors - failing to find an effect, when one is actually present - is called statistical power analysis. It requires at least one additional assumption: the size of the effect looked for. Here, again, to understand the concepts, you should start very simple. The independent-samples t-test is a good place to start here, too: learn to do a power analysis for such a test, and what the result means, before going farther. And to understand why "there's no such formula, and never will be" in the absence of an estimate of effect size. -With best wishes for your growth in statistics, Richard Ristow |
Dear Richard,
You've provided a lot of useful information in your e-mail. I am already familiar with many of the concepts you outline and have run various statistical tests. On the other hand, there are many things I need to brush up on (mainly the good old basics of hypothesis testing!) as I haven't looked at them in a while. You've given me some good advice and some starting points. I did think that hierarchical multiple regression would be more complex and would most likely involve knowing other statistics e.g., effect size and sample size. Thank you very much for your useful guide Richard. Best wishes. >From: Richard Ristow <[hidden email]> >To: "Stats Q" <[hidden email]>, SPSS discussion list ><[hidden email]> >Subject: Re: Type 1 error >Date: Wed, 17 Jan 2007 12:05:49 -0500 > >At 05:43 AM 1/17/2007, Stats Q asked: > >>I take it that if I want to work out the number of type I errors in a set >>of say 24 t-tests I would simply 24 by .05 = 1, > >That's the calculation. But, that isn't the *actual* number of type I >errors, which can't be known. It's the mathematically *expected* number of >type I errors, if the null hypothesis is correct (there is no actual effect >present) and model assumptions for t-test are satisfied. > >If the distinction between 'actual' and 'expected' number isn't clear, >review the concept of 'expected value' in probability theory. You'll need >that much understanding to make sense of statistics at all. > >>but if I was running a hierarchical multiple regression, would it simply >>be the number of predictors in the model multiplied by the sig level e.g., >>18 predictors x ..05 = 1, to work out the type I error rate? > >It's not the number of predictors as such; it's the number of significance >tests. Yes, if you have 18 t-tests, that's the arithmetic. > >HOWEVER, in a complex single model like a hierarchical multiple regression, >that's not the way to reason. There are tests for the *model* having a >significant relationship to the dependent variable, and you should be >looking at those. > >You may be getting yourself into trouble. The meaning of a hierarchical >multiple regression is subtle; indeed, so is the meaning of an ordinary >multiple regression. Perhaps you understand what they mean; but most people >who do, wouldn't have to ask the questions you've asked. > >Best, you should find someone who can help you and instruct you in >statistics, a lot more than we can on this list. Second-best, find books or >Web tutorials (I can't recommend a set, but maybe others here can) and >learn carefully, starting with simple analyses. > >For a first exercise, before going farther: Describe how to run an >independent-samples t-test, including: What is the null hypothesis? (And, >what does the term 'null hypothesis' mean?) Supposing the test is not >statistically significant, what can you say regarding the null and >alternative hypotheses? What if it is statistically significant? > >I'm not offering to go farther, helping with this; I'm not offering to be a >statistical tutor for you. Probably, nobody else on the list will, either. >But you need to be able to answer at least what I've asked, accurately and >with good understanding, before going any farther. I can't recommend you >try to use hierarchical multiple regression, or even simple linear >regression, until you can answer those, and then go step by step to >understanding how the same concepts apply in more complex models. > >>Finally, whilst i'm on the subject, is there are a formula for calculating >>how many type II errors have occurred? Again, I know this is something >>simple and i'm sure i've come across this before > >Yes, it's simple: there's no such formula, and never will be. > >Estimating the likelihood of Type II errors - failing to find an effect, >when one is actually present - is called statistical power analysis. It >requires at least one additional assumption: the size of the effect looked >for. > >Here, again, to understand the concepts, you should start very simple. The >independent-samples t-test is a good place to start here, too: learn to do >a power analysis for such a test, and what the result means, before going >farther. And to understand why "there's no such formula, and never will be" >in the absence of an estimate of effect size. > >-With best wishes for your growth in statistics, > Richard Ristow > _________________________________________________________________ Get Hotmail, News, Sport and Entertainment from MSN on your mobile. http://www.msn.txt4content.com/ |
In reply to this post by Richard Ristow
There is a complication. The computation below assumes that each
correlation is independent of the others. In a correlation matrix that assumption is not correct. I believe that the computation would have to take this into account. David Greenberg, Sociology Department, New York University ----- Original Message ----- From: Richard Ristow <[hidden email]> Date: Tuesday, January 16, 2007 9:55 pm Subject: Re: Type 1 error > At 07:02 AM 1/16/2007, Stats Q wrote: > > >I was wondering if there is a formula (either hand calculated or via > >spss syntax) to calculate how many correlations in a correlation > >matrix are likely to be a result of type I error? For example, if I > >have 35 variables in a correlation matrix, how do I calculate how > many>of these are due to type I errors using p<.05 or p<.01.? I'm > sure the > >calculations are quite simple. > > They are fairly simple. > > First, the number of correlations: for 35 variables, it's > 35*(35-1)/2=595. > > For p=.05, the expected number of significant results by chance > (Type I > error) is 595*.05=30. Use this figure to estimate the number of > expected Type I errors for the p<.05 criterion. > > For p=.01, the expected number by chance is 595*.01=6; use this as the > estimate for p<.01. > |
David raises an important point here. The standard Bonferroni correction
(dividing alpha by the number of statistical tests that are performed) is very conservative when the tests are correlated. This applies to testing correlations in a matrix, as well as making group comparisons on multiple correlated variables. I have seen some journals accept papers that use the Bonferroni correction to control the overall type-I error rate because they prefer to err on the side of caution in rejecting the null hypothesis, at the expense of reducing statistical power. The Sidak correction is somewhat less conservative, but also overcorrects alpha when the tests are conducted on inter-correlated variables. HTH, Stephen Brand For personalized and professional consultation in statistics and research design, visit www.statisticsdoc.com -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]]On Behalf Of David Greenberg Sent: Wednesday, January 17, 2007 4:52 PM To: [hidden email] Subject: Re: Type 1 error There is a complication. The computation below assumes that each correlation is independent of the others. In a correlation matrix that assumption is not correct. I believe that the computation would have to take this into account. David Greenberg, Sociology Department, New York University ----- Original Message ----- From: Richard Ristow <[hidden email]> Date: Tuesday, January 16, 2007 9:55 pm Subject: Re: Type 1 error > At 07:02 AM 1/16/2007, Stats Q wrote: > > >I was wondering if there is a formula (either hand calculated or via > >spss syntax) to calculate how many correlations in a correlation > >matrix are likely to be a result of type I error? For example, if I > >have 35 variables in a correlation matrix, how do I calculate how > many>of these are due to type I errors using p<.05 or p<.01.? I'm > sure the > >calculations are quite simple. > > They are fairly simple. > > First, the number of correlations: for 35 variables, it's > 35*(35-1)/2=595. > > For p=.05, the expected number of significant results by chance > (Type I > error) is 595*.05=30. Use this figure to estimate the number of > expected Type I errors for the p<.05 criterion. > > For p=.01, the expected number by chance is 595*.01=6; use this as the > estimate for p<.01. > |
In reply to this post by David Greenberg
At 04:52 PM 1/17/2007, David Greenberg wrote (under subject head "Re:
Type 1 error"): >There is a complication [with the following]. > >>For p=.05, the expected number of [correlations significant at the >>.05 level by chance] (Type I error) is [the number of >>correlations]*.05=30. p<.05 criterion. > >[This computation] assumes that each correlation is independent of the >others. In a correlation matrix that assumption is not correct. Thank you; I need educating, on this one. The computation itself is OK. It relies on the expectation of the sum being the sum of the expectations, and that's true regardless of dependence of the summands. But you're quite right. I'd blithely assumed the correlation coefficients were independent; and they can't be. Correlation matrices are positive semi-definite; that's exact both for exact correlation matrices in the population, and correlation matrices estimated from data (assuming no missing data). My linear algebra is very rusty. I can't rattle off the proof of that, and I've no idea how large is the subspace of positive semi-definite matrices within the space of symmetric matrices all of whose elements have the same sign. (The latter, I think, being the subspace of matrices that would be spanned by the correlation matrices, if the correlations were independent.) But yes, you're right. Thank you, Richard |
Free forum by Nabble | Edit this page |