|
I often have to produce a table of Pearson correlations between a
suite of environmental impact variables (water chemistry, etc.) and a suite of biological monitoring variables (number of species, number of organisms, etc.). Typically, even with some data reduction (PCA or similar), this results in several hundred correlations, even when I reduce the matrix to only those correlations between the 2 suites of variables. I realize each correlation is independent, but potentially there are a worrisome number of falsely significant correlations flagged. As a screening tool to identify associations of interest, I have sometimes used a pseudo-Bonferroni correction to adjust the significance level according to the number of bivariate correlations in the table, and as well I usually spend quite a bit of time generating scatterplots of possible associations. However, I'm wondering if this approach to screening the correlations is defensible even in a pragmatic -- if not statistical -- sense, or whether there is a better way to consider a large number of possible associations between these variables? I'd appreciate any thoughts or suggestions. regards, Ian Ian D. Martin, Ph.D. Tsuji Laboratory University of Waterloo Dept. of Environment & Resource Studies ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
You should consider using the false discovery rate method of Benjamini & Hochberg (1995) or q-value approach developed by Story (2002).
Scott R Millis, PhD, ABPP (CN,CL,RP), CStat, CSci Professor & Director of Research Dept of Physical Medicine & Rehabilitation Dept of Emergency Medicine Wayne State University School of Medicine 261 Mack Blvd Detroit, MI 48201 Email: [hidden email] Tel: 313-993-8085 Fax: 313-966-7682 --- On Thu, 4/16/09, Ian Martin <[hidden email]> wrote: > From: Ian Martin <[hidden email]> > Subject: validity and screening of a large number of bivar correlations? > To: [hidden email] > Date: Thursday, April 16, 2009, 12:18 PM > I often have to produce a table of Pearson correlations > between a > suite of environmental impact variables (water chemistry, > etc.) and a > suite of biological monitoring variables (number of > species, number > of organisms, etc.). > > Typically, even with some data reduction (PCA or similar), > this > results in several hundred correlations, even when I reduce > the > matrix to only those correlations between the 2 suites of > variables. > I realize each correlation is independent, but potentially > there are > a worrisome number of falsely significant correlations > flagged. > > As a screening tool to identify associations of interest, I > have > sometimes used a pseudo-Bonferroni correction to adjust > the > significance level according to the number of bivariate > correlations > in the table, and as well I usually spend quite a bit of > time > generating scatterplots of possible associations. However, > I'm > wondering if this approach to screening the correlations is > defensible even in a pragmatic -- if not statistical -- > sense, or > whether there is a better way to consider a large number of > possible > associations between these variables? > > I'd appreciate any thoughts or suggestions. > > regards, > Ian > > Ian D. Martin, Ph.D. > > Tsuji Laboratory > University of Waterloo > Dept. of Environment & Resource Studies > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body > text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the > command > INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Scott, thanks very much for your suggestion. If possible, I'd
appreciate a little more detail on the references you mention, so that I can look them up at the library. regards, Ian On 16 Apr, 2009, at 2:51 PM, SR Millis wrote: > > You should consider using the false discovery rate method of > Benjamini & Hochberg (1995) or q-value approach developed by Story > (2002). > > > Scott R Millis, PhD, ABPP (CN,CL,RP), CStat, CSci > Professor & Director of Research > Dept of Physical Medicine & Rehabilitation > Dept of Emergency Medicine > Wayne State University School of Medicine > 261 Mack Blvd > Detroit, MI 48201 > Email: [hidden email] > Tel: 313-993-8085 > Fax: 313-966-7682 > > > --- On Thu, 4/16/09, Ian Martin <[hidden email]> wrote: > >> From: Ian Martin <[hidden email]> >> Subject: validity and screening of a large number of bivar >> correlations? >> To: [hidden email] >> Date: Thursday, April 16, 2009, 12:18 PM >> I often have to produce a table of Pearson correlations >> between a >> suite of environmental impact variables (water chemistry, >> etc.) and a >> suite of biological monitoring variables (number of >> species, number >> of organisms, etc.). >> >> Typically, even with some data reduction (PCA or similar), >> this >> results in several hundred correlations, even when I reduce >> the >> matrix to only those correlations between the 2 suites of >> variables. >> I realize each correlation is independent, but potentially >> there are >> a worrisome number of falsely significant correlations >> flagged. >> >> As a screening tool to identify associations of interest, I >> have >> sometimes used a pseudo-Bonferroni correction to adjust >> the >> significance level according to the number of bivariate >> correlations >> in the table, and as well I usually spend quite a bit of >> time >> generating scatterplots of possible associations. However, >> I'm >> wondering if this approach to screening the correlations is >> defensible even in a pragmatic -- if not statistical -- >> sense, or >> whether there is a better way to consider a large number of >> possible >> associations between these variables? >> >> I'd appreciate any thoughts or suggestions. >> >> regards, >> Ian >> >> Ian D. Martin, Ph.D. >> >> Tsuji Laboratory >> University of Waterloo >> Dept. of Environment & Resource Studies >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> [hidden email] (not to SPSSX-L), with no body >> text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the >> command >> INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Ian Martin-2
Ian,
Here are some references and resources: Benjamini, Y., Drai, D., Elmer, G., Kafkafi, N., & Golani, I. (2001). Controlling the false discovery rate in behavior genetics research. Behav Brain Res, 125(1-2), 279-284. Story, J. (2002). A direct approach to false discovery rates. Journal of the Royal Statistical Society, Series B, 64, 479-498. http://genomics.princeton.edu/storeylab/qvalue/ Scott R Millis, PhD, ABPP (CN,CL,RP), CStat, CSci Professor & Director of Research Dept of Physical Medicine & Rehabilitation Dept of Emergency Medicine Wayne State University School of Medicine 261 Mack Blvd Detroit, MI 48201 Email: [hidden email] Tel: 313-993-8085 Fax: 313-966-7682 --- On Thu, 4/16/09, Ian Martin <[hidden email]> wrote: > From: Ian Martin <[hidden email]> > Subject: Re: validity and screening of a large number of bivar correlations? > To: "SR Millis" <[hidden email]> > Cc: [hidden email] > Date: Thursday, April 16, 2009, 3:27 PM > Scott, thanks very much for your suggestion. If possible, > I'd appreciate a little more detail on the references > you mention, so that I can look them up at the library. > > regards, > Ian > > On 16 Apr, 2009, at 2:51 PM, SR Millis wrote: > > > > > You should consider using the false discovery rate > method of Benjamini & Hochberg (1995) or q-value > approach developed by Story (2002). > > > > > > Scott R Millis, PhD, ABPP (CN,CL,RP), CStat, CSci > > Professor & Director of Research > > Dept of Physical Medicine & Rehabilitation > > Dept of Emergency Medicine > > Wayne State University School of Medicine > > 261 Mack Blvd > > Detroit, MI 48201 > > Email: [hidden email] > > Tel: 313-993-8085 > > Fax: 313-966-7682 > > > > > > --- On Thu, 4/16/09, Ian Martin > <[hidden email]> wrote: > > > >> From: Ian Martin <[hidden email]> > >> Subject: validity and screening of a large number > of bivar correlations? > >> To: [hidden email] > >> Date: Thursday, April 16, 2009, 12:18 PM > >> I often have to produce a table of Pearson > correlations > >> between a > >> suite of environmental impact variables (water > chemistry, > >> etc.) and a > >> suite of biological monitoring variables (number > of > >> species, number > >> of organisms, etc.). > >> > >> Typically, even with some data reduction (PCA or > similar), > >> this > >> results in several hundred correlations, even when > I reduce > >> the > >> matrix to only those correlations between the 2 > suites of > >> variables. > >> I realize each correlation is independent, but > potentially > >> there are > >> a worrisome number of falsely significant > correlations > >> flagged. > >> > >> As a screening tool to identify associations of > interest, I > >> have > >> sometimes used a pseudo-Bonferroni correction to > adjust > >> the > >> significance level according to the number of > bivariate > >> correlations > >> in the table, and as well I usually spend quite a > bit of > >> time > >> generating scatterplots of possible associations. > However, > >> I'm > >> wondering if this approach to screening the > correlations is > >> defensible even in a pragmatic -- if not > statistical -- > >> sense, or > >> whether there is a better way to consider a large > number of > >> possible > >> associations between these variables? > >> > >> I'd appreciate any thoughts or suggestions. > >> > >> regards, > >> Ian > >> > >> Ian D. Martin, Ph.D. > >> > >> Tsuji Laboratory > >> University of Waterloo > >> Dept. of Environment & Resource Studies > >> > >> ===================== > >> To manage your subscription to SPSSX-L, send a > message to > >> [hidden email] (not to SPSSX-L), with > no body > >> text except the > >> command. To leave the list, send the command > >> SIGNOFF SPSSX-L > >> For a list of commands to manage subscriptions, > send the > >> command > >> INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Ian Martin-2
Ian Martin wrote:
> Scott, thanks very much for your suggestion. If possible, I'd > appreciate a little more detail on the references you mention, so > that I can look them up at the library. You can find a lot of references at the end of this message I sent to list in January: http://www.listserv.uga.edu/cgi-bin/wa?A2=ind0801&L=spssx-l&P=46191 (I still keep a copy of the program mentioned in the message). Regards, Marta GarcĂa-Granero >> You should consider using the false discovery rate method of >> Benjamini & Hochberg (1995) or q-value approach developed by Story >> (2002). >> >> >> --- On Thu, 4/16/09, Ian Martin <[hidden email]> wrote: >> >>> From: Ian Martin <[hidden email]> >>> Subject: validity and screening of a large number of bivar >>> correlations? >>> To: [hidden email] >>> Date: Thursday, April 16, 2009, 12:18 PM >>> I often have to produce a table of Pearson correlations >>> between a >>> suite of environmental impact variables (water chemistry, >>> etc.) and a >>> suite of biological monitoring variables (number of >>> species, number >>> of organisms, etc.). >>> >>> Typically, even with some data reduction (PCA or similar), >>> this >>> results in several hundred correlations, even when I reduce >>> the >>> matrix to only those correlations between the 2 suites of >>> variables. >>> I realize each correlation is independent, but potentially >>> there are >>> a worrisome number of falsely significant correlations >>> flagged. >>> >>> As a screening tool to identify associations of interest, I >>> have >>> sometimes used a pseudo-Bonferroni correction to adjust >>> the >>> significance level according to the number of bivariate >>> correlations >>> in the table, and as well I usually spend quite a bit of >>> time >>> generating scatterplots of possible associations. However, >>> I'm >>> wondering if this approach to screening the correlations is >>> defensible even in a pragmatic -- if not statistical -- >>> sense, or >>> whether there is a better way to consider a large number of >>> possible >>> associations between these variables? >>> > -- For miscellaneous statistical stuff, visit: http://gjyp.nl/marta/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Ruben van den Berg wrote:
> > Is it mal practice to use (oneway) ANOVA with a dichotomous dependent > variable? The sampling distributions of the conditional (within group) > means should follow normal distributions due to the central limit > theorems, or am I missing something? Of course you'd normally use a > chi2 independence test but there's no post hoc option (like Tukey's > HSD) in there. Try to find the thread concerning Marascuilo procedure (april 7, or so). A method to perform multiple pairwise comparisons for binary outcomes was presented (syntax provided), and compared with CTABLES procedure with Bonferroni adjustment. Although ANOVA is quite robust to departures from normality, binary outcomes tend to present heterogeneity of variances (since variance will be related to the proportion of cases in a group: Var(p)=p*(1-p)/n), and that's a worse problem than lack of normality: it precludes the use of Tukey's HSD method, for instance. I'd rather get these questions at the list, not privately, since more people might contribute to or benefit from the thread. I will therefore address the answer to the whole list. Nice weekend to you, too, marta > > TIA and have a nice weekend! > -- For miscellaneous statistical stuff, visit: http://gjyp.nl/marta/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Ian Martin-2
Have tried exploring your data with canonical correlations rather than PCA?
IIRC the OVERALS procedure will do an n sets canonical correlations. Macros ship with SPSS to do 2 set canonical correlations. Art Kendall Social Research Consultants. Ian Martin wrote: > I often have to produce a table of Pearson correlations between a > suite of environmental impact variables (water chemistry, etc.) and a > suite of biological monitoring variables (number of species, number > of organisms, etc.). > > Typically, even with some data reduction (PCA or similar), this > results in several hundred correlations, even when I reduce the > matrix to only those correlations between the 2 suites of variables. > I realize each correlation is independent, but potentially there are > a worrisome number of falsely significant correlations flagged. > > As a screening tool to identify associations of interest, I have > sometimes used a pseudo-Bonferroni correction to adjust the > significance level according to the number of bivariate correlations > in the table, and as well I usually spend quite a bit of time > generating scatterplots of possible associations. However, I'm > wondering if this approach to screening the correlations is > defensible even in a pragmatic -- if not statistical -- sense, or > whether there is a better way to consider a large number of possible > associations between these variables? > > I'd appreciate any thoughts or suggestions. > > regards, > Ian > > Ian D. Martin, Ph.D. > > Tsuji Laboratory > University of Waterloo > Dept. of Environment & Resource Studies > > ===================== > To manage your subscription to SPSSX-L, send a message to > [hidden email] (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
| Free forum by Nabble | Edit this page |
