|
I have a question that is similar to one that you answered on a forum (nabble link). Therefore I sincerely hope that someone can help me with my question too.
I need to do large bivariate Pearson correlation calculations (30000 variables, 79 cases, fully filled table). Do you think this is doable? I am a complete SPSS noob, so if you get back to me with a syntax code or file, could you please explain me in detail what I should change in this (filename, or row numbers etc.) before running?
Thanks already so much!
spobster
|
|
Why are you performing so many correlations, ie, what is the nature of the study?
Scott Millis --- On Wed, 6/24/09, spobster <[hidden email]> wrote: > From: spobster <[hidden email]> > Subject: SPSS Bivariate Correlations using Pearsons LIMITED? > To: [hidden email] > Date: Wednesday, June 24, 2009, 12:14 PM > > I have a question that is similar to one that you answered > on a forum (nabble > link). Therefore I sincerely hope that someone can help > me with my question too. > > I need to do large bivariate Pearson correlation > calculations (30000 variables, 79 cases, fully filled > table). Do you think this is doable? I am a complete SPSS > noob, so if you get back to me with a syntax code or file, > could you please explain me in detail what I should change > in this (filename, or row numbers etc.) before running? > > Thanks already so much! > > spobster > > > View this message in context: SPSS > Bivariate Correlations using Pearsons LIMITED? > > Sent from the SPSSX > Discussion mailing list archive at Nabble.com. > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by spobster
The CORRELATIONS command is limited to 500
variables. Why do you want to correlate 30,000
variables? What do you plan to do with the results? From: SPSSX(r)
Discussion [mailto:[hidden email]] On
Behalf Of spobster I have a question that is similar to one that you answered on a forum (nabble link). Therefore I sincerely hope that someone can
help me with my question too. I need to do large bivariate Pearson correlation
calculations (30000 variables, 79 cases, fully filled table). Do you think this
is doable? I am a complete SPSS noob, so if you get back to me with a syntax
code or file, could you please explain me in detail what I should change in
this (filename, or row numbers etc.) before running? Thanks already so much!
spobster View this message in context: SPSS
Bivariate Correlations using Pearsons LIMITED? |
|
That’s almost 450 million correlations with an n of 79. Do you realize
how many spurious correlations you can get by chance? Almost 22.5 million. Not
to mention that with 79 cases, the number of spurious correlations will likely
be higher because of the small sample. Dr. Paul R. Swank, Professor and Director of Research Children's Learning Institute University of Texas Health Science Center-Houston From: SPSSX(r) Discussion
[mailto:[hidden email]] On Behalf Of Oliver, Richard The CORRELATIONS command is limited to 500 variables. Why do you want to correlate 30,000 variables? What do you plan to
do with the results? From: SPSSX(r) Discussion
[mailto:[hidden email]] On Behalf Of spobster I have a question that is similar to one that you answered
on a forum (nabble link). Therefore I sincerely hope that someone can
help me with my question too. I need to do large bivariate Pearson correlation
calculations (30000 variables, 79 cases, fully filled table). Do you think this
is doable? I am a complete SPSS noob, so if you get back to me with a syntax
code or file, could you please explain me in detail what I should change in
this (filename, or row numbers etc.) before running? Thanks already so much!
spobster View this message in context: SPSS
Bivariate Correlations using Pearsons LIMITED? |
|
In reply to this post by spobster
This is certainly an interesting software capacity
question.
Even more interesting is what you will do to control
the more than 22 million Type I errors that you can expect
from this design.
Art
------------------------------------------------------------------
I have a question that is similar to one that you answered on a forum
(nabble link). Therefore I sincerely hope that someone can help
me with my question too. I need to do large bivariate Pearson correlation
calculations (30000 variables, 79 cases, fully filled table). Do you think this
is doable? I am a complete SPSS noob, so if you get back to me with a syntax
code or file, could you please explain me in detail what I should change in this
(filename, or row numbers etc.) before running? Thanks already so much! spobster
Art
Burke Phone:
503-275-9592 / 800-547-6339 Fax:
503-275-0450
From: spobster [mailto:[hidden email]] Sent: Wednesday, June 24, 2009 9:15 AM To: [hidden email] Subject: SPSS Bivariate Correlations using Pearsons LIMITED? View this message in context: SPSS Bivariate Correlations using Pearsons LIMITED? Sent from the SPSSX Discussion mailing list archive at Nabble.com. |
|
In reply to this post by spobster
Wow!!
Please explain the nature of your study. This is a huge number of variables even in gene sequencing. Are there really 30,000 separate constructs, or are the "variables" some form of repeated measure such as time series, electromagnetic spectra, etc.? What is the meaning of a case in you study? What level of measurement are the variables? What questions are you using the data to answer? What would you do with a correlation matrix that size? With 79 cases the R matrix would be singular. Art Kendall Social Research Consultants spobster wrote: I have a question that is similar to one that you answered on a forum (nabble link). Therefore I sincerely hope that someone can help me with my question too. I need to do large bivariate Pearson correlation calculations (30000 variables, 79 cases, fully filled table). Do you think this is doable? I am a complete SPSS noob, so if you get back to me with a syntax code or file, could you please explain me in detail what I should change in this (filename, or row numbers etc.) before running? Thanks already so much! spobster===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
In reply to this post by SR Millis-3
As an answer to everybody: I will try to explain. The data are the result of micro-array analysis. Using 30000 different probes specific for the different genes on the human genome, expression of these genes in 79 different tissues was tested. All probes have an average value of intensity from duplicates for each of the 79 tissues.
I will try to expain what I want with the correlations: I want to make the correlations to find out which genes have a similar expression pattern throughout these 79 tissues. Of course, there are some genes of special interest to me, and it is most exciting to see which other genes belong to the group containing these genes. However, I realize that the groups might be not tightly defined, and therefore it is interesting to see also the correlation from borderline genes with each gene within such a group.
If the total correlation option is not possible, then I hope that some of you can come up with a nice option to classify the different genes according to their tissue expression.
If you have more questions concerning my approach, please don't hesitate to ask!
Thanks everybody!!
Spobster
|
|
How did you select the tissues? Are there subsets of the cases that have a known characteristic? Several people at the Classification Society meeting a couple of weeks ago dealt with this kind of problem. I suggest that you clarify your question and post it to http://lists.sunysb.edu/index.cgi?A0=CLASS-L It would greatly reduce the size of the analysis by doing correlations of those you are specifically interested WITH selected sets of the others. The curse of dimensionality will still be a major consideration. Art Kendall Social Research Consultants spobster wrote: As an answer to everybody: I will try to explain. The data are the result of micro-array analysis. Using 30000 different probes specific for the different genes on the human genome, expression of these genes in 79 different tissues was tested. All probes have an average value of intensity from duplicates for each of the 79 tissues. I will try to expain what I want with the correlations: I want to make the correlations to find out which genes have a similar expression pattern throughout these 79 tissues. Of course, there are some genes of special interest to me, and it is most exciting to see which other genes belong to the group containing these genes. However, I realize that the groups might be not tightly defined, and therefore it is interesting to see also the correlation from borderline genes with each gene within such a group. If the total correlation option is not possible, then I hope that some of you can come up with a nice option to classify the different genes according to their tissue expression. If you have more questions concerning my approach, please don't hesitate to ask! Thanks everybody!! Spobster===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
In reply to this post by Swank, Paul R
Swank, Paul R wrote:
> > That’s almost 450 million correlations with an n of 79. Do you realize > how many spurious correlations you can get by chance? Almost 22.5 > million. Not to mention that with 79 cases, the number of spurious > correlations will likely be higher because of the small sample. > Another point of view: Even if someone is able to scan and interpret one correlation coefficient per second, 449,985,000 seconds are 14.26 years, that's a lot of time to devote to any research (a lot of marriages don't last that much). What's the point of computing such a huge correlation matrix, if the results can't be analyzed in a reasonable time? Besides, I don't even think my poor computer could handle that (each computed correlation coefficient will take several bytes, I assume, plus the corresponding p-value... we should start reserving terabytes of drive space). Marta GG > > I have a question that is similar to one that you answered on a forum > (nabble link > <http://www.nabble.com/Bivariate-Correlation-Variable-Limit-td18004400.html>). > Therefore I sincerely hope that someone can help me with my question > too. I need to do large bivariate Pearson correlation calculations > (30000 variables, 79 cases, fully filled table). Do you think this is > doable? I am a complete SPSS noob, so if you get back to me with a > syntax code or file, could you please explain me in detail what I > should change in this (filename, or row numbers etc.) before running? > Thanks already so much! spobster > > ------------------------------------------------------------------------ -- For miscellaneous SPSS related statistical stuff, visit: http://gjyp.nl/marta/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Art Kendall
Thanks again for response of everybody. Because of the non-feasibility of my ideas, I have changed my approach, and therefore I have a new question.
First, I did not perform the micro-arrays myself, I have downloaded data from the internet (wombat.gnf.org). The RNA of the tissues from different individuals was pooled before the array. Each probe was hybridized only in duplo on the same pool and the average of this duplo is shown in the tables that I downloaded.
I want to use the data as addition for my own research in tumor immunology. Due to your advices, I have selected part of the dataset, still 898 variables. Since I still can't do the total bivariate correlation analysis with this number of variables (100 is the limit according to my version 15), I want to start by doing correlation analysis of one of those variables against the other 897. However, I cannot find in SPSS how to perform this.
Does anyone know how to perform this, this first analysis should not eat my total computer which does not have Terabytes of memory and harddisk
Thanks!!
|
|
spobster WROTE:
> Thanks again for response of everybody. Because of the non-feasibility > of my ideas, I have changed my approach, and therefore I have a new > question. First, I did not perform the micro-arrays myself, I have > downloaded data from the internet (wombat.gnf.org). The RNA of the > tissues from different individuals was pooled before the array. Each > probe was hybridized only in duplo on the same pool and the average of > this duplo is shown in the tables that I downloaded. I want to use the > data as addition for my own research in tumor immunology. Due to your > advices, I have selected part of the dataset, still 898 variables. > Since I still can't do the total bivariate correlation analysis with > this number of variables (100 is the limit according to my version > 15), I want to start by doing correlation analysis of one of those > variables against the other 897. However, I cannot find in SPSS how to > perform this. Does anyone know how to perform this, this first > analysis should not eat my total computer which does not have > Terabytes of memory and harddisk Thanks!! "swallowable" pieces. CORRELATIONS /VARIABLES=var1 WITH var2 TO var100 /PRINT=TWOTAIL NOSIG. CORRELATIONS /VARIABLES=var1 WITH var101 TO var199 /PRINT=TWOTAIL NOSIG. And so on, until you get to the last variable (var898). The keyword TO will allow you to name consecutive variables (first TO last) in bunches of 99 (plus the the one to the left of the WITH keyword, you get the 100 limit). HTH, Marta GG -- For miscellaneous SPSS related statistical stuff, visit: http://gjyp.nl/marta/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
