SPSS Bivariate Correlations using Pearsons LIMITED?

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

SPSS Bivariate Correlations using Pearsons LIMITED?

spobster
I have a question that is similar to one that you answered on a forum (nabble link). Therefore I sincerely hope that someone can help me with my question too. I need to do large bivariate Pearson correlation calculations (30000 variables, 79 cases, fully filled table). Do you think this is doable? I am a complete SPSS noob, so if you get back to me with a syntax code or file, could you please explain me in detail what I should change in this (filename, or row numbers etc.) before running? Thanks already so much! spobster
Reply | Threaded
Open this post in threaded view
|

Re: SPSS Bivariate Correlations using Pearsons LIMITED?

SR Millis-3
Why are you performing so many correlations, ie, what is the nature of the study?

Scott Millis



--- On Wed, 6/24/09, spobster <[hidden email]> wrote:

> From: spobster <[hidden email]>
> Subject: SPSS Bivariate Correlations using Pearsons LIMITED?
> To: [hidden email]
> Date: Wednesday, June 24, 2009, 12:14 PM
>
> I have a question that is similar to one that you answered
> on a forum (nabble
> link). Therefore I sincerely hope that someone can help
> me with my question too.
>
> I need to do large bivariate Pearson correlation
> calculations (30000 variables, 79 cases, fully filled
> table). Do you think this is doable? I am a complete SPSS
> noob, so if you get back to me with a syntax code or file,
> could you please explain me in detail what I should change
> in this (filename, or row numbers etc.) before running?
>
> Thanks already so much!
>
> spobster
>
>
> View this message in context: SPSS
> Bivariate Correlations using Pearsons LIMITED?
>
> Sent from the SPSSX
> Discussion mailing list archive at Nabble.com.
>
>

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: SPSS Bivariate Correlations using Pearsons LIMITED?

Oliver, Richard
In reply to this post by spobster

The CORRELATIONS command is limited to 500 variables.

 

Why do you want to correlate 30,000 variables? What do you plan to do with the results?

 


From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of spobster
Sent: Wednesday, June 24, 2009 11:15 AM
To: [hidden email]
Subject: SPSS Bivariate Correlations using Pearsons LIMITED?

 

I have a question that is similar to one that you answered on a forum (nabble link). Therefore I sincerely hope that someone can help me with my question too. I need to do large bivariate Pearson correlation calculations (30000 variables, 79 cases, fully filled table). Do you think this is doable? I am a complete SPSS noob, so if you get back to me with a syntax code or file, could you please explain me in detail what I should change in this (filename, or row numbers etc.) before running? Thanks already so much! spobster


View this message in context: SPSS Bivariate Correlations using Pearsons LIMITED?
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: SPSS Bivariate Correlations using Pearsons LIMITED?

Swank, Paul R

That’s almost 450 million correlations with an n of 79. Do you realize how many spurious correlations you can get by chance? Almost 22.5 million. Not to mention that with 79 cases, the number of spurious correlations will likely be higher because of the small sample.

 

Dr. Paul R. Swank,

Professor and Director of Research

Children's Learning Institute

University of Texas Health Science Center-Houston

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Oliver, Richard
Sent: Wednesday, June 24, 2009 12:22 PM
To: [hidden email]
Subject: Re: SPSS Bivariate Correlations using Pearsons LIMITED?

 

The CORRELATIONS command is limited to 500 variables.

 

Why do you want to correlate 30,000 variables? What do you plan to do with the results?

 


From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of spobster
Sent: Wednesday, June 24, 2009 11:15 AM
To: [hidden email]
Subject: SPSS Bivariate Correlations using Pearsons LIMITED?

 

I have a question that is similar to one that you answered on a forum (nabble link). Therefore I sincerely hope that someone can help me with my question too. I need to do large bivariate Pearson correlation calculations (30000 variables, 79 cases, fully filled table). Do you think this is doable? I am a complete SPSS noob, so if you get back to me with a syntax code or file, could you please explain me in detail what I should change in this (filename, or row numbers etc.) before running? Thanks already so much! spobster


View this message in context: SPSS Bivariate Correlations using Pearsons LIMITED?
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

Reply | Threaded
Open this post in threaded view
|

Re: SPSS Bivariate Correlations using Pearsons LIMITED?

Arthur Burke
In reply to this post by spobster
This is certainly an interesting software capacity question. 
 
Even more interesting is what you will do to control the more than 22 million Type I errors that you can expect from this design.  
 
Art
------------------------------------------------------------------

Art Burke
Northwest Regional Educational Laboratory
101 SW Main St, Suite 500
Portland, OR 97204-3213

Phone:  503-275-9592 / 800-547-6339

Fax: 503-275-0450

[hidden email] 


From: spobster [mailto:[hidden email]]
Sent: Wednesday, June 24, 2009 9:15 AM
To: [hidden email]
Subject: SPSS Bivariate Correlations using Pearsons LIMITED?

I have a question that is similar to one that you answered on a forum (nabble link). Therefore I sincerely hope that someone can help me with my question too. I need to do large bivariate Pearson correlation calculations (30000 variables, 79 cases, fully filled table). Do you think this is doable? I am a complete SPSS noob, so if you get back to me with a syntax code or file, could you please explain me in detail what I should change in this (filename, or row numbers etc.) before running? Thanks already so much! spobster

View this message in context: SPSS Bivariate Correlations using Pearsons LIMITED?
Sent from the SPSSX Discussion mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: SPSS Bivariate Correlations using Pearsons LIMITED?

Art Kendall
In reply to this post by spobster
Wow!!
Please explain the nature of your study.   This is a huge number of variables even in gene sequencing.

Are there really 30,000 separate constructs, or are the "variables" some form of repeated measure such as time series, electromagnetic spectra, etc.?

What is the meaning of a case in you study?

What level of measurement are the variables?

What questions are you using the data to answer?

What would you do with a correlation matrix that size? With 79 cases the R matrix would be singular.


Art Kendall
Social Research Consultants




spobster wrote:
I have a question that is similar to one that you answered on a forum (nabble link). Therefore I sincerely hope that someone can help me with my question too. I need to do large bivariate Pearson correlation calculations (30000 variables, 79 cases, fully filled table). Do you think this is doable? I am a complete SPSS noob, so if you get back to me with a syntax code or file, could you please explain me in detail what I should change in this (filename, or row numbers etc.) before running? Thanks already so much! spobster

View this message in context: SPSS Bivariate Correlations using Pearsons LIMITED?
Sent from the SPSSX Discussion mailing list archive at Nabble.com.
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: SPSS Bivariate Correlations using Pearsons LIMITED?

spobster
In reply to this post by SR Millis-3
As an answer to everybody: I will try to explain. The data are the result of micro-array analysis. Using 30000 different probes specific for the different genes on the human genome, expression of these genes in 79 different tissues was tested. All probes have an average value of intensity from duplicates for each of the 79 tissues. I will try to expain what I want with the correlations: I want to make the correlations to find out which genes have a similar expression pattern throughout these 79 tissues. Of course, there are some genes of special interest to me, and it is most exciting to see which other genes belong to the group containing these genes. However, I realize that the groups might be not tightly defined, and therefore it is interesting to see also the correlation from borderline genes with each gene within such a group. If the total correlation option is not possible, then I hope that some of you can come up with a nice option to classify the different genes according to their tissue expression. If you have more questions concerning my approach, please don't hesitate to ask! Thanks everybody!! Spobster
SR Millis wrote
Why are you performing so many correlations, ie, what is the nature of the study? Scott Millis --- On Wed, 6/24/09, spobster <rmspaapen@HOTMAIL.COM> wrote: > From: spobster <rmspaapen@HOTMAIL.COM> > Subject: SPSS Bivariate Correlations using Pearsons LIMITED? > To: SPSSX-L@LISTSERV.UGA.EDU > Date: Wednesday, June 24, 2009, 12:14 PM > > I have a question that is similar to one that you answered > on a forum (nabble > link). Therefore I sincerely hope that someone can help > me with my question too. > > I need to do large bivariate Pearson correlation > calculations (30000 variables, 79 cases, fully filled > table). Do you think this is doable? I am a complete SPSS > noob, so if you get back to me with a syntax code or file, > could you please explain me in detail what I should change > in this (filename, or row numbers etc.) before running? > > Thanks already so much! > > spobster > > > View this message in context: SPSS > Bivariate Correlations using Pearsons LIMITED? > > Sent from the SPSSX > Discussion mailing list archive at Nabble.com. > > ===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: SPSS Bivariate Correlations using Pearsons LIMITED?

Art Kendall
Do you have two readings of intensity for each locus from the same exact tissue? Or is the average over a larger number of intensities? Is there anything that distinguishes the readings?

How did you select the tissues?
Are there subsets of the cases that have a known characteristic?

Several people at the Classification Society meeting a couple of weeks ago dealt with this kind of problem. I suggest that you clarify your question and post it to
http://lists.sunysb.edu/index.cgi?A0=CLASS-L

It would greatly reduce the size of the analysis by doing correlations of those you are specifically interested WITH  selected sets of the others. The curse of dimensionality will still be a major consideration.

Art Kendall
Social Research Consultants
spobster wrote:
As an answer to everybody: I will try to explain. The data are the result of micro-array analysis. Using 30000 different probes specific for the different genes on the human genome, expression of these genes in 79 different tissues was tested. All probes have an average value of intensity from duplicates for each of the 79 tissues. I will try to expain what I want with the correlations: I want to make the correlations to find out which genes have a similar expression pattern throughout these 79 tissues. Of course, there are some genes of special interest to me, and it is most exciting to see which other genes belong to the group containing these genes. However, I realize that the groups might be not tightly defined, and therefore it is interesting to see also the correlation from borderline genes with each gene within such a group. If the total correlation option is not possible, then I hope that some of you can come up with a nice option to classify the different genes according to their tissue expression. If you have more questions concerning my approach, please don't hesitate to ask! Thanks everybody!! Spobster
SR Millis wrote:
Why are you performing so many correlations, ie, what is the nature of the study? Scott Millis --- On Wed, 6/24/09, spobster wrote: > From: spobster > Subject: SPSS Bivariate Correlations using Pearsons LIMITED? > To: [hidden email] > Date: Wednesday, June 24, 2009, 12:14 PM > > I have a question that is similar to one that you answered > on a forum (nabble > link). Therefore I sincerely hope that someone can help > me with my question too. > > I need to do large bivariate Pearson correlation > calculations (30000 variables, 79 cases, fully filled > table). Do you think this is doable? I am a complete SPSS > noob, so if you get back to me with a syntax code or file, > could you please explain me in detail what I should change > in this (filename, or row numbers etc.) before running? > > Thanks already so much! > > spobster > > > View this message in context: SPSS > Bivariate Correlations using Pearsons LIMITED? > > Sent from the SPSSX > Discussion mailing list archive at Nabble.com. > > ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


View this message in context: Re: SPSS Bivariate Correlations using Pearsons LIMITED?
Sent from the SPSSX Discussion mailing list archive at Nabble.com.
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: SPSS Bivariate Correlations using Pearsons LIMITED?

Marta Garcia-Granero
In reply to this post by Swank, Paul R
Swank, Paul R wrote:
>
> That’s almost 450 million correlations with an n of 79. Do you realize
> how many spurious correlations you can get by chance? Almost 22.5
> million. Not to mention that with 79 cases, the number of spurious
> correlations will likely be higher because of the small sample.
>

Another point of view: Even if someone is able to scan and interpret one
correlation coefficient per second, 449,985,000 seconds are 14.26 years,
that's a lot of time to devote to any research (a lot of marriages don't
last that much). What's the point of computing such a huge correlation
matrix, if the results can't be analyzed in a reasonable time? Besides,
I don't even think my poor computer could handle that (each computed
correlation coefficient will take several bytes, I assume, plus the
corresponding p-value... we should start reserving terabytes of drive
space).

Marta GG

>
> I have a question that is similar to one that you answered on a forum
> (nabble link
> <http://www.nabble.com/Bivariate-Correlation-Variable-Limit-td18004400.html>).
> Therefore I sincerely hope that someone can help me with my question
> too. I need to do large bivariate Pearson correlation calculations
> (30000 variables, 79 cases, fully filled table). Do you think this is
> doable? I am a complete SPSS noob, so if you get back to me with a
> syntax code or file, could you please explain me in detail what I
> should change in this (filename, or row numbers etc.) before running?
> Thanks already so much! spobster
>
> ------------------------------------------------------------------------


--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: SPSS Bivariate Correlations using Pearsons LIMITED?

spobster
In reply to this post by Art Kendall
Thanks again for response of everybody. Because of the non-feasibility of my ideas, I have changed my approach, and therefore I have a new question. First, I did not perform the micro-arrays myself, I have downloaded data from the internet (wombat.gnf.org). The RNA of the tissues from different individuals was pooled before the array. Each probe was hybridized only in duplo on the same pool and the average of this duplo is shown in the tables that I downloaded. I want to use the data as addition for my own research in tumor immunology. Due to your advices, I have selected part of the dataset, still 898 variables. Since I still can't do the total bivariate correlation analysis with this number of variables (100 is the limit according to my version 15), I want to start by doing correlation analysis of one of those variables against the other 897. However, I cannot find in SPSS how to perform this. Does anyone know how to perform this, this first analysis should not eat my total computer which does not have Terabytes of memory and harddisk Thanks!!
Art Kendall wrote
Do you have two readings of intensity for each locus from the same exact tissue? Or is the average over a larger number of intensities? Is there anything that distinguishes the readings? How did you select the tissues? Are there subsets of the cases that have a known characteristic? Several people at the Classification Society meeting a couple of weeks ago dealt with this kind of problem. I suggest that you clarify your question and post it to http://lists.sunysb.edu/index.cgi?A0=CLASS-L It would greatly reduce the size of the analysis by doing correlations of those you are specifically interested WITH  selected sets of the others. The curse of dimensionality will still be a major consideration. Art Kendall Social Research Consultants spobster wrote: As an answer to everybody: I will try to explain. The data are the result of micro-array analysis. Using 30000 different probes specific for the different genes on the human genome, expression of these genes in 79 different tissues was tested. All probes have an average value of intensity from duplicates for each of the 79 tissues. I will try to expain what I want with the correlations: I want to make the correlations to find out which genes have a similar expression pattern throughout these 79 tissues. Of course, there are some genes of special interest to me, and it is most exciting to see which other genes belong to the group containing these genes. However, I realize that the groups might be not tightly defined, and therefore it is interesting to see also the correlation from borderline genes with each gene within such a group. If the total correlation option is not possible, then I hope that some of you can come up with a nice option to classify the different genes according to their tissue expression. If you have more questions concerning my approach, please don't hesitate to ask! Thanks everybody!! Spobster SR Millis wrote: Why are you performing so many correlations, ie, what is the nature of the study? Scott Millis --- On Wed, 6/24/09, spobster wrote: > From: spobster > Subject: SPSS Bivariate Correlations using Pearsons LIMITED? > To: SPSSX-L@LISTSERV.UGA.EDU > Date: Wednesday, June 24, 2009, 12:14 PM > > I have a question that is similar to one that you answered > on a forum (nabble > link). Therefore I sincerely hope that someone can help > me with my question too. > > I need to do large bivariate Pearson correlation > calculations (30000 variables, 79 cases, fully filled > table). Do you think this is doable? I am a complete SPSS > noob, so if you get back to me with a syntax code or file, > could you please explain me in detail what I should change > in this (filename, or row numbers etc.) before running? > > Thanks already so much! > > spobster > > > View this message in context: SPSS > Bivariate Correlations using Pearsons LIMITED? > > Sent from the SPSSX > Discussion mailing list archive at Nabble.com. > > ===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD View this message in context: Re: SPSS Bivariate Correlations using Pearsons LIMITED? Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@LISTSERV.UGA.EDU (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: SPSS Bivariate Correlations using Pearsons LIMITED?

Marta Garcia-Granero
spobster WROTE:

> Thanks again for response of everybody. Because of the non-feasibility
> of my ideas, I have changed my approach, and therefore I have a new
> question. First, I did not perform the micro-arrays myself, I have
> downloaded data from the internet (wombat.gnf.org). The RNA of the
> tissues from different individuals was pooled before the array. Each
> probe was hybridized only in duplo on the same pool and the average of
> this duplo is shown in the tables that I downloaded. I want to use the
> data as addition for my own research in tumor immunology. Due to your
> advices, I have selected part of the dataset, still 898 variables.
> Since I still can't do the total bivariate correlation analysis with
> this number of variables (100 is the limit according to my version
> 15), I want to start by doing correlation analysis of one of those
> variables against the other 897. However, I cannot find in SPSS how to
> perform this. Does anyone know how to perform this, this first
> analysis should not eat my total computer which does not have
> Terabytes of memory and harddisk Thanks!!
The 100 variables limit will force you to split your analysis in
"swallowable" pieces.

CORRELATIONS
  /VARIABLES=var1 WITH var2 TO var100
  /PRINT=TWOTAIL NOSIG.

CORRELATIONS
  /VARIABLES=var1 WITH var101 TO var199
  /PRINT=TWOTAIL NOSIG.

And so on, until you get to the last variable (var898). The keyword TO
will allow you to name consecutive variables (first TO last) in bunches
of 99 (plus the the one to the left of the WITH keyword, you get the 100
limit).

HTH,
Marta GG

--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD