rank order (spearman) correlation vs pearson correlations

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

rank order (spearman) correlation vs pearson correlations

Maguin, Eugene

Can anyone offer comments or a reference on under what conditions rank order correlations will differ from pearson correlations? By differ, I mean, by at least 20%, not 2nd or 3rd digit differences. Maybe high skewness, outlier points even if on the least squares regression line. I suppose this touches on robust methods, which I don’t know anything about.

Thanks, Gene Maguin

Reply | Threaded
Open this post in threaded view
|

Re: rank order (spearman) correlation vs pearson correlations

Ryan
Much can be said on this topic, but in a nutshell the Spearman is used to measure a monotonic relationship while the Pearson is used to measure a linear relationship (a type of monotonic relationship). I assume the OP knows how one calculates a Pearson and Spearman as this is taught in introductory stats courses. I'll refrain from discussing issues surrounding outliers, skewness etc. for the moment but they clearly can affect the difference in coefficients.

Here is a little simulation I just created which should illuminate the *general* point I made above.

*Generate data.
SET SEED 65923454.
NEW FILE.
INPUT PROGRAM.
LOOP ID= 1 to 100.
COMPUTE x = RV.NORMAL(0,1).
COMPUTE  y = x**3.
END CASE.
END LOOP.
END FILE.
END INPUT PROGRAM.
EXECUTE.

GRAPH
  /SCATTERPLOT(BIVAR)=x WITH y.

CORRELATIONS
  /VARIABLES=x y
  /PRINT=TWOTAIL NOSIG.
NONPAR CORR
  /VARIABLES=x y
  /PRINT=SPEARMAN TWOTAIL NOSIG.



On Tue, May 6, 2014 at 3:54 PM, Maguin, Eugene <[hidden email]> wrote:

Can anyone offer comments or a reference on under what conditions rank order correlations will differ from pearson correlations? By differ, I mean, by at least 20%, not 2nd or 3rd digit differences. Maybe high skewness, outlier points even if on the least squares regression line. I suppose this touches on robust methods, which I don’t know anything about.

Thanks, Gene Maguin


Reply | Threaded
Open this post in threaded view
|

Re: rank order (spearman) correlation vs pearson correlations

David Marso
Administrator
In reply to this post by Maguin, Eugene
Plot the data.
I can envision any number of situations where the ranked data are perfectly correlated yet the Pearson can be moderate.
Consider X={1:10}, Y =EXP(X), Rho=1, Pearson=0.716870

Maguin, Eugene wrote
Can anyone offer comments or a reference on under what conditions rank order correlations will differ from pearson correlations? By differ, I mean, by at least 20%, not 2nd or 3rd digit differences. Maybe high skewness, outlier points even if on the least squares regression line. I suppose this touches on robust methods, which I don't know anything about.
Thanks, Gene Maguin
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: rank order (spearman) correlation vs pearson correlations

Maguin, Eugene
In reply to this post by Ryan

Thank you, all of you.

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Ryan Black
Sent: Tuesday, May 06, 2014 5:02 PM
To: [hidden email]
Subject: Re: rank order (spearman) correlation vs pearson correlations

 

Much can be said on this topic, but in a nutshell the Spearman is used to measure a monotonic relationship while the Pearson is used to measure a linear relationship (a type of monotonic relationship). I assume the OP knows how one calculates a Pearson and Spearman as this is taught in introductory stats courses. I'll refrain from discussing issues surrounding outliers, skewness etc. for the moment but they clearly can affect the difference in coefficients.

 

Here is a little simulation I just created which should illuminate the *general* point I made above.

 

*Generate data.

SET SEED 65923454.

NEW FILE.

INPUT PROGRAM.

LOOP ID= 1 to 100.

COMPUTE x = RV.NORMAL(0,1).

COMPUTE  y = x**3.

END CASE.

END LOOP.

END FILE.

END INPUT PROGRAM.

EXECUTE.

 

GRAPH

  /SCATTERPLOT(BIVAR)=x WITH y.

 

CORRELATIONS

  /VARIABLES=x y

  /PRINT=TWOTAIL NOSIG.

NONPAR CORR

  /VARIABLES=x y

  /PRINT=SPEARMAN TWOTAIL NOSIG.

 

 

On Tue, May 6, 2014 at 3:54 PM, Maguin, Eugene <[hidden email]> wrote:

Can anyone offer comments or a reference on under what conditions rank order correlations will differ from pearson correlations? By differ, I mean, by at least 20%, not 2nd or 3rd digit differences. Maybe high skewness, outlier points even if on the least squares regression line. I suppose this touches on robust methods, which I don’t know anything about.

Thanks, Gene Maguin

 

Reply | Threaded
Open this post in threaded view
|

Re: rank order (spearman) correlation vs pearson correlations

Rich Ulrich
In reply to this post by David Marso
Neatly demonstrated.

Here are a few logical points about Spearman/Pearson.

1) The simple computing relationship is that the Spearman is what you
get when you compute a Pearson r on the rank-transformed versions of
the scores.
2) A difference between the two demonstrates, in my experience, that
one or both measures should be transformed before later statistical
analysis, if that is at all reasonable. 
3) When there is extreme skew in both measures, a plot will show you
almost all of the data in one corner of the plot.  In this sort of example,
it should be easy to recognize that the size of the correlation depends
on the few score away from that corner ... and thus, in effect, *those*
make up all the "degrees of freedom" of the relationship, regardless of
how many scores are in that corner.  This matters because an r with smaller
d.f.  has a correspondingly *larger* standard error, and a larger value
needed for statistical significance.

--
Rich Ulrich

> Date: Tue, 6 May 2014 14:06:47 -0700

> From: [hidden email]
> Subject: Re: rank order (spearman) correlation vs pearson correlations
> To: [hidden email]
>
> Plot the data.
> I can envision any number of situations where the ranked data are perfectly
> correlated yet the Pearson can be moderate.
> Consider X={1:10}, Y =EXP(X), Rho=1, Pearson=0.716870
>
>
> Maguin, Eugene wrote
> > Can anyone offer comments or a reference on under what conditions rank
> > order correlations will differ from pearson correlations? By differ, I
> > mean, by at least 20%, not 2nd or 3rd digit differences. Maybe high
> > skewness, outlier points even if on the least squares regression line. I
> > suppose this touches on robust methods, which I don't know anything about.
> > Thanks, Gene Maguin
>
>
> ...
Reply | Threaded
Open this post in threaded view
|

Re: rank order (spearman) correlation vs pearson correlations

Marta Garcia-Granero
In reply to this post by Maguin, Eugene
Hi:

I normaly teach the differences between Pearson's and Spearman's correlation coefficients using the Anscombe quartet (Google will give a lot of hits, including graphs&datasets)

Regards,
Marta GG

El 06/05/2014 23:29, Maguin, Eugene escribió:

Thank you, all of you.

From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Ryan Black
Sent: Tuesday, May 06, 2014 5:02 PM
To: [hidden email]
Subject: Re: rank order (spearman) correlation vs pearson correlations

Much can be said on this topic, but in a nutshell the Spearman is used to measure a monotonic relationship while the Pearson is used to measure a linear relationship (a type of monotonic relationship). I assume the OP knows how one calculates a Pearson and Spearman as this is taught in introductory stats courses. I'll refrain from discussing issues surrounding outliers, skewness etc. for the moment but they clearly can affect the difference in coefficients.

Here is a little simulation I just created which should illuminate the *general* point I made above.

*Generate data.

SET SEED 65923454.

NEW FILE.

INPUT PROGRAM.

LOOP ID= 1 to 100.

COMPUTE x = RV.NORMAL(0,1).

COMPUTE � y = x**3.

END CASE.

END LOOP.

END FILE.

END INPUT PROGRAM.

EXECUTE.

GRAPH

� /SCATTERPLOT(BIVAR)=x WITH y.

CORRELATIONS

� /VARIABLES=x y

� /PRINT=TWOTAIL NOSIG.

NONPAR CORR

� /VARIABLES=x y

� /PRINT=SPEARMAN TWOTAIL NOSIG.

On Tue, May 6, 2014 at 3:54 PM, Maguin, Eugene <[hidden email]> wrote:

Can anyone offer comments or a reference on under what conditions rank order correlations will differ from pearson correlations? By differ, I mean, by at least 20%, not 2nd or 3rd digit differences. Maybe high skewness, outlier points even if on the least squares regression line. I suppose this touches on robust methods, which I don’t know anything about.

Thanks, Gene Maguin