IBM Support page on the point-biserial correlation

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

IBM Support page on the point-biserial correlation

Bruce Weaver
Administrator
A recent discussion about the point-biserial correlation (r_pb) on ResearchGate got me thinking about this IBM Support page about r_pb:

https://www.ibm.com/support/pages/point-biserial-correlations-spss

I have no real issue with what it says.  It is completely true that when X is dichotomous and Y is a metric variable, Pearson r is equivalent to r_pb.  However, I think the note should add that one cannot infer from this equivalence of the point estimates that the 95% CI for Pearson r is the correct CI for r_pb.  If the help for Stata's -esize- command is correct, the two CIs are not the same.  Specifically, the non-central t-distribution is used in computing the correct CI for r_pb.  Click on Methods and Formulas here for more info:

https://www.stata.com/manuals/resize.pdf

This is not the only example where two different methods can be used to compute exactly the same point estimate, but where the CI from one method is not the correct CI for the statistic of interest.  Consider Pearson r and the slope from a simple linear regression model when both variables have been standardized.  The point estimates are indeed equivalent.  But the 95% CI for the slope from that regression model is not the correct CI for Pearson r, despite the advice given in this YouTube video:

https://www.youtube.com/watch?v=-dSoWqDyT4E

The correct CI for Pearson r is symmetrical only when the observed value of r = 0; otherwise, it is asymmetrical.  And the limits are always within the range -1 to 1.  The CI for the slope from an OLS model, on the other hand is always symmetrical; and it can have values outside the range -1 to 1.  

So, as noted above, I reckon the IBM Support page should be updated to clarify that one cannot simply use the STATS CORRELATION extension command to get the correct CI for r_pb.  

Finally, a question for Jon, who wrote STATS CORRELATION.  Is it feasible to add an option to compute the correct CI for r_pb (using the non-central t-distribution)?  

Cheers,
Bruce

PS- Anyone interested in the RG and Statalist discussions about this can view them via the links below.

https://www.researchgate.net/post/Q_re_pbis_in_STATA-is_there_a_macro_to_do_multiple_calculations_to_a_table_or_list_or_do_I_have_to_do_them_all_individually

https://www.statalist.org/forums/forum/general-stata-discussion/general/1690078-methods-for-computing-the-point-biserial-correlation
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: IBM Support page on the point-biserial correlation

jkpeck
Bruce has an interest  in this question, since he inspired the STATS CORRELATION extension command :-)

As for adding that alternative CI formula, I need to look back at that code, but since Statistics has a noncentral t cdf, and I have code that can reach into the backend and run the relevant calculation, this is probably doable.  
But,  this command allows for a bootstrapped CI, so doesn't that cover the dichotomous case?
Reply | Threaded
Open this post in threaded view
|

Re: IBM Support page on the point-biserial correlation

Bruce Weaver
Administrator
Here is one possible translation of the first sentence below:  Jon found my !rhoCI macro very clunky, and decided he had to produce a better mousetrap.  (https://www.tqmp.org/RegularArticles/vol10-1/p029/index.html)

Just kidding Jon.  But it was very kind of you to acknowledge me in the help for STATS CORRELATION, as I do not recall making any very substantial contributions.  

Yes, bootstrapping the CI is an option.  But my preference would be to also have the method that uses the non-central t cdf.  I suppose I could always revise !rhoCI to include that.  Maybe I'll do that some day.  ;-)  

Cheers,
Bruce

PS- STATS CORRELATIONS is a better mousetrap than !rhoCI, which is why I advise people to use it.  


jkpeck wrote
Bruce has an interest  in this question, since he inspired the STATS CORRELATION extension command :-)

As for adding that alternative CI formula, I need to look back at that code, but since Statistics has a noncentral t cdf, and I have code that can reach into the backend and run the relevant calculation, this is probably doable.  
But,  this command allows for a bootstrapped CI, so doesn't that cover the dichotomous case?
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).