Incorrect Mahalanobis distances from regression?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Incorrect Mahalanobis distances from regression?

Ruben Geert van den Berg
Dear all,
 
I'd like to obtain Mahalanobis distances to check for multivariate outliers. The easiest option, I guess, is to run regression with the desired contributors to this distance as predictors and some arbitrary dependent variable (since I'm interested in nothing but the Mahalanobis distances). So far so good, but when I validated these distances, it seemed as if SPSS renders squared distances. That is, the square root sign seems to have been omitted from the Wikipedia equation:
 
http://en.wikipedia.org/wiki/Mahalanobis_distance
 
Could anyone please confirm or disconfirm whether SPSS renders SQUARED Mahalanobis distances instead of Mahalanobis distances?
 
TIA and have a shiny happy weekend everyone!


Express yourself instantly with MSN Messenger! MSN Messenger
Reply | Threaded
Open this post in threaded view
|

Re: Incorrect Mahalanobis distances from regression?

Art Kendall
Open SPSS.
click <help>
click <algorithms>
click <search>
type "mahalnobis" in the edit box.
click <list topics>


Art Kendall
Social Research Consultants


Ruben van den Berg wrote:
Dear all,
 
I'd like to obtain Mahalanobis distances to check for multivariate outliers. The easiest option, I guess, is to run regression with the desired contributors to this distance as predictors and some arbitrary dependent variable (since I'm interested in nothing but the Mahalanobis distances). So far so good, but when I validated these distances, it seemed as if SPSS renders squared distances. That is, the square root sign seems to have been omitted from the Wikipedia equation:
 
http://en.wikipedia.org/wiki/Mahalanobis_distance
 
Could anyone please confirm or disconfirm whether SPSS renders SQUARED Mahalanobis distances instead of Mahalanobis distances?
 
TIA and have a shiny happy weekend everyone!


Express yourself instantly with MSN Messenger! MSN Messenger
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Incorrect Mahalanobis distances from regression?

SPSS Support
In reply to this post by Ruben Geert van den Berg

Hello Ruben,

  The values saved by the REGRESSION../SAVE MAHAL option are indeed squared Mahalanobis distances. It is the squared Mahalanobis distances (which I’ll call MD2) that are applied in outlier detection in the multivariate texts that I’ve consulted (Stevens, 2002; Tabachnick & Fidell, 2007). Both these texts use the term Mahalanobis distance (although Stevens denotes it as D^2, i.e. D with a superscript 2) in discussing the application to outliers but their formulae are clearly for MD2. So, for the purposes of outlier detection, the MD2 distance measures that are saved by the MAHAL keyword in REGRESSION are likely what you want to use anyway. However, I’ve sent a note to one of our documentation authors to suggest that we should be explicitly note that it is the squared Mahalanobis distances that are saved.

 

 Having saved the MD2 from REGRESSION, it would be very simple to compute the unsquared distance (MD). If you ran the linear regression procedure from the menu system, the squared Mahalanobis  distance is saved with a generic name such as MAH_1. From the

Transform->Compute menu, enter the name for the new distance variable (MDist, for example). The numeric expression would be

Sqrt(mah_1)

The equivalent syntax commands are:

 

compute mdist = sqrt(mah_1) .

execute.  

 

Stevens, J. (2002). Applied Multivariate Statistics for the Social Sciences (4th Ed.). Mahwah NJ: Erlbaum.

 

Tabachnick, B.G, & Fidell, L.S. (2007). Using Multivariate Statistics (5th Ed.). Boston: Pearson Education.

 

  I hope this helps.

 

David Matheson

SPSS Statistical Support

 


From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Ruben van den Berg
Sent: Friday, May 01, 2009 5:13 AM
To: [hidden email]
Subject: Incorrect Mahalanobis distances from regression?

 

Dear all,
 
I'd like to obtain Mahalanobis distances to check for multivariate outliers. The easiest option, I guess, is to run regression with the desired contributors to this distance as predictors and some arbitrary dependent variable (since I'm interested in nothing but the Mahalanobis distances). So far so good, but when I validated these distances, it seemed as if SPSS renders squared distances. That is, the square root sign seems to have been omitted from the Wikipedia equation:
 
http://en.wikipedia.org/wiki/Mahalanobis_distance
 
Could anyone please confirm or disconfirm whether SPSS renders SQUARED Mahalanobis distances instead of Mahalanobis distances?
 
TIA and have a shiny happy weekend everyone!


Express yourself instantly with MSN Messenger! MSN Messenger

Reply | Threaded
Open this post in threaded view
|

Re: Incorrect Mahalanobis distances from regression?

Art Kendall
How about including Mahalanobis distances in PROXIMITIES?

Art Kendall
Social Research Consultants

SPSS Support wrote:

Hello Ruben,

  The values saved by the REGRESSION../SAVE MAHAL option are indeed squared Mahalanobis distances. It is the squared Mahalanobis distances (which I’ll call MD2) that are applied in outlier detection in the multivariate texts that I’ve consulted (Stevens, 2002; Tabachnick & Fidell, 2007). Both these texts use the term Mahalanobis distance (although Stevens denotes it as D^2, i.e. D with a superscript 2) in discussing the application to outliers but their formulae are clearly for MD2. So, for the purposes of outlier detection, the MD2 distance measures that are saved by the MAHAL keyword in REGRESSION are likely what you want to use anyway. However, I’ve sent a note to one of our documentation authors to suggest that we should be explicitly note that it is the squared Mahalanobis distances that are saved.

 

 Having saved the MD2 from REGRESSION, it would be very simple to compute the unsquared distance (MD). If you ran the linear regression procedure from the menu system, the squared Mahalanobis  distance is saved with a generic name such as MAH_1. From the

Transform->Compute menu, enter the name for the new distance variable (MDist, for example). The numeric expression would be

Sqrt(mah_1)

The equivalent syntax commands are:

 

compute mdist = sqrt(mah_1) .

execute.  

 

Stevens, J. (2002). Applied Multivariate Statistics for the Social Sciences (4th Ed.). Mahwah NJ: Erlbaum.

 

Tabachnick, B.G, & Fidell, L.S. (2007). Using Multivariate Statistics (5th Ed.). Boston: Pearson Education.

 

  I hope this helps.

 

David Matheson

SPSS Statistical Support

 


From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Ruben van den Berg
Sent: Friday, May 01, 2009 5:13 AM
To: [hidden email]
Subject: Incorrect Mahalanobis distances from regression?

 

Dear all,
 
I'd like to obtain Mahalanobis distances to check for multivariate outliers. The easiest option, I guess, is to run regression with the desired contributors to this distance as predictors and some arbitrary dependent variable (since I'm interested in nothing but the Mahalanobis distances). So far so good, but when I validated these distances, it seemed as if SPSS renders squared distances. That is, the square root sign seems to have been omitted from the Wikipedia equation:
 
http://en.wikipedia.org/wiki/Mahalanobis_distance
 
Could anyone please confirm or disconfirm whether SPSS renders SQUARED Mahalanobis distances instead of Mahalanobis distances?
 
TIA and have a shiny happy weekend everyone!


Express yourself instantly with MSN Messenger! MSN Messenger

Art Kendall
Social Research Consultants