SPSSX Discussion

Multivariate outliers - different mahalanobis distances between casewise and regression

Classic

List

Threaded

6 messages Options

Andy H

Mar 25, 2013; 7:55pm

Multivariate outliers - different mahalanobis distances between casewise and regression

Hi list folk,

I am attempting to identify multivariate outliers in relation to discriminant analysis. When I select casewise results in the classify dialog box of discriminant analysis in SPSS, I can inspect the Squared Mahalanobis Distance to Centroid for each case. Ive also read that I can save the Mahalanobis distances for each case via linear regression (which I have done). Im a little confused though as the values are different from each method. Even squared the regression mahalanobis distances are not the same as casewise.

Im assuming the Squared Mahalanobis Distance to Centroid in the casewise statistics is calculated at the group level (I have 4 groups). I have also obtained the Mahalanobis distances via linear regression by selecting only cases in a given group. Sorry if this is simple stuff but could someone please explain why the values are different via these two methods.

Regards
A

David Marso

Mar 25, 2013; 8:10pm

Re: Multivariate outliers - different mahalanobis distances between casewise and regression

Administrator

This post was updated on Mar 25, 2013; 10:04pm.

Andy,
One place to start digging is the SPSS Algorithms available on the IBM website.
Since I don't know the exact answer to your question off the top of my head, I would have to dig out my copy and look it up... Ah, but you can do that too.
Maybe someone else knows off the top of their head and will answer, but I figured I'd show you a place to start in the meantime. When you locate the answer, please return and post what you find.
EDIT-DELETE: My suspicion ? Treatment of Missing Values.

ADDED: A quick look at the algorithms indicates the two "Mahalanobis" distances are completely different beasts and I would not expect them to have any correspondence to one another.

David
--

Andy H wrote

Hi list folk,

I am attempting to identify multivariate outliers in relation to discriminant analysis. When I select casewise results in the classify dialog box of discriminant analysis in SPSS, I can inspect the Squared Mahalanobis Distance to Centroid for each case. Ive also read that I can save the Mahalanobis distances for each case via linear regression (which I have done). Im a little confused though as the values are different from each method. Even squared the regression mahalanobis distances are not the same as casewise.

Im assuming the Squared Mahalanobis Distance to Centroid in the casewise statistics is calculated at the group level (I have 4 groups). I have also obtained the Mahalanobis distances via linear regression by selecting only cases in a given group. Sorry if this is simple stuff but could someone please explain why the values are different via these two methods.

Regards
A

Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"

Kirill Orlov

Mar 26, 2013; 7:16am

Re: Multivariate outliers - different mahalanobis distances between casewise and regression

In reply to this post by Andy H

Andy,
Regression procedure computes you sq. Mahalanobis distance of every of the cloud point to the centroid of the cloud in the space of the independent variables. (I have an equivalent easy-to-use MATRIX function !smahalc on my web page rivita.ru/spssmacros_en.shtml.)

Discriminant procedure computes sq. Mahalanobis distance separately against each class (you are right), i.e. the cloud is this or that class, not the whole sample. But there are more differences. For Discriminant, the space is the space of discriminant functions, *not* of the original independent variables (and if you have less functions than variables it makes a difference in the shape of the cloud!). Also, by *default* Discriminant uses pooled within-class covariance matrix in computing the distance (which is futher different from your attempt to compute the distance in Regression separately for each class).

25.03.2013 23:55, Andy H пишет:

Hi list folk,

I am attempting to identify multivariate outliers in relation to
discriminant analysis. When I select casewise results in the classify dialog
box of discriminant analysis in SPSS, I can inspect the Squared Mahalanobis
Distance to Centroid for each case. Ive also read that I can save the
Mahalanobis distances for each case via linear regression (which I have
done). Im a little confused though as the values are different from each
method. Even squared the regression mahalanobis distances are not the same
as casewise.

Im assuming the Squared Mahalanobis Distance to Centroid in the casewise
statistics is calculated at the group level (I have 4 groups). I have also
obtained the Mahalanobis distances via linear regression by selecting only
cases in a given group. Sorry if this is simple stuff but could someone
please explain why the values are different via these two methods.

Regards
A




--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multivariate-outliers-different-mahalanobis-distances-between-casewise-and-regression-tp5719071.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

... [show rest of quote]

Cox, Shelley

Mar 26, 2013; 7:18am

Automatic reply: Multivariate outliers - different mahalanobis distances between casewise and regression

I am away from the office until 4/4/2013. I will reply to your email when I return.

Kind Regards

Shelley

This email and any attachments are confidential, privileged or private. If you are not the intended recipient you must not keep, forward, copy, use, disclose, save or rely on this communication, and any such action is unauthorised and prohibited. If you have received this email in error, please notify the sender immediately and delete the email. Ambulance Victoria (AV) disclaims liability for the contents of private emails and does not warrant that this email or any attachments are free from viruses or defects.

Andy H

Mar 26, 2013; 8:43am

Re: Multivariate outliers - different mahalanobis distances between casewise and regression

In reply to this post by Andy H

Thank you for your replies.

@David. A good suggestion for a starting. Thank you. Ive located the info and there are differences. What those differences are is a little confusing though.
Regression - http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.help%2Falg_regression.htm
Discriminant - http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.help%2Fidh_disc_value.htm

@Kirill. Thank you for your explanation. Although I have split the cases by group for the regression method, "For Discriminant, the space is the space of discriminant functions, *not* of the original independent variables" explains the differences between the regression and discriminant methods. Do you have any links to any literature that discusses this further please? Also, you website appears to not be online. I can access it via Google cache but the macros are not accessible via this method.

Im opting to use the regression mahalanobis distances with cases split by group and my method for exploring multivariate outliers. Since identifying outliers is an initial exploratory task, using the regression method seems appropriate rather than the discriminant mahalanobis distances which are calculated further in the process. Does anyone have any thoughts on this please? Incidently, im also using z-scores via descriptives to identify univariate outliers.

Regards
A

Kirill Orlov

Mar 26, 2013; 9:13am

Re: Multivariate outliers - different mahalanobis distances between casewise and regression

In reply to this post by Andy H

Andy,

1) Well, on the literature... With Mahalanobis distance in general (= as in Regression), I think, Wikipedia is enough. With it within Discriminant, one should read topics on linear discriminant analysis (including SPSS Algorithms, Cace Studies, Help). Here is my own answers about how discriminant functions are extracted and how they are used in classifying cases.
2) Oh yeah, my web page is currently unavailable, don't know why. OK, then, I'm attaching the MATRIX functions collection for you with this email.

26.03.2013 12:43, Andy H пишет:

@Kirill. Thank you for your explanation. Although I have split the cases by
group for the regression method, "For Discriminant, the space is the space
of discriminant functions, *not* of the original independent variables"
explains the differences between the regression and discriminant methods. Do
you have any links to any literature that discusses this further please?
Also, you website appears to not be online. I can access it via Google cache
but the macros are not accessible via this method.