Hi list folk,
I am attempting to identify multivariate outliers in relation to discriminant analysis. When I select casewise results in the classify dialog box of discriminant analysis in SPSS, I can inspect the Squared Mahalanobis Distance to Centroid for each case. Ive also read that I can save the Mahalanobis distances for each case via linear regression (which I have done). Im a little confused though as the values are different from each method. Even squared the regression mahalanobis distances are not the same as casewise. Im assuming the Squared Mahalanobis Distance to Centroid in the casewise statistics is calculated at the group level (I have 4 groups). I have also obtained the Mahalanobis distances via linear regression by selecting only cases in a given group. Sorry if this is simple stuff but could someone please explain why the values are different via these two methods. Regards A |
Administrator
|
This post was updated on .
Andy,
One place to start digging is the SPSS Algorithms available on the IBM website. Since I don't know the exact answer to your question off the top of my head, I would have to dig out my copy and look it up... Ah, but you can do that too. Maybe someone else knows off the top of their head and will answer, but I figured I'd show you a place to start in the meantime. When you locate the answer, please return and post what you find. EDIT-DELETE: My suspicion ? Treatment of Missing Values. ADDED: A quick look at the algorithms indicates the two "Mahalanobis" distances are completely different beasts and I would not expect them to have any correspondence to one another. David --
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by Andy H
Andy,
Regression procedure computes you sq. Mahalanobis distance of every of the cloud point to the centroid of the cloud in the space of the independent variables. (I have an equivalent easy-to-use MATRIX function !smahalc on my web page rivita.ru/spssmacros_en.shtml.) Discriminant procedure computes sq. Mahalanobis distance separately against each class (you are right), i.e. the cloud is this or that class, not the whole sample. But there are more differences. For Discriminant, the space is the space of discriminant functions, *not* of the original independent variables (and if you have less functions than variables it makes a difference in the shape of the cloud!). Also, by *default* Discriminant uses pooled within-class covariance matrix in computing the distance (which is futher different from your attempt to compute the distance in Regression separately for each class). 25.03.2013 23:55, Andy H пишет:
Hi list folk, I am attempting to identify multivariate outliers in relation to discriminant analysis. When I select casewise results in the classify dialog box of discriminant analysis in SPSS, I can inspect the Squared Mahalanobis Distance to Centroid for each case. Ive also read that I can save the Mahalanobis distances for each case via linear regression (which I have done). Im a little confused though as the values are different from each method. Even squared the regression mahalanobis distances are not the same as casewise. Im assuming the Squared Mahalanobis Distance to Centroid in the casewise statistics is calculated at the group level (I have 4 groups). I have also obtained the Mahalanobis distances via linear regression by selecting only cases in a given group. Sorry if this is simple stuff but could someone please explain why the values are different via these two methods. Regards A -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Multivariate-outliers-different-mahalanobis-distances-between-casewise-and-regression-tp5719071.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
I am away from the office until 4/4/2013. I will reply to your email when I return. Kind Regards Shelley |
In reply to this post by Andy H
Thank you for your replies.
@David. A good suggestion for a starting. Thank you. Ive located the info and there are differences. What those differences are is a little confusing though. Regression - http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.help%2Falg_regression.htm Discriminant - http://publib.boulder.ibm.com/infocenter/spssstat/v20r0m0/index.jsp?topic=%2Fcom.ibm.spss.statistics.help%2Fidh_disc_value.htm @Kirill. Thank you for your explanation. Although I have split the cases by group for the regression method, "For Discriminant, the space is the space of discriminant functions, *not* of the original independent variables" explains the differences between the regression and discriminant methods. Do you have any links to any literature that discusses this further please? Also, you website appears to not be online. I can access it via Google cache but the macros are not accessible via this method. Im opting to use the regression mahalanobis distances with cases split by group and my method for exploring multivariate outliers. Since identifying outliers is an initial exploratory task, using the regression method seems appropriate rather than the discriminant mahalanobis distances which are calculated further in the process. Does anyone have any thoughts on this please? Incidently, im also using z-scores via descriptives to identify univariate outliers. Regards A |
In reply to this post by Andy H
Andy,
1) Well, on the literature...
With Mahalanobis distance in general (= as in Regression), I
think, Wikipedia is enough. With it within Discriminant, one
should read topics on linear discriminant analysis (including
SPSS Algorithms, Cace Studies, Help). Here is my own answers about
how discriminant functions are extracted
and how they are used in classifying
cases.
2) Oh yeah, my web page is currently unavailable, don't know why. OK, then, I'm attaching the MATRIX functions collection for you with this email. 26.03.2013 12:43, Andy H пишет:
@Kirill. Thank you for your explanation. Although I have split the cases by group for the regression method, "For Discriminant, the space is the space of discriminant functions, *not* of the original independent variables" explains the differences between the regression and discriminant methods. Do you have any links to any literature that discusses this further please? Also, you website appears to not be online. I can access it via Google cache but the macros are not accessible via this method. |
Free forum by Nabble | Edit this page |