Login  Register

Re: Boxplot (seemingly) does not show outlier

Posted by Ken Belzer on Apr 06, 2007; 3:00pm
URL: http://spssx-discussion.165.s1.nabble.com/Boxplot-seemingly-does-not-show-outlier-tp1074926p1074929.html

Jan,

I've been following this thread with some interest as I am working with a small data set myself with many non-normally distributed variables and multiple outliers. I understand and typically use the boxplot approach for detecting outliers, or compute z-scores and consider those with z = +/- 2.0 as suspect. However, I am unfamiliar with the approach you descibe below. Specifically, what does "Outlier = case with deleted residual in the null linear model above 15 in absolute value" mean? Could you kindly clarify this, as I would like to understand this approach a little better.

Thanks very much in advance.
Ken


-----Original Message-----
From: [hidden email]
To: [hidden email]
Sent: Fri, 6 Apr 2007 4:29 AM
Subject: Re: Boxplot (seemingly) does not show outlier


Tom,

I think that you should first try to define what do you think is an
outlier, because the standard definitions are inappropriate in your
case, as I wrote earlier. After you have the definition, we can probably
implement it in SPSS.

For example you can define "Outlier = case with deleted residual in the
null linear model above 15 in absolute value". Then we can compute the
residuals and mark the outliers:

data list free /id judge .
begin data.
1 100
2 62
3 59
4 33
end data.
form all (f2).

compute myconst = 1.
UNIANOVA
  judge  BY myconst
  /METHOD = SSTYPE(3)
  /INTERCEPT = INCLUDE
  /SAVE = DRESID (dresid)
  /CRITERIA = ALPHA(.05)
  /DESIGN = myconst .
compute outlier = abs(dresid) > 15.
val lab outlier 1 "Outlier" 0 "Regular case".
exe.

(The disadvantage: All judges can be outliers in some scenarios, if the
spread of their judgements is huge enough.)

HTH

Jan



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Tom Werner
Sent: Thursday, April 05, 2007 7:23 PM
To: [hidden email]
Subject: Re: Boxplot (seemingly) does not show outlier

Thank you very much for your reply.

Yes, it's true that 4 points is a very small sample.

Unfortunately, it is in the nature of the real-world situation.

I have an awards program in which each entry is judged by 4 judges.
(Each set of 4 judges is randomly selected from a large pool of judges.)


Right now, for each entry, I review the judges' scores 'by eyeball' and
subjectively identify outliers.

(For example, if four scores were 62, 62, 59, and 33 (on a scale of
7-70), I would have subjectively said that the '33' is from an overly
strict, 'outlier' judge.)


I was wondering whether an SPSS boxplot could be produced for each entry
showing the 4 judges' scores, and thus use the InterQuartile Range + 1.5
IQR as a statistical definition of an outlier.


Note: If a conclusion here is that 5 data points (5 judges) is a better
approach, that is a possibility. That obviously involves more effort
(more judges), but if it produces more rigor it may be worth it.


Regards,

Tom


Tom Werner
Brandon Hall Research
734-433-1299
[hidden email]


-----Original Message-----
From: Ornelas, Fermin [mailto:[hidden email]]
Sent: Thursday, April 05, 2007 12:05 PM
To: Tom Werner; [hidden email]
Subject: RE: Boxplot (seemingly) does not show outlier

Let me take another shot at this. It is not clear what you are trying to
do in your analysis. Having only 4 data points is not a very meaningful
way to conduct statistical research. In most practical statistical
classes you will be reminded of questionable results when you have a
small sample size. None of the properties usually referred in regression
can be verified (normality, constant variance, outliers, independence).
That is what I was referring indirectly when I said "why bother if you
only have 4 observations".

Fermin Ornelas, Ph.D.
Management Analyst III, AZ DES
Tel: (602) 542-5639
E-mail: [hidden email]


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Tom Werner
Sent: Thursday, April 05, 2007 7:38 AM
To: [hidden email]
Subject: Boxplot (seemingly) does not show outlier

In SPSS (12.0 for Windows, Student version) when I attempt to produce a
boxplot of four data points (62, 61, 59, and 33), SPSS generates the
boxplot...

...but does NOT show 33 as an outlier (even though 33 would seem to be
an outlier relative to 62, 61, and 59 to the casual observer).

(I'm analyzing the scores of sets of four judges and would like to use
SPSS to produce boxplots to indicate 'outlier' judge scores.)

Even if I change the value of 33 to 13, it still does not show in a
boxplot as an outlier.

If I add a fifth data point (with a value as low as 50), 33 shows in a
boxplot as an outlier.

Can anyone explain this?

1.  Is it because of the even number of data points (four), thus
requiring that the median be a calculated value?

2.  Are five data points that much more powerful than four data points
at producing a tighter intraquartile range (i.e., a tighter box in the
boxplot), and thus generating an outlier?

3.  Is this perhaps a quirk of SPSS?

Much thanks for any help!

NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR
CONFIDENTIAL information and is intended only for the use of the
specific
individual(s) to whom it is addressed.  It may contain information that
is privileged and confidential under state and federal law.  This
information may be used or disclosed only in accordance with law, and
you may be subject to penalties under law for improper use or further
disclosure of

the information in this e-mail and its attachments. If you have received

this e-mail in error, please immediately notify the person named above
by reply e-mail, and then delete the original e-mail.  Thank you.
________________________________________________________________________
AOL now offers free email to everyone.  Find out more about what's free from AOL at AOL.com.