Boxplot (seemingly) does not show outlier

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Boxplot (seemingly) does not show outlier

Tom Werner
In SPSS (12.0 for Windows, Student version) when I attempt to produce a
boxplot of four data points (62, 61, 59, and 33), SPSS generates the boxplot...

...but does NOT show 33 as an outlier (even though 33 would seem to be an
outlier relative to 62, 61, and 59 to the casual observer).

(I'm analyzing the scores of sets of four judges and would like to use SPSS
to produce boxplots to indicate 'outlier' judge scores.)

Even if I change the value of 33 to 13, it still does not show in a boxplot
as an outlier.

If I add a fifth data point (with a value as low as 50), 33 shows in a
boxplot as an outlier.

Can anyone explain this?

1.  Is it because of the even number of data points (four), thus requiring
that the median be a calculated value?

2.  Are five data points that much more powerful than four data points at
producing a tighter intraquartile range (i.e., a tighter box in the
boxplot), and thus generating an outlier?

3.  Is this perhaps a quirk of SPSS?

Much thanks for any help!
Reply | Threaded
Open this post in threaded view
|

Re: Boxplot (seemingly) does not show outlier

Ornelas, Fermin
What I would recommend is just a simple plot of the residual vs fitted
values or residuals against the predictors. If you have an outlier it
will show in the plot. If an error is more than two standard deviations
from zero then it may be an outlier.

A normal probability plot will also show if a residual is an outlier.


Fermin Ornelas, Ph.D.
Management Analyst III, AZ DES
Tel: (602) 542-5639
E-mail: [hidden email]


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Tom Werner
Sent: Thursday, April 05, 2007 7:38 AM
To: [hidden email]
Subject: Boxplot (seemingly) does not show outlier

In SPSS (12.0 for Windows, Student version) when I attempt to produce a
boxplot of four data points (62, 61, 59, and 33), SPSS generates the
boxplot...

...but does NOT show 33 as an outlier (even though 33 would seem to be
an
outlier relative to 62, 61, and 59 to the casual observer).

(I'm analyzing the scores of sets of four judges and would like to use
SPSS
to produce boxplots to indicate 'outlier' judge scores.)

Even if I change the value of 33 to 13, it still does not show in a
boxplot
as an outlier.

If I add a fifth data point (with a value as low as 50), 33 shows in a
boxplot as an outlier.

Can anyone explain this?

1.  Is it because of the even number of data points (four), thus
requiring
that the median be a calculated value?

2.  Are five data points that much more powerful than four data points
at
producing a tighter intraquartile range (i.e., a tighter box in the
boxplot), and thus generating an outlier?

3.  Is this perhaps a quirk of SPSS?

Much thanks for any help!

NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR
CONFIDENTIAL information and is intended only for the use of the
specific
individual(s) to whom it is addressed.  It may contain information that
is
privileged and confidential under state and federal law.  This
information
may be used or disclosed only in accordance with law, and you may be
subject to penalties under law for improper use or further disclosure of

the information in this e-mail and its attachments. If you have received

this e-mail in error, please immediately notify the person named above
by
reply e-mail, and then delete the original e-mail.  Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: Boxplot (seemingly) does not show outlier

Spousta Jan
In reply to this post by Tom Werner
Hi,

If you have only four points, both metods (boxplot or mean +- 2 sd) are
worthles because they never show outliers regardles of positions of the
points. (Boxplots need at least 5 points, mean +- 2 sd needs at least 6
points to be able to detect one single outlier in some cases).

And this is in fact OK: the sample of four is too small to estimate the
"normal" behavior of the population correctly - therefore we are not
able to tell the regular points from outliers.

Of course if you have a specific prior information about the
distribution (Bayesian approach), you can sometimes detect an outlier
even in the sample of one.

Regards

Jan

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Ornelas, Fermin
Sent: Thursday, April 05, 2007 5:15 PM
To: [hidden email]
Subject: Re: Boxplot (seemingly) does not show outlier

What I would recommend is just a simple plot of the residual vs fitted
values or residuals against the predictors. If you have an outlier it
will show in the plot. If an error is more than two standard deviations
from zero then it may be an outlier.

A normal probability plot will also show if a residual is an outlier.


Fermin Ornelas, Ph.D.
Management Analyst III, AZ DES
Tel: (602) 542-5639
E-mail: [hidden email]


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Tom Werner
Sent: Thursday, April 05, 2007 7:38 AM
To: [hidden email]
Subject: Boxplot (seemingly) does not show outlier

In SPSS (12.0 for Windows, Student version) when I attempt to produce a
boxplot of four data points (62, 61, 59, and 33), SPSS generates the
boxplot...

...but does NOT show 33 as an outlier (even though 33 would seem to be
an outlier relative to 62, 61, and 59 to the casual observer).

(I'm analyzing the scores of sets of four judges and would like to use
SPSS to produce boxplots to indicate 'outlier' judge scores.)

Even if I change the value of 33 to 13, it still does not show in a
boxplot as an outlier.

If I add a fifth data point (with a value as low as 50), 33 shows in a
boxplot as an outlier.

Can anyone explain this?

1.  Is it because of the even number of data points (four), thus
requiring that the median be a calculated value?

2.  Are five data points that much more powerful than four data points
at producing a tighter intraquartile range (i.e., a tighter box in the
boxplot), and thus generating an outlier?

3.  Is this perhaps a quirk of SPSS?

Much thanks for any help!

NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR
CONFIDENTIAL information and is intended only for the use of the
specific
individual(s) to whom it is addressed.  It may contain information that
is privileged and confidential under state and federal law.  This
information may be used or disclosed only in accordance with law, and
you may be subject to penalties under law for improper use or further
disclosure of

the information in this e-mail and its attachments. If you have received

this e-mail in error, please immediately notify the person named above
by reply e-mail, and then delete the original e-mail.  Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: Boxplot (seemingly) does not show outlier

Ornelas, Fermin
I should have read the e-mail more carefully, obviously if you have only
4 data points why bother to do the analysis in the first place.

Fermin Ornelas, Ph.D.
Management Analyst III, AZ DES
Tel: (602) 542-5639
E-mail: [hidden email]


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Spousta Jan
Sent: Thursday, April 05, 2007 8:36 AM
To: [hidden email]
Subject: Re: Boxplot (seemingly) does not show outlier

Hi,

If you have only four points, both metods (boxplot or mean +- 2 sd) are
worthles because they never show outliers regardles of positions of the
points. (Boxplots need at least 5 points, mean +- 2 sd needs at least 6
points to be able to detect one single outlier in some cases).

And this is in fact OK: the sample of four is too small to estimate the
"normal" behavior of the population correctly - therefore we are not
able to tell the regular points from outliers.

Of course if you have a specific prior information about the
distribution (Bayesian approach), you can sometimes detect an outlier
even in the sample of one.

Regards

Jan

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Ornelas, Fermin
Sent: Thursday, April 05, 2007 5:15 PM
To: [hidden email]
Subject: Re: Boxplot (seemingly) does not show outlier

What I would recommend is just a simple plot of the residual vs fitted
values or residuals against the predictors. If you have an outlier it
will show in the plot. If an error is more than two standard deviations
from zero then it may be an outlier.

A normal probability plot will also show if a residual is an outlier.


Fermin Ornelas, Ph.D.
Management Analyst III, AZ DES
Tel: (602) 542-5639
E-mail: [hidden email]


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Tom Werner
Sent: Thursday, April 05, 2007 7:38 AM
To: [hidden email]
Subject: Boxplot (seemingly) does not show outlier

In SPSS (12.0 for Windows, Student version) when I attempt to produce a
boxplot of four data points (62, 61, 59, and 33), SPSS generates the
boxplot...

...but does NOT show 33 as an outlier (even though 33 would seem to be
an outlier relative to 62, 61, and 59 to the casual observer).

(I'm analyzing the scores of sets of four judges and would like to use
SPSS to produce boxplots to indicate 'outlier' judge scores.)

Even if I change the value of 33 to 13, it still does not show in a
boxplot as an outlier.

If I add a fifth data point (with a value as low as 50), 33 shows in a
boxplot as an outlier.

Can anyone explain this?

1.  Is it because of the even number of data points (four), thus
requiring that the median be a calculated value?

2.  Are five data points that much more powerful than four data points
at producing a tighter intraquartile range (i.e., a tighter box in the
boxplot), and thus generating an outlier?

3.  Is this perhaps a quirk of SPSS?

Much thanks for any help!

NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR
CONFIDENTIAL information and is intended only for the use of the
specific
individual(s) to whom it is addressed.  It may contain information that
is privileged and confidential under state and federal law.  This
information may be used or disclosed only in accordance with law, and
you may be subject to penalties under law for improper use or further
disclosure of

the information in this e-mail and its attachments. If you have received

this e-mail in error, please immediately notify the person named above
by reply e-mail, and then delete the original e-mail.  Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: Boxplot (seemingly) does not show outlier

Ornelas, Fermin
In reply to this post by Tom Werner
Let me take another shot at this. It is not clear what you are trying to
do in your analysis. Having only 4 data points is not a very meaningful
way to conduct statistical research. In most practical statistical
classes you will be reminded of questionable results when you have a
small sample size. None of the properties usually referred in regression
can be verified (normality, constant variance, outliers, independence).
That is what I was referring indirectly when I said "why bother if you
only have 4 observations".

Fermin Ornelas, Ph.D.
Management Analyst III, AZ DES
Tel: (602) 542-5639
E-mail: [hidden email]


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Tom Werner
Sent: Thursday, April 05, 2007 7:38 AM
To: [hidden email]
Subject: Boxplot (seemingly) does not show outlier

In SPSS (12.0 for Windows, Student version) when I attempt to produce a
boxplot of four data points (62, 61, 59, and 33), SPSS generates the
boxplot...

...but does NOT show 33 as an outlier (even though 33 would seem to be
an
outlier relative to 62, 61, and 59 to the casual observer).

(I'm analyzing the scores of sets of four judges and would like to use
SPSS
to produce boxplots to indicate 'outlier' judge scores.)

Even if I change the value of 33 to 13, it still does not show in a
boxplot
as an outlier.

If I add a fifth data point (with a value as low as 50), 33 shows in a
boxplot as an outlier.

Can anyone explain this?

1.  Is it because of the even number of data points (four), thus
requiring
that the median be a calculated value?

2.  Are five data points that much more powerful than four data points
at
producing a tighter intraquartile range (i.e., a tighter box in the
boxplot), and thus generating an outlier?

3.  Is this perhaps a quirk of SPSS?

Much thanks for any help!

NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR
CONFIDENTIAL information and is intended only for the use of the
specific
individual(s) to whom it is addressed.  It may contain information that
is
privileged and confidential under state and federal law.  This
information
may be used or disclosed only in accordance with law, and you may be
subject to penalties under law for improper use or further disclosure of

the information in this e-mail and its attachments. If you have received

this e-mail in error, please immediately notify the person named above
by
reply e-mail, and then delete the original e-mail.  Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: Boxplot (seemingly) does not show outlier

Tom Werner
Thank you very much for your reply.

Yes, it's true that 4 points is a very small sample.

Unfortunately, it is in the nature of the real-world situation.

I have an awards program in which each entry is judged by 4 judges. (Each
set of 4 judges is randomly selected from a large pool of judges.)


Right now, for each entry, I review the judges' scores 'by eyeball' and
subjectively identify outliers.

(For example, if four scores were 62, 62, 59, and 33 (on a scale of 7-70), I
would have subjectively said that the '33' is from an overly strict,
'outlier' judge.)


I was wondering whether an SPSS boxplot could be produced for each entry
showing the 4 judges' scores, and thus use the InterQuartile Range + 1.5 IQR
as a statistical definition of an outlier.


Note: If a conclusion here is that 5 data points (5 judges) is a better
approach, that is a possibility. That obviously involves more effort (more
judges), but if it produces more rigor it may be worth it.


Regards,

Tom


Tom Werner
Brandon Hall Research
734-433-1299
[hidden email]


-----Original Message-----
From: Ornelas, Fermin [mailto:[hidden email]]
Sent: Thursday, April 05, 2007 12:05 PM
To: Tom Werner; [hidden email]
Subject: RE: Boxplot (seemingly) does not show outlier

Let me take another shot at this. It is not clear what you are trying to do
in your analysis. Having only 4 data points is not a very meaningful way to
conduct statistical research. In most practical statistical classes you will
be reminded of questionable results when you have a small sample size. None
of the properties usually referred in regression can be verified (normality,
constant variance, outliers, independence).
That is what I was referring indirectly when I said "why bother if you only
have 4 observations".

Fermin Ornelas, Ph.D.
Management Analyst III, AZ DES
Tel: (602) 542-5639
E-mail: [hidden email]


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Tom
Werner
Sent: Thursday, April 05, 2007 7:38 AM
To: [hidden email]
Subject: Boxplot (seemingly) does not show outlier

In SPSS (12.0 for Windows, Student version) when I attempt to produce a
boxplot of four data points (62, 61, 59, and 33), SPSS generates the
boxplot...

...but does NOT show 33 as an outlier (even though 33 would seem to be an
outlier relative to 62, 61, and 59 to the casual observer).

(I'm analyzing the scores of sets of four judges and would like to use SPSS
to produce boxplots to indicate 'outlier' judge scores.)

Even if I change the value of 33 to 13, it still does not show in a boxplot
as an outlier.

If I add a fifth data point (with a value as low as 50), 33 shows in a
boxplot as an outlier.

Can anyone explain this?

1.  Is it because of the even number of data points (four), thus requiring
that the median be a calculated value?

2.  Are five data points that much more powerful than four data points at
producing a tighter intraquartile range (i.e., a tighter box in the
boxplot), and thus generating an outlier?

3.  Is this perhaps a quirk of SPSS?

Much thanks for any help!

NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR
CONFIDENTIAL information and is intended only for the use of the specific
individual(s) to whom it is addressed.  It may contain information that is
privileged and confidential under state and federal law.  This information
may be used or disclosed only in accordance with law, and you may be subject
to penalties under law for improper use or further disclosure of

the information in this e-mail and its attachments. If you have received

this e-mail in error, please immediately notify the person named above by
reply e-mail, and then delete the original e-mail.  Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: Boxplot (seemingly) does not show outlier

Ornelas, Fermin
In reply to this post by Tom Werner
It seems that "eyeballing" within the context of the discussion is
reasonable. To justify what you are doing you could calculate the mean
which is 54 and the standard deviation is about 12 so you could say that
the score 33 is an outlier. But if you kept previous scores you could
build a series which would allow you to have more robust conclusions
since you could calculate descriptive statistics for the whole series of
scores.
There is another technicality here that often outliers are usually
referred to in the context of model estimation such as regression,
ANOVA, etc. That is why one usually plots residuals versus fitted
response.

For 4 data points it seems that using the software is an over kill.

Fermin Ornelas, Ph.D.
Management Analyst III, AZ DES
Tel: (602) 542-5639
E-mail: [hidden email]


-----Original Message-----
From: Tom Werner [mailto:[hidden email]]
Sent: Thursday, April 05, 2007 10:23 AM
To: Ornelas, Fermin; [hidden email]; [hidden email]
Subject: RE: Boxplot (seemingly) does not show outlier


Thank you very much for your reply.

Yes, it's true that 4 points is a very small sample.

Unfortunately, it is in the nature of the real-world situation.

I have an awards program in which each entry is judged by 4 judges.
(Each
set of 4 judges is randomly selected from a large pool of judges.)


Right now, for each entry, I review the judges' scores 'by eyeball' and
subjectively identify outliers.

(For example, if four scores were 62, 62, 59, and 33 (on a scale of
7-70), I
would have subjectively said that the '33' is from an overly strict,
'outlier' judge.)


I was wondering whether an SPSS boxplot could be produced for each entry
showing the 4 judges' scores, and thus use the InterQuartile Range + 1.5
IQR
as a statistical definition of an outlier.


Note: If a conclusion here is that 5 data points (5 judges) is a better
approach, that is a possibility. That obviously involves more effort
(more
judges), but if it produces more rigor it may be worth it.


Regards,

Tom


Tom Werner
Brandon Hall Research
734-433-1299
[hidden email]


-----Original Message-----
From: Ornelas, Fermin [mailto:[hidden email]]
Sent: Thursday, April 05, 2007 12:05 PM
To: Tom Werner; [hidden email]
Subject: RE: Boxplot (seemingly) does not show outlier

Let me take another shot at this. It is not clear what you are trying to
do
in your analysis. Having only 4 data points is not a very meaningful way
to
conduct statistical research. In most practical statistical classes you
will
be reminded of questionable results when you have a small sample size.
None
of the properties usually referred in regression can be verified
(normality,
constant variance, outliers, independence).
That is what I was referring indirectly when I said "why bother if you
only
have 4 observations".

Fermin Ornelas, Ph.D.
Management Analyst III, AZ DES
Tel: (602) 542-5639
E-mail: [hidden email]


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Tom
Werner
Sent: Thursday, April 05, 2007 7:38 AM
To: [hidden email]
Subject: Boxplot (seemingly) does not show outlier

In SPSS (12.0 for Windows, Student version) when I attempt to produce a
boxplot of four data points (62, 61, 59, and 33), SPSS generates the
boxplot...

...but does NOT show 33 as an outlier (even though 33 would seem to be
an
outlier relative to 62, 61, and 59 to the casual observer).

(I'm analyzing the scores of sets of four judges and would like to use
SPSS
to produce boxplots to indicate 'outlier' judge scores.)

Even if I change the value of 33 to 13, it still does not show in a
boxplot
as an outlier.

If I add a fifth data point (with a value as low as 50), 33 shows in a
boxplot as an outlier.

Can anyone explain this?

1.  Is it because of the even number of data points (four), thus
requiring
that the median be a calculated value?

2.  Are five data points that much more powerful than four data points
at
producing a tighter intraquartile range (i.e., a tighter box in the
boxplot), and thus generating an outlier?

3.  Is this perhaps a quirk of SPSS?

Much thanks for any help!

NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR
CONFIDENTIAL information and is intended only for the use of the
specific
individual(s) to whom it is addressed.  It may contain information that
is
privileged and confidential under state and federal law.  This
information
may be used or disclosed only in accordance with law, and you may be
subject
to penalties under law for improper use or further disclosure of

the information in this e-mail and its attachments. If you have received

this e-mail in error, please immediately notify the person named above
by
reply e-mail, and then delete the original e-mail.  Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: Boxplot (seemingly) does not show outlier

Spousta Jan
In reply to this post by Tom Werner
Tom,

I think that you should first try to define what do you think is an
outlier, because the standard definitions are inappropriate in your
case, as I wrote earlier. After you have the definition, we can probably
implement it in SPSS.

For example you can define "Outlier = case with deleted residual in the
null linear model above 15 in absolute value". Then we can compute the
residuals and mark the outliers:

data list free /id judge .
begin data.
1 100
2 62
3 59
4 33
end data.
form all (f2).

compute myconst = 1.
UNIANOVA
  judge  BY myconst
  /METHOD = SSTYPE(3)
  /INTERCEPT = INCLUDE
  /SAVE = DRESID (dresid)
  /CRITERIA = ALPHA(.05)
  /DESIGN = myconst .
compute outlier = abs(dresid) > 15.
val lab outlier 1 "Outlier" 0 "Regular case".
exe.

(The disadvantage: All judges can be outliers in some scenarios, if the
spread of their judgements is huge enough.)

HTH

Jan



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Tom Werner
Sent: Thursday, April 05, 2007 7:23 PM
To: [hidden email]
Subject: Re: Boxplot (seemingly) does not show outlier

Thank you very much for your reply.

Yes, it's true that 4 points is a very small sample.

Unfortunately, it is in the nature of the real-world situation.

I have an awards program in which each entry is judged by 4 judges.
(Each set of 4 judges is randomly selected from a large pool of judges.)


Right now, for each entry, I review the judges' scores 'by eyeball' and
subjectively identify outliers.

(For example, if four scores were 62, 62, 59, and 33 (on a scale of
7-70), I would have subjectively said that the '33' is from an overly
strict, 'outlier' judge.)


I was wondering whether an SPSS boxplot could be produced for each entry
showing the 4 judges' scores, and thus use the InterQuartile Range + 1.5
IQR as a statistical definition of an outlier.


Note: If a conclusion here is that 5 data points (5 judges) is a better
approach, that is a possibility. That obviously involves more effort
(more judges), but if it produces more rigor it may be worth it.


Regards,

Tom


Tom Werner
Brandon Hall Research
734-433-1299
[hidden email]


-----Original Message-----
From: Ornelas, Fermin [mailto:[hidden email]]
Sent: Thursday, April 05, 2007 12:05 PM
To: Tom Werner; [hidden email]
Subject: RE: Boxplot (seemingly) does not show outlier

Let me take another shot at this. It is not clear what you are trying to
do in your analysis. Having only 4 data points is not a very meaningful
way to conduct statistical research. In most practical statistical
classes you will be reminded of questionable results when you have a
small sample size. None of the properties usually referred in regression
can be verified (normality, constant variance, outliers, independence).
That is what I was referring indirectly when I said "why bother if you
only have 4 observations".

Fermin Ornelas, Ph.D.
Management Analyst III, AZ DES
Tel: (602) 542-5639
E-mail: [hidden email]


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Tom Werner
Sent: Thursday, April 05, 2007 7:38 AM
To: [hidden email]
Subject: Boxplot (seemingly) does not show outlier

In SPSS (12.0 for Windows, Student version) when I attempt to produce a
boxplot of four data points (62, 61, 59, and 33), SPSS generates the
boxplot...

...but does NOT show 33 as an outlier (even though 33 would seem to be
an outlier relative to 62, 61, and 59 to the casual observer).

(I'm analyzing the scores of sets of four judges and would like to use
SPSS to produce boxplots to indicate 'outlier' judge scores.)

Even if I change the value of 33 to 13, it still does not show in a
boxplot as an outlier.

If I add a fifth data point (with a value as low as 50), 33 shows in a
boxplot as an outlier.

Can anyone explain this?

1.  Is it because of the even number of data points (four), thus
requiring that the median be a calculated value?

2.  Are five data points that much more powerful than four data points
at producing a tighter intraquartile range (i.e., a tighter box in the
boxplot), and thus generating an outlier?

3.  Is this perhaps a quirk of SPSS?

Much thanks for any help!

NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR
CONFIDENTIAL information and is intended only for the use of the
specific
individual(s) to whom it is addressed.  It may contain information that
is privileged and confidential under state and federal law.  This
information may be used or disclosed only in accordance with law, and
you may be subject to penalties under law for improper use or further
disclosure of

the information in this e-mail and its attachments. If you have received

this e-mail in error, please immediately notify the person named above
by reply e-mail, and then delete the original e-mail.  Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: Boxplot (seemingly) does not show outlier

Ken Belzer
Jan,

I've been following this thread with some interest as I am working with a small data set myself with many non-normally distributed variables and multiple outliers. I understand and typically use the boxplot approach for detecting outliers, or compute z-scores and consider those with z = +/- 2.0 as suspect. However, I am unfamiliar with the approach you descibe below. Specifically, what does "Outlier = case with deleted residual in the null linear model above 15 in absolute value" mean? Could you kindly clarify this, as I would like to understand this approach a little better.

Thanks very much in advance.
Ken


-----Original Message-----
From: [hidden email]
To: [hidden email]
Sent: Fri, 6 Apr 2007 4:29 AM
Subject: Re: Boxplot (seemingly) does not show outlier


Tom,

I think that you should first try to define what do you think is an
outlier, because the standard definitions are inappropriate in your
case, as I wrote earlier. After you have the definition, we can probably
implement it in SPSS.

For example you can define "Outlier = case with deleted residual in the
null linear model above 15 in absolute value". Then we can compute the
residuals and mark the outliers:

data list free /id judge .
begin data.
1 100
2 62
3 59
4 33
end data.
form all (f2).

compute myconst = 1.
UNIANOVA
  judge  BY myconst
  /METHOD = SSTYPE(3)
  /INTERCEPT = INCLUDE
  /SAVE = DRESID (dresid)
  /CRITERIA = ALPHA(.05)
  /DESIGN = myconst .
compute outlier = abs(dresid) > 15.
val lab outlier 1 "Outlier" 0 "Regular case".
exe.

(The disadvantage: All judges can be outliers in some scenarios, if the
spread of their judgements is huge enough.)

HTH

Jan



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Tom Werner
Sent: Thursday, April 05, 2007 7:23 PM
To: [hidden email]
Subject: Re: Boxplot (seemingly) does not show outlier

Thank you very much for your reply.

Yes, it's true that 4 points is a very small sample.

Unfortunately, it is in the nature of the real-world situation.

I have an awards program in which each entry is judged by 4 judges.
(Each set of 4 judges is randomly selected from a large pool of judges.)


Right now, for each entry, I review the judges' scores 'by eyeball' and
subjectively identify outliers.

(For example, if four scores were 62, 62, 59, and 33 (on a scale of
7-70), I would have subjectively said that the '33' is from an overly
strict, 'outlier' judge.)


I was wondering whether an SPSS boxplot could be produced for each entry
showing the 4 judges' scores, and thus use the InterQuartile Range + 1.5
IQR as a statistical definition of an outlier.


Note: If a conclusion here is that 5 data points (5 judges) is a better
approach, that is a possibility. That obviously involves more effort
(more judges), but if it produces more rigor it may be worth it.


Regards,

Tom


Tom Werner
Brandon Hall Research
734-433-1299
[hidden email]


-----Original Message-----
From: Ornelas, Fermin [mailto:[hidden email]]
Sent: Thursday, April 05, 2007 12:05 PM
To: Tom Werner; [hidden email]
Subject: RE: Boxplot (seemingly) does not show outlier

Let me take another shot at this. It is not clear what you are trying to
do in your analysis. Having only 4 data points is not a very meaningful
way to conduct statistical research. In most practical statistical
classes you will be reminded of questionable results when you have a
small sample size. None of the properties usually referred in regression
can be verified (normality, constant variance, outliers, independence).
That is what I was referring indirectly when I said "why bother if you
only have 4 observations".

Fermin Ornelas, Ph.D.
Management Analyst III, AZ DES
Tel: (602) 542-5639
E-mail: [hidden email]


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Tom Werner
Sent: Thursday, April 05, 2007 7:38 AM
To: [hidden email]
Subject: Boxplot (seemingly) does not show outlier

In SPSS (12.0 for Windows, Student version) when I attempt to produce a
boxplot of four data points (62, 61, 59, and 33), SPSS generates the
boxplot...

...but does NOT show 33 as an outlier (even though 33 would seem to be
an outlier relative to 62, 61, and 59 to the casual observer).

(I'm analyzing the scores of sets of four judges and would like to use
SPSS to produce boxplots to indicate 'outlier' judge scores.)

Even if I change the value of 33 to 13, it still does not show in a
boxplot as an outlier.

If I add a fifth data point (with a value as low as 50), 33 shows in a
boxplot as an outlier.

Can anyone explain this?

1.  Is it because of the even number of data points (four), thus
requiring that the median be a calculated value?

2.  Are five data points that much more powerful than four data points
at producing a tighter intraquartile range (i.e., a tighter box in the
boxplot), and thus generating an outlier?

3.  Is this perhaps a quirk of SPSS?

Much thanks for any help!

NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR
CONFIDENTIAL information and is intended only for the use of the
specific
individual(s) to whom it is addressed.  It may contain information that
is privileged and confidential under state and federal law.  This
information may be used or disclosed only in accordance with law, and
you may be subject to penalties under law for improper use or further
disclosure of

the information in this e-mail and its attachments. If you have received

this e-mail in error, please immediately notify the person named above
by reply e-mail, and then delete the original e-mail.  Thank you.
________________________________________________________________________
AOL now offers free email to everyone.  Find out more about what's free from AOL at AOL.com.
Reply | Threaded
Open this post in threaded view
|

Re: Boxplot (seemingly) does not show outlier

Spousta Jan
In reply to this post by Tom Werner
Ken,

It is a complicated way how to say that the suspect case is more than 15
points from the mean of other cases. There is no big science behind it,
it was only an example how can a definition of outliers look like.

From a broader perspective: If you have only 4 cases, priors (=your
knowledge/expectations about the behavior of judges) are of paramount
importance.

* Either you have no specific idea about it (= flat, non-informative
priors). Then it is impossible to estimate the distribution parameters
from the data with a reasonable degree of exactness. From the classical
point of view, you are forced to accept all judgements as inliers.

* Or you have more specific expectations. Then you should formulate them
and derive a rule or definition of outliers (like the mine, which says
that good judges agree within a 15 points interval) or compute the
posterior probabilities directly using the Bayesian framework.

Regards,

Jan

-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Ken Belzer
Sent: Friday, April 06, 2007 4:01 PM
To: [hidden email]
Subject: Re: Boxplot (seemingly) does not show outlier

Jan,

I've been following this thread with some interest as I am working with
a small data set myself with many non-normally distributed variables and
multiple outliers. I understand and typically use the boxplot approach
for detecting outliers, or compute z-scores and consider those with z =
+/- 2.0 as suspect. However, I am unfamiliar with the approach you
descibe below. Specifically, what does "Outlier = case with deleted
residual in the null linear model above 15 in absolute value" mean?
Could you kindly clarify this, as I would like to understand this
approach a little better.

Thanks very much in advance.
Ken


-----Original Message-----
From: [hidden email]
To: [hidden email]
Sent: Fri, 6 Apr 2007 4:29 AM
Subject: Re: Boxplot (seemingly) does not show outlier


Tom,

I think that you should first try to define what do you think is an
outlier, because the standard definitions are inappropriate in your
case, as I wrote earlier. After you have the definition, we can probably
implement it in SPSS.

For example you can define "Outlier = case with deleted residual in the
null linear model above 15 in absolute value". Then we can compute the
residuals and mark the outliers:

data list free /id judge .
begin data.
1 100
2 62
3 59
4 33
end data.
form all (f2).

compute myconst = 1.
UNIANOVA
  judge  BY myconst
  /METHOD = SSTYPE(3)
  /INTERCEPT = INCLUDE
  /SAVE = DRESID (dresid)
  /CRITERIA = ALPHA(.05)
  /DESIGN = myconst .
compute outlier = abs(dresid) > 15.
val lab outlier 1 "Outlier" 0 "Regular case".
exe.

(The disadvantage: All judges can be outliers in some scenarios, if the
spread of their judgements is huge enough.)

HTH

Jan



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Tom Werner
Sent: Thursday, April 05, 2007 7:23 PM
To: [hidden email]
Subject: Re: Boxplot (seemingly) does not show outlier

Thank you very much for your reply.

Yes, it's true that 4 points is a very small sample.

Unfortunately, it is in the nature of the real-world situation.

I have an awards program in which each entry is judged by 4 judges.
(Each set of 4 judges is randomly selected from a large pool of judges.)


Right now, for each entry, I review the judges' scores 'by eyeball' and
subjectively identify outliers.

(For example, if four scores were 62, 62, 59, and 33 (on a scale of
7-70), I would have subjectively said that the '33' is from an overly
strict, 'outlier' judge.)


I was wondering whether an SPSS boxplot could be produced for each entry
showing the 4 judges' scores, and thus use the InterQuartile Range + 1.5
IQR as a statistical definition of an outlier.


Note: If a conclusion here is that 5 data points (5 judges) is a better
approach, that is a possibility. That obviously involves more effort
(more judges), but if it produces more rigor it may be worth it.


Regards,

Tom


Tom Werner
Brandon Hall Research
734-433-1299
[hidden email]


-----Original Message-----
From: Ornelas, Fermin [mailto:[hidden email]]
Sent: Thursday, April 05, 2007 12:05 PM
To: Tom Werner; [hidden email]
Subject: RE: Boxplot (seemingly) does not show outlier

Let me take another shot at this. It is not clear what you are trying to
do in your analysis. Having only 4 data points is not a very meaningful
way to conduct statistical research. In most practical statistical
classes you will be reminded of questionable results when you have a
small sample size. None of the properties usually referred in regression
can be verified (normality, constant variance, outliers, independence).
That is what I was referring indirectly when I said "why bother if you
only have 4 observations".

Fermin Ornelas, Ph.D.
Management Analyst III, AZ DES
Tel: (602) 542-5639
E-mail: [hidden email]


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of
Tom Werner
Sent: Thursday, April 05, 2007 7:38 AM
To: [hidden email]
Subject: Boxplot (seemingly) does not show outlier

In SPSS (12.0 for Windows, Student version) when I attempt to produce a
boxplot of four data points (62, 61, 59, and 33), SPSS generates the
boxplot...

...but does NOT show 33 as an outlier (even though 33 would seem to be
an outlier relative to 62, 61, and 59 to the casual observer).

(I'm analyzing the scores of sets of four judges and would like to use
SPSS to produce boxplots to indicate 'outlier' judge scores.)

Even if I change the value of 33 to 13, it still does not show in a
boxplot as an outlier.

If I add a fifth data point (with a value as low as 50), 33 shows in a
boxplot as an outlier.

Can anyone explain this?

1.  Is it because of the even number of data points (four), thus
requiring that the median be a calculated value?

2.  Are five data points that much more powerful than four data points
at producing a tighter intraquartile range (i.e., a tighter box in the
boxplot), and thus generating an outlier?

3.  Is this perhaps a quirk of SPSS?

Much thanks for any help!

NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR
CONFIDENTIAL information and is intended only for the use of the
specific
individual(s) to whom it is addressed.  It may contain information that
is privileged and confidential under state and federal law.  This
information may be used or disclosed only in accordance with law, and
you may be subject to penalties under law for improper use or further
disclosure of

the information in this e-mail and its attachments. If you have received

this e-mail in error, please immediately notify the person named above
by reply e-mail, and then delete the original e-mail.  Thank you.
________________________________________________________________________
AOL now offers free email to everyone.  Find out more about what's free
from AOL at AOL.com.