In SPSS (12.0 for Windows, Student version) when I attempt to produce a
boxplot of four data points (62, 61, 59, and 33), SPSS generates the boxplot... ...but does NOT show 33 as an outlier (even though 33 would seem to be an outlier relative to 62, 61, and 59 to the casual observer). (I'm analyzing the scores of sets of four judges and would like to use SPSS to produce boxplots to indicate 'outlier' judge scores.) Even if I change the value of 33 to 13, it still does not show in a boxplot as an outlier. If I add a fifth data point (with a value as low as 50), 33 shows in a boxplot as an outlier. Can anyone explain this? 1. Is it because of the even number of data points (four), thus requiring that the median be a calculated value? 2. Are five data points that much more powerful than four data points at producing a tighter intraquartile range (i.e., a tighter box in the boxplot), and thus generating an outlier? 3. Is this perhaps a quirk of SPSS? Much thanks for any help! |
What I would recommend is just a simple plot of the residual vs fitted
values or residuals against the predictors. If you have an outlier it will show in the plot. If an error is more than two standard deviations from zero then it may be an outlier. A normal probability plot will also show if a residual is an outlier. Fermin Ornelas, Ph.D. Management Analyst III, AZ DES Tel: (602) 542-5639 E-mail: [hidden email] -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Tom Werner Sent: Thursday, April 05, 2007 7:38 AM To: [hidden email] Subject: Boxplot (seemingly) does not show outlier In SPSS (12.0 for Windows, Student version) when I attempt to produce a boxplot of four data points (62, 61, 59, and 33), SPSS generates the boxplot... ...but does NOT show 33 as an outlier (even though 33 would seem to be an outlier relative to 62, 61, and 59 to the casual observer). (I'm analyzing the scores of sets of four judges and would like to use SPSS to produce boxplots to indicate 'outlier' judge scores.) Even if I change the value of 33 to 13, it still does not show in a boxplot as an outlier. If I add a fifth data point (with a value as low as 50), 33 shows in a boxplot as an outlier. Can anyone explain this? 1. Is it because of the even number of data points (four), thus requiring that the median be a calculated value? 2. Are five data points that much more powerful than four data points at producing a tighter intraquartile range (i.e., a tighter box in the boxplot), and thus generating an outlier? 3. Is this perhaps a quirk of SPSS? Much thanks for any help! NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR CONFIDENTIAL information and is intended only for the use of the specific individual(s) to whom it is addressed. It may contain information that is privileged and confidential under state and federal law. This information may be used or disclosed only in accordance with law, and you may be subject to penalties under law for improper use or further disclosure of the information in this e-mail and its attachments. If you have received this e-mail in error, please immediately notify the person named above by reply e-mail, and then delete the original e-mail. Thank you. |
In reply to this post by Tom Werner
Hi,
If you have only four points, both metods (boxplot or mean +- 2 sd) are worthles because they never show outliers regardles of positions of the points. (Boxplots need at least 5 points, mean +- 2 sd needs at least 6 points to be able to detect one single outlier in some cases). And this is in fact OK: the sample of four is too small to estimate the "normal" behavior of the population correctly - therefore we are not able to tell the regular points from outliers. Of course if you have a specific prior information about the distribution (Bayesian approach), you can sometimes detect an outlier even in the sample of one. Regards Jan -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Ornelas, Fermin Sent: Thursday, April 05, 2007 5:15 PM To: [hidden email] Subject: Re: Boxplot (seemingly) does not show outlier What I would recommend is just a simple plot of the residual vs fitted values or residuals against the predictors. If you have an outlier it will show in the plot. If an error is more than two standard deviations from zero then it may be an outlier. A normal probability plot will also show if a residual is an outlier. Fermin Ornelas, Ph.D. Management Analyst III, AZ DES Tel: (602) 542-5639 E-mail: [hidden email] -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Tom Werner Sent: Thursday, April 05, 2007 7:38 AM To: [hidden email] Subject: Boxplot (seemingly) does not show outlier In SPSS (12.0 for Windows, Student version) when I attempt to produce a boxplot of four data points (62, 61, 59, and 33), SPSS generates the boxplot... ...but does NOT show 33 as an outlier (even though 33 would seem to be an outlier relative to 62, 61, and 59 to the casual observer). (I'm analyzing the scores of sets of four judges and would like to use SPSS to produce boxplots to indicate 'outlier' judge scores.) Even if I change the value of 33 to 13, it still does not show in a boxplot as an outlier. If I add a fifth data point (with a value as low as 50), 33 shows in a boxplot as an outlier. Can anyone explain this? 1. Is it because of the even number of data points (four), thus requiring that the median be a calculated value? 2. Are five data points that much more powerful than four data points at producing a tighter intraquartile range (i.e., a tighter box in the boxplot), and thus generating an outlier? 3. Is this perhaps a quirk of SPSS? Much thanks for any help! NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR CONFIDENTIAL information and is intended only for the use of the specific individual(s) to whom it is addressed. It may contain information that is privileged and confidential under state and federal law. This information may be used or disclosed only in accordance with law, and you may be subject to penalties under law for improper use or further disclosure of the information in this e-mail and its attachments. If you have received this e-mail in error, please immediately notify the person named above by reply e-mail, and then delete the original e-mail. Thank you. |
I should have read the e-mail more carefully, obviously if you have only
4 data points why bother to do the analysis in the first place. Fermin Ornelas, Ph.D. Management Analyst III, AZ DES Tel: (602) 542-5639 E-mail: [hidden email] -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Spousta Jan Sent: Thursday, April 05, 2007 8:36 AM To: [hidden email] Subject: Re: Boxplot (seemingly) does not show outlier Hi, If you have only four points, both metods (boxplot or mean +- 2 sd) are worthles because they never show outliers regardles of positions of the points. (Boxplots need at least 5 points, mean +- 2 sd needs at least 6 points to be able to detect one single outlier in some cases). And this is in fact OK: the sample of four is too small to estimate the "normal" behavior of the population correctly - therefore we are not able to tell the regular points from outliers. Of course if you have a specific prior information about the distribution (Bayesian approach), you can sometimes detect an outlier even in the sample of one. Regards Jan -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Ornelas, Fermin Sent: Thursday, April 05, 2007 5:15 PM To: [hidden email] Subject: Re: Boxplot (seemingly) does not show outlier What I would recommend is just a simple plot of the residual vs fitted values or residuals against the predictors. If you have an outlier it will show in the plot. If an error is more than two standard deviations from zero then it may be an outlier. A normal probability plot will also show if a residual is an outlier. Fermin Ornelas, Ph.D. Management Analyst III, AZ DES Tel: (602) 542-5639 E-mail: [hidden email] -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Tom Werner Sent: Thursday, April 05, 2007 7:38 AM To: [hidden email] Subject: Boxplot (seemingly) does not show outlier In SPSS (12.0 for Windows, Student version) when I attempt to produce a boxplot of four data points (62, 61, 59, and 33), SPSS generates the boxplot... ...but does NOT show 33 as an outlier (even though 33 would seem to be an outlier relative to 62, 61, and 59 to the casual observer). (I'm analyzing the scores of sets of four judges and would like to use SPSS to produce boxplots to indicate 'outlier' judge scores.) Even if I change the value of 33 to 13, it still does not show in a boxplot as an outlier. If I add a fifth data point (with a value as low as 50), 33 shows in a boxplot as an outlier. Can anyone explain this? 1. Is it because of the even number of data points (four), thus requiring that the median be a calculated value? 2. Are five data points that much more powerful than four data points at producing a tighter intraquartile range (i.e., a tighter box in the boxplot), and thus generating an outlier? 3. Is this perhaps a quirk of SPSS? Much thanks for any help! NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR CONFIDENTIAL information and is intended only for the use of the specific individual(s) to whom it is addressed. It may contain information that is privileged and confidential under state and federal law. This information may be used or disclosed only in accordance with law, and you may be subject to penalties under law for improper use or further disclosure of the information in this e-mail and its attachments. If you have received this e-mail in error, please immediately notify the person named above by reply e-mail, and then delete the original e-mail. Thank you. |
In reply to this post by Tom Werner
Let me take another shot at this. It is not clear what you are trying to
do in your analysis. Having only 4 data points is not a very meaningful way to conduct statistical research. In most practical statistical classes you will be reminded of questionable results when you have a small sample size. None of the properties usually referred in regression can be verified (normality, constant variance, outliers, independence). That is what I was referring indirectly when I said "why bother if you only have 4 observations". Fermin Ornelas, Ph.D. Management Analyst III, AZ DES Tel: (602) 542-5639 E-mail: [hidden email] -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Tom Werner Sent: Thursday, April 05, 2007 7:38 AM To: [hidden email] Subject: Boxplot (seemingly) does not show outlier In SPSS (12.0 for Windows, Student version) when I attempt to produce a boxplot of four data points (62, 61, 59, and 33), SPSS generates the boxplot... ...but does NOT show 33 as an outlier (even though 33 would seem to be an outlier relative to 62, 61, and 59 to the casual observer). (I'm analyzing the scores of sets of four judges and would like to use SPSS to produce boxplots to indicate 'outlier' judge scores.) Even if I change the value of 33 to 13, it still does not show in a boxplot as an outlier. If I add a fifth data point (with a value as low as 50), 33 shows in a boxplot as an outlier. Can anyone explain this? 1. Is it because of the even number of data points (four), thus requiring that the median be a calculated value? 2. Are five data points that much more powerful than four data points at producing a tighter intraquartile range (i.e., a tighter box in the boxplot), and thus generating an outlier? 3. Is this perhaps a quirk of SPSS? Much thanks for any help! NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR CONFIDENTIAL information and is intended only for the use of the specific individual(s) to whom it is addressed. It may contain information that is privileged and confidential under state and federal law. This information may be used or disclosed only in accordance with law, and you may be subject to penalties under law for improper use or further disclosure of the information in this e-mail and its attachments. If you have received this e-mail in error, please immediately notify the person named above by reply e-mail, and then delete the original e-mail. Thank you. |
Thank you very much for your reply.
Yes, it's true that 4 points is a very small sample. Unfortunately, it is in the nature of the real-world situation. I have an awards program in which each entry is judged by 4 judges. (Each set of 4 judges is randomly selected from a large pool of judges.) Right now, for each entry, I review the judges' scores 'by eyeball' and subjectively identify outliers. (For example, if four scores were 62, 62, 59, and 33 (on a scale of 7-70), I would have subjectively said that the '33' is from an overly strict, 'outlier' judge.) I was wondering whether an SPSS boxplot could be produced for each entry showing the 4 judges' scores, and thus use the InterQuartile Range + 1.5 IQR as a statistical definition of an outlier. Note: If a conclusion here is that 5 data points (5 judges) is a better approach, that is a possibility. That obviously involves more effort (more judges), but if it produces more rigor it may be worth it. Regards, Tom Tom Werner Brandon Hall Research 734-433-1299 [hidden email] -----Original Message----- From: Ornelas, Fermin [mailto:[hidden email]] Sent: Thursday, April 05, 2007 12:05 PM To: Tom Werner; [hidden email] Subject: RE: Boxplot (seemingly) does not show outlier Let me take another shot at this. It is not clear what you are trying to do in your analysis. Having only 4 data points is not a very meaningful way to conduct statistical research. In most practical statistical classes you will be reminded of questionable results when you have a small sample size. None of the properties usually referred in regression can be verified (normality, constant variance, outliers, independence). That is what I was referring indirectly when I said "why bother if you only have 4 observations". Fermin Ornelas, Ph.D. Management Analyst III, AZ DES Tel: (602) 542-5639 E-mail: [hidden email] -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Tom Werner Sent: Thursday, April 05, 2007 7:38 AM To: [hidden email] Subject: Boxplot (seemingly) does not show outlier In SPSS (12.0 for Windows, Student version) when I attempt to produce a boxplot of four data points (62, 61, 59, and 33), SPSS generates the boxplot... ...but does NOT show 33 as an outlier (even though 33 would seem to be an outlier relative to 62, 61, and 59 to the casual observer). (I'm analyzing the scores of sets of four judges and would like to use SPSS to produce boxplots to indicate 'outlier' judge scores.) Even if I change the value of 33 to 13, it still does not show in a boxplot as an outlier. If I add a fifth data point (with a value as low as 50), 33 shows in a boxplot as an outlier. Can anyone explain this? 1. Is it because of the even number of data points (four), thus requiring that the median be a calculated value? 2. Are five data points that much more powerful than four data points at producing a tighter intraquartile range (i.e., a tighter box in the boxplot), and thus generating an outlier? 3. Is this perhaps a quirk of SPSS? Much thanks for any help! NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR CONFIDENTIAL information and is intended only for the use of the specific individual(s) to whom it is addressed. It may contain information that is privileged and confidential under state and federal law. This information may be used or disclosed only in accordance with law, and you may be subject to penalties under law for improper use or further disclosure of the information in this e-mail and its attachments. If you have received this e-mail in error, please immediately notify the person named above by reply e-mail, and then delete the original e-mail. Thank you. |
In reply to this post by Tom Werner
It seems that "eyeballing" within the context of the discussion is
reasonable. To justify what you are doing you could calculate the mean which is 54 and the standard deviation is about 12 so you could say that the score 33 is an outlier. But if you kept previous scores you could build a series which would allow you to have more robust conclusions since you could calculate descriptive statistics for the whole series of scores. There is another technicality here that often outliers are usually referred to in the context of model estimation such as regression, ANOVA, etc. That is why one usually plots residuals versus fitted response. For 4 data points it seems that using the software is an over kill. Fermin Ornelas, Ph.D. Management Analyst III, AZ DES Tel: (602) 542-5639 E-mail: [hidden email] -----Original Message----- From: Tom Werner [mailto:[hidden email]] Sent: Thursday, April 05, 2007 10:23 AM To: Ornelas, Fermin; [hidden email]; [hidden email] Subject: RE: Boxplot (seemingly) does not show outlier Thank you very much for your reply. Yes, it's true that 4 points is a very small sample. Unfortunately, it is in the nature of the real-world situation. I have an awards program in which each entry is judged by 4 judges. (Each set of 4 judges is randomly selected from a large pool of judges.) Right now, for each entry, I review the judges' scores 'by eyeball' and subjectively identify outliers. (For example, if four scores were 62, 62, 59, and 33 (on a scale of 7-70), I would have subjectively said that the '33' is from an overly strict, 'outlier' judge.) I was wondering whether an SPSS boxplot could be produced for each entry showing the 4 judges' scores, and thus use the InterQuartile Range + 1.5 IQR as a statistical definition of an outlier. Note: If a conclusion here is that 5 data points (5 judges) is a better approach, that is a possibility. That obviously involves more effort (more judges), but if it produces more rigor it may be worth it. Regards, Tom Tom Werner Brandon Hall Research 734-433-1299 [hidden email] -----Original Message----- From: Ornelas, Fermin [mailto:[hidden email]] Sent: Thursday, April 05, 2007 12:05 PM To: Tom Werner; [hidden email] Subject: RE: Boxplot (seemingly) does not show outlier Let me take another shot at this. It is not clear what you are trying to do in your analysis. Having only 4 data points is not a very meaningful way to conduct statistical research. In most practical statistical classes you will be reminded of questionable results when you have a small sample size. None of the properties usually referred in regression can be verified (normality, constant variance, outliers, independence). That is what I was referring indirectly when I said "why bother if you only have 4 observations". Fermin Ornelas, Ph.D. Management Analyst III, AZ DES Tel: (602) 542-5639 E-mail: [hidden email] -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Tom Werner Sent: Thursday, April 05, 2007 7:38 AM To: [hidden email] Subject: Boxplot (seemingly) does not show outlier In SPSS (12.0 for Windows, Student version) when I attempt to produce a boxplot of four data points (62, 61, 59, and 33), SPSS generates the boxplot... ...but does NOT show 33 as an outlier (even though 33 would seem to be an outlier relative to 62, 61, and 59 to the casual observer). (I'm analyzing the scores of sets of four judges and would like to use SPSS to produce boxplots to indicate 'outlier' judge scores.) Even if I change the value of 33 to 13, it still does not show in a boxplot as an outlier. If I add a fifth data point (with a value as low as 50), 33 shows in a boxplot as an outlier. Can anyone explain this? 1. Is it because of the even number of data points (four), thus requiring that the median be a calculated value? 2. Are five data points that much more powerful than four data points at producing a tighter intraquartile range (i.e., a tighter box in the boxplot), and thus generating an outlier? 3. Is this perhaps a quirk of SPSS? Much thanks for any help! NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR CONFIDENTIAL information and is intended only for the use of the specific individual(s) to whom it is addressed. It may contain information that is privileged and confidential under state and federal law. This information may be used or disclosed only in accordance with law, and you may be subject to penalties under law for improper use or further disclosure of the information in this e-mail and its attachments. If you have received this e-mail in error, please immediately notify the person named above by reply e-mail, and then delete the original e-mail. Thank you. |
In reply to this post by Tom Werner
Tom,
I think that you should first try to define what do you think is an outlier, because the standard definitions are inappropriate in your case, as I wrote earlier. After you have the definition, we can probably implement it in SPSS. For example you can define "Outlier = case with deleted residual in the null linear model above 15 in absolute value". Then we can compute the residuals and mark the outliers: data list free /id judge . begin data. 1 100 2 62 3 59 4 33 end data. form all (f2). compute myconst = 1. UNIANOVA judge BY myconst /METHOD = SSTYPE(3) /INTERCEPT = INCLUDE /SAVE = DRESID (dresid) /CRITERIA = ALPHA(.05) /DESIGN = myconst . compute outlier = abs(dresid) > 15. val lab outlier 1 "Outlier" 0 "Regular case". exe. (The disadvantage: All judges can be outliers in some scenarios, if the spread of their judgements is huge enough.) HTH Jan -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Tom Werner Sent: Thursday, April 05, 2007 7:23 PM To: [hidden email] Subject: Re: Boxplot (seemingly) does not show outlier Thank you very much for your reply. Yes, it's true that 4 points is a very small sample. Unfortunately, it is in the nature of the real-world situation. I have an awards program in which each entry is judged by 4 judges. (Each set of 4 judges is randomly selected from a large pool of judges.) Right now, for each entry, I review the judges' scores 'by eyeball' and subjectively identify outliers. (For example, if four scores were 62, 62, 59, and 33 (on a scale of 7-70), I would have subjectively said that the '33' is from an overly strict, 'outlier' judge.) I was wondering whether an SPSS boxplot could be produced for each entry showing the 4 judges' scores, and thus use the InterQuartile Range + 1.5 IQR as a statistical definition of an outlier. Note: If a conclusion here is that 5 data points (5 judges) is a better approach, that is a possibility. That obviously involves more effort (more judges), but if it produces more rigor it may be worth it. Regards, Tom Tom Werner Brandon Hall Research 734-433-1299 [hidden email] -----Original Message----- From: Ornelas, Fermin [mailto:[hidden email]] Sent: Thursday, April 05, 2007 12:05 PM To: Tom Werner; [hidden email] Subject: RE: Boxplot (seemingly) does not show outlier Let me take another shot at this. It is not clear what you are trying to do in your analysis. Having only 4 data points is not a very meaningful way to conduct statistical research. In most practical statistical classes you will be reminded of questionable results when you have a small sample size. None of the properties usually referred in regression can be verified (normality, constant variance, outliers, independence). That is what I was referring indirectly when I said "why bother if you only have 4 observations". Fermin Ornelas, Ph.D. Management Analyst III, AZ DES Tel: (602) 542-5639 E-mail: [hidden email] -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Tom Werner Sent: Thursday, April 05, 2007 7:38 AM To: [hidden email] Subject: Boxplot (seemingly) does not show outlier In SPSS (12.0 for Windows, Student version) when I attempt to produce a boxplot of four data points (62, 61, 59, and 33), SPSS generates the boxplot... ...but does NOT show 33 as an outlier (even though 33 would seem to be an outlier relative to 62, 61, and 59 to the casual observer). (I'm analyzing the scores of sets of four judges and would like to use SPSS to produce boxplots to indicate 'outlier' judge scores.) Even if I change the value of 33 to 13, it still does not show in a boxplot as an outlier. If I add a fifth data point (with a value as low as 50), 33 shows in a boxplot as an outlier. Can anyone explain this? 1. Is it because of the even number of data points (four), thus requiring that the median be a calculated value? 2. Are five data points that much more powerful than four data points at producing a tighter intraquartile range (i.e., a tighter box in the boxplot), and thus generating an outlier? 3. Is this perhaps a quirk of SPSS? Much thanks for any help! NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR CONFIDENTIAL information and is intended only for the use of the specific individual(s) to whom it is addressed. It may contain information that is privileged and confidential under state and federal law. This information may be used or disclosed only in accordance with law, and you may be subject to penalties under law for improper use or further disclosure of the information in this e-mail and its attachments. If you have received this e-mail in error, please immediately notify the person named above by reply e-mail, and then delete the original e-mail. Thank you. |
Jan,
I've been following this thread with some interest as I am working with a small data set myself with many non-normally distributed variables and multiple outliers. I understand and typically use the boxplot approach for detecting outliers, or compute z-scores and consider those with z = +/- 2.0 as suspect. However, I am unfamiliar with the approach you descibe below. Specifically, what does "Outlier = case with deleted residual in the null linear model above 15 in absolute value" mean? Could you kindly clarify this, as I would like to understand this approach a little better. Thanks very much in advance. Ken -----Original Message----- From: [hidden email] To: [hidden email] Sent: Fri, 6 Apr 2007 4:29 AM Subject: Re: Boxplot (seemingly) does not show outlier Tom, I think that you should first try to define what do you think is an outlier, because the standard definitions are inappropriate in your case, as I wrote earlier. After you have the definition, we can probably implement it in SPSS. For example you can define "Outlier = case with deleted residual in the null linear model above 15 in absolute value". Then we can compute the residuals and mark the outliers: data list free /id judge . begin data. 1 100 2 62 3 59 4 33 end data. form all (f2). compute myconst = 1. UNIANOVA judge BY myconst /METHOD = SSTYPE(3) /INTERCEPT = INCLUDE /SAVE = DRESID (dresid) /CRITERIA = ALPHA(.05) /DESIGN = myconst . compute outlier = abs(dresid) > 15. val lab outlier 1 "Outlier" 0 "Regular case". exe. (The disadvantage: All judges can be outliers in some scenarios, if the spread of their judgements is huge enough.) HTH Jan -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Tom Werner Sent: Thursday, April 05, 2007 7:23 PM To: [hidden email] Subject: Re: Boxplot (seemingly) does not show outlier Thank you very much for your reply. Yes, it's true that 4 points is a very small sample. Unfortunately, it is in the nature of the real-world situation. I have an awards program in which each entry is judged by 4 judges. (Each set of 4 judges is randomly selected from a large pool of judges.) Right now, for each entry, I review the judges' scores 'by eyeball' and subjectively identify outliers. (For example, if four scores were 62, 62, 59, and 33 (on a scale of 7-70), I would have subjectively said that the '33' is from an overly strict, 'outlier' judge.) I was wondering whether an SPSS boxplot could be produced for each entry showing the 4 judges' scores, and thus use the InterQuartile Range + 1.5 IQR as a statistical definition of an outlier. Note: If a conclusion here is that 5 data points (5 judges) is a better approach, that is a possibility. That obviously involves more effort (more judges), but if it produces more rigor it may be worth it. Regards, Tom Tom Werner Brandon Hall Research 734-433-1299 [hidden email] -----Original Message----- From: Ornelas, Fermin [mailto:[hidden email]] Sent: Thursday, April 05, 2007 12:05 PM To: Tom Werner; [hidden email] Subject: RE: Boxplot (seemingly) does not show outlier Let me take another shot at this. It is not clear what you are trying to do in your analysis. Having only 4 data points is not a very meaningful way to conduct statistical research. In most practical statistical classes you will be reminded of questionable results when you have a small sample size. None of the properties usually referred in regression can be verified (normality, constant variance, outliers, independence). That is what I was referring indirectly when I said "why bother if you only have 4 observations". Fermin Ornelas, Ph.D. Management Analyst III, AZ DES Tel: (602) 542-5639 E-mail: [hidden email] -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Tom Werner Sent: Thursday, April 05, 2007 7:38 AM To: [hidden email] Subject: Boxplot (seemingly) does not show outlier In SPSS (12.0 for Windows, Student version) when I attempt to produce a boxplot of four data points (62, 61, 59, and 33), SPSS generates the boxplot... ...but does NOT show 33 as an outlier (even though 33 would seem to be an outlier relative to 62, 61, and 59 to the casual observer). (I'm analyzing the scores of sets of four judges and would like to use SPSS to produce boxplots to indicate 'outlier' judge scores.) Even if I change the value of 33 to 13, it still does not show in a boxplot as an outlier. If I add a fifth data point (with a value as low as 50), 33 shows in a boxplot as an outlier. Can anyone explain this? 1. Is it because of the even number of data points (four), thus requiring that the median be a calculated value? 2. Are five data points that much more powerful than four data points at producing a tighter intraquartile range (i.e., a tighter box in the boxplot), and thus generating an outlier? 3. Is this perhaps a quirk of SPSS? Much thanks for any help! NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR CONFIDENTIAL information and is intended only for the use of the specific individual(s) to whom it is addressed. It may contain information that is privileged and confidential under state and federal law. This information may be used or disclosed only in accordance with law, and you may be subject to penalties under law for improper use or further disclosure of the information in this e-mail and its attachments. If you have received this e-mail in error, please immediately notify the person named above by reply e-mail, and then delete the original e-mail. Thank you. ________________________________________________________________________ AOL now offers free email to everyone. Find out more about what's free from AOL at AOL.com. |
In reply to this post by Tom Werner
Ken,
It is a complicated way how to say that the suspect case is more than 15 points from the mean of other cases. There is no big science behind it, it was only an example how can a definition of outliers look like. From a broader perspective: If you have only 4 cases, priors (=your knowledge/expectations about the behavior of judges) are of paramount importance. * Either you have no specific idea about it (= flat, non-informative priors). Then it is impossible to estimate the distribution parameters from the data with a reasonable degree of exactness. From the classical point of view, you are forced to accept all judgements as inliers. * Or you have more specific expectations. Then you should formulate them and derive a rule or definition of outliers (like the mine, which says that good judges agree within a 15 points interval) or compute the posterior probabilities directly using the Bayesian framework. Regards, Jan -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Ken Belzer Sent: Friday, April 06, 2007 4:01 PM To: [hidden email] Subject: Re: Boxplot (seemingly) does not show outlier Jan, I've been following this thread with some interest as I am working with a small data set myself with many non-normally distributed variables and multiple outliers. I understand and typically use the boxplot approach for detecting outliers, or compute z-scores and consider those with z = +/- 2.0 as suspect. However, I am unfamiliar with the approach you descibe below. Specifically, what does "Outlier = case with deleted residual in the null linear model above 15 in absolute value" mean? Could you kindly clarify this, as I would like to understand this approach a little better. Thanks very much in advance. Ken -----Original Message----- From: [hidden email] To: [hidden email] Sent: Fri, 6 Apr 2007 4:29 AM Subject: Re: Boxplot (seemingly) does not show outlier Tom, I think that you should first try to define what do you think is an outlier, because the standard definitions are inappropriate in your case, as I wrote earlier. After you have the definition, we can probably implement it in SPSS. For example you can define "Outlier = case with deleted residual in the null linear model above 15 in absolute value". Then we can compute the residuals and mark the outliers: data list free /id judge . begin data. 1 100 2 62 3 59 4 33 end data. form all (f2). compute myconst = 1. UNIANOVA judge BY myconst /METHOD = SSTYPE(3) /INTERCEPT = INCLUDE /SAVE = DRESID (dresid) /CRITERIA = ALPHA(.05) /DESIGN = myconst . compute outlier = abs(dresid) > 15. val lab outlier 1 "Outlier" 0 "Regular case". exe. (The disadvantage: All judges can be outliers in some scenarios, if the spread of their judgements is huge enough.) HTH Jan -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Tom Werner Sent: Thursday, April 05, 2007 7:23 PM To: [hidden email] Subject: Re: Boxplot (seemingly) does not show outlier Thank you very much for your reply. Yes, it's true that 4 points is a very small sample. Unfortunately, it is in the nature of the real-world situation. I have an awards program in which each entry is judged by 4 judges. (Each set of 4 judges is randomly selected from a large pool of judges.) Right now, for each entry, I review the judges' scores 'by eyeball' and subjectively identify outliers. (For example, if four scores were 62, 62, 59, and 33 (on a scale of 7-70), I would have subjectively said that the '33' is from an overly strict, 'outlier' judge.) I was wondering whether an SPSS boxplot could be produced for each entry showing the 4 judges' scores, and thus use the InterQuartile Range + 1.5 IQR as a statistical definition of an outlier. Note: If a conclusion here is that 5 data points (5 judges) is a better approach, that is a possibility. That obviously involves more effort (more judges), but if it produces more rigor it may be worth it. Regards, Tom Tom Werner Brandon Hall Research 734-433-1299 [hidden email] -----Original Message----- From: Ornelas, Fermin [mailto:[hidden email]] Sent: Thursday, April 05, 2007 12:05 PM To: Tom Werner; [hidden email] Subject: RE: Boxplot (seemingly) does not show outlier Let me take another shot at this. It is not clear what you are trying to do in your analysis. Having only 4 data points is not a very meaningful way to conduct statistical research. In most practical statistical classes you will be reminded of questionable results when you have a small sample size. None of the properties usually referred in regression can be verified (normality, constant variance, outliers, independence). That is what I was referring indirectly when I said "why bother if you only have 4 observations". Fermin Ornelas, Ph.D. Management Analyst III, AZ DES Tel: (602) 542-5639 E-mail: [hidden email] -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Tom Werner Sent: Thursday, April 05, 2007 7:38 AM To: [hidden email] Subject: Boxplot (seemingly) does not show outlier In SPSS (12.0 for Windows, Student version) when I attempt to produce a boxplot of four data points (62, 61, 59, and 33), SPSS generates the boxplot... ...but does NOT show 33 as an outlier (even though 33 would seem to be an outlier relative to 62, 61, and 59 to the casual observer). (I'm analyzing the scores of sets of four judges and would like to use SPSS to produce boxplots to indicate 'outlier' judge scores.) Even if I change the value of 33 to 13, it still does not show in a boxplot as an outlier. If I add a fifth data point (with a value as low as 50), 33 shows in a boxplot as an outlier. Can anyone explain this? 1. Is it because of the even number of data points (four), thus requiring that the median be a calculated value? 2. Are five data points that much more powerful than four data points at producing a tighter intraquartile range (i.e., a tighter box in the boxplot), and thus generating an outlier? 3. Is this perhaps a quirk of SPSS? Much thanks for any help! NOTICE: This e-mail (and any attachments) may contain PRIVILEGED OR CONFIDENTIAL information and is intended only for the use of the specific individual(s) to whom it is addressed. It may contain information that is privileged and confidential under state and federal law. This information may be used or disclosed only in accordance with law, and you may be subject to penalties under law for improper use or further disclosure of the information in this e-mail and its attachments. If you have received this e-mail in error, please immediately notify the person named above by reply e-mail, and then delete the original e-mail. Thank you. ________________________________________________________________________ AOL now offers free email to everyone. Find out more about what's free from AOL at AOL.com. |
Free forum by Nabble | Edit this page |