|
Dear List members,
I have been doing experiments and analyzing data for two years but still have some basic questions to ask (my field is vision and cognitive psychology): 1) A general question: When do you exclude outliers and what standards do you use? I understand that excluding outliers normally won't change much about Mean but will make SD smaller. But still there are cases we do exclude outliers not just to make SD look better. 2) A specific one: Suppose we have two conditions A B in the study. When excluding outliers according to 3SD, do you do this in each condition (i.e. data in A and B are separated) or on the whole data set (i.e. combination of A and B)? Any ideas? Many thanks! Cheers, Zhicheng -- ****************************************** Zhicheng Lin Department of Psychology University of Minnesota 75 East River Rd, Elliott Hall Minneapolis, MN 55455 Email: [hidden email] Phone: 612-625-2470 Fax: 612-626-2079 http://zhichenglin.googlepages.com/ ****************************************** ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
At 09:29 PM 3/2/2008, Zhicheng Lin wrote:
>1) A general question: When do you exclude outliers and what >standards do you use? I understand that excluding outliers normally >won't change much about Mean but will make SD smaller. But still >there are cases we do exclude outliers not just to make SD look better. There's been a lot of correspondence about 'outliers', and excluding them, on this list, over the years. The general, short answer is: don't do it, except for 'outliers' that can confidently be identified as erroneous data. Otherwise, you're distorting your data and your analysis. >I understand that excluding outliers normally won't change much >about Mean but will make SD smaller. It won't change Mean *only* if the rarer, larger values have the same mean as the more common values. There's not the least reason this need be so. (And remember, the large values have a very heavy weight in computing the mean.) As for reducing the SD, that's cheating. The SD really is what it is. What if you threw out everything more than 1 SD from the mean? Your SD would look really good, but your data would be nowhere near as precise as that would look. >Any ideas? Many thanks! Well, there's a harsh one ... -Onward, and best wishes, Richard -- No virus found in this outgoing message. Checked by AVG. Version: 7.5.518 / Virus Database: 269.21.7/1324 - Release Date: 3/10/2008 7:27 PM ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Well, in some areas of specialization (e.g., testing) it is common practice to eliminate outliers. Think about a criterion-related validation study. You are asking incumbents to complete some predictor and their supervisors to make performance ratings on those incumbents. Neither is very excited to spend a couple hours on something they are told has to be done.
As a result, we have many incumbents who don't try very hard. To include those incumbents in the final analysis would distort the picture of how well the predictor does at selecting candidates for the job in question. There are several ways that we look for these folks, but one common standard is to eliminate any one data point that is + or - 3.29 sds from the mean. Of course, there are many arguments for not engaging in this practice; however, if we are being practical rather than academic, I think it is a responsible, defendable choice. -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow Sent: Tuesday, March 11, 2008 8:48 PM To: [hidden email] Subject: Re: outliers detection and exclusion At 09:29 PM 3/2/2008, Zhicheng Lin wrote: >1) A general question: When do you exclude outliers and what >standards do you use? I understand that excluding outliers normally >won't change much about Mean but will make SD smaller. But still >there are cases we do exclude outliers not just to make SD look better. There's been a lot of correspondence about 'outliers', and excluding them, on this list, over the years. The general, short answer is: don't do it, except for 'outliers' that can confidently be identified as erroneous data. Otherwise, you're distorting your data and your analysis. >I understand that excluding outliers normally won't change much >about Mean but will make SD smaller. It won't change Mean *only* if the rarer, larger values have the same mean as the more common values. There's not the least reason this need be so. (And remember, the large values have a very heavy weight in computing the mean.) As for reducing the SD, that's cheating. The SD really is what it is. What if you threw out everything more than 1 SD from the mean? Your SD would look really good, but your data would be nowhere near as precise as that would look. >Any ideas? Many thanks! Well, there's a harsh one ... -Onward, and best wishes, Richard -- No virus found in this outgoing message. Checked by AVG. Version: 7.5.518 / Virus Database: 269.21.7/1324 - Release Date: 3/10/2008 7:27 PM ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD No virus found in this incoming message. Checked by AVG. Version: 7.5.518 / Virus Database: 269.21.7/1325 - Release Date: 3/11/2008 1:41 PM No virus found in this outgoing message. Checked by AVG. Version: 7.5.518 / Virus Database: 269.21.7/1325 - Release Date: 3/11/2008 1:41 PM ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
At 09:05 AM 3/12/2008, Katkowski, David wrote:
>Well, in some areas of specialization (e.g., testing) it is common >practice to eliminate outliers. [For example,] you are asking >incumbents to complete some predictor and their supervisors to make >performance ratings on those incumbents. Neither is very excited to >spend a couple hours on something they are told has to be done. > >As a result, we have many incumbents who don't try very hard. To >include those incumbents in the final analysis would distort the >picture of how well the predictor does at selecting candidates. An important point. This is the case where "you may have two processes, one of which operates occasionally to produce the [outlier] values, the other of which operates 'normally' but is swamped when the larger process happens."(*) In your case, the 'larger process' is the respondents' decision to blow off the questionnaire. ('Large' means a large effect relative to what's usually seen. It can include values much nearer zero than the usual ones.) In this case, there is indeed reason to exclude 'outliers', by some reasonable heuristic if you can't observe the "larger process." It stands, though: In the absence of a specific argument that your 'outliers' are from a different population than the 'normal' cases, it's not good methodology to drop them. ................ (*) I'm quoting myself: Date: Sun, 20 Aug 2006 16:30:50 -0400 From: Richard Ristow <[hidden email]> Subject: Re: outliers?? To: [hidden email] -- No virus found in this outgoing message. Checked by AVG. Version: 7.5.518 / Virus Database: 269.21.7/1327 - Release Date: 3/12/2008 1:27 PM ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
And should outliers be dropped, it is good practice to compare results
with and without outliers in the model to demonstrate the effect the outliers have on the results. Paul R. Swank, Ph.D. Professor and Director of Research Children's Learning Institute University of Texas Health Science Center - Houston -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Richard Ristow Sent: Wednesday, March 12, 2008 3:13 PM To: [hidden email] Subject: Re: outliers detection and exclusion At 09:05 AM 3/12/2008, Katkowski, David wrote: >Well, in some areas of specialization (e.g., testing) it is common >practice to eliminate outliers. [For example,] you are asking >incumbents to complete some predictor and their supervisors to make >performance ratings on those incumbents. Neither is very excited to >spend a couple hours on something they are told has to be done. > >As a result, we have many incumbents who don't try very hard. To >include those incumbents in the final analysis would distort the >picture of how well the predictor does at selecting candidates. An important point. This is the case where "you may have two processes, one of which operates occasionally to produce the [outlier] values, the other of which operates 'normally' but is swamped when the larger process happens."(*) In your case, the 'larger process' is the respondents' decision to blow off the questionnaire. ('Large' means a large effect relative to what's usually seen. It can include values much nearer zero than the usual ones.) In this case, there is indeed reason to exclude 'outliers', by some reasonable heuristic if you can't observe the "larger process." It stands, though: In the absence of a specific argument that your 'outliers' are from a different population than the 'normal' cases, it's not good methodology to drop them. ................ (*) I'm quoting myself: Date: Sun, 20 Aug 2006 16:30:50 -0400 From: Richard Ristow <[hidden email]> Subject: Re: outliers?? To: [hidden email] -- No virus found in this outgoing message. Checked by AVG. Version: 7.5.518 / Virus Database: 269.21.7/1327 - Release Date: 3/12/2008 1:27 PM ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
