|
Dear Listers, I am cleaning consumption/expenditure data and need help to do the following procedure:
How can we identify cases with outliers (those cases with values more than two standard deviations from mean value) using SPSS?
Thanking in advance, Khaing Soe
|
|
Hi
Check Raynald Leveque's site - it has a syntax to do this. http://www.spsstools.net/Syntax/FlagOrSelectCases/ExcludeOutliersDefinedAsMeanPlusMinus2SD.txt
Thanks
Ratna
Ratna Wynn Senior Director Adelphi Research by Design -----Original Message-----
Dear Listers, I am cleaning consumption/expenditure data and need help to do the following procedure:
How can we identify cases with outliers (those cases with values more than two standard deviations from mean value) using SPSS?
Thanking in advance, Khaing Soe
DISCLAIMER: The information in this message is confidential and may be legally privileged. It is intended solely for the addressee. Access to this message by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, or distribution of the message, or any action or omission taken by you in reliance on it, is prohibited and may be unlawful. Please immediately contact the sender if you have received this message in error. Thank you. |
|
In reply to this post by Khaing Soe-2
Things to look at: EXAMINE command (Analyze>Explore) VALIDATE DATA (Data Prep option - more focused on illegal values) ADP (Data Prep option - Transform>Prepare Data for Modeling) Boxplots (GGRAPH - Graphics>Chart Builder or other graphics commands) HTH, Jon Peck SPSS, an IBM Company [hidden email] 312-651-3435
Dear Listers, I am cleaning consumption/expenditure data and need help to do the following procedure:
How can we identify cases with outliers (those cases with values more than two standard deviations from mean value) using SPSS?
Thanking in advance, Khaing Soe
|
|
If this is a bad solution, I'm sure someone will jump in and explain why there are better alternatives, but I'm not sure there is a more straightforward way to identify values that meet the specific criterion of more than 2 standard deviations: ***create some data***. input program. loop idvar=1 to 1000. compute var1=rv.normal(100,20). end case. end loop. end file. end input program. ***real code starts here***. compute breakvar=1. /*not necessary in version 18+. aggregate outfile=* mode=addvariables /break=breakvar /*not necessary in version 18+ /sdvar=sd(var1) /meanvar=mean(var1). compute filtervar=var1>meanvar+2*sdvar or var1<meanvar-2*sdvar. execute. filter by filtervar. SUMMARIZE /TABLES=idvar var1 /FORMAT=VALIDLIST NOCASENUM TOTAL /TITLE='Cases with values greater than 2 standard deviations from mean' /MISSING=VARIABLE /CELLS=COUNT.
Things to look at: EXAMINE command (Analyze>Explore) VALIDATE DATA (Data Prep option - more focused on illegal values) ADP (Data Prep option - Transform>Prepare Data for Modeling) Boxplots (GGRAPH - Graphics>Chart Builder or other graphics commands) HTH, Jon Peck SPSS, an IBM Company [hidden email] 312-651-3435
Dear Listers, I am cleaning consumption/expenditure data and need help to do the following procedure:
How can we identify cases with outliers (those cases with values more than two standard deviations from mean value) using SPSS?
Thanking in advance, Khaing Soe
|
|
There is a simpler way to get z-scores. ***create some data***. input program. loop idvar=1 to 1000. compute var1=rv.normal(100,20). end case. end loop. end file. end input program. ***real code starts here***. DESCRIPTIVES VARIABLES= VAR1 /STATISTICS=ALL /SAVE. compute filtervar= ABS(ZVAR1) GT 2. execute. filter by filtervar. SUMMARIZE /TABLES=idvar var1 /FORMAT=VALIDLIST NOCASENUM TOTAL /TITLE='Cases with values greater than 2 standard deviations from mean' /MISSING=VARIABLE /CELLS=COUNT. Art Kendall Social Research Consultants On 6/25/2010 12:39 PM, Rick Oliver wrote: ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
|
Of course. I completely forgot that Descriptives can create a Z-score variable. I knew there must be a simpler way.
BEWARE cutting at a z-score is often a questionable way to deal with suspicious values. There is a simpler way to get z-scores. ***create some data***. input program. loop idvar=1 to 1000. compute var1=rv.normal(100,20). end case. end loop. end file. end input program. ***real code starts here***. DESCRIPTIVES VARIABLES= VAR1 /STATISTICS=ALL /SAVE. compute filtervar= ABS(ZVAR1) GT 2. execute. filter by filtervar. SUMMARIZE /TABLES=idvar var1 /FORMAT=VALIDLIST NOCASENUM TOTAL /TITLE='Cases with values greater than 2 standard deviations from mean' /MISSING=VARIABLE /CELLS=COUNT. Art Kendall Social Research Consultants On 6/25/2010 12:39 PM, Rick Oliver wrote: If this is a bad solution, I'm sure someone will jump in and explain why there are better alternatives, but I'm not sure there is a more straightforward way to identify values that meet the specific criterion of more than 2 standard deviations: ***create some data***. input program. loop idvar=1 to 1000. compute var1=rv.normal(100,20). end case. end loop. end file. end input program. ***real code starts here***. compute breakvar=1. /*not necessary in version 18+. aggregate outfile=* mode=addvariables /break=breakvar /*not necessary in version 18+ /sdvar=sd(var1) /meanvar=mean(var1). compute filtervar=var1>meanvar+2*sdvar or var1<meanvar-2*sdvar. execute. filter by filtervar. SUMMARIZE /TABLES=idvar var1 /FORMAT=VALIDLIST NOCASENUM TOTAL /TITLE='Cases with values greater than 2 standard deviations from mean' /MISSING=VARIABLE /CELLS=COUNT.
Things to look at: EXAMINE command (Analyze>Explore) VALIDATE DATA (Data Prep option - more focused on illegal values) ADP (Data Prep option - Transform>Prepare Data for Modeling) Boxplots (GGRAPH - Graphics>Chart Builder or other graphics commands) HTH, Jon Peck SPSS, an IBM Company peck@... 312-651-3435
Dear Listers, I am cleaning consumption/expenditure data and need help to do the following procedure:
How can we identify cases with outliers (those cases with values more than two standard deviations from mean value) using SPSS?
Thanking in advance, Khaing Soe
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
|
|
Depending on how many variables you have, you could
always do something like
freq var1 to varn /for not
/per 5 95 /his .
..(or other percentile cut points, eg 10, 90) to
see what things look like first.
I've never used Visual Bander, but that might give
you some clues as well.
|
| Free forum by Nabble | Edit this page |
