outliers

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

outliers

Khaing Soe-2

Dear Listers,

I am cleaning consumption/expenditure data and need help to do the following procedure:

 

How can we identify cases with outliers (those cases with values more than two standard deviations from mean value) using SPSS?

 

Thanking in advance,

Khaing Soe

 

 


Reply | Threaded
Open this post in threaded view
|

Re: outliers

Ratna Wynn

Hi

 

Check Raynald Leveque's site - it has a syntax to do this.

http://www.spsstools.net/Syntax/FlagOrSelectCases/ExcludeOutliersDefinedAsMeanPlusMinus2SD.txt

 

Thanks

 

Ratna

 

Ratna Wynn

Senior Director

Adelphi Research by Design

-----Original Message-----
From: Khaing Soe [mailto:[hidden email]]
Sent: Friday, June 25, 2010 4:23 AM
To:
[hidden email]
Subject: outliers

 

Dear Listers,

I am cleaning consumption/expenditure data and need help to do the following procedure:

 

How can we identify cases with outliers (those cases with values more than two standard deviations from mean value) using SPSS?

 

Thanking in advance,

Khaing Soe

 

 

 

DISCLAIMER: The information in this message is confidential and may be legally privileged. It is intended solely for the addressee. Access to this message by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, or distribution of the message, or any action or omission taken by you in reliance on it, is prohibited and may be unlawful. Please immediately contact the sender if you have received this message in error. Thank you.

Reply | Threaded
Open this post in threaded view
|

Re: outliers

Jon K Peck
In reply to this post by Khaing Soe-2

Things to look at:
EXAMINE command (Analyze>Explore)
VALIDATE DATA (Data Prep option - more focused on illegal values)
ADP (Data Prep option - Transform>Prepare Data for Modeling)
Boxplots (GGRAPH - Graphics>Chart Builder or other graphics commands)

HTH,

Jon Peck
SPSS, an IBM Company
[hidden email]
312-651-3435



From: Khaing Soe <[hidden email]>
To: [hidden email]
Date: 06/25/2010 02:26 AM
Subject: [SPSSX-L] outliers
Sent by: "SPSSX(r) Discussion" <[hidden email]>





Dear Listers,

I am cleaning consumption/expenditure data and need help to do the following procedure:

 

How can we identify cases with outliers (those cases with values more than two standard deviations from mean value) using SPSS?

 

Thanking in advance,

Khaing Soe

 

 

Reply | Threaded
Open this post in threaded view
|

Re: outliers

Rick Oliver-3

If this is a bad solution, I'm sure someone will jump in and explain why there are better alternatives, but I'm not sure there is a more straightforward way to identify values that meet the specific criterion of more than 2 standard deviations:

***create some data***.
input program.
loop idvar=1 to 1000.
compute var1=rv.normal(100,20).
end case.
end loop.
end file.
end input program.
***real code starts here***.
compute breakvar=1. /*not necessary in version 18+.
aggregate outfile=* mode=addvariables
  /break=breakvar /*not necessary in version 18+
  /sdvar=sd(var1)
  /meanvar=mean(var1).
compute filtervar=var1>meanvar+2*sdvar or var1<meanvar-2*sdvar.
execute.
filter  by filtervar.
SUMMARIZE
  /TABLES=idvar var1
  /FORMAT=VALIDLIST NOCASENUM TOTAL
  /TITLE='Cases with values greater than 2 standard deviations from mean'
  /MISSING=VARIABLE
  /CELLS=COUNT.


From: Jon K Peck/Chicago/IBM@IBMUS
To: [hidden email]
Date: 06/25/2010 10:52 AM
Subject: Re: outliers
Sent by: "SPSSX(r) Discussion" <[hidden email]>






Things to look at:

EXAMINE command (Analyze>Explore)

VALIDATE DATA (Data Prep option - more focused on illegal values)

ADP (Data Prep option - Transform>Prepare Data for Modeling)

Boxplots (GGRAPH - Graphics>Chart Builder or other graphics commands)


HTH,


Jon Peck
SPSS, an IBM Company
[hidden email]
312-651-3435


From: Khaing Soe <[hidden email]>
To: [hidden email]
Date: 06/25/2010 02:26 AM
Subject: [SPSSX-L] outliers
Sent by: "SPSSX(r) Discussion" <[hidden email]>






Dear Listers,

I am cleaning consumption/expenditure data and need help to do the following procedure:

 

How can we identify cases with outliers (those cases with values more than two standard deviations from mean value) using SPSS?

 

Thanking in advance,

Khaing Soe

 

 

Reply | Threaded
Open this post in threaded view
|

Re: outliers

Art Kendall
BEWARE cutting at a z-score  is often a questionable way to deal with suspicious values.

There is a simpler way to get z-scores.
***create some data***.
input program.
loop idvar=1 to 1000.
compute var1=rv.normal(100,20).
end case.
end loop.
end file.
end input program.

***real code starts here***.
DESCRIPTIVES VARIABLES= VAR1 /STATISTICS=ALL /SAVE.
compute filtervar= ABS(ZVAR1)  GT 2.

execute.
filter  by filtervar.
SUMMARIZE
  /TABLES=idvar var1
  /FORMAT=VALIDLIST NOCASENUM TOTAL
  /TITLE='Cases with values greater than 2 standard deviations from mean'
  /MISSING=VARIABLE
  /CELLS=COUNT.


Art Kendall
Social Research Consultants

On 6/25/2010 12:39 PM, Rick Oliver wrote:

If this is a bad solution, I'm sure someone will jump in and explain why there are better alternatives, but I'm not sure there is a more straightforward way to identify values that meet the specific criterion of more than 2 standard deviations:

***create some data***.
input program.
loop idvar=1 to 1000.
compute var1=rv.normal(100,20).
end case.
end loop.
end file.
end input program.
***real code starts here***.
compute breakvar=1. /*not necessary in version 18+.
aggregate outfile=* mode=addvariables
  /break=breakvar /*not necessary in version 18+
  /sdvar=sd(var1)
  /meanvar=mean(var1).
compute filtervar=var1>meanvar+2*sdvar or var1<meanvar-2*sdvar.
execute.
filter  by filtervar.
SUMMARIZE
  /TABLES=idvar var1
  /FORMAT=VALIDLIST NOCASENUM TOTAL
  /TITLE='Cases with values greater than 2 standard deviations from mean'
  /MISSING=VARIABLE
  /CELLS=COUNT.


From: Jon K Peck/Chicago/IBM@IBMUS
To: [hidden email]
Date: 06/25/2010 10:52 AM
Subject: Re: outliers
Sent by: "SPSSX(r) Discussion" [hidden email]






Things to look at:

EXAMINE command (Analyze>Explore)

VALIDATE DATA (Data Prep option - more focused on illegal values)

ADP (Data Prep option - Transform>Prepare Data for Modeling)

Boxplots (GGRAPH - Graphics>Chart Builder or other graphics commands)


HTH,


Jon Peck
SPSS, an IBM Company
[hidden email]
312-651-3435


From: Khaing Soe [hidden email]
To: [hidden email]
Date: 06/25/2010 02:26 AM
Subject: [SPSSX-L] outliers
Sent by: "SPSSX(r) Discussion" [hidden email]






Dear Listers,

I am cleaning consumption/expenditure data and need help to do the following procedure:

 

How can we identify cases with outliers (those cases with values more than two standard deviations from mean value) using SPSS?

 

Thanking in advance,

Khaing Soe

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: outliers

Rick Oliver-3

Of course. I completely forgot that Descriptives can create a Z-score variable.  I knew there must be a simpler way.




From: Art Kendall <[hidden email]>
To: [hidden email]
Date: 06/25/2010 12:36 PM
Subject: Re: outliers
Sent by: "SPSSX(r) Discussion" <[hidden email]>





BEWARE cutting at a z-score  is often a questionable way to deal with suspicious values.

There is a simpler way to get z-scores.
***create some data***.

input program.
loop idvar=1 to 1000.
compute var1=rv.normal(100,20).
end case.
end loop.
end file.
end input program.

***real code starts here***.

DESCRIPTIVES VARIABLES= VAR1 /STATISTICS=ALL /SAVE.
compute filtervar= ABS(ZVAR1)  GT 2.

execute.
filter  by filtervar.
SUMMARIZE
 /TABLES=idvar var1
 /FORMAT=VALIDLIST NOCASENUM TOTAL
 /TITLE='Cases with values greater than 2 standard deviations from mean'
 /MISSING=VARIABLE
 /CELLS=COUNT.


Art Kendall
Social Research Consultants

On 6/25/2010 12:39 PM, Rick Oliver wrote:


If this is a bad solution, I'm sure someone will jump in and explain why there are better alternatives, but I'm not sure there is a more straightforward way to identify values that meet the specific criterion of more than 2 standard deviations:


***create some data***.

input program.

loop idvar=1 to 1000.

compute var1=rv.normal(100,20).

end case.

end loop.

end file.

end input program.

***real code starts here***.

compute breakvar=1. /*not necessary in version 18+.

aggregate outfile=* mode=addvariables

 /break=breakvar /*not necessary in version 18+

 /sdvar=sd(var1)

 /meanvar=mean(var1).

compute filtervar=var1>meanvar+2*sdvar or var1<meanvar-2*sdvar.

execute.

filter  by filtervar.

SUMMARIZE

 /TABLES=idvar var1

 /FORMAT=VALIDLIST NOCASENUM TOTAL

 /TITLE='Cases with values greater than 2 standard deviations from mean'

 /MISSING=VARIABLE

 /CELLS=COUNT.


From: Jon K Peck/Chicago/IBM@IBMUS
To: [hidden email]
Date: 06/25/2010 10:52 AM
Subject: Re: outliers
Sent by: "SPSSX(r) Discussion" [hidden email]







Things to look at:

EXAMINE command (Analyze>Explore)

VALIDATE DATA (Data Prep option - more focused on illegal values)

ADP (Data Prep option - Transform>Prepare Data for Modeling)

Boxplots (GGRAPH - Graphics>Chart Builder or other graphics commands)


HTH,


Jon Peck
SPSS, an IBM Company

peck@...
312-651-3435

From: Khaing Soe <khsoe1@...>
To: [hidden email]
Date: 06/25/2010 02:26 AM
Subject: [SPSSX-L] outliers
Sent by: "SPSSX(r) Discussion" [hidden email]







Dear Listers,

I am cleaning consumption/expenditure data and need help to do the following procedure:

 

How can we identify cases with outliers (those cases with values more than two standard deviations from mean value) using SPSS?

 

Thanking in advance,

Khaing Soe

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: outliers

John F Hall
Depending on how many variables you have, you could always do something like
 
freq var1 to varn /for not /per 5 95 /his .
 
..(or other percentile cut points, eg 10, 90) to see what things look like first.
 
I've never used Visual Bander, but that might give you some clues as well.
----- Original Message -----
Sent: Friday, June 25, 2010 7:42 PM
Subject: Re: outliers


Of course. I completely forgot that Descriptives can create a Z-score variable.  I knew there must be a simpler way.




From: Art Kendall <[hidden email]>
To: [hidden email]
Date: 06/25/2010 12:36 PM
Subject: Re: outliers
Sent by: "SPSSX(r) Discussion" <[hidden email]>





BEWARE cutting at a z-score  is often a questionable way to deal with suspicious values.

There is a simpler way to get z-scores.
***create some data***.

input program.
loop idvar=1 to 1000.
compute var1=rv.normal(100,20).
end case.
end loop.
end file.
end input program.

***real code starts here***.

DESCRIPTIVES VARIABLES= VAR1 /STATISTICS=ALL /SAVE.
compute filtervar= ABS(ZVAR1)  GT 2.

execute.
filter  by filtervar.
SUMMARIZE
 /TABLES=idvar var1
 /FORMAT=VALIDLIST NOCASENUM TOTAL
 /TITLE='Cases with values greater than 2 standard deviations from mean'
 /MISSING=VARIABLE
 /CELLS=COUNT.


Art Kendall
Social Research Consultants

On 6/25/2010 12:39 PM, Rick Oliver wrote:


If this is a bad solution, I'm sure someone will jump in and explain why there are better alternatives, but I'm not sure there is a more straightforward way to identify values that meet the specific criterion of more than 2 standard deviations:


***create some data***.

input program.

loop idvar=1 to 1000.

compute var1=rv.normal(100,20).

end case.

end loop.

end file.

end input program.

***real code starts here***.

compute breakvar=1. /*not necessary in version 18+.

aggregate outfile=* mode=addvariables

 /break=breakvar /*not necessary in version 18+

 /sdvar=sd(var1)

 /meanvar=mean(var1).

compute filtervar=var1>meanvar+2*sdvar or var1<meanvar-2*sdvar.

execute.

filter  by filtervar.

SUMMARIZE

 /TABLES=idvar var1

 /FORMAT=VALIDLIST NOCASENUM TOTAL

 /TITLE='Cases with values greater than 2 standard deviations from mean'

 /MISSING=VARIABLE

 /CELLS=COUNT.


From: Jon K Peck/Chicago/IBM@IBMUS
To: [hidden email]
Date: 06/25/2010 10:52 AM
Subject: Re: outliers
Sent by: "SPSSX(r) Discussion" [hidden email]







Things to look at:

EXAMINE command (Analyze>Explore)

VALIDATE DATA (Data Prep option - more focused on illegal values)

ADP (Data Prep option - Transform>Prepare Data for Modeling)

Boxplots (GGRAPH - Graphics>Chart Builder or other graphics commands)


HTH,


Jon Peck
SPSS, an IBM Company

[hidden email]
312-651-3435

From: Khaing Soe [hidden email]
To: [hidden email]
Date: 06/25/2010 02:26 AM
Subject: [SPSSX-L] outliers
Sent by: "SPSSX(r) Discussion" [hidden email]







Dear Listers,

I am cleaning consumption/expenditure data and need help to do the following procedure:

 

How can we identify cases with outliers (those cases with values more than two standard deviations from mean value) using SPSS?

 

Thanking in advance,

Khaing Soe

 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD