SPSSX Discussion

outliers

Classic

List

Threaded

7 messages Options

Khaing Soe-2

outliers

Dear Listers,

I am cleaning consumption/expenditure data and need help to do the following procedure:

How can we identify cases with outliers (those cases with values more than two standard deviations from mean value) using SPSS?

Thanking in advance,

Khaing Soe

Ratna Wynn

Re: outliers

Check Raynald Leveque's site - it has a syntax to do this.

http://www.spsstools.net/Syntax/FlagOrSelectCases/ExcludeOutliersDefinedAsMeanPlusMinus2SD.txt

Thanks

Ratna

Ratna Wynn

Senior Director

Adelphi Research by Design

-----Original Message-----
From: Khaing Soe [mailto:[hidden email]]
Sent: Friday, June 25, 2010 4:23 AM
To: [hidden email]
Subject: outliers

Dear Listers,

I am cleaning consumption/expenditure data and need help to do the following procedure:

How can we identify cases with outliers (those cases with values more than two standard deviations from mean value) using SPSS?

Thanking in advance,

Khaing Soe

DISCLAIMER: The information in this message is confidential and may be legally privileged. It is intended solely for the addressee. Access to this message by anyone else is unauthorised. If you are not the intended recipient, any disclosure, copying, or distribution of the message, or any action or omission taken by you in reliance on it, is prohibited and may be unlawful. Please immediately contact the sender if you have received this message in error. Thank you.

Jon K Peck

Re: outliers

In reply to this post by Khaing Soe-2

Things to look at:
EXAMINE command (Analyze>Explore)
VALIDATE DATA (Data Prep option - more focused on illegal values)
ADP (Data Prep option - Transform>Prepare Data for Modeling)
Boxplots (GGRAPH - Graphics>Chart Builder or other graphics commands)

HTH,

Jon Peck
SPSS, an IBM Company
[hidden email]
312-651-3435

From:	Khaing Soe <[hidden email]>
To:	[hidden email]
Date:	06/25/2010 02:26 AM
Subject:	[SPSSX-L] outliers
Sent by:	"SPSSX(r) Discussion" <[hidden email]>

Dear Listers,

I am cleaning consumption/expenditure data and need help to do the following procedure:

How can we identify cases with outliers (those cases with values more than two standard deviations from mean value) using SPSS?

Thanking in advance,

Khaing Soe

Rick Oliver-3

Re: outliers

If this is a bad solution, I'm sure someone will jump in and explain why there are better alternatives, but I'm not sure there is a more straightforward way to identify values that meet the specific criterion of more than 2 standard deviations:

***create some data***.
input program.
loop idvar=1 to 1000.
compute var1=rv.normal(100,20).
end case.
end loop.
end file.
end input program.
***real code starts here***.
compute breakvar=1. /*not necessary in version 18+.
aggregate outfile=* mode=addvariables
/break=breakvar /*not necessary in version 18+
/sdvar=sd(var1)
/meanvar=mean(var1).
compute filtervar=var1>meanvar+2*sdvar or var1<meanvar-2*sdvar.
execute.
filter by filtervar.
SUMMARIZE
/TABLES=idvar var1
/FORMAT=VALIDLIST NOCASENUM TOTAL
/TITLE='Cases with values greater than 2 standard deviations from mean'
/MISSING=VARIABLE
/CELLS=COUNT.

From:	Jon K Peck/Chicago/IBM@IBMUS
To:	[hidden email]
Date:	06/25/2010 10:52 AM
Subject:	Re: outliers
Sent by:	"SPSSX(r) Discussion" <[hidden email]>

From:	Khaing Soe <[hidden email]>
To:	[hidden email]
Date:	06/25/2010 02:26 AM
Subject:	[SPSSX-L] outliers
Sent by:	"SPSSX(r) Discussion" <[hidden email]>

Dear Listers,

I am cleaning consumption/expenditure data and need help to do the following procedure:

How can we identify cases with outliers (those cases with values more than two standard deviations from mean value) using SPSS?

Thanking in advance,

Khaing Soe

Art Kendall

Re: outliers

If this is a bad solution, I'm sure someone will jump in and explain why there are better alternatives, but I'm not sure there is a more straightforward way to identify values that meet the specific criterion of more than 2 standard deviations:

***create some data***.
input program.
loop idvar=1 to 1000.
compute var1=rv.normal(100,20).
end case.
end loop.
end file.
end input program.
***real code starts here***.
compute breakvar=1. /*not necessary in version 18+.
aggregate outfile=* mode=addvariables
/break=breakvar /*not necessary in version 18+
/sdvar=sd(var1)
/meanvar=mean(var1).
compute filtervar=var1>meanvar+2*sdvar or var1<meanvar-2*sdvar.
execute.
filter by filtervar.
SUMMARIZE
/TABLES=idvar var1
/FORMAT=VALIDLIST NOCASENUM TOTAL
/TITLE='Cases with values greater than 2 standard deviations from mean'
/MISSING=VARIABLE
/CELLS=COUNT.

From: Jon K Peck/Chicago/IBM@IBMUS

To: [hidden email]

Date: 06/25/2010 10:52 AM

Subject: Re: outliers

Sent by: "SPSSX(r) Discussion" [hidden email]

Things to look at:
EXAMINE command (Analyze>Explore)
VALIDATE DATA (Data Prep option - more focused on illegal values)
ADP (Data Prep option - Transform>Prepare Data for Modeling)
Boxplots (GGRAPH - Graphics>Chart Builder or other graphics commands)

HTH,

Jon Peck
SPSS, an IBM Company
[hidden email]
312-651-3435

From: Khaing Soe [hidden email]

To: [hidden email]

Date: 06/25/2010 02:26 AM

Subject: [SPSSX-L] outliers

Sent by: "SPSSX(r) Discussion" [hidden email]

Dear Listers,
I am cleaning consumption/expenditure data and need help to do the following procedure:

How can we identify cases with outliers (those cases with values more than two standard deviations from mean value) using SPSS?

Thanking in advance,

Khaing Soe

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Art Kendall
Social Research Consultants

Rick Oliver-3

Re: outliers

Of course. I completely forgot that Descriptives can create a Z-score variable. I knew there must be a simpler way.

From:	Art Kendall <[hidden email]>
To:	[hidden email]
Date:	06/25/2010 12:36 PM
Subject:	Re: outliers
Sent by:	"SPSSX(r) Discussion" <[hidden email]>

BEWARE cutting at a z-score is often a questionable way to deal with suspicious values.

There is a simpler way to get z-scores.
***create some data***.input program. loop idvar=1 to 1000. compute var1=rv.normal(100,20). end case. end loop. end file. end input program.
***real code starts here***.DESCRIPTIVES VARIABLES= VAR1 /STATISTICS=ALL /SAVE. compute filtervar= ABS(ZVAR1) GT 2.execute. filter by filtervar. SUMMARIZE /TABLES=idvar var1 /FORMAT=VALIDLIST NOCASENUM TOTAL /TITLE='Cases with values greater than 2 standard deviations from mean' /MISSING=VARIABLE /CELLS=COUNT.

Art Kendall
Social Research Consultants

On 6/25/2010 12:39 PM, Rick Oliver wrote:

If this is a bad solution, I'm sure someone will jump in and explain why there are better alternatives, but I'm not sure there is a more straightforward way to identify values that meet the specific criterion of more than 2 standard deviations:

***create some data***.
input program.
loop idvar=1 to 1000.
compute var1=rv.normal(100,20).
end case.
end loop.
end file.
end input program.
***real code starts here***.
compute breakvar=1. /*not necessary in version 18+.
aggregate outfile=* mode=addvariables
/break=breakvar /*not necessary in version 18+
/sdvar=sd(var1)
/meanvar=mean(var1).
compute filtervar=var1>meanvar+2*sdvar or var1<meanvar-2*sdvar.
execute.
filter by filtervar.
SUMMARIZE
/TABLES=idvar var1
/FORMAT=VALIDLIST NOCASENUM TOTAL
/TITLE='Cases with values greater than 2 standard deviations from mean'
/MISSING=VARIABLE
/CELLS=COUNT.

From:	Jon K Peck/Chicago/IBM@IBMUS
To:	[hidden email]
Date:	06/25/2010 10:52 AM
Subject:	Re: outliers
Sent by:	"SPSSX(r) Discussion" [hidden email]

From:	Khaing Soe <khsoe1@...>
To:	[hidden email]
Date:	06/25/2010 02:26 AM
Subject:	[SPSSX-L] outliers
Sent by:	"SPSSX(r) Discussion" [hidden email]

Dear Listers,

I am cleaning consumption/expenditure data and need help to do the following procedure:

How can we identify cases with outliers (those cases with values more than two standard deviations from mean value) using SPSS?

Thanking in advance,

Khaing Soe

John F Hall

Re: outliers

Depending on how many variables you have, you could always do something like

freq var1 to varn /for not /per 5 95 /his .

..(or other percentile cut points, eg 10, 90) to see what things look like first.

I've never used Visual Bander, but that might give you some clues as well.

----- Original Message -----

From: [hidden email]

To: [hidden email]

Sent: Friday, June 25, 2010 7:42 PM

Subject: Re: outliers

Of course. I completely forgot that Descriptives can create a Z-score variable. I knew there must be a simpler way.

From: Art Kendall <[hidden email]>
To: [hidden email]
Date: 06/25/2010 12:36 PM
Subject: Re: outliers
Sent by: "SPSSX(r) Discussion" <[hidden email]>

BEWARE cutting at a z-score is often a questionable way to deal with suspicious values.

There is a simpler way to get z-scores.
***create some data***.input program. loop idvar=1 to 1000. compute var1=rv.normal(100,20). end case. end loop. end file. end input program.
***real code starts here***.DESCRIPTIVES VARIABLES= VAR1 /STATISTICS=ALL /SAVE. compute filtervar= ABS(ZVAR1) GT 2.execute. filter by filtervar. SUMMARIZE /TABLES=idvar var1 /FORMAT=VALIDLIST NOCASENUM TOTAL /TITLE='Cases with values greater than 2 standard deviations from mean' /MISSING=VARIABLE /CELLS=COUNT.

Art Kendall
Social Research Consultants

On 6/25/2010 12:39 PM, Rick Oliver wrote:

If this is a bad solution, I'm sure someone will jump in and explain why there are better alternatives, but I'm not sure there is a more straightforward way to identify values that meet the specific criterion of more than 2 standard deviations:

***create some data***.
input program.
loop idvar=1 to 1000.
compute var1=rv.normal(100,20).
end case.
end loop.
end file.
end input program.
***real code starts here***.
compute breakvar=1. /*not necessary in version 18+.
aggregate outfile=* mode=addvariables
/break=breakvar /*not necessary in version 18+
/sdvar=sd(var1)
/meanvar=mean(var1).
compute filtervar=var1>meanvar+2*sdvar or var1<meanvar-2*sdvar.
execute.
filter by filtervar.
SUMMARIZE
/TABLES=idvar var1
/FORMAT=VALIDLIST NOCASENUM TOTAL
/TITLE='Cases with values greater than 2 standard deviations from mean'
/MISSING=VARIABLE
/CELLS=COUNT.

From: Jon K Peck/Chicago/IBM@IBMUS
To: [hidden email]
Date: 06/25/2010 10:52 AM
Subject: Re: outliers
Sent by: "SPSSX(r) Discussion" [hidden email]

Things to look at:
EXAMINE command (Analyze>Explore)
VALIDATE DATA (Data Prep option - more focused on illegal values)
ADP (Data Prep option - Transform>Prepare Data for Modeling)
Boxplots (GGRAPH - Graphics>Chart Builder or other graphics commands)

HTH,

Jon Peck
SPSS, an IBM Company
[hidden email]
312-651-3435

From: Khaing Soe [hidden email]
To: [hidden email]
Date: 06/25/2010 02:26 AM
Subject: [SPSSX-L] outliers
Sent by: "SPSSX(r) Discussion" [hidden email]

Dear Listers,
I am cleaning consumption/expenditure data and need help to do the following procedure:

How can we identify cases with outliers (those cases with values more than two standard deviations from mean value) using SPSS?

Thanking in advance,
Khaing Soe

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD