SPSS - Outlier Macro

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

SPSS - Outlier Macro

Lakshmikanth Makaraju
Dear Listers,

I found one difficulty in doing outlier treatment. Here I have some 30
variables which have to be treated for outliers. This is the process I am
doing.
I am replacing each non missing value with Mean - 3*SD of a variable if
that variable value is less than Mean - 3*SD.
Replacing with Mean +3SD of variable if that variable value is greater
than Mean +3SD.

For this exercise...I will have to calculate means and Sds of a variable
first and create M-3SD and M+3SD then create two variables (M-3sd, M+3Sd)
for each of the 30 variables. Then Iam comparing each value and doing
outlier treatment.

Is there any way that I can do this using a macro which can reduce the
steps as well as look my syntax easy.. when referred by others.

I will be very happy... any body helps.

Thanks and regards
Lakshmikanth

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: SPSS - Outlier Macro

Maguin, Eugene
Lakshmikanth,

A macro might be easier, I'm not sure. Also, a python rountine might be more
elegant. I'm also not sure.

But, I think you can do this with syntax. I'll assume you are familiar with
syntax because I am only going to outline the structure I'd start with (and
then probably modify when I found my first scheme didn't work right). So,
here's how I'd do this.

Descriptives v1 to v3/statistics mean stddev.

Use the OMS code to export the statistics table from the descriptives
command to a file, call this file 'stats'. I can't do this off the top of my
head but it's pretty simple. I'd inspect the stats file to ascertain its
structure and then do what ever restructuring was needed to create a file
having one record and that had the following structure

V1Mean V1SD .... V30Mean V30SD.

Probably casestovars would be required along with variable renameing, etc.
Let me add that it would be conveniant but not required that the variable
order in stats be
V1Mean ... V30Mean  V1SD ... V30SD.

Then, do match files with the table keyword to append the means and sd
record from stats to every record in the original file. You won't need a by
variable, I do not think but I haven't tried this kind of match for quite a
while. However, if by variable is needed, then create a variable, call it
'link' in both datasets and give it the same value for all records in both
datasets.

There's two ways to go at the next step. One way is via Do repeat; the other
is via Loop-end loop. I'll use do repeat AND I'll assume the 'conveniant'
variable order described above.

Do repeat a=v1 to v30/b=V1Mean to V30Mean/c=V1SD to V30SD.
+  do if (a < b-3*c).
+     compute a=b-3*c.
+  else if (a > b+3*c).
+     compute a=b+3*c.
+  end if.
End repeat.

Gene Maguin





>>I found one difficulty in doing outlier treatment. Here I have some 30
variables which have to be treated for outliers. This is the process I am
doing.
I am replacing each non missing value with Mean - 3*SD of a variable if
that variable value is less than Mean - 3*SD.
Replacing with Mean +3SD of variable if that variable value is greater
than Mean +3SD.

For this exercise...I will have to calculate means and Sds of a variable
first and create M-3SD and M+3SD then create two variables (M-3sd, M+3Sd)
for each of the 30 variables. Then Iam comparing each value and doing
outlier treatment.

Is there any way that I can do this using a macro which can reduce the
steps as well as look my syntax easy.. when referred by others.

I will be very happy... any body helps.

Thanks and regards
Lakshmikanth

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: SPSS - Outlier Macro

Art Kendall
It might be easier to do something like this untested syntax.

compute constant=1.
aggregate outfile=* mode=addvaraibles /break=constant
 /meanv1 to meanv30 =mean(v1 to v30)
/sdv1 to sdv30 =sd(v1 to v30).

Art

Gene Maguin wrote:
Lakshmikanth,

A macro might be easier, I'm not sure. Also, a python rountine might be more
elegant. I'm also not sure.

But, I think you can do this with syntax. I'll assume you are familiar with
syntax because I am only going to outline the structure I'd start with (and
then probably modify when I found my first scheme didn't work right). So,
here's how I'd do this.

Descriptives v1 to v3/statistics mean stddev.

Use the OMS code to export the statistics table from the descriptives
command to a file, call this file 'stats'. I can't do this off the top of my
head but it's pretty simple. I'd inspect the stats file to ascertain its
structure and then do what ever restructuring was needed to create a file
having one record and that had the following structure

V1Mean V1SD .... V30Mean V30SD.

Probably casestovars would be required along with variable renameing, etc.
Let me add that it would be conveniant but not required that the variable
order in stats be
V1Mean ... V30Mean  V1SD ... V30SD.

Then, do match files with the table keyword to append the means and sd
record from stats to every record in the original file. You won't need a by
variable, I do not think but I haven't tried this kind of match for quite a
while. However, if by variable is needed, then create a variable, call it
'link' in both datasets and give it the same value for all records in both
datasets.

There's two ways to go at the next step. One way is via Do repeat; the other
is via Loop-end loop. I'll use do repeat AND I'll assume the 'conveniant'
variable order described above.

Do repeat a=v1 to v30/b=V1Mean to V30Mean/c=V1SD to V30SD.
+  do if (a < b-3*c).
+     compute a=b-3*c.
+  else if (a > b+3*c).
+     compute a=b+3*c.
+  end if.
End repeat.

Gene Maguin





  
I found one difficulty in doing outlier treatment. Here I have some 30
      
variables which have to be treated for outliers. This is the process I am
doing.
I am replacing each non missing value with Mean - 3*SD of a variable if
that variable value is less than Mean - 3*SD.
Replacing with Mean +3SD of variable if that variable value is greater
than Mean +3SD.

For this exercise...I will have to calculate means and Sds of a variable
first and create M-3SD and M+3SD then create two variables (M-3sd, M+3Sd)
for each of the 30 variables. Then Iam comparing each value and doing
outlier treatment.

Is there any way that I can do this using a macro which can reduce the
steps as well as look my syntax easy.. when referred by others.

I will be very happy... any body helps.

Thanks and regards
Lakshmikanth

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

  
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: SPSS - Outlier Macro

Jon K Peck

I'm not sure that trimming outliers is a good idea, but if you have Version 18 and the DataPrep option, the ADP dialog/command can do this for you automatically.

Jon Peck
SPSS, an IBM Company
[hidden email]
312-651-3435



From: Art Kendall <[hidden email]>
To: [hidden email]
Date: 11/18/2009 12:16 PM
Subject: Re: [SPSSX-L] SPSS - Outlier Macro
Sent by: "SPSSX(r) Discussion" <[hidden email]>





It might be easier to do something like this untested syntax.

compute constant=1.
aggregate outfile=* mode=addvaraibles /break=constant
/meanv1 to meanv30 =mean(v1 to v30)
/sdv1 to sdv30 =sd(v1 to v30).


Art

Gene Maguin wrote:

Lakshmikanth,

A macro might be easier, I'm not sure. Also, a python rountine might be more
elegant. I'm also not sure.

But, I think you can do this with syntax. I'll assume you are familiar with
syntax because I am only going to outline the structure I'd start with (and
then probably modify when I found my first scheme didn't work right). So,
here's how I'd do this.

Descriptives v1 to v3/statistics mean stddev.

Use the OMS code to export the statistics table from the descriptives
command to a file, call this file 'stats'. I can't do this off the top of my
head but it's pretty simple. I'd inspect the stats file to ascertain its
structure and then do what ever restructuring was needed to create a file
having one record and that had the following structure

V1Mean V1SD .... V30Mean V30SD.

Probably casestovars would be required along with variable renameing, etc.
Let me add that it would be conveniant but not required that the variable
order in stats be
V1Mean ... V30Mean  V1SD ... V30SD.

Then, do match files with the table keyword to append the means and sd
record from stats to every record in the original file. You won't need a by
variable, I do not think but I haven't tried this kind of match for quite a
while. However, if by variable is needed, then create a variable, call it
'link' in both datasets and give it the same value for all records in both
datasets.

There's two ways to go at the next step. One way is via Do repeat; the other
is via Loop-end loop. I'll use do repeat AND I'll assume the 'conveniant'
variable order described above.

Do repeat a=v1 to v30/b=V1Mean to V30Mean/c=V1SD to V30SD.
+  do if (a < b-3*c).
+     compute a=b-3*c.
+  else if (a > b+3*c).
+     compute a=b+3*c.
+  end if.
End repeat.

Gene Maguin





 

I found one difficulty in doing outlier treatment. Here I have some 30
     

variables which have to be treated for outliers. This is the process I am
doing.
I am replacing each non missing value with Mean - 3*SD of a variable if
that variable value is less than Mean - 3*SD.
Replacing with Mean +3SD of variable if that variable value is greater
than Mean +3SD.

For this exercise...I will have to calculate means and Sds of a variable
first and create M-3SD and M+3SD then create two variables (M-3sd, M+3Sd)
for each of the 30 variables. Then Iam comparing each value and doing
outlier treatment.

Is there any way that I can do this using a macro which can reduce the
steps as well as look my syntax easy.. when referred by others.

I will be very happy... any body helps.

Thanks and regards
Lakshmikanth

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
LISTSERV@... (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Reply | Threaded
Open this post in threaded view
|

Re: SPSS - Outlier Macro

Bruce Weaver
Administrator
Jon K Peck wrote
I'm not sure that trimming outliers is a good idea, but if you have
Version 18 and the DataPrep option, the ADP dialog/command can do this for
you automatically.

Jon Peck
SPSS, an IBM Company
peck@us.ibm.com
312-651-3435
I agree with Jon's comment that trimming of univariate outliers is not necessarily a good idea.  What are you going to do with the 30 variables?  If they are being used in regression models, I'd be more concerned about bi- or multivariate outliers (e.g., the 5 year old who is 6 feet tall), which often signal data entry errors.  I'd also be more concerned about influential points (as measured by Cook's distance, for example) than univariate outliers.

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: SPSS - Outlier Macro

Marta Garcia-Granero
Trisha Greenhalgh comments on the topic of automatically correcting
outliers as if they were simple error in one of her papers
(masterpieces, BTW) in BMJ (BMJ 1997;315:364-366. How to read a paper:
Statistics for the non-statistician. I: Different types of data need
different statistical tests)

"... A few years ago, while doing a research project, I measured several
different hormones in about 30 subjects. One subject's growth hormone
levels came back about 100 times higher than everyone else's. I assumed
this was a transcription error, so I moved the decimal point two places
to the left. Some weeks later, I met the technician who had analysed the
specimens and he asked, "Whatever happened to that chap with acromegaly?"

I always mention that story to my students when I explain what an
outlier is and how to deal (correctly) with it.

My two euro-cents.

Marta GG


Bruce Weaver wrote:

> Jon K Peck wrote:
>
>> I'm not sure that trimming outliers is a good idea, but if you have
>> Version 18 and the DataPrep option, the ADP dialog/command can do this for
>> you automatically.
>>
>> Jon Peck
>> SPSS, an IBM Company
>> [hidden email]
>> 312-651-3435
>>
>>
>
> I agree with Jon's comment that trimming of univariate outliers is not
> necessarily a good idea.
>
>


--
For miscellaneous SPSS related statistical stuff, visit:
http://gjyp.nl/marta/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

PASW/SPSS18 graphs worse than 16

Kornbrot, Diana
PASW/SPSS18 graphs worse than 16 IF you want good graphics do NOT move to SPSS/PASW18-
Especially if you are so eccentric as to want to fit a straight line to points
Chart builder is the PITS
PASW have really fouled up the graph by menu
I want a simple x-y scatter plot graph with TWO groupings variables, i.e. one by colour, the other by symbol.
OBVIOUSLY I’d like to fit a best straight line
Meanwhile, of course, linearr regression STIL, STIL does not plot dpendent v independent
I also have a 3rd grouping variable for panels, & the simple desire to sauy that I’d like the arrangement 3 wide by 4 deep
ALL of this is available in the wonderfully simple and usable legacy i-graph in 16

IN 18.
  1. the legacy i-graphs have been removed
chart builder YUK YUK YUK
1   can only specify 1 grouping variable – colour OR pattern not both
2   can’t fit a straight line
3   does not permit specification of panel arrangements
4   does not allow specification of number format
Once the graph is generated one can change the number format, colours & background, fiddle with chart siz e to get apnels right & save as a template –the  syntax is not saved to the syntax window

ITS appallingly frustrating
NO doubt produced by idiot graphics designers only interested in use less 3 d. Not at all interested in data.

I teach students that graphics is key to understanding data –getting a decent graph is pasw18 is so awful that I am contemplating defecting to stata. If you’ve got to do scripts...

I could put all this on their complaints web site – but prefer to communicate with this excellent list. SPSS/PASW people on the list, please note.

Unfortunately, I can’t regress to 16 as I am on MAC Snow Leopard
BUT I can abandon SPSS altogether

Sigh...............SPSS is a valued friend in many incarnations

Diana


Reply | Threaded
Open this post in threaded view
|

Re: PASW/SPSS18 graphs worse than 16

ViAnn Beadle
PASW/SPSS18 graphs worse than 16

Chart Builder provides a very simple interface to an incredibly rich language called GPL. GPL will do this for you. IGRAPH will also do what you want but there is no dialog box interface for it in 18.

 

The chart editor is interactive (and always has been so I don’t understand what you think has changed here). You can save the results of your chart editors as a template which you can then apply to subsequent charts.

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of kornbrot
Sent: Thursday, November 19, 2009 9:42 AM
To: [hidden email]
Subject: PASW/SPSS18 graphs worse than 16

 

IF you want good graphics do NOT move to SPSS/PASW18-
Especially if you are so eccentric as to want to fit a straight line to points
Chart builder is the PITS
PASW have really fouled up the graph by menu
I want a simple x-y scatter plot graph with TWO groupings variables, i.e. one by colour, the other by symbol.
OBVIOUSLY I’d like to fit a best straight line
Meanwhile, of course, linearr regression STIL, STIL does not plot dpendent v independent
I also have a 3rd grouping variable for panels, & the simple desire to sauy that I’d like the arrangement 3 wide by 4 deep
ALL of this is available in the wonderfully simple and usable legacy i-graph in 16

IN 18.

  1. the legacy i-graphs have been removed

chart builder YUK YUK YUK
1   can only specify 1 grouping variable – colour OR pattern not both
2   can’t fit a straight line
3   does not permit specification of panel arrangements
4   does not allow specification of number format
Once the graph is generated one can change the number format, colours & background, fiddle with chart siz e to get apnels right & save as a template –the  syntax is not saved to the syntax window

ITS appallingly frustrating
NO doubt produced by idiot graphics designers only interested in use less 3 d. Not at all interested in data.

I teach students that graphics is key to understanding data –getting a decent graph is pasw18 is so awful that I am contemplating defecting to stata. If you’ve got to do scripts...

I could put all this on their complaints web site – but prefer to communicate with this excellent list. SPSS/PASW people on the list, please note.

Unfortunately, I can’t regress to 16 as I am on MAC Snow Leopard
BUT I can abandon SPSS altogether

Sigh...............SPSS is a valued friend in many incarnations

Diana