|
Dear Listers,
I found one difficulty in doing outlier treatment. Here I have some 30 variables which have to be treated for outliers. This is the process I am doing. I am replacing each non missing value with Mean - 3*SD of a variable if that variable value is less than Mean - 3*SD. Replacing with Mean +3SD of variable if that variable value is greater than Mean +3SD. For this exercise...I will have to calculate means and Sds of a variable first and create M-3SD and M+3SD then create two variables (M-3sd, M+3Sd) for each of the 30 variables. Then Iam comparing each value and doing outlier treatment. Is there any way that I can do this using a macro which can reduce the steps as well as look my syntax easy.. when referred by others. I will be very happy... any body helps. Thanks and regards Lakshmikanth ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Lakshmikanth,
A macro might be easier, I'm not sure. Also, a python rountine might be more elegant. I'm also not sure. But, I think you can do this with syntax. I'll assume you are familiar with syntax because I am only going to outline the structure I'd start with (and then probably modify when I found my first scheme didn't work right). So, here's how I'd do this. Descriptives v1 to v3/statistics mean stddev. Use the OMS code to export the statistics table from the descriptives command to a file, call this file 'stats'. I can't do this off the top of my head but it's pretty simple. I'd inspect the stats file to ascertain its structure and then do what ever restructuring was needed to create a file having one record and that had the following structure V1Mean V1SD .... V30Mean V30SD. Probably casestovars would be required along with variable renameing, etc. Let me add that it would be conveniant but not required that the variable order in stats be V1Mean ... V30Mean V1SD ... V30SD. Then, do match files with the table keyword to append the means and sd record from stats to every record in the original file. You won't need a by variable, I do not think but I haven't tried this kind of match for quite a while. However, if by variable is needed, then create a variable, call it 'link' in both datasets and give it the same value for all records in both datasets. There's two ways to go at the next step. One way is via Do repeat; the other is via Loop-end loop. I'll use do repeat AND I'll assume the 'conveniant' variable order described above. Do repeat a=v1 to v30/b=V1Mean to V30Mean/c=V1SD to V30SD. + do if (a < b-3*c). + compute a=b-3*c. + else if (a > b+3*c). + compute a=b+3*c. + end if. End repeat. Gene Maguin >>I found one difficulty in doing outlier treatment. Here I have some 30 variables which have to be treated for outliers. This is the process I am doing. I am replacing each non missing value with Mean - 3*SD of a variable if that variable value is less than Mean - 3*SD. Replacing with Mean +3SD of variable if that variable value is greater than Mean +3SD. For this exercise...I will have to calculate means and Sds of a variable first and create M-3SD and M+3SD then create two variables (M-3sd, M+3Sd) for each of the 30 variables. Then Iam comparing each value and doing outlier treatment. Is there any way that I can do this using a macro which can reduce the steps as well as look my syntax easy.. when referred by others. I will be very happy... any body helps. Thanks and regards Lakshmikanth ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
compute constant=1. aggregate outfile=* mode=addvaraibles /break=constant /meanv1 to meanv30 =mean(v1 to v30) /sdv1 to sdv30 =sd(v1 to v30). Art Gene Maguin wrote: ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARDLakshmikanth, A macro might be easier, I'm not sure. Also, a python rountine might be more elegant. I'm also not sure. But, I think you can do this with syntax. I'll assume you are familiar with syntax because I am only going to outline the structure I'd start with (and then probably modify when I found my first scheme didn't work right). So, here's how I'd do this. Descriptives v1 to v3/statistics mean stddev. Use the OMS code to export the statistics table from the descriptives command to a file, call this file 'stats'. I can't do this off the top of my head but it's pretty simple. I'd inspect the stats file to ascertain its structure and then do what ever restructuring was needed to create a file having one record and that had the following structure V1Mean V1SD .... V30Mean V30SD. Probably casestovars would be required along with variable renameing, etc. Let me add that it would be conveniant but not required that the variable order in stats be V1Mean ... V30Mean V1SD ... V30SD. Then, do match files with the table keyword to append the means and sd record from stats to every record in the original file. You won't need a by variable, I do not think but I haven't tried this kind of match for quite a while. However, if by variable is needed, then create a variable, call it 'link' in both datasets and give it the same value for all records in both datasets. There's two ways to go at the next step. One way is via Do repeat; the other is via Loop-end loop. I'll use do repeat AND I'll assume the 'conveniant' variable order described above. Do repeat a=v1 to v30/b=V1Mean to V30Mean/c=V1SD to V30SD. + do if (a < b-3*c). + compute a=b-3*c. + else if (a > b+3*c). + compute a=b+3*c. + end if. End repeat. Gene Maguin
Art Kendall
Social Research Consultants |
|
I'm not sure that trimming outliers is a good idea, but if you have Version 18 and the DataPrep option, the ADP dialog/command can do this for you automatically. Jon Peck SPSS, an IBM Company [hidden email] 312-651-3435
It might be easier to do something like this untested syntax. compute constant=1. aggregate outfile=* mode=addvaraibles /break=constant /meanv1 to meanv30 =mean(v1 to v30) /sdv1 to sdv30 =sd(v1 to v30). Art Gene Maguin wrote: Lakshmikanth, A macro might be easier, I'm not sure. Also, a python rountine might be more elegant. I'm also not sure. But, I think you can do this with syntax. I'll assume you are familiar with syntax because I am only going to outline the structure I'd start with (and then probably modify when I found my first scheme didn't work right). So, here's how I'd do this. Descriptives v1 to v3/statistics mean stddev. Use the OMS code to export the statistics table from the descriptives command to a file, call this file 'stats'. I can't do this off the top of my head but it's pretty simple. I'd inspect the stats file to ascertain its structure and then do what ever restructuring was needed to create a file having one record and that had the following structure V1Mean V1SD .... V30Mean V30SD. Probably casestovars would be required along with variable renameing, etc. Let me add that it would be conveniant but not required that the variable order in stats be V1Mean ... V30Mean V1SD ... V30SD. Then, do match files with the table keyword to append the means and sd record from stats to every record in the original file. You won't need a by variable, I do not think but I haven't tried this kind of match for quite a while. However, if by variable is needed, then create a variable, call it 'link' in both datasets and give it the same value for all records in both datasets. There's two ways to go at the next step. One way is via Do repeat; the other is via Loop-end loop. I'll use do repeat AND I'll assume the 'conveniant' variable order described above. Do repeat a=v1 to v30/b=V1Mean to V30Mean/c=V1SD to V30SD. + do if (a < b-3*c). + compute a=b-3*c. + else if (a > b+3*c). + compute a=b+3*c. + end if. End repeat. Gene Maguin I found one difficulty in doing outlier treatment. Here I have some 30 variables which have to be treated for outliers. This is the process I am doing. I am replacing each non missing value with Mean - 3*SD of a variable if that variable value is less than Mean - 3*SD. Replacing with Mean +3SD of variable if that variable value is greater than Mean +3SD. For this exercise...I will have to calculate means and Sds of a variable first and create M-3SD and M+3SD then create two variables (M-3sd, M+3Sd) for each of the 30 variables. Then Iam comparing each value and doing outlier treatment. Is there any way that I can do this using a macro which can reduce the steps as well as look my syntax easy.. when referred by others. I will be very happy... any body helps. Thanks and regards Lakshmikanth ===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Administrator
|
I agree with Jon's comment that trimming of univariate outliers is not necessarily a good idea. What are you going to do with the 30 variables? If they are being used in regression models, I'd be more concerned about bi- or multivariate outliers (e.g., the 5 year old who is 6 feet tall), which often signal data entry errors. I'd also be more concerned about influential points (as measured by Cook's distance, for example) than univariate outliers.
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
|
Trisha Greenhalgh comments on the topic of automatically correcting
outliers as if they were simple error in one of her papers (masterpieces, BTW) in BMJ (BMJ 1997;315:364-366. How to read a paper: Statistics for the non-statistician. I: Different types of data need different statistical tests) "... A few years ago, while doing a research project, I measured several different hormones in about 30 subjects. One subject's growth hormone levels came back about 100 times higher than everyone else's. I assumed this was a transcription error, so I moved the decimal point two places to the left. Some weeks later, I met the technician who had analysed the specimens and he asked, "Whatever happened to that chap with acromegaly?" I always mention that story to my students when I explain what an outlier is and how to deal (correctly) with it. My two euro-cents. Marta GG Bruce Weaver wrote: > Jon K Peck wrote: > >> I'm not sure that trimming outliers is a good idea, but if you have >> Version 18 and the DataPrep option, the ADP dialog/command can do this for >> you automatically. >> >> Jon Peck >> SPSS, an IBM Company >> [hidden email] >> 312-651-3435 >> >> > > I agree with Jon's comment that trimming of univariate outliers is not > necessarily a good idea. > > -- For miscellaneous SPSS related statistical stuff, visit: http://gjyp.nl/marta/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
Especially if you are so eccentric as to want to fit a straight line to points Chart builder is the PITS PASW have really fouled up the graph by menu I want a simple x-y scatter plot graph with TWO groupings variables, i.e. one by colour, the other by symbol. OBVIOUSLY I’d like to fit a best straight line Meanwhile, of course, linearr regression STIL, STIL does not plot dpendent v independent I also have a 3rd grouping variable for panels, & the simple desire to sauy that I’d like the arrangement 3 wide by 4 deep ALL of this is available in the wonderfully simple and usable legacy i-graph in 16 IN 18.
1 can only specify 1 grouping variable – colour OR pattern not both 2 can’t fit a straight line 3 does not permit specification of panel arrangements 4 does not allow specification of number format Once the graph is generated one can change the number format, colours & background, fiddle with chart siz e to get apnels right & save as a template –the syntax is not saved to the syntax window ITS appallingly frustrating NO doubt produced by idiot graphics designers only interested in use less 3 d. Not at all interested in data. I teach students that graphics is key to understanding data –getting a decent graph is pasw18 is so awful that I am contemplating defecting to stata. If you’ve got to do scripts... I could put all this on their complaints web site – but prefer to communicate with this excellent list. SPSS/PASW people on the list, please note. Unfortunately, I can’t regress to 16 as I am on MAC Snow Leopard BUT I can abandon SPSS altogether Sigh...............SPSS is a valued friend in many incarnations Diana |
|
Chart Builder provides a very simple interface to an incredibly
rich language called GPL. GPL will do this for you. IGRAPH will also do what
you want but there is no dialog box interface for it in 18. The chart editor is interactive (and always has been so I don’t
understand what you think has changed here). You can save the results of your
chart editors as a template which you can then apply to subsequent charts. From: SPSSX(r) Discussion
[mailto:[hidden email]] On Behalf Of kornbrot IF
you want good graphics do NOT move to SPSS/PASW18-
chart
builder YUK YUK YUK |
| Free forum by Nabble | Edit this page |
