Good morning SPSS experts, I have a question for you, I have a database of 14,506 records which contains numerical type variables where the expense made in some concepts is entered 28, to be exact, the items are as accommodation, food and drinks, terrestrial transport, matirico transport, etc .., in each of the variables the interviewers write down the expenditure indicated by the respondents but there are values that are out of the average, the so-called OUTLIERS values, these values are analyzed with the command EXAMINE the which I think is very good This command generates some output charts and graphs, such as the summary of case processing, Descriptive Statistics, Estimators, Percentiles, Extreme Values and normality tests, then some graphs, histograms and a box graph. in this last one it shows me the values outliers represented by a star that are values more than 3 lengths of box of the 75th percentile, and the outlying values more than 1.5 lengths of box of the 75th percentile. These two are the ones that interest me but when there are many can not be visualized in the cash graphs, then I see the table of extreme values but here only I will die 5 major values and 5 minor values, as I could do to see a list of the total of extreme valroes and atypical similar to the extreme value box. ex. from sintasix EXAMINE VARIABLES = G_ALOJAMIENTO BY quarter / PLOT BOXPLOT STEMLEAF HISTOGRAM NPPLOT / COMPARE GROUPS / MESTIMATORS HUBER (1,339) ANDREW (1.34) HAMPEL (1.7,3,4,8,5) TUKEY (4,685) / PERCENTILES (5,10,25,50,75,90,95) HAVERAGE / STATISTICS DESCRIPTIVES EXTREME / CINTERVAL 95 / MISSING LISTWISE / NOTOTAL. thank you very much for your help, any comenterio will be very grateful. Sincerely, -- Javier Figueroa Procesamiento y Análisis de bases de datos Cel: 5927-4748 / 4970-1940 Casa: 2289-0184 |
Javier You might find it easier just to select out those that are outliers by definition (cutvalue = 1.5* 75th percentile) and view these cases with with the Frequencies command or (better) the case summaries. Suppose that the 75th percentile of var X =125. Compute cutvalue_varx = 1.5*125. Temporary. Select IF VarX GE Cut_value_varx
. SUMMARIZE /TABLES=
ID
VarX var1 var2 var2 /FORMAT=VALIDLIST NOCASENUM TOTAL LIMIT=1000 /TITLE='Case Summaries' /MISSING=VARIABLE /CELLS=COUNT. One advantage of the Summarize command is that you can view a table of values across several variables (in this case ID, Varx var1 var2 var3). By reporting a set of variables you will have some "context" in which these outliers reside. This can help you decide if the "outlier" makes sense give other related values. The default limit = 100 but you can change that as I have done above. I think you can also use a filter for this command - that is safer but a little more involved. I hope you find this helpful, WD On Thu, Apr 12, 2018 at 12:13 PM, Javier Figueroa <[hidden email]> wrote:
William N. Dudley, PhD 437-L Coleman BuildingProfessor - Public Health Education The School of Health and Human Sciences The University of North Carolina at Greensboro Greensboro, NC 27402-6170 See my research on ResearchGate VOICE 336.256 2475 |
In reply to this post by Javier Figueroa
You can specify how may outliers are displayed in EXAMINE, although this setting is not supported in the dialog box. The syntax would look like this to display 1000 outliers. EXAMINE VARIABLES=salary salbegin. /PLOT NONE /STATISTICS EXTREME(1000) /NOTOTAL. If you are interested in multivariate outliers, take a look at Data > Identify Unusual Cases. On Thu, Apr 12, 2018 at 10:13 AM, Javier Figueroa <[hidden email]> wrote:
|
Thank you very much for your prompt response, it was very helpful. Sincerely, 2018-04-12 14:23 GMT-06:00 Jon Peck <[hidden email]>:
Javier Figueroa Procesamiento y Análisis de bases de datos Cel: 5927-4748 / 4970-1940 Casa: 2289-0184 |
Free forum by Nabble | Edit this page |