show extreme values (OUTLIERS)

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

show extreme values (OUTLIERS)

Javier Figueroa

Good morning SPSS experts, I have a question for you, I have a database of 14,506 records which contains numerical type variables where the expense made in some concepts is entered 28, to be exact, the items are as accommodation, food and drinks, terrestrial transport, matirico transport, etc .., in each of the variables the interviewers write down the expenditure indicated by the respondents but there are values ​​that are out of the average, the so-called OUTLIERS values, these values ​​are analyzed with the command EXAMINE the which I think is very good

This command generates some output charts and graphs, such as the summary of case processing, Descriptive Statistics, Estimators, Percentiles, Extreme Values ​​and normality tests, then some graphs, histograms and a box graph.

in this last one it shows me the values ​​outliers represented by a star that are values ​​more than 3 lengths of box of the 75th percentile, and the outlying values ​​more than 1.5 lengths of box of the 75th percentile. These two are the ones that interest me but when there are many can not be visualized in the cash graphs, then I see the table of extreme values ​​but here only I will die 5 major values ​​and 5 minor values, as I could do to see a list of the total of extreme valroes and atypical similar to the extreme value box.

ex. from sintasix

EXAMINE VARIABLES = G_ALOJAMIENTO BY quarter
  / PLOT BOXPLOT STEMLEAF HISTOGRAM NPPLOT
  / COMPARE GROUPS
  / MESTIMATORS HUBER (1,339) ANDREW (1.34) HAMPEL (1.7,3,4,8,5) TUKEY (4,685)
  / PERCENTILES (5,10,25,50,75,90,95) HAVERAGE
  / STATISTICS DESCRIPTIVES EXTREME
  / CINTERVAL 95
  / MISSING LISTWISE
  / NOTOTAL.

thank you very much for your help, any comenterio will be very grateful.

Sincerely,
--
Javier Figueroa
Procesamiento y Análisis de bases de datos
Cel: 5927-4748 / 4970-1940
Casa: 2289-0184

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: show extreme values (OUTLIERS)

William Dudley-2
Javier

You might find it easier just to select out those that are outliers by definition (cutvalue = 1.5* 75th percentile) 
and view these cases with with the Frequencies command or (better) the case summaries.

Suppose that the 75th percentile of var X =125.

Compute cutvalue_varx = 1.5*125.

Temporary.
Select IF  VarX  GE Cut_value_varx  .

SUMMARIZE
  /TABLES=   ID  VarX var1 var2 var2
  /FORMAT=VALIDLIST NOCASENUM TOTAL LIMIT=1000
  /TITLE='Case Summaries'
  /MISSING=VARIABLE
  /CELLS=COUNT.

One advantage of the Summarize command is that you can view a table of values across several variables (in this case ID, Varx var1 var2 var3).
By reporting  a set of variables you will have some "context" in which these outliers reside.
This can help you decide if the "outlier" makes sense give other related values.

The default limit = 100 but you can change that as I have done above.
I think you can also use a filter for this command - that is safer but a little more involved.

I hope you find this helpful,

WD


On Thu, Apr 12, 2018 at 12:13 PM, Javier Figueroa <[hidden email]> wrote:

Good morning SPSS experts, I have a question for you, I have a database of 14,506 records which contains numerical type variables where the expense made in some concepts is entered 28, to be exact, the items are as accommodation, food and drinks, terrestrial transport, matirico transport, etc .., in each of the variables the interviewers write down the expenditure indicated by the respondents but there are values ​​that are out of the average, the so-called OUTLIERS values, these values ​​are analyzed with the command EXAMINE the which I think is very good

This command generates some output charts and graphs, such as the summary of case processing, Descriptive Statistics, Estimators, Percentiles, Extreme Values ​​and normality tests, then some graphs, histograms and a box graph.

in this last one it shows me the values ​​outliers represented by a star that are values ​​more than 3 lengths of box of the 75th percentile, and the outlying values ​​more than 1.5 lengths of box of the 75th percentile. These two are the ones that interest me but when there are many can not be visualized in the cash graphs, then I see the table of extreme values ​​but here only I will die 5 major values ​​and 5 minor values, as I could do to see a list of the total of extreme valroes and atypical similar to the extreme value box.

ex. from sintasix

EXAMINE VARIABLES = G_ALOJAMIENTO BY quarter
  / PLOT BOXPLOT STEMLEAF HISTOGRAM NPPLOT
  / COMPARE GROUPS
  / MESTIMATORS HUBER (1,339) ANDREW (1.34) HAMPEL (1.7,3,4,8,5) TUKEY (4,685)
  / PERCENTILES (5,10,25,50,75,90,95) HAVERAGE
  / STATISTICS DESCRIPTIVES EXTREME
  / CINTERVAL 95
  / MISSING LISTWISE
  / NOTOTAL.

thank you very much for your help, any comenterio will be very grateful.

Sincerely,
--
Javier Figueroa
Procesamiento y Análisis de bases de datos
Cel: 5927-4748 / 4970-1940
Casa: 2289-0184

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



--
William N. Dudley, PhD
Professor - Public Health Education
The School of Health and Human Sciences
The University of North Carolina at Greensboro
437-L Coleman Building
Greensboro, NC 27402-6170
See my research on
ResearchGate
VOICE 336.256 2475

email signature image example.png

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: show extreme values (OUTLIERS)

Jon Peck
In reply to this post by Javier Figueroa
You can specify how may outliers are displayed in EXAMINE, although this setting is not supported in the dialog box.  The syntax would look like this to display 1000 outliers.

EXAMINE VARIABLES=salary salbegin.
  /PLOT NONE
  /STATISTICS EXTREME(1000)
  /NOTOTAL.

If you are interested in multivariate outliers, take a look at Data > Identify Unusual Cases.

On Thu, Apr 12, 2018 at 10:13 AM, Javier Figueroa <[hidden email]> wrote:

Good morning SPSS experts, I have a question for you, I have a database of 14,506 records which contains numerical type variables where the expense made in some concepts is entered 28, to be exact, the items are as accommodation, food and drinks, terrestrial transport, matirico transport, etc .., in each of the variables the interviewers write down the expenditure indicated by the respondents but there are values ​​that are out of the average, the so-called OUTLIERS values, these values ​​are analyzed with the command EXAMINE the which I think is very good

This command generates some output charts and graphs, such as the summary of case processing, Descriptive Statistics, Estimators, Percentiles, Extreme Values ​​and normality tests, then some graphs, histograms and a box graph.

in this last one it shows me the values ​​outliers represented by a star that are values ​​more than 3 lengths of box of the 75th percentile, and the outlying values ​​more than 1.5 lengths of box of the 75th percentile. These two are the ones that interest me but when there are many can not be visualized in the cash graphs, then I see the table of extreme values ​​but here only I will die 5 major values ​​and 5 minor values, as I could do to see a list of the total of extreme valroes and atypical similar to the extreme value box.

ex. from sintasix

EXAMINE VARIABLES = G_ALOJAMIENTO BY quarter
  / PLOT BOXPLOT STEMLEAF HISTOGRAM NPPLOT
  / COMPARE GROUPS
  / MESTIMATORS HUBER (1,339) ANDREW (1.34) HAMPEL (1.7,3,4,8,5) TUKEY (4,685)
  / PERCENTILES (5,10,25,50,75,90,95) HAVERAGE
  / STATISTICS DESCRIPTIVES EXTREME
  / CINTERVAL 95
  / MISSING LISTWISE
  / NOTOTAL.

thank you very much for your help, any comenterio will be very grateful.

Sincerely,
--
Javier Figueroa
Procesamiento y Análisis de bases de datos
Cel: 5927-4748 / 4970-1940
Casa: 2289-0184

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: show extreme values (OUTLIERS)

Javier Figueroa
Thank you very much for your prompt response, it was very helpful.

Sincerely,

2018-04-12 14:23 GMT-06:00 Jon Peck <[hidden email]>:
You can specify how may outliers are displayed in EXAMINE, although this setting is not supported in the dialog box.  The syntax would look like this to display 1000 outliers.

EXAMINE VARIABLES=salary salbegin.
  /PLOT NONE
  /STATISTICS EXTREME(1000)
  /NOTOTAL.

If you are interested in multivariate outliers, take a look at Data > Identify Unusual Cases.

On Thu, Apr 12, 2018 at 10:13 AM, Javier Figueroa <[hidden email]> wrote:

Good morning SPSS experts, I have a question for you, I have a database of 14,506 records which contains numerical type variables where the expense made in some concepts is entered 28, to be exact, the items are as accommodation, food and drinks, terrestrial transport, matirico transport, etc .., in each of the variables the interviewers write down the expenditure indicated by the respondents but there are values ​​that are out of the average, the so-called OUTLIERS values, these values ​​are analyzed with the command EXAMINE the which I think is very good

This command generates some output charts and graphs, such as the summary of case processing, Descriptive Statistics, Estimators, Percentiles, Extreme Values ​​and normality tests, then some graphs, histograms and a box graph.

in this last one it shows me the values ​​outliers represented by a star that are values ​​more than 3 lengths of box of the 75th percentile, and the outlying values ​​more than 1.5 lengths of box of the 75th percentile. These two are the ones that interest me but when there are many can not be visualized in the cash graphs, then I see the table of extreme values ​​but here only I will die 5 major values ​​and 5 minor values, as I could do to see a list of the total of extreme valroes and atypical similar to the extreme value box.

ex. from sintasix

EXAMINE VARIABLES = G_ALOJAMIENTO BY quarter
  / PLOT BOXPLOT STEMLEAF HISTOGRAM NPPLOT
  / COMPARE GROUPS
  / MESTIMATORS HUBER (1,339) ANDREW (1.34) HAMPEL (1.7,3,4,8,5) TUKEY (4,685)
  / PERCENTILES (5,10,25,50,75,90,95) HAVERAGE
  / STATISTICS DESCRIPTIVES EXTREME
  / CINTERVAL 95
  / MISSING LISTWISE
  / NOTOTAL.

thank you very much for your help, any comenterio will be very grateful.

Sincerely,
--
Javier Figueroa
Procesamiento y Análisis de bases de datos
Cel: 5927-4748 / 4970-1940
Casa: 2289-0184

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD



--
Jon K Peck
[hidden email]




--
Javier Figueroa
Procesamiento y Análisis de bases de datos
Cel: 5927-4748 / 4970-1940
Casa: 2289-0184

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD