SPSSX Discussion

IDENTIFY THE QUARTILLES AND CREATE NEW VARIABLES THOSE VALUES

Classic

List

Threaded

10 messages Options

Javier Figueroa

IDENTIFY THE QUARTILLES AND CREATE NEW VARIABLES THOSE VALUES

Hello spss experts !!!!

I hope everyone is well, as far as possible, with this pandemic it affected the whole world a lot and especially the tourism sector that I consider was the most affected by the covid-19 pandemic.

I am writing to you because I have a situation in a database that I have not been able to solve and that may be something simple but I have turned it around in my head and I have not been able to do it.

I need to create a variable that contains the first quartile (Q1) and the 2nd quartile (MEDIAN) and 3rd quartile (Q3) to then calculate the interqualty range (Q3-Q1)

This result will help me to identify the outliers

this is the syntax i am using.

* Encoding: UTF-8.
/* procedimiento crear base de datos PRUEBA.

data list list
/G_ALOJAMIENTO_D imp_paquete_tur tipoviaj TOTAL.
begin data.
0.00 0 1 1
8.93 0 1 1
67.50 0 2 1
0.00 0 1 1
100.00 0 2 1
0.00 0 1 1
0.00 0 2 1
150.00 0 1 1
0.00 0 2 1
0.00 0 1 1
0.00 0 1 1
16.67 0 1 1
20.00 0 1 1
15.00 0 2 1
8.33 0 2 1
0.00 0 1 1
8.33 0 2 1
0.00 0 2 1
0.00 0 2 1
12.50 0 2 1
0.00 0 2 1
13.33 0 2 1
7.50 0 1 1
0.00 0 2 1
0.00 0 1 1
0.00 0 1 1
0.00 0 2 1
23.51 0 1 1
0.00 0 1 1
0.00 0 2 1
end data.
execute.

numeric id(f8.0).
compute id=$casenum.
execute.
variable level all (scale).

COMPUTE x_conteo=1.
EXECUTE.

SORT CASES BY G_ALOJAMIENTO_D(A).

TEMPORARY.
SELECT IF G_ALOJAMIENTO_D>0.
FREQUENCIES VARIABLES=G_ALOJAMIENTO_D
/STATISTICS=MEDIAN
/NTILES=4
/PERCENTILES=25.0 50.0 75.0
/ORDER=ANALYSIS.

TEMPORARY.
SELECT IF G_ALOJAMIENTO_D>0.
AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/BREAK=TOTAL
/x_conteo_sum=SUM(x_conteo).

compute x_Q1=0.
compute x_Q2=0.
compute x_Q3=0.
if x_conteo_sum>0 x_Q1= (x_conteo_sum+1)/4. /* Obtener registro o caso del Q1.
if x_conteo_sum>0 x_Q2= (x_conteo_sum+1)/2. /* Obtener registro o caso del Q2.
if x_conteo_sum>0 x_Q3= 3*(x_conteo_sum+1)/4. /* Obtener registro o caso del Q3.
EXECUTE.

* Cuartiles: percentiles 25, 50 y 75, es decir, los valores por
debajo de los cuales se encuentra 25%, 50% y 75% de los
casos respectivamente.
* Q1 = 25%(NVAL+1)=3.50.
* Q2 = 50%(NVAL+1)=15.00.
* Q3= 75%(NVAL+1)=45.50.

/* AQUI PONER PROCEDIMIENTO PARA CREAR VARIABLES NUEVAS CON LOS VALORES DE LOS CUARTILES.

I am very grateful to you in advance and I wish you an excellent start to the week.

Javier Figueroa
Procesamiento y Análisis de bases de datos

Cel: 5927-4748 / 4970-1940

Casa: 2289-0184

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Bruce Weaver

Re: IDENTIFY THE QUARTILLES AND CREATE NEW VARIABLES THOSE VALUES

Administrator

Hello Javier--hola. If I follow what you are trying to do, I think you might
have an error in your computation of Q1. Using EXAMINE, I match your
estimates of Q2 and Q3, but get a different result for Q1. Here is my code
and (selected) output.

TEMPORARY.
SELECT IF G_ALOJAMIENTO_D>0.
EXAMINE VARIABLES=G_ALOJAMIENTO_D
/PLOT NONE
/PERCENTILES(25,50,75) HAVERAGE
/STATISTICS DESCRIPTIVES
/CINTERVAL 95
/MISSING LISTWISE
/NOTOTAL.

Percentiles
Percentiles
25 50 75
Weighted Average(Definition 1) G_ALOJAMIENTO_D
8.6300 15.0000 45.5050
Tukey's Hinges G_ALOJAMIENTO_D
8.9300 15.0000 23.5100

As you see, the Weighted Average (Definition 1) results match yours for Q2
and Q3, but not for Q1.

If you do want the Definition 1 quartiles, you could use EXAMINE in
conjunction with OMS to write them to a new dataset. After that, do
whatever data management is necessary in the new dataset, and then merge it
with the original to bring in the quartiles.

HTH.

Javier Figueroa wrote

> Hello spss experts !!!!
>
> I hope everyone is well, as far as possible, with this pandemic it
> affected
> the whole world a lot and especially the tourism sector that I consider
> was
> the most affected by the covid-19 pandemic.
>
> I am writing to you because I have a situation in a database that I have
> not been able to solve and that may be something simple but I have turned
> it around in my head and I have not been able to do it.
>
> I need to create a variable that contains the first quartile (Q1) and the
> 2nd quartile (MEDIAN) and 3rd quartile (Q3) to then calculate the
> interqualty range (Q3-Q1)
>
> This result will help me to identify the outliers
>
> this is the syntax i am using.
>
> * Encoding: UTF-8.
> /* procedimiento crear base de datos PRUEBA.
>
> data list list
> /G_ALOJAMIENTO_D imp_paquete_tur tipoviaj TOTAL.
> begin data.
> 0.00 0 1 1
> 8.93 0 1 1
> 67.50 0 2 1
> 0.00 0 1 1
> 100.00 0 2 1
> 0.00 0 1 1
> 0.00 0 2 1
> 150.00 0 1 1
> 0.00 0 2 1
> 0.00 0 1 1
> 0.00 0 1 1
> 16.67 0 1 1
> 20.00 0 1 1
> 15.00 0 2 1
> 8.33 0 2 1
> 0.00 0 1 1
> 8.33 0 2 1
> 0.00 0 2 1
> 0.00 0 2 1
> 12.50 0 2 1
> 0.00 0 2 1
> 13.33 0 2 1
> 7.50 0 1 1
> 0.00 0 2 1
> 0.00 0 1 1
> 0.00 0 1 1
> 0.00 0 2 1
> 23.51 0 1 1
> 0.00 0 1 1
> 0.00 0 2 1
> end data.
> execute.
>
> numeric id(f8.0).
> compute id=$casenum.
> execute.
> variable level all (scale).
>
> COMPUTE x_conteo=1.
> EXECUTE.
>
> SORT CASES BY G_ALOJAMIENTO_D(A).
>
> TEMPORARY.
> SELECT IF G_ALOJAMIENTO_D>0.
> FREQUENCIES VARIABLES=G_ALOJAMIENTO_D
> /STATISTICS=MEDIAN
> /NTILES=4
> /PERCENTILES=25.0 50.0 75.0
> /ORDER=ANALYSIS.
>
> TEMPORARY.
> SELECT IF G_ALOJAMIENTO_D>0.
> AGGREGATE
> /OUTFILE=* MODE=ADDVARIABLES
> /BREAK=TOTAL
> /x_conteo_sum=SUM(x_conteo).
>
> compute x_Q1=0.
> compute x_Q2=0.
> compute x_Q3=0.
> if x_conteo_sum>0 x_Q1= (x_conteo_sum+1)/4. /* Obtener registro o caso del
> Q1.
> if x_conteo_sum>0 x_Q2= (x_conteo_sum+1)/2. /* Obtener registro o caso del
> Q2.
> if x_conteo_sum>0 x_Q3= 3*(x_conteo_sum+1)/4. /* Obtener registro o caso
> del Q3.
> EXECUTE.
>
> * Cuartiles: percentiles 25, 50 y 75, es decir, los valores por
> debajo de los cuales se encuentra 25%, 50% y 75% de los
> casos respectivamente.
> * Q1 = 25%(NVAL+1)=3.50.
> * Q2 = 50%(NVAL+1)=15.00.
> * Q3= 75%(NVAL+1)=45.50.
>
> /* AQUI PONER PROCEDIMIENTO PARA CREAR VARIABLES NUEVAS CON LOS VALORES DE
> LOS CUARTILES.
>
>
>
> I am very grateful to you in advance and I wish you an excellent start to
> the week.
>
>
> --
>
> *Javier FigueroaProcesamiento y Análisis de bases de datos*
> *Cel: 5927-4748 / 4970-1940*
>
>
> *Casa: 2289-0184*
>
> =====================
> To manage your subscription to SPSSX-L, send a message to

> LISTSERV@.UGA

> (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD

-----
--
Bruce Weaver
[hidden email]
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

NOTE: My Hotmail account is not monitored regularly.
To send me an e-mail, please use the address shown above.

--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING:
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).

Jon Peck

Re: IDENTIFY THE QUARTILLES AND CREATE NEW VARIABLES THOSE VALUES

In reply to this post by Javier Figueroa

The EXAMINE procedure (Analyze > Descriptive Statistics > Explore) can give you the IQR, FREQUENCIES can also give you selected percentiles. This example uses FREQUENCIES.

dataset name main.

dataset declare stats.

oms select tables /if subtypes='Statistics'

/destination outfile=stats format=sav

/columns dimnames="Statistics".

FREQUENCIES VARIABLES=salary

/FORMAT=NOTABLE

/NTILES=4.

omsend.

compute const=1.

dataset activate stats.

compute const=1.

dataset activate main.

match files /file=* /table=stats/by const

/drop Command_ to Missing @50 const.

exec.

On Tue, Sep 29, 2020 at 11:13 AM Javier Figueroa <[hidden email]> wrote:

Hello spss experts !!!!

I hope everyone is well, as far as possible, with this pandemic it affected the whole world a lot and especially the tourism sector that I consider was the most affected by the covid-19 pandemic.

I am writing to you because I have a situation in a database that I have not been able to solve and that may be something simple but I have turned it around in my head and I have not been able to do it.

I need to create a variable that contains the first quartile (Q1) and the 2nd quartile (MEDIAN) and 3rd quartile (Q3) to then calculate the interqualty range (Q3-Q1)

This result will help me to identify the outliers

this is the syntax i am using.

* Encoding: UTF-8.
/* procedimiento crear base de datos PRUEBA.

data list list
/G_ALOJAMIENTO_D imp_paquete_tur tipoviaj TOTAL.
begin data.
0.00 0 1 1
8.93 0 1 1
67.50 0 2 1
0.00 0 1 1
100.00 0 2 1
0.00 0 1 1
0.00 0 2 1
150.00 0 1 1
0.00 0 2 1
0.00 0 1 1
0.00 0 1 1
16.67 0 1 1
20.00 0 1 1
15.00 0 2 1
8.33 0 2 1
0.00 0 1 1
8.33 0 2 1
0.00 0 2 1
0.00 0 2 1
12.50 0 2 1
0.00 0 2 1
13.33 0 2 1
7.50 0 1 1
0.00 0 2 1
0.00 0 1 1
0.00 0 1 1
0.00 0 2 1
23.51 0 1 1
0.00 0 1 1
0.00 0 2 1
end data.
execute.

numeric id(f8.0).
compute id=$casenum.
execute.
variable level all (scale).

COMPUTE x_conteo=1.
EXECUTE.

SORT CASES BY G_ALOJAMIENTO_D(A).

TEMPORARY.
SELECT IF G_ALOJAMIENTO_D>0.
FREQUENCIES VARIABLES=G_ALOJAMIENTO_D
/STATISTICS=MEDIAN
/NTILES=4
/PERCENTILES=25.0 50.0 75.0
/ORDER=ANALYSIS.

TEMPORARY.
SELECT IF G_ALOJAMIENTO_D>0.
AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/BREAK=TOTAL
/x_conteo_sum=SUM(x_conteo).

compute x_Q1=0.
compute x_Q2=0.
compute x_Q3=0.
if x_conteo_sum>0 x_Q1= (x_conteo_sum+1)/4. /* Obtener registro o caso del Q1.
if x_conteo_sum>0 x_Q2= (x_conteo_sum+1)/2. /* Obtener registro o caso del Q2.
if x_conteo_sum>0 x_Q3= 3*(x_conteo_sum+1)/4. /* Obtener registro o caso del Q3.
EXECUTE.

* Cuartiles: percentiles 25, 50 y 75, es decir, los valores por
debajo de los cuales se encuentra 25%, 50% y 75% de los
casos respectivamente.
* Q1 = 25%(NVAL+1)=3.50.
* Q2 = 50%(NVAL+1)=15.00.
* Q3= 75%(NVAL+1)=45.50.

/* AQUI PONER PROCEDIMIENTO PARA CREAR VARIABLES NUEVAS CON LOS VALORES DE LOS CUARTILES.

I am very grateful to you in advance and I wish you an excellent start to the week.

--
Javier Figueroa
Procesamiento y Análisis de bases de datos
Cel: 5927-4748 / 4970-1940
Casa: 2289-0184

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Jon K Peck
[hidden email]

Javier Figueroa

Re: IDENTIFY THE QUARTILLES AND CREATE NEW VARIABLES THOSE VALUES

Thank you very much for your answers and suggestions

I thank in general this excellent community of experts

I want to apologize if for any reason the mail gave some misunderstanding.

At no time did I want someone to do the procedure for me, much less take advantage of their good deed, I understand that everyone is very busy with their activities and this is something extra, and they do it to help. But in my country Guatemala there are not many experts in SPSS and I believed that I could turn to their knowledge to solve my doubt.

I understand that my English language is not very fluent, much less perfect, so "Bad Interpretations" happen.

I apologize again and I want to express my gratitude for your support because it is not the first time I have received it, since more than once they have saved my life and because of your suggestions I have been able to learn more.

I am not even half her knowledge, but here in Guatemala I am above average and when I can solve a problem I do it with great pleasure, here I am an advanced level! LOL.

I hope you have an excellent day and that your activities are being carried out successfully.

and remember,

if I take care of myself, I take care of others!

#JuntosSaldremosAdelante
#GuatemalaNoSeDetiene

Sincerely,

El mar., 29 sept. 2020 a las 11:50, Jon Peck (<[hidden email]>) escribió:

The EXAMINE procedure (Analyze > Descriptive Statistics > Explore) can give you the IQR, FREQUENCIES can also give you selected percentiles. This example uses FREQUENCIES.

dataset name main.
dataset declare stats.
oms select tables /if subtypes='Statistics'
/destination outfile=stats format=sav
/columns dimnames="Statistics".
FREQUENCIES VARIABLES=salary
/FORMAT=NOTABLE
/NTILES=4.
omsend.
compute const=1.
dataset activate stats.
compute const=1.
dataset activate main.
match files /file=* /table=stats/by const
/drop Command_ to Missing @50 const.
exec.

On Tue, Sep 29, 2020 at 11:13 AM Javier Figueroa <[hidden email]> wrote:
Hello spss experts !!!!

I hope everyone is well, as far as possible, with this pandemic it affected the whole world a lot and especially the tourism sector that I consider was the most affected by the covid-19 pandemic.

I am writing to you because I have a situation in a database that I have not been able to solve and that may be something simple but I have turned it around in my head and I have not been able to do it.

I need to create a variable that contains the first quartile (Q1) and the 2nd quartile (MEDIAN) and 3rd quartile (Q3) to then calculate the interqualty range (Q3-Q1)

This result will help me to identify the outliers

this is the syntax i am using.

* Encoding: UTF-8.
/* procedimiento crear base de datos PRUEBA.

data list list
/G_ALOJAMIENTO_D imp_paquete_tur tipoviaj TOTAL.
begin data.
0.00 0 1 1
8.93 0 1 1
67.50 0 2 1
0.00 0 1 1
100.00 0 2 1
0.00 0 1 1
0.00 0 2 1
150.00 0 1 1
0.00 0 2 1
0.00 0 1 1
0.00 0 1 1
16.67 0 1 1
20.00 0 1 1
15.00 0 2 1
8.33 0 2 1
0.00 0 1 1
8.33 0 2 1
0.00 0 2 1
0.00 0 2 1
12.50 0 2 1
0.00 0 2 1
13.33 0 2 1
7.50 0 1 1
0.00 0 2 1
0.00 0 1 1
0.00 0 1 1
0.00 0 2 1
23.51 0 1 1
0.00 0 1 1
0.00 0 2 1
end data.
execute.

numeric id(f8.0).
compute id=$casenum.
execute.
variable level all (scale).

COMPUTE x_conteo=1.
EXECUTE.

SORT CASES BY G_ALOJAMIENTO_D(A).

TEMPORARY.
SELECT IF G_ALOJAMIENTO_D>0.
FREQUENCIES VARIABLES=G_ALOJAMIENTO_D
/STATISTICS=MEDIAN
/NTILES=4
/PERCENTILES=25.0 50.0 75.0
/ORDER=ANALYSIS.

TEMPORARY.
SELECT IF G_ALOJAMIENTO_D>0.
AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/BREAK=TOTAL
/x_conteo_sum=SUM(x_conteo).

compute x_Q1=0.
compute x_Q2=0.
compute x_Q3=0.
if x_conteo_sum>0 x_Q1= (x_conteo_sum+1)/4. /* Obtener registro o caso del Q1.
if x_conteo_sum>0 x_Q2= (x_conteo_sum+1)/2. /* Obtener registro o caso del Q2.
if x_conteo_sum>0 x_Q3= 3*(x_conteo_sum+1)/4. /* Obtener registro o caso del Q3.
EXECUTE.

* Cuartiles: percentiles 25, 50 y 75, es decir, los valores por
debajo de los cuales se encuentra 25%, 50% y 75% de los
casos respectivamente.
* Q1 = 25%(NVAL+1)=3.50.
* Q2 = 50%(NVAL+1)=15.00.
* Q3= 75%(NVAL+1)=45.50.

/* AQUI PONER PROCEDIMIENTO PARA CREAR VARIABLES NUEVAS CON LOS VALORES DE LOS CUARTILES.

I am very grateful to you in advance and I wish you an excellent start to the week.

--
Javier Figueroa
Procesamiento y Análisis de bases de datos
Cel: 5927-4748 / 4970-1940
Casa: 2289-0184

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

--
Jon K Peck
[hidden email]

Javier Figueroa
Procesamiento y Análisis de bases de datos

Cel: 5927-4748 / 4970-1940

Casa: 2289-0184

Javier Figueroa

Re: IDENTIFY THE QUARTILLES AND CREATE NEW VARIABLES THOSE VALUES

No puedo dejar de agradecer y comentarles que gracias a sus sugerencias, encontré lo que estaba tratando de hacer.

En efecto Brus, si tengo un error en el Q1 y es que copie el resultado de mi formula que me indica el registro donde esta el Q1 y no el resultado del Q1 que es 8.63

Les dejo el link

https://www.ibm.com/support/knowledgecenter/es/SSLVMB_sub/statistics_mainhelp_ddita/spss/base/oms_sav_examples_intro.html

Nota: Se los escribo en español y una traducción al inglés.

atentamente,

TRADUCCIÓN

I can't help but thank you and comment that thanks to your suggestions, I found what I was trying to do.

Indeed Brus, if I have an error in Q1 and it is that I copy the result of my formula that indicates the register where Q1 is and not the result of Q1 which is 8.63

Using commands in conjunction with OMS is wonderful.

Thank you very much Jon, for the example.

Note: I am writing them in Spanish and an English translation.

Sincerely,

El mar., 29 sept. 2020 a las 12:54, Javier Figueroa (<[hidden email]>) escribió:

Thank you very much for your answers and suggestions

I thank in general this excellent community of experts

I want to apologize if for any reason the mail gave some misunderstanding.

At no time did I want someone to do the procedure for me, much less take advantage of their good deed, I understand that everyone is very busy with their activities and this is something extra, and they do it to help. But in my country Guatemala there are not many experts in SPSS and I believed that I could turn to their knowledge to solve my doubt.

I understand that my English language is not very fluent, much less perfect, so "Bad Interpretations" happen.

I apologize again and I want to express my gratitude for your support because it is not the first time I have received it, since more than once they have saved my life and because of your suggestions I have been able to learn more.

I am not even half her knowledge, but here in Guatemala I am above average and when I can solve a problem I do it with great pleasure, here I am an advanced level! LOL.

I hope you have an excellent day and that your activities are being carried out successfully.

and remember,

if I take care of myself, I take care of others!

#JuntosSaldremosAdelante
#GuatemalaNoSeDetiene

Sincerely,

El mar., 29 sept. 2020 a las 11:50, Jon Peck (<[hidden email]>) escribió:
The EXAMINE procedure (Analyze > Descriptive Statistics > Explore) can give you the IQR, FREQUENCIES can also give you selected percentiles. This example uses FREQUENCIES.

dataset name main.
dataset declare stats.
oms select tables /if subtypes='Statistics'
/destination outfile=stats format=sav
/columns dimnames="Statistics".
FREQUENCIES VARIABLES=salary
/FORMAT=NOTABLE
/NTILES=4.
omsend.
compute const=1.
dataset activate stats.
compute const=1.
dataset activate main.
match files /file=* /table=stats/by const
/drop Command_ to Missing @50 const.
exec.

On Tue, Sep 29, 2020 at 11:13 AM Javier Figueroa <[hidden email]> wrote:
Hello spss experts !!!!

I hope everyone is well, as far as possible, with this pandemic it affected the whole world a lot and especially the tourism sector that I consider was the most affected by the covid-19 pandemic.

I am writing to you because I have a situation in a database that I have not been able to solve and that may be something simple but I have turned it around in my head and I have not been able to do it.

I need to create a variable that contains the first quartile (Q1) and the 2nd quartile (MEDIAN) and 3rd quartile (Q3) to then calculate the interqualty range (Q3-Q1)

This result will help me to identify the outliers

this is the syntax i am using.

* Encoding: UTF-8.
/* procedimiento crear base de datos PRUEBA.

data list list
/G_ALOJAMIENTO_D imp_paquete_tur tipoviaj TOTAL.
begin data.
0.00 0 1 1
8.93 0 1 1
67.50 0 2 1
0.00 0 1 1
100.00 0 2 1
0.00 0 1 1
0.00 0 2 1
150.00 0 1 1
0.00 0 2 1
0.00 0 1 1
0.00 0 1 1
16.67 0 1 1
20.00 0 1 1
15.00 0 2 1
8.33 0 2 1
0.00 0 1 1
8.33 0 2 1
0.00 0 2 1
0.00 0 2 1
12.50 0 2 1
0.00 0 2 1
13.33 0 2 1
7.50 0 1 1
0.00 0 2 1
0.00 0 1 1
0.00 0 1 1
0.00 0 2 1
23.51 0 1 1
0.00 0 1 1
0.00 0 2 1
end data.
execute.

numeric id(f8.0).
compute id=$casenum.
execute.
variable level all (scale).

COMPUTE x_conteo=1.
EXECUTE.

SORT CASES BY G_ALOJAMIENTO_D(A).

TEMPORARY.
SELECT IF G_ALOJAMIENTO_D>0.
FREQUENCIES VARIABLES=G_ALOJAMIENTO_D
/STATISTICS=MEDIAN
/NTILES=4
/PERCENTILES=25.0 50.0 75.0
/ORDER=ANALYSIS.

TEMPORARY.
SELECT IF G_ALOJAMIENTO_D>0.
AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/BREAK=TOTAL
/x_conteo_sum=SUM(x_conteo).

compute x_Q1=0.
compute x_Q2=0.
compute x_Q3=0.
if x_conteo_sum>0 x_Q1= (x_conteo_sum+1)/4. /* Obtener registro o caso del Q1.
if x_conteo_sum>0 x_Q2= (x_conteo_sum+1)/2. /* Obtener registro o caso del Q2.
if x_conteo_sum>0 x_Q3= 3*(x_conteo_sum+1)/4. /* Obtener registro o caso del Q3.
EXECUTE.

* Cuartiles: percentiles 25, 50 y 75, es decir, los valores por
debajo de los cuales se encuentra 25%, 50% y 75% de los
casos respectivamente.
* Q1 = 25%(NVAL+1)=3.50.
* Q2 = 50%(NVAL+1)=15.00.
* Q3= 75%(NVAL+1)=45.50.

/* AQUI PONER PROCEDIMIENTO PARA CREAR VARIABLES NUEVAS CON LOS VALORES DE LOS CUARTILES.

I am very grateful to you in advance and I wish you an excellent start to the week.

--
Javier Figueroa
Procesamiento y Análisis de bases de datos
Cel: 5927-4748 / 4970-1940
Casa: 2289-0184

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

--
Jon K Peck
[hidden email]

--
Javier Figueroa
Procesamiento y Análisis de bases de datos
Cel: 5927-4748 / 4970-1940
Casa: 2289-0184

Javier Figueroa
Procesamiento y Análisis de bases de datos

Cel: 5927-4748 / 4970-1940

Casa: 2289-0184

Mike

Fwd: Rejected posting to SPSSX-L@LISTSERV.UGA.EDU

In reply to this post by Javier Figueroa

---------- Forwarded message ---------
From: UGA LISTSERV Server (16.0) <[hidden email]>
Date: Tue, Sep 29, 2020 at 4:32 PM
Subject: Rejected posting to [hidden email]
To: <[hidden email]>

Your posting to the SPSSX-L list has been rejected because it contains an
attachment of type 'IMAGE/PNG'. The SPSSX-L list has been configured to reject
such attachments. Please contact the list owner at
[hidden email] for more information.

---------- Forwarded message ----------
From: Michael Palij <[hidden email]>
To: Jon Peck <[hidden email]>, SPSS list <[hidden email]>
Cc: Michael Palij <[hidden email]>
Date: Tue, 29 Sep 2020 16:30:55 -0400
Subject: Re: IDENTIFY THE QUARTILLES AND CREATE NEW VARIABLES THOSE VALUES

I'm a little confused by the request and have a few questions:

(1) Is the solution/SPSS code going to be used with just this data (N=30) or to a larger

sample or multiple samples?

(2) The expression "outliers" is somewhat ambiguous. At first glance, it seems to me

that values less the Q1 and greater than Q3 are to be considered outliers -- is this correct?

This seems a little strange to me (i.e., the most extreme 25% are considered outliers) but

the definition of what constitutes an outlier is not provided.

(3) Are the values of Q1, Q2, Q3 (i.e., 25th, 50th, 75th percentiles) really needed? Or

does one only needs to know if G_ALOJAMIENTO_D falls into the first, second, third, or

fourth quartile?

Since others have shown how to get the values of the interquartile range, let me address

points (2) and (3) above. If one uses the Rank procedure, one can get (a) the cumulative

percentile of each value of G_ALOJAMIENTO_D (below represented by G_percent),

(b) the z-score of each value of G_ALOJAMIENTO_D (below represented by G_norm),

and (c) the quartile the value of G_ALOJAMIENTO_D falls into (below represented by

G_quartiles).

TEMPORARY.
SELECT IF G_ALOJAMIENTO_D>0.
rank var=G_ALOJAMIENTO_D/
percent into G_percent/
normal into G_norm/
ntiles(4) into G_quartiles.

List.

The listing of the data follows:

G_ALOJAMIENTO_D imp_paquete_tur tipoviaj TOTAL id x_conteo G_percent G_norm G_quartiles

0.00 0.00 1.00 1.00 1 1.00 . . .
0.00 0.00 1.00 1.00 4 1.00 . . .
0.00 0.00 1.00 1.00 6 1.00 . . .
0.00 0.00 2.00 1.00 7 1.00 . . .
0.00 0.00 2.00 1.00 9 1.00 . . .
0.00 0.00 1.00 1.00 10 1.00 . . .
0.00 0.00 1.00 1.00 11 1.00 . . .
0.00 0.00 1.00 1.00 16 1.00 . . .
0.00 0.00 2.00 1.00 18 1.00 . . .
0.00 0.00 2.00 1.00 19 1.00 . . .
0.00 0.00 2.00 1.00 21 1.00 . . .
0.00 0.00 2.00 1.00 24 1.00 . . .
0.00 0.00 1.00 1.00 25 1.00 . . .
0.00 0.00 1.00 1.00 26 1.00 . . .
0.00 0.00 2.00 1.00 27 1.00 . . .
0.00 0.00 1.00 1.00 29 1.00 . . .
0.00 0.00 2.00 1.00 30 1.00 . . .
7.50 0.00 1.00 1.00 23 1.00 7.69 -1.673 1
8.33 0.00 2.00 1.00 15 1.00 19.23 -.9929 1
8.33 0.00 2.00 1.00 17 1.00 19.23 -.9929 1
8.93 0.00 1.00 1.00 2 1.00 30.77 -.6020 2
12.50 0.00 2.00 1.00 20 1.00 38.46 -.3879 2
13.33 0.00 2.00 1.00 22 1.00 46.15 -.1903 2
15.00 0.00 2.00 1.00 14 1.00 53.85 0.0000 3
16.67 0.00 1.00 1.00 12 1.00 61.54 0.1903 3
20.00 0.00 1.00 1.00 13 1.00 69.23 0.3879 3
23.51 0.00 1.00 1.00 28 1.00 76.92 0.6020 3
67.50 0.00 2.00 1.00 3 1.00 84.62 0.8484 4
100.00 0.00 2.00 1.00 5 1.00 92.31 1.1619 4
150.00 0.00 1.00 1.00 8 1.00 100.00 1.6729 4

So, is it the case that the variable G_quartiles all that is needed? Or does one watn to compare the raw value

of G_ALOJAMIENTO_D to the value of Q1, Q2, Q3 (why)? One could just as easily compare G_percent against

values of 25, 50, and 75.

Incidentally, it may not be immediately obvious but the distribution of G_ALOJAMIENTO_D is positively skewed.

Using Examine to get the boxplot for G_ALOJAMIENTO_D readily shows this:

TEMPORARY.
SELECT IF G_ALOJAMIENTO_D>0.
examine var=G_ALOJAMIENTO_D/ID=ID/
plot=boxplot/
statistics=extreme.

*NOTE: extreme value in both tails are printed but not shown here.

*******

Sorry, the mailing list software does not allow images embedded in an email (?)

so this bounced back to me. I've removed the image in order to post this email

to the list.

*******

The height of the box used to the interquartile range computed using Tukey's hinges (one

should check this) and the IDs 3, 5, and 8, all in the 4th quartile (see listing above) but according

to SPSS criteria, are not "outliers" -- O would be provided (i.e., 1.5 IQR-3.00 IQR) and if

greater than 3.00 IQRs, an E would be provided.

So, what am I missing?

-Mike Palij

New York University

[hidden email]

On Tue, Sep 29, 2020 at 1:50 PM Jon Peck <[hidden email]> wrote:

The EXAMINE procedure (Analyze > Descriptive Statistics > Explore) can give you the IQR, FREQUENCIES can also give you selected percentiles. This example uses FREQUENCIES.

dataset name main.
dataset declare stats.
oms select tables /if subtypes='Statistics'
/destination outfile=stats format=sav
/columns dimnames="Statistics".
FREQUENCIES VARIABLES=salary
/FORMAT=NOTABLE
/NTILES=4.
omsend.
compute const=1.
dataset activate stats.
compute const=1.
dataset activate main.
match files /file=* /table=stats/by const
/drop Command_ to Missing @50 const.
exec.

On Tue, Sep 29, 2020 at 11:13 AM Javier Figueroa <[hidden email]> wrote:
Hello spss experts !!!!

I hope everyone is well, as far as possible, with this pandemic it affected the whole world a lot and especially the tourism sector that I consider was the most affected by the covid-19 pandemic.

I am writing to you because I have a situation in a database that I have not been able to solve and that may be something simple but I have turned it around in my head and I have not been able to do it.

I need to create a variable that contains the first quartile (Q1) and the 2nd quartile (MEDIAN) and 3rd quartile (Q3) to then calculate the interqualty range (Q3-Q1)

This result will help me to identify the outliers

this is the syntax i am using.

* Encoding: UTF-8.
/* procedimiento crear base de datos PRUEBA.

data list list
/G_ALOJAMIENTO_D imp_paquete_tur tipoviaj TOTAL.
begin data.
0.00 0 1 1
8.93 0 1 1
67.50 0 2 1
0.00 0 1 1
100.00 0 2 1
0.00 0 1 1
0.00 0 2 1
150.00 0 1 1
0.00 0 2 1
0.00 0 1 1
0.00 0 1 1
16.67 0 1 1
20.00 0 1 1
15.00 0 2 1
8.33 0 2 1
0.00 0 1 1
8.33 0 2 1
0.00 0 2 1
0.00 0 2 1
12.50 0 2 1
0.00 0 2 1
13.33 0 2 1
7.50 0 1 1
0.00 0 2 1
0.00 0 1 1
0.00 0 1 1
0.00 0 2 1
23.51 0 1 1
0.00 0 1 1
0.00 0 2 1
end data.
execute.

numeric id(f8.0).
compute id=$casenum.
execute.
variable level all (scale).

COMPUTE x_conteo=1.
EXECUTE.

SORT CASES BY G_ALOJAMIENTO_D(A).

TEMPORARY.
SELECT IF G_ALOJAMIENTO_D>0.
FREQUENCIES VARIABLES=G_ALOJAMIENTO_D
/STATISTICS=MEDIAN
/NTILES=4
/PERCENTILES=25.0 50.0 75.0
/ORDER=ANALYSIS.

TEMPORARY.
SELECT IF G_ALOJAMIENTO_D>0.
AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/BREAK=TOTAL
/x_conteo_sum=SUM(x_conteo).

compute x_Q1=0.
compute x_Q2=0.
compute x_Q3=0.
if x_conteo_sum>0 x_Q1= (x_conteo_sum+1)/4. /* Obtener registro o caso del Q1.
if x_conteo_sum>0 x_Q2= (x_conteo_sum+1)/2. /* Obtener registro o caso del Q2.
if x_conteo_sum>0 x_Q3= 3*(x_conteo_sum+1)/4. /* Obtener registro o caso del Q3.
EXECUTE.

* Cuartiles: percentiles 25, 50 y 75, es decir, los valores por
debajo de los cuales se encuentra 25%, 50% y 75% de los
casos respectivamente.
* Q1 = 25%(NVAL+1)=3.50.
* Q2 = 50%(NVAL+1)=15.00.
* Q3= 75%(NVAL+1)=45.50.

/* AQUI PONER PROCEDIMIENTO PARA CREAR VARIABLES NUEVAS CON LOS VALORES DE LOS CUARTILES.

I am very grateful to you in advance and I wish you an excellent start to the week.

--
Javier Figueroa
Procesamiento y Análisis de bases de datos
Cel: 5927-4748 / 4970-1940
Casa: 2289-0184

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Jon Peck

Re: Fwd: Rejected posting to SPSSX-L@LISTSERV.UGA.EDU

Another way to see this would be a boxplot with the point id's showing.

GGRAPH

/GRAPHDATASET NAME="graphdataset" VARIABLES=jobcat salary id MISSING=LISTWISE REPORTMISSING=NO

/GRAPHSPEC SOURCE=INLINE.

BEGIN GPL

SOURCE: s=userSource(id("graphdataset"))

DATA: jobcat=col(source(s), name("jobcat"), unit.category())

DATA: salary=col(source(s), name("salary"))

DATA: id=col(source(s), name("id"))

GUIDE: axis(dim(1), label("Employment Category"))

GUIDE: axis(dim(2), label("Current Salary"))

GUIDE: text.title(label("Simple Boxplot of Current Salary by Employment Category"))

SCALE: cat(dim(1), include("1", "2", "3"))

SCALE: linear(dim(2), include(0))

ELEMENT: schema(position(bin.quantile.letter(jobcat*salary)), label(id))

END GPL.

Not automated, of course, but the EXAMINE outliers list is very crude and pretty much useless.

On Tue, Sep 29, 2020 at 2:45 PM Michael Palij <[hidden email]> wrote:

---------- Forwarded message ---------
From: UGA LISTSERV Server (16.0) <[hidden email]>
Date: Tue, Sep 29, 2020 at 4:32 PM
Subject: Rejected posting to [hidden email]
To: <[hidden email]>

Your posting to the SPSSX-L list has been rejected because it contains an
attachment of type 'IMAGE/PNG'. The SPSSX-L list has been configured to reject
such attachments. Please contact the list owner at
[hidden email] for more information.

---------- Forwarded message ----------
From: Michael Palij <[hidden email]>
To: Jon Peck <[hidden email]>, SPSS list <[hidden email]>
Cc: Michael Palij <[hidden email]>
Date: Tue, 29 Sep 2020 16:30:55 -0400
Subject: Re: IDENTIFY THE QUARTILLES AND CREATE NEW VARIABLES THOSE VALUES

I'm a little confused by the request and have a few questions:

(1) Is the solution/SPSS code going to be used with just this data (N=30) or to a larger
sample or multiple samples?

(2) The expression "outliers" is somewhat ambiguous. At first glance, it seems to me
that values less the Q1 and greater than Q3 are to be considered outliers -- is this correct?
This seems a little strange to me (i.e., the most extreme 25% are considered outliers) but
the definition of what constitutes an outlier is not provided.

(3) Are the values of Q1, Q2, Q3 (i.e., 25th, 50th, 75th percentiles) really needed? Or
does one only needs to know if G_ALOJAMIENTO_D falls into the first, second, third, or
fourth quartile?

Since others have shown how to get the values of the interquartile range, let me address
points (2) and (3) above. If one uses the Rank procedure, one can get (a) the cumulative
percentile of each value of G_ALOJAMIENTO_D (below represented by G_percent),
(b) the z-score of each value of G_ALOJAMIENTO_D (below represented by G_norm),
and (c) the quartile the value of G_ALOJAMIENTO_D falls into (below represented by
G_quartiles).

TEMPORARY.
SELECT IF G_ALOJAMIENTO_D>0.
rank var=G_ALOJAMIENTO_D/
percent into G_percent/
normal into G_norm/
ntiles(4) into G_quartiles.

List.

The listing of the data follows:

G_ALOJAMIENTO_D imp_paquete_tur tipoviaj TOTAL id x_conteo G_percent G_norm G_quartiles

0.00 0.00 1.00 1.00 1 1.00 . . .
0.00 0.00 1.00 1.00 4 1.00 . . .
0.00 0.00 1.00 1.00 6 1.00 . . .
0.00 0.00 2.00 1.00 7 1.00 . . .
0.00 0.00 2.00 1.00 9 1.00 . . .
0.00 0.00 1.00 1.00 10 1.00 . . .
0.00 0.00 1.00 1.00 11 1.00 . . .
0.00 0.00 1.00 1.00 16 1.00 . . .
0.00 0.00 2.00 1.00 18 1.00 . . .
0.00 0.00 2.00 1.00 19 1.00 . . .
0.00 0.00 2.00 1.00 21 1.00 . . .
0.00 0.00 2.00 1.00 24 1.00 . . .
0.00 0.00 1.00 1.00 25 1.00 . . .
0.00 0.00 1.00 1.00 26 1.00 . . .
0.00 0.00 2.00 1.00 27 1.00 . . .
0.00 0.00 1.00 1.00 29 1.00 . . .
0.00 0.00 2.00 1.00 30 1.00 . . .
7.50 0.00 1.00 1.00 23 1.00 7.69 -1.673 1
8.33 0.00 2.00 1.00 15 1.00 19.23 -.9929 1
8.33 0.00 2.00 1.00 17 1.00 19.23 -.9929 1
8.93 0.00 1.00 1.00 2 1.00 30.77 -.6020 2
12.50 0.00 2.00 1.00 20 1.00 38.46 -.3879 2
13.33 0.00 2.00 1.00 22 1.00 46.15 -.1903 2
15.00 0.00 2.00 1.00 14 1.00 53.85 0.0000 3
16.67 0.00 1.00 1.00 12 1.00 61.54 0.1903 3
20.00 0.00 1.00 1.00 13 1.00 69.23 0.3879 3
23.51 0.00 1.00 1.00 28 1.00 76.92 0.6020 3
67.50 0.00 2.00 1.00 3 1.00 84.62 0.8484 4
100.00 0.00 2.00 1.00 5 1.00 92.31 1.1619 4
150.00 0.00 1.00 1.00 8 1.00 100.00 1.6729 4

So, is it the case that the variable G_quartiles all that is needed? Or does one watn to compare the raw value
of G_ALOJAMIENTO_D to the value of Q1, Q2, Q3 (why)? One could just as easily compare G_percent against
values of 25, 50, and 75.

Incidentally, it may not be immediately obvious but the distribution of G_ALOJAMIENTO_D is positively skewed.
Using Examine to get the boxplot for G_ALOJAMIENTO_D readily shows this:

TEMPORARY.
SELECT IF G_ALOJAMIENTO_D>0.
examine var=G_ALOJAMIENTO_D/ID=ID/
plot=boxplot/
statistics=extreme.
*NOTE: extreme value in both tails are printed but not shown here.

*******
Sorry, the mailing list software does not allow images embedded in an email (?)
so this bounced back to me. I've removed the image in order to post this email
to the list.
*******

The height of the box used to the interquartile range computed using Tukey's hinges (one
should check this) and the IDs 3, 5, and 8, all in the 4th quartile (see listing above) but according
to SPSS criteria, are not "outliers" -- O would be provided (i.e., 1.5 IQR-3.00 IQR) and if
greater than 3.00 IQRs, an E would be provided.

So, what am I missing?

-Mike Palij
New York University
[hidden email]

On Tue, Sep 29, 2020 at 1:50 PM Jon Peck <[hidden email]> wrote:
The EXAMINE procedure (Analyze > Descriptive Statistics > Explore) can give you the IQR, FREQUENCIES can also give you selected percentiles. This example uses FREQUENCIES.

dataset name main.
dataset declare stats.
oms select tables /if subtypes='Statistics'
/destination outfile=stats format=sav
/columns dimnames="Statistics".
FREQUENCIES VARIABLES=salary
/FORMAT=NOTABLE
/NTILES=4.
omsend.
compute const=1.
dataset activate stats.
compute const=1.
dataset activate main.
match files /file=* /table=stats/by const
/drop Command_ to Missing @50 const.
exec.

On Tue, Sep 29, 2020 at 11:13 AM Javier Figueroa <[hidden email]> wrote:
Hello spss experts !!!!

I hope everyone is well, as far as possible, with this pandemic it affected the whole world a lot and especially the tourism sector that I consider was the most affected by the covid-19 pandemic.

I am writing to you because I have a situation in a database that I have not been able to solve and that may be something simple but I have turned it around in my head and I have not been able to do it.

I need to create a variable that contains the first quartile (Q1) and the 2nd quartile (MEDIAN) and 3rd quartile (Q3) to then calculate the interqualty range (Q3-Q1)

This result will help me to identify the outliers

this is the syntax i am using.

* Encoding: UTF-8.
/* procedimiento crear base de datos PRUEBA.

data list list
/G_ALOJAMIENTO_D imp_paquete_tur tipoviaj TOTAL.
begin data.
0.00 0 1 1
8.93 0 1 1
67.50 0 2 1
0.00 0 1 1
100.00 0 2 1
0.00 0 1 1
0.00 0 2 1
150.00 0 1 1
0.00 0 2 1
0.00 0 1 1
0.00 0 1 1
16.67 0 1 1
20.00 0 1 1
15.00 0 2 1
8.33 0 2 1
0.00 0 1 1
8.33 0 2 1
0.00 0 2 1
0.00 0 2 1
12.50 0 2 1
0.00 0 2 1
13.33 0 2 1
7.50 0 1 1
0.00 0 2 1
0.00 0 1 1
0.00 0 1 1
0.00 0 2 1
23.51 0 1 1
0.00 0 1 1
0.00 0 2 1
end data.
execute.

numeric id(f8.0).
compute id=$casenum.
execute.
variable level all (scale).

COMPUTE x_conteo=1.
EXECUTE.

SORT CASES BY G_ALOJAMIENTO_D(A).

TEMPORARY.
SELECT IF G_ALOJAMIENTO_D>0.
FREQUENCIES VARIABLES=G_ALOJAMIENTO_D
/STATISTICS=MEDIAN
/NTILES=4
/PERCENTILES=25.0 50.0 75.0
/ORDER=ANALYSIS.

TEMPORARY.
SELECT IF G_ALOJAMIENTO_D>0.
AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/BREAK=TOTAL
/x_conteo_sum=SUM(x_conteo).

compute x_Q1=0.
compute x_Q2=0.
compute x_Q3=0.
if x_conteo_sum>0 x_Q1= (x_conteo_sum+1)/4. /* Obtener registro o caso del Q1.
if x_conteo_sum>0 x_Q2= (x_conteo_sum+1)/2. /* Obtener registro o caso del Q2.
if x_conteo_sum>0 x_Q3= 3*(x_conteo_sum+1)/4. /* Obtener registro o caso del Q3.
EXECUTE.

* Cuartiles: percentiles 25, 50 y 75, es decir, los valores por
debajo de los cuales se encuentra 25%, 50% y 75% de los
casos respectivamente.
* Q1 = 25%(NVAL+1)=3.50.
* Q2 = 50%(NVAL+1)=15.00.
* Q3= 75%(NVAL+1)=45.50.

/* AQUI PONER PROCEDIMIENTO PARA CREAR VARIABLES NUEVAS CON LOS VALORES DE LOS CUARTILES.

I am very grateful to you in advance and I wish you an excellent start to the week.

--
Javier Figueroa
Procesamiento y Análisis de bases de datos
Cel: 5927-4748 / 4970-1940
Casa: 2289-0184

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Jon K Peck
[hidden email]

Javier Figueroa

Re: Fwd: Rejected posting to SPSSX-L@LISTSERV.UGA.EDU

Thank you very much

👍

El mar., 29 sept. 2020 a las 15:47, Jon Peck (<[hidden email]>) escribió:

Another way to see this would be a boxplot with the point id's showing.
GGRAPH
/GRAPHDATASET NAME="graphdataset" VARIABLES=jobcat salary id MISSING=LISTWISE REPORTMISSING=NO
/GRAPHSPEC SOURCE=INLINE.
BEGIN GPL
SOURCE: s=userSource(id("graphdataset"))
DATA: jobcat=col(source(s), name("jobcat"), unit.category())
DATA: salary=col(source(s), name("salary"))
DATA: id=col(source(s), name("id"))
GUIDE: axis(dim(1), label("Employment Category"))
GUIDE: axis(dim(2), label("Current Salary"))
GUIDE: text.title(label("Simple Boxplot of Current Salary by Employment Category"))
SCALE: cat(dim(1), include("1", "2", "3"))
SCALE: linear(dim(2), include(0))
ELEMENT: schema(position(bin.quantile.letter(jobcat*salary)), label(id))
END GPL.

Not automated, of course, but the EXAMINE outliers list is very crude and pretty much useless.

On Tue, Sep 29, 2020 at 2:45 PM Michael Palij <[hidden email]> wrote:

---------- Forwarded message ---------
From: UGA LISTSERV Server (16.0) <[hidden email]>
Date: Tue, Sep 29, 2020 at 4:32 PM
Subject: Rejected posting to [hidden email]
To: <[hidden email]>

Your posting to the SPSSX-L list has been rejected because it contains an
attachment of type 'IMAGE/PNG'. The SPSSX-L list has been configured to reject
such attachments. Please contact the list owner at
[hidden email] for more information.

---------- Forwarded message ----------
From: Michael Palij <[hidden email]>
To: Jon Peck <[hidden email]>, SPSS list <[hidden email]>
Cc: Michael Palij <[hidden email]>
Date: Tue, 29 Sep 2020 16:30:55 -0400
Subject: Re: IDENTIFY THE QUARTILLES AND CREATE NEW VARIABLES THOSE VALUES

I'm a little confused by the request and have a few questions:

(1) Is the solution/SPSS code going to be used with just this data (N=30) or to a larger
sample or multiple samples?

(2) The expression "outliers" is somewhat ambiguous. At first glance, it seems to me
that values less the Q1 and greater than Q3 are to be considered outliers -- is this correct?
This seems a little strange to me (i.e., the most extreme 25% are considered outliers) but
the definition of what constitutes an outlier is not provided.

(3) Are the values of Q1, Q2, Q3 (i.e., 25th, 50th, 75th percentiles) really needed? Or
does one only needs to know if G_ALOJAMIENTO_D falls into the first, second, third, or
fourth quartile?

Since others have shown how to get the values of the interquartile range, let me address
points (2) and (3) above. If one uses the Rank procedure, one can get (a) the cumulative
percentile of each value of G_ALOJAMIENTO_D (below represented by G_percent),
(b) the z-score of each value of G_ALOJAMIENTO_D (below represented by G_norm),
and (c) the quartile the value of G_ALOJAMIENTO_D falls into (below represented by
G_quartiles).

TEMPORARY.
SELECT IF G_ALOJAMIENTO_D>0.
rank var=G_ALOJAMIENTO_D/
percent into G_percent/
normal into G_norm/
ntiles(4) into G_quartiles.

List.

The listing of the data follows:

G_ALOJAMIENTO_D imp_paquete_tur tipoviaj TOTAL id x_conteo G_percent G_norm G_quartiles

0.00 0.00 1.00 1.00 1 1.00 . . .
0.00 0.00 1.00 1.00 4 1.00 . . .
0.00 0.00 1.00 1.00 6 1.00 . . .
0.00 0.00 2.00 1.00 7 1.00 . . .
0.00 0.00 2.00 1.00 9 1.00 . . .
0.00 0.00 1.00 1.00 10 1.00 . . .
0.00 0.00 1.00 1.00 11 1.00 . . .
0.00 0.00 1.00 1.00 16 1.00 . . .
0.00 0.00 2.00 1.00 18 1.00 . . .
0.00 0.00 2.00 1.00 19 1.00 . . .
0.00 0.00 2.00 1.00 21 1.00 . . .
0.00 0.00 2.00 1.00 24 1.00 . . .
0.00 0.00 1.00 1.00 25 1.00 . . .
0.00 0.00 1.00 1.00 26 1.00 . . .
0.00 0.00 2.00 1.00 27 1.00 . . .
0.00 0.00 1.00 1.00 29 1.00 . . .
0.00 0.00 2.00 1.00 30 1.00 . . .
7.50 0.00 1.00 1.00 23 1.00 7.69 -1.673 1
8.33 0.00 2.00 1.00 15 1.00 19.23 -.9929 1
8.33 0.00 2.00 1.00 17 1.00 19.23 -.9929 1
8.93 0.00 1.00 1.00 2 1.00 30.77 -.6020 2
12.50 0.00 2.00 1.00 20 1.00 38.46 -.3879 2
13.33 0.00 2.00 1.00 22 1.00 46.15 -.1903 2
15.00 0.00 2.00 1.00 14 1.00 53.85 0.0000 3
16.67 0.00 1.00 1.00 12 1.00 61.54 0.1903 3
20.00 0.00 1.00 1.00 13 1.00 69.23 0.3879 3
23.51 0.00 1.00 1.00 28 1.00 76.92 0.6020 3
67.50 0.00 2.00 1.00 3 1.00 84.62 0.8484 4
100.00 0.00 2.00 1.00 5 1.00 92.31 1.1619 4
150.00 0.00 1.00 1.00 8 1.00 100.00 1.6729 4

So, is it the case that the variable G_quartiles all that is needed? Or does one watn to compare the raw value
of G_ALOJAMIENTO_D to the value of Q1, Q2, Q3 (why)? One could just as easily compare G_percent against
values of 25, 50, and 75.

Incidentally, it may not be immediately obvious but the distribution of G_ALOJAMIENTO_D is positively skewed.
Using Examine to get the boxplot for G_ALOJAMIENTO_D readily shows this:

TEMPORARY.
SELECT IF G_ALOJAMIENTO_D>0.
examine var=G_ALOJAMIENTO_D/ID=ID/
plot=boxplot/
statistics=extreme.
*NOTE: extreme value in both tails are printed but not shown here.

*******
Sorry, the mailing list software does not allow images embedded in an email (?)
so this bounced back to me. I've removed the image in order to post this email
to the list.
*******

The height of the box used to the interquartile range computed using Tukey's hinges (one
should check this) and the IDs 3, 5, and 8, all in the 4th quartile (see listing above) but according
to SPSS criteria, are not "outliers" -- O would be provided (i.e., 1.5 IQR-3.00 IQR) and if
greater than 3.00 IQRs, an E would be provided.

So, what am I missing?

-Mike Palij
New York University
[hidden email]

On Tue, Sep 29, 2020 at 1:50 PM Jon Peck <[hidden email]> wrote:
The EXAMINE procedure (Analyze > Descriptive Statistics > Explore) can give you the IQR, FREQUENCIES can also give you selected percentiles. This example uses FREQUENCIES.

dataset name main.
dataset declare stats.
oms select tables /if subtypes='Statistics'
/destination outfile=stats format=sav
/columns dimnames="Statistics".
FREQUENCIES VARIABLES=salary
/FORMAT=NOTABLE
/NTILES=4.
omsend.
compute const=1.
dataset activate stats.
compute const=1.
dataset activate main.
match files /file=* /table=stats/by const
/drop Command_ to Missing @50 const.
exec.

On Tue, Sep 29, 2020 at 11:13 AM Javier Figueroa <[hidden email]> wrote:
Hello spss experts !!!!

I hope everyone is well, as far as possible, with this pandemic it affected the whole world a lot and especially the tourism sector that I consider was the most affected by the covid-19 pandemic.

I am writing to you because I have a situation in a database that I have not been able to solve and that may be something simple but I have turned it around in my head and I have not been able to do it.

I need to create a variable that contains the first quartile (Q1) and the 2nd quartile (MEDIAN) and 3rd quartile (Q3) to then calculate the interqualty range (Q3-Q1)

This result will help me to identify the outliers

this is the syntax i am using.

* Encoding: UTF-8.
/* procedimiento crear base de datos PRUEBA.

data list list
/G_ALOJAMIENTO_D imp_paquete_tur tipoviaj TOTAL.
begin data.
0.00 0 1 1
8.93 0 1 1
67.50 0 2 1
0.00 0 1 1
100.00 0 2 1
0.00 0 1 1
0.00 0 2 1
150.00 0 1 1
0.00 0 2 1
0.00 0 1 1
0.00 0 1 1
16.67 0 1 1
20.00 0 1 1
15.00 0 2 1
8.33 0 2 1
0.00 0 1 1
8.33 0 2 1
0.00 0 2 1
0.00 0 2 1
12.50 0 2 1
0.00 0 2 1
13.33 0 2 1
7.50 0 1 1
0.00 0 2 1
0.00 0 1 1
0.00 0 1 1
0.00 0 2 1
23.51 0 1 1
0.00 0 1 1
0.00 0 2 1
end data.
execute.

numeric id(f8.0).
compute id=$casenum.
execute.
variable level all (scale).

COMPUTE x_conteo=1.
EXECUTE.

SORT CASES BY G_ALOJAMIENTO_D(A).

TEMPORARY.
SELECT IF G_ALOJAMIENTO_D>0.
FREQUENCIES VARIABLES=G_ALOJAMIENTO_D
/STATISTICS=MEDIAN
/NTILES=4
/PERCENTILES=25.0 50.0 75.0
/ORDER=ANALYSIS.

TEMPORARY.
SELECT IF G_ALOJAMIENTO_D>0.
AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/BREAK=TOTAL
/x_conteo_sum=SUM(x_conteo).

compute x_Q1=0.
compute x_Q2=0.
compute x_Q3=0.
if x_conteo_sum>0 x_Q1= (x_conteo_sum+1)/4. /* Obtener registro o caso del Q1.
if x_conteo_sum>0 x_Q2= (x_conteo_sum+1)/2. /* Obtener registro o caso del Q2.
if x_conteo_sum>0 x_Q3= 3*(x_conteo_sum+1)/4. /* Obtener registro o caso del Q3.
EXECUTE.

* Cuartiles: percentiles 25, 50 y 75, es decir, los valores por
debajo de los cuales se encuentra 25%, 50% y 75% de los
casos respectivamente.
* Q1 = 25%(NVAL+1)=3.50.
* Q2 = 50%(NVAL+1)=15.00.
* Q3= 75%(NVAL+1)=45.50.

/* AQUI PONER PROCEDIMIENTO PARA CREAR VARIABLES NUEVAS CON LOS VALORES DE LOS CUARTILES.

I am very grateful to you in advance and I wish you an excellent start to the week.

--
Javier Figueroa
Procesamiento y Análisis de bases de datos
Cel: 5927-4748 / 4970-1940
Casa: 2289-0184

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

--
Jon K Peck
[hidden email]

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

Javier Figueroa
Procesamiento y Análisis de bases de datos

Cel: 5927-4748 / 4970-1940

Casa: 2289-0184

Andy W

Re: Fwd: Rejected posting to SPSSX-L@LISTSERV.UGA.EDU

I have an example on my blog of grabbing the quantiles and calculating
statistics a bit of a different way from OMS as well,
https://andrewpwheeler.com/2015/04/13/extracting-items-from-spss-tables-using-python/.

This trades off some of the data manipulation needed for the OMS and/or RANK
steps via scooping up in the info in python.

-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/

Javier Figueroa

Re: Fwd: Rejected posting to SPSSX-L@LISTSERV.UGA.EDU

This approach is fantastic.Thanks.

It's just what I need.

Thank you very much.

El mié., 30 sept. 2020 a las 7:47, Andy W (<[hidden email]>) escribió:

I have an example on my blog of grabbing the quantiles and calculating
statistics a bit of a different way from OMS as well,
https://andrewpwheeler.com/2015/04/13/extracting-items-from-spss-tables-using-python/.

This trades off some of the data manipulation needed for the OMS and/or RANK
steps via scooping up in the info in python.

-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
Sent from: http://spssx-discussion.1045642.n5.nabble.com/

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

Javier Figueroa
Procesamiento y Análisis de bases de datos

Cel: 5927-4748 / 4970-1940

Casa: 2289-0184