Hello spss experts !!!!
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
I hope everyone is well, as far as possible, with this pandemic it affected the whole world a lot and especially the tourism sector that I consider was the most affected by the covid-19 pandemic. I am writing to you because I have a situation in a database that I have not been able to solve and that may be something simple but I have turned it around in my head and I have not been able to do it. I need to create a variable that contains the first quartile (Q1) and the 2nd quartile (MEDIAN) and 3rd quartile (Q3) to then calculate the interqualty range (Q3-Q1) This result will help me to identify the outliers this is the syntax i am using. * Encoding: UTF-8. /* procedimiento crear base de datos PRUEBA. data list list /G_ALOJAMIENTO_D imp_paquete_tur tipoviaj TOTAL. begin data. 0.00 0 1 1 8.93 0 1 1 67.50 0 2 1 0.00 0 1 1 100.00 0 2 1 0.00 0 1 1 0.00 0 2 1 150.00 0 1 1 0.00 0 2 1 0.00 0 1 1 0.00 0 1 1 16.67 0 1 1 20.00 0 1 1 15.00 0 2 1 8.33 0 2 1 0.00 0 1 1 8.33 0 2 1 0.00 0 2 1 0.00 0 2 1 12.50 0 2 1 0.00 0 2 1 13.33 0 2 1 7.50 0 1 1 0.00 0 2 1 0.00 0 1 1 0.00 0 1 1 0.00 0 2 1 23.51 0 1 1 0.00 0 1 1 0.00 0 2 1 end data. execute. numeric id(f8.0). compute id=$casenum. execute. variable level all (scale). COMPUTE x_conteo=1. EXECUTE. SORT CASES BY G_ALOJAMIENTO_D(A). TEMPORARY. SELECT IF G_ALOJAMIENTO_D>0. FREQUENCIES VARIABLES=G_ALOJAMIENTO_D /STATISTICS=MEDIAN /NTILES=4 /PERCENTILES=25.0 50.0 75.0 /ORDER=ANALYSIS. TEMPORARY. SELECT IF G_ALOJAMIENTO_D>0. AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=TOTAL /x_conteo_sum=SUM(x_conteo). compute x_Q1=0. compute x_Q2=0. compute x_Q3=0. if x_conteo_sum>0 x_Q1= (x_conteo_sum+1)/4. /* Obtener registro o caso del Q1. if x_conteo_sum>0 x_Q2= (x_conteo_sum+1)/2. /* Obtener registro o caso del Q2. if x_conteo_sum>0 x_Q3= 3*(x_conteo_sum+1)/4. /* Obtener registro o caso del Q3. EXECUTE. * Cuartiles: percentiles 25, 50 y 75, es decir, los valores por debajo de los cuales se encuentra 25%, 50% y 75% de los casos respectivamente. * Q1 = 25%(NVAL+1)=3.50. * Q2 = 50%(NVAL+1)=15.00. * Q3= 75%(NVAL+1)=45.50. /* AQUI PONER PROCEDIMIENTO PARA CREAR VARIABLES NUEVAS CON LOS VALORES DE LOS CUARTILES. I am very grateful to you in advance and I wish you an excellent start to the week. Javier Figueroa Procesamiento y Análisis de bases de datos Cel: 5927-4748 / 4970-1940 Casa: 2289-0184 |
Administrator
|
Hello Javier--hola. If I follow what you are trying to do, I think you might
have an error in your computation of Q1. Using EXAMINE, I match your estimates of Q2 and Q3, but get a different result for Q1. Here is my code and (selected) output. TEMPORARY. SELECT IF G_ALOJAMIENTO_D>0. EXAMINE VARIABLES=G_ALOJAMIENTO_D /PLOT NONE /PERCENTILES(25,50,75) HAVERAGE /STATISTICS DESCRIPTIVES /CINTERVAL 95 /MISSING LISTWISE /NOTOTAL. Percentiles Percentiles 25 50 75 Weighted Average(Definition 1) G_ALOJAMIENTO_D 8.6300 15.0000 45.5050 Tukey's Hinges G_ALOJAMIENTO_D 8.9300 15.0000 23.5100 As you see, the Weighted Average (Definition 1) results match yours for Q2 and Q3, but not for Q1. If you do want the Definition 1 quartiles, you could use EXAMINE in conjunction with OMS to write them to a new dataset. After that, do whatever data management is necessary in the new dataset, and then merge it with the original to bring in the quartiles. HTH. Javier Figueroa wrote > Hello spss experts !!!! > > I hope everyone is well, as far as possible, with this pandemic it > affected > the whole world a lot and especially the tourism sector that I consider > was > the most affected by the covid-19 pandemic. > > I am writing to you because I have a situation in a database that I have > not been able to solve and that may be something simple but I have turned > it around in my head and I have not been able to do it. > > I need to create a variable that contains the first quartile (Q1) and the > 2nd quartile (MEDIAN) and 3rd quartile (Q3) to then calculate the > interqualty range (Q3-Q1) > > This result will help me to identify the outliers > > this is the syntax i am using. > > * Encoding: UTF-8. > /* procedimiento crear base de datos PRUEBA. > > data list list > /G_ALOJAMIENTO_D imp_paquete_tur tipoviaj TOTAL. > begin data. > 0.00 0 1 1 > 8.93 0 1 1 > 67.50 0 2 1 > 0.00 0 1 1 > 100.00 0 2 1 > 0.00 0 1 1 > 0.00 0 2 1 > 150.00 0 1 1 > 0.00 0 2 1 > 0.00 0 1 1 > 0.00 0 1 1 > 16.67 0 1 1 > 20.00 0 1 1 > 15.00 0 2 1 > 8.33 0 2 1 > 0.00 0 1 1 > 8.33 0 2 1 > 0.00 0 2 1 > 0.00 0 2 1 > 12.50 0 2 1 > 0.00 0 2 1 > 13.33 0 2 1 > 7.50 0 1 1 > 0.00 0 2 1 > 0.00 0 1 1 > 0.00 0 1 1 > 0.00 0 2 1 > 23.51 0 1 1 > 0.00 0 1 1 > 0.00 0 2 1 > end data. > execute. > > numeric id(f8.0). > compute id=$casenum. > execute. > variable level all (scale). > > COMPUTE x_conteo=1. > EXECUTE. > > SORT CASES BY G_ALOJAMIENTO_D(A). > > TEMPORARY. > SELECT IF G_ALOJAMIENTO_D>0. > FREQUENCIES VARIABLES=G_ALOJAMIENTO_D > /STATISTICS=MEDIAN > /NTILES=4 > /PERCENTILES=25.0 50.0 75.0 > /ORDER=ANALYSIS. > > TEMPORARY. > SELECT IF G_ALOJAMIENTO_D>0. > AGGREGATE > /OUTFILE=* MODE=ADDVARIABLES > /BREAK=TOTAL > /x_conteo_sum=SUM(x_conteo). > > compute x_Q1=0. > compute x_Q2=0. > compute x_Q3=0. > if x_conteo_sum>0 x_Q1= (x_conteo_sum+1)/4. /* Obtener registro o caso del > Q1. > if x_conteo_sum>0 x_Q2= (x_conteo_sum+1)/2. /* Obtener registro o caso del > Q2. > if x_conteo_sum>0 x_Q3= 3*(x_conteo_sum+1)/4. /* Obtener registro o caso > del Q3. > EXECUTE. > > * Cuartiles: percentiles 25, 50 y 75, es decir, los valores por > debajo de los cuales se encuentra 25%, 50% y 75% de los > casos respectivamente. > * Q1 = 25%(NVAL+1)=3.50. > * Q2 = 50%(NVAL+1)=15.00. > * Q3= 75%(NVAL+1)=45.50. > > /* AQUI PONER PROCEDIMIENTO PARA CREAR VARIABLES NUEVAS CON LOS VALORES DE > LOS CUARTILES. > > > > I am very grateful to you in advance and I wish you an excellent start to > the week. > > > -- > > *Javier FigueroaProcesamiento y Análisis de bases de datos* > *Cel: 5927-4748 / 4970-1940* > > > *Casa: 2289-0184* > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA > (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD ----- -- Bruce Weaver [hidden email] http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." NOTE: My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
In reply to this post by Javier Figueroa
The EXAMINE procedure (Analyze > Descriptive Statistics > Explore) can give you the IQR, FREQUENCIES can also give you selected percentiles. This example uses FREQUENCIES. dataset name main. dataset declare stats. oms select tables /if subtypes='Statistics' /destination outfile=stats format=sav /columns dimnames="Statistics". FREQUENCIES VARIABLES=salary /FORMAT=NOTABLE /NTILES=4. omsend. compute const=1. dataset activate stats. compute const=1. dataset activate main. match files /file=* /table=stats/by const /drop Command_ to Missing @50 const. exec. On Tue, Sep 29, 2020 at 11:13 AM Javier Figueroa <[hidden email]> wrote:
|
Thank you very much for your answers and suggestionsI thank in general this excellent community of expertsEl mar., 29 sept. 2020 a las 11:50, Jon Peck (<[hidden email]>) escribió:
Javier Figueroa Procesamiento y Análisis de bases de datos Cel: 5927-4748 / 4970-1940 Casa: 2289-0184 |
No puedo dejar de agradecer y comentarles que gracias a sus sugerencias, encontré lo que estaba tratando de hacer. En efecto Brus, si tengo un error en el Q1 y es que copie el resultado de mi formula que me indica el registro donde esta el Q1 y no el resultado del Q1 que es 8.63 Les dejo el link Nota:
Se los escribo en español y una traducción al inglés.
atentamente, TRADUCCIÓN I can't help but thank you and comment that thanks to your suggestions, I found what I was trying to do. Indeed Brus, if I have an error in Q1 and it is that I copy the result of my formula that indicates the register where Q1 is and not the result of Q1 which is 8.63 Using commands in conjunction with OMS is wonderful. Thank you very much Jon, for the example. Note: I am writing them in Spanish and an English translation. Sincerely, El mar., 29 sept. 2020 a las 12:54, Javier Figueroa (<[hidden email]>) escribió:
Javier Figueroa Procesamiento y Análisis de bases de datos Cel: 5927-4748 / 4970-1940 Casa: 2289-0184 |
In reply to this post by Javier Figueroa
---------- Forwarded message --------- From: UGA LISTSERV Server (16.0) <[hidden email]> Date: Tue, Sep 29, 2020 at 4:32 PM Subject: Rejected posting to [hidden email] To: <[hidden email]> Your posting to the SPSSX-L list has been rejected because it contains an attachment of type 'IMAGE/PNG'. The SPSSX-L list has been configured to reject such attachments. Please contact the list owner at [hidden email] for more information. ---------- Forwarded message ---------- From: Michael Palij <[hidden email]> To: Jon Peck <[hidden email]>, SPSS list <[hidden email]> Cc: Michael Palij <[hidden email]> Date: Tue, 29 Sep 2020 16:30:55 -0400 Subject: Re: IDENTIFY THE QUARTILLES AND CREATE NEW VARIABLES THOSE VALUES I'm a little confused by the request and have a few questions: (1) Is the solution/SPSS code going to be used with just this data (N=30) or to a larger sample or multiple samples? (2) The expression "outliers" is somewhat ambiguous. At first glance, it seems to me that values less the Q1 and greater than Q3 are to be considered outliers -- is this correct? This seems a little strange to me (i.e., the most extreme 25% are considered outliers) but the definition of what constitutes an outlier is not provided. (3) Are the values of Q1, Q2, Q3 (i.e., 25th, 50th, 75th percentiles) really needed? Or does one only needs to know if
G_ALOJAMIENTO_D falls into the first, second, third, or fourth quartile? Since others have shown how to get the values of the interquartile range, let me address points (2) and (3) above. If one uses the Rank procedure, one can get (a) the cumulative percentile of each value of
G_ALOJAMIENTO_D (below represented by G_percent), (b) the z-score of each value of
G_ALOJAMIENTO_D (below represented by G_norm), and (c) the quartile the value of
G_ALOJAMIENTO_D falls into (below represented by G_quartiles). TEMPORARY. SELECT IF G_ALOJAMIENTO_D>0. rank var=G_ALOJAMIENTO_D/ percent into G_percent/ normal into G_norm/ ntiles(4) into G_quartiles. List. The listing of the data follows: G_ALOJAMIENTO_D imp_paquete_tur tipoviaj TOTAL id x_conteo G_percent G_norm G_quartiles 0.00 0.00 1.00 1.00 1 1.00 . . . 0.00 0.00 1.00 1.00 4 1.00 . . . 0.00 0.00 1.00 1.00 6 1.00 . . . 0.00 0.00 2.00 1.00 7 1.00 . . . 0.00 0.00 2.00 1.00 9 1.00 . . . 0.00 0.00 1.00 1.00 10 1.00 . . . 0.00 0.00 1.00 1.00 11 1.00 . . . 0.00 0.00 1.00 1.00 16 1.00 . . . 0.00 0.00 2.00 1.00 18 1.00 . . . 0.00 0.00 2.00 1.00 19 1.00 . . . 0.00 0.00 2.00 1.00 21 1.00 . . . 0.00 0.00 2.00 1.00 24 1.00 . . . 0.00 0.00 1.00 1.00 25 1.00 . . . 0.00 0.00 1.00 1.00 26 1.00 . . . 0.00 0.00 2.00 1.00 27 1.00 . . . 0.00 0.00 1.00 1.00 29 1.00 . . . 0.00 0.00 2.00 1.00 30 1.00 . . . 7.50 0.00 1.00 1.00 23 1.00 7.69 -1.673 1 8.33 0.00 2.00 1.00 15 1.00 19.23 -.9929 1 8.33 0.00 2.00 1.00 17 1.00 19.23 -.9929 1 8.93 0.00 1.00 1.00 2 1.00 30.77 -.6020 2 12.50 0.00 2.00 1.00 20 1.00 38.46 -.3879 2 13.33 0.00 2.00 1.00 22 1.00 46.15 -.1903 2 15.00 0.00 2.00 1.00 14 1.00 53.85 0.0000 3 16.67 0.00 1.00 1.00 12 1.00 61.54 0.1903 3 20.00 0.00 1.00 1.00 13 1.00 69.23 0.3879 3 23.51 0.00 1.00 1.00 28 1.00 76.92 0.6020 3 67.50 0.00 2.00 1.00 3 1.00 84.62 0.8484 4 100.00 0.00 2.00 1.00 5 1.00 92.31 1.1619 4 150.00 0.00 1.00 1.00 8 1.00 100.00 1.6729 4 So, is it the case that the variable G_quartiles all that is needed? Or does one watn to compare the raw value of G_ALOJAMIENTO_D to the value of Q1, Q2, Q3 (why)? One could just as easily compare G_percent against values of 25, 50, and 75. Incidentally, it may not be immediately obvious but the distribution of
G_ALOJAMIENTO_D is positively skewed. Using Examine to get the boxplot for
G_ALOJAMIENTO_D readily shows this: TEMPORARY. SELECT IF G_ALOJAMIENTO_D>0. examine var=G_ALOJAMIENTO_D/ID=ID/ plot=boxplot/ statistics=extreme. *NOTE: extreme value in both tails are printed but not shown here. ******* Sorry, the mailing list software does not allow images embedded in an email (?) so this bounced back to me. I've removed the image in order to post this email to the list. ******* The height of the box used to the interquartile range computed using Tukey's hinges (one should check this) and the IDs 3, 5, and 8, all in the 4th quartile (see listing above) but according to SPSS criteria, are not "outliers" -- O would be provided (i.e., 1.5 IQR-3.00 IQR) and if greater than 3.00 IQRs, an E would be provided. So, what am I missing? -Mike Palij New York University On Tue, Sep 29, 2020 at 1:50 PM Jon Peck <[hidden email]> wrote:
|
Another way to see this would be a boxplot with the point id's showing. GGRAPH /GRAPHDATASET NAME="graphdataset" VARIABLES=jobcat salary id MISSING=LISTWISE REPORTMISSING=NO /GRAPHSPEC SOURCE=INLINE. BEGIN GPL SOURCE: s=userSource(id("graphdataset")) DATA: jobcat=col(source(s), name("jobcat"), unit.category()) DATA: salary=col(source(s), name("salary")) DATA: id=col(source(s), name("id")) GUIDE: axis(dim(1), label("Employment Category")) GUIDE: axis(dim(2), label("Current Salary")) GUIDE: text.title(label("Simple Boxplot of Current Salary by Employment Category")) SCALE: cat(dim(1), include("1", "2", "3")) SCALE: linear(dim(2), include(0)) ELEMENT: schema(position(bin.quantile.letter(jobcat*salary)), label(id)) END GPL. Not automated, of course, but the EXAMINE outliers list is very crude and pretty much useless. On Tue, Sep 29, 2020 at 2:45 PM Michael Palij <[hidden email]> wrote:
|
Thank you very much 👍 El mar., 29 sept. 2020 a las 15:47, Jon Peck (<[hidden email]>) escribió:
Javier Figueroa Procesamiento y Análisis de bases de datos Cel: 5927-4748 / 4970-1940 Casa: 2289-0184 |
I have an example on my blog of grabbing the quantiles and calculating
statistics a bit of a different way from OMS as well, https://andrewpwheeler.com/2015/04/13/extracting-items-from-spss-tables-using-python/. This trades off some of the data manipulation needed for the OMS and/or RANK steps via scooping up in the info in python. ----- Andy W [hidden email] http://andrewpwheeler.wordpress.com/ -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
This approach is fantastic.Thanks. Thank you very much. El mié., 30 sept. 2020 a las 7:47, Andy W (<[hidden email]>) escribió: I have an example on my blog of grabbing the quantiles and calculating Javier Figueroa Procesamiento y Análisis de bases de datos Cel: 5927-4748 / 4970-1940 Casa: 2289-0184 |
Free forum by Nabble | Edit this page |