Hello, experts in spss, I come to your great knowledge in macros to be able to modify a syntax that thought about generating once and with a variable, today that same process is done more frequently and adding between 2 to 5 variables with different combinations. Below I detail the variables and their respective codes. tipoviaj = Type of traveler 1 Tourist 2 Hiker via = Access road 1 Air 2 Terrestrial 3 Maritime Border = Admission border 1 Aurora 2 Mundo Maya 7 Valle Nuevo 8 Pedro de Alvarado 9 San Cristóbal 10 New Anguatú 12 The Florido 15 The Carmen 16 The Mesilla 20 Melchor de Mencos Stay_Average = Days of stay scale variable min = 2 max = 260 Country of residence There are 9 possible codes The syntax what it does is a process of imputation to extreme values and lost values to a variable type scale using the average of already existing data of the combination of the variables written previously, any comment would be very helpful. The database I will use has approximately 60,000 records. /* procedimiento crear base de datos data list list /tipoviaj via Border Stay_Average Contry G_PAQUETE imp_paquete_tur. begin data. 1 1 07 13 10 280 2 2 2 01 60 02 80000 1 1 1 09 50 3 0.50 1 1 2 02 100 3 100 2 2 2 05 15 2 150 2 2 1 15 30 2 . 1 end data. execute. numeric id(f8.0). compute id=$casenum. execute. variable level all (scale). It must be imputed with G_PAQUETE the value of the average to all those that in the variable imp_paquete_tur appear with code 1 in addition to the possible combinations. then syntax to modify: COMPUTE X=1. EXECUTE. TEMPORARY. SELECT IF (TIPOVIAJ=2 AND G_PAQUETE>0 AND IMP_PAQUETE_TUR=1). AGGREGATE /OUTFILE='C:\Users\jfigueroa\Desktop\TRABAJOS\2016\Base Gastos\2018\Trimestre3\G_PAQUETE.SAV' /BREAK=TIPOVIAJ /SUMA_G_PAQUETE=SUM (G_PAQUETE) /GRUP_G_PAQUETE=SUM (GRUPOGASTO) /DIA_G_PAQUETE= SUM (DIA) /N_ENCU_G_PAQUETE=SUM (X). EXECUTE. GET FILE='C:\Users\jfigueroa\Desktop\TRABAJOS\2016\Base Gastos\2018\Trimestre3\G_PAQUETE.SAV'. COMPUTE MEDIA_G_PAQUETE_P=SUMA_G_PAQUETE/GRUP_G_PAQUETE. COMPUTE MEDIA_G_PAQUETE_D=DIA_G_PAQUETE/N_ENCU_G_PAQUETE. COMPUTE MEDIA_G_PAQUETE_PD=MEDIA_G_PAQUETE_P/MEDIA_G_PAQUETE_D. EXECUTE. COMPUTE IMP_PAQUETE_TUR=2. EXECUTE. SAVE OUTFILE='C:\Users\jfigueroa\Desktop\TRABAJOS\2016\Base Gastos\2018\Trimestre3\G_PAQUETE.SAV'/COMPRESSED. GET FILE='C:\Users\jfigueroa\Desktop\TRABAJOS\2016\Base Gastos\2018\Trimestre3\GASTO_GUATE_DEP_IMP_7-8-9-10_2018.SAV'. SORT CASES BY IMP_PAQUETE_TUR. MATCH FILES /FILE=*/TABLE='C:\Users\jfigueroa\Desktop\TRABAJOS\2016\Base Gastos\2018\Trimestre3\G_PAQUETE.SAV'/BY IMP_PAQUETE_TUR. EXECUTE. IF IMP_PAQUETE_TUR=2 G_PAQUETE_PD=MEDIA_G_PAQUETE_PD. IF IMP_PAQUETE_TUR=2 G_PAQUETE=G_PAQUETE_PD*GRUPOGASTO*DIA. EXECUTE. SAVE OUTFILE='C:\Users\jfigueroa\Desktop\TRABAJOS\2016\Base Gastos\2018\Trimestre3\GASTO_GUATE_IMP3.SAV' /DROP SUMA_G_PAQUETE GRUP_G_PAQUETE DIA_G_PAQUETE N_ENCU_G_PAQUETE MEDIA_G_PAQUETE_P MEDIA_G_PAQUETE_D MEDIA_G_PAQUETE_PD. Javier Figueroa Procesamiento y Análisis de bases de datos Cel: 5927-4748 / 4970-1940 Casa: 2289-0184 |
Sounds like you want us to do all the work for you, right? Mario Giesel Munich, Germany
Am Dienstag, 27. November 2018, 22:07:11 MEZ hat Javier Figueroa <[hidden email]> Folgendes geschrieben:
Hello, experts in spss, I come to your great knowledge in macros to be able to modify a syntax that thought about generating once and with a variable, today that same process is done more frequently and adding between 2 to 5 variables with different combinations. Below I detail the variables and their respective codes. tipoviaj = Type of traveler 1 Tourist 2 Hiker via = Access road 1 Air 2 Terrestrial 3 Maritime Border = Admission border 1 Aurora 2 Mundo Maya 7 Valle Nuevo 8 Pedro de Alvarado 9 San Cristóbal 10 New Anguatú 12 The Florido 15 The Carmen 16 The Mesilla 20 Melchor de Mencos Stay_Average = Days of stay scale variable min = 2 max = 260 Country of residence There are 9 possible codes The syntax what it does is a process of imputation to extreme values and lost values to a variable type scale using the average of already existing data of the combination of the variables written previously, any comment would be very helpful. The database I will use has approximately 60,000 records. /* procedimiento crear base de datos data list list /tipoviaj via Border Stay_Average Contry G_PAQUETE imp_paquete_tur. begin data. 1 1 07 13 10 280 2 2 2 01 60 02 80000 1 1 1 09 50 3 0.50 1 1 2 02 100 3 100 2 2 2 05 15 2 150 2 2 1 15 30 2 . 1 end data. execute. numeric id(f8.0). compute id=$casenum. execute. variable level all (scale). It must be imputed with G_PAQUETE the value of the average to all those that in the variable imp_paquete_tur appear with code 1 in addition to the possible combinations. then syntax to modify: COMPUTE X=1. EXECUTE. TEMPORARY. SELECT IF (TIPOVIAJ=2 AND G_PAQUETE>0 AND IMP_PAQUETE_TUR=1). AGGREGATE /OUTFILE='C:\Users\jfigueroa\Desktop\TRABAJOS\2016\Base Gastos\2018\Trimestre3\G_PAQUETE.SAV' /BREAK=TIPOVIAJ /SUMA_G_PAQUETE=SUM (G_PAQUETE) /GRUP_G_PAQUETE=SUM (GRUPOGASTO) /DIA_G_PAQUETE= SUM (DIA) /N_ENCU_G_PAQUETE=SUM (X). EXECUTE. GET FILE='C:\Users\jfigueroa\Desktop\TRABAJOS\2016\Base Gastos\2018\Trimestre3\G_PAQUETE.SAV'. COMPUTE MEDIA_G_PAQUETE_P=SUMA_G_PAQUETE/GRUP_G_PAQUETE. COMPUTE MEDIA_G_PAQUETE_D=DIA_G_PAQUETE/N_ENCU_G_PAQUETE. COMPUTE MEDIA_G_PAQUETE_PD=MEDIA_G_PAQUETE_P/MEDIA_G_PAQUETE_D. EXECUTE. COMPUTE IMP_PAQUETE_TUR=2. EXECUTE. SAVE OUTFILE='C:\Users\jfigueroa\Desktop\TRABAJOS\2016\Base Gastos\2018\Trimestre3\G_PAQUETE.SAV'/COMPRESSED. GET FILE='C:\Users\jfigueroa\Desktop\TRABAJOS\2016\Base Gastos\2018\Trimestre3\GASTO_GUATE_DEP_IMP_7-8-9-10_2018.SAV'. SORT CASES BY IMP_PAQUETE_TUR. MATCH FILES /FILE=*/TABLE='C:\Users\jfigueroa\Desktop\TRABAJOS\2016\Base Gastos\2018\Trimestre3\G_PAQUETE.SAV'/BY IMP_PAQUETE_TUR. EXECUTE. IF IMP_PAQUETE_TUR=2 G_PAQUETE_PD=MEDIA_G_PAQUETE_PD. IF IMP_PAQUETE_TUR=2 G_PAQUETE=G_PAQUETE_PD*GRUPOGASTO*DIA. EXECUTE. SAVE OUTFILE='C:\Users\jfigueroa\Desktop\TRABAJOS\2016\Base Gastos\2018\Trimestre3\GASTO_GUATE_IMP3.SAV' /DROP SUMA_G_PAQUETE GRUP_G_PAQUETE DIA_G_PAQUETE N_ENCU_G_PAQUETE MEDIA_G_PAQUETE_P MEDIA_G_PAQUETE_D MEDIA_G_PAQUETE_PD. Javier Figueroa Procesamiento y Análisis de bases de datos Cel: 5927-4748 / 4970-1940 Casa: 2289-0184 |
In reply to this post by Javier Figueroa
First off, I am not certain of your expectations/intentions.
What would be the inputs to the macro? Probably no one will write it for you without compensation. Perhaps look at some existing macros and try doing it yourself and post your efforts and someone might step up and provide helpful advise. The current post cannot be addressed in any meaningful way. As a start Get rid of all EXECUTE statements. Use datasets rather than disk files. Pretty sure all of the variables are NOT scale measurement level. Fix that. Maybe intersperse some comments for the uninitiated home audience. <quote author="Javier Figueroa"> Hello, experts in spss, I come to your great knowledge in macros to be able to modify a syntax that thought about generating once and with a variable, today that same process is done more frequently and adding between 2 to 5 variables with different combinations. Below I detail the variables and their respective codes. tipoviaj = Type of traveler 1 Tourist 2 Hiker via = Access road 1 Air 2 Terrestrial 3 Maritime Border = Admission border 1 Aurora 2 Mundo Maya 7 Valle Nuevo 8 Pedro de Alvarado 9 San Cristóbal 10 New Anguatú 12 The Florido 15 The Carmen 16 The Mesilla 20 Melchor de Mencos Stay_Average = Days of stay scale variable min = 2 max = 260 Country of residence There are 9 possible codes The syntax what it does is a process of imputation to extreme values and lost values to a variable type scale using the average of already existing data of the combination of the variables written previously, any comment would be very helpful. The database I will use has approximately 60,000 records. /* procedimiento crear base de datos data list list /tipoviaj via Border Stay_Average Contry G_PAQUETE imp_paquete_tur. begin data. 1 1 07 13 10 280 2 2 2 01 60 02 80000 1 1 1 09 50 3 0.50 1 1 2 02 100 3 100 2 2 2 05 15 2 150 2 2 1 15 30 2 . 1 end data. execute. numeric id(f8.0). compute id=$casenum. execute. variable level all (scale). It must be imputed with G_PAQUETE the value of the average to all those that in the variable imp_paquete_tur appear with code 1 in addition to the possible combinations. then syntax to modify: COMPUTE X=1. EXECUTE. TEMPORARY. SELECT IF (TIPOVIAJ=2 AND G_PAQUETE>0 AND IMP_PAQUETE_TUR=1). AGGREGATE /OUTFILE='C:\Users\jfigueroa\Desktop\TRABAJOS\2016\Base Gastos\2018\Trimestre3\G_PAQUETE.SAV' /BREAK=TIPOVIAJ /SUMA_G_PAQUETE=SUM (G_PAQUETE) /GRUP_G_PAQUETE=SUM (GRUPOGASTO) /DIA_G_PAQUETE= SUM (DIA) /N_ENCU_G_PAQUETE=SUM (X). EXECUTE. GET FILE='C:\Users\jfigueroa\Desktop\TRABAJOS\2016\Base Gastos\2018\Trimestre3\G_PAQUETE.SAV'. COMPUTE MEDIA_G_PAQUETE_P=SUMA_G_PAQUETE/GRUP_G_PAQUETE. COMPUTE MEDIA_G_PAQUETE_D=DIA_G_PAQUETE/N_ENCU_G_PAQUETE. COMPUTE MEDIA_G_PAQUETE_PD=MEDIA_G_PAQUETE_P/MEDIA_G_PAQUETE_D. EXECUTE. COMPUTE IMP_PAQUETE_TUR=2. EXECUTE. SAVE OUTFILE='C:\Users\jfigueroa\Desktop\TRABAJOS\2016\Base Gastos\2018\Trimestre3\G_PAQUETE.SAV'/COMPRESSED. GET FILE='C:\Users\jfigueroa\Desktop\TRABAJOS\2016\Base Gastos\2018\Trimestre3\GASTO_GUATE_DEP_IMP_7-8-9-10_2018.SAV'. SORT CASES BY IMP_PAQUETE_TUR. MATCH FILES /FILE=*/TABLE='C:\Users\jfigueroa\Desktop\TRABAJOS\2016\Base Gastos\2018\Trimestre3\G_PAQUETE.SAV'/BY IMP_PAQUETE_TUR. EXECUTE. IF IMP_PAQUETE_TUR=2 G_PAQUETE_PD=MEDIA_G_PAQUETE_PD. IF IMP_PAQUETE_TUR=2 G_PAQUETE=G_PAQUETE_PD*GRUPOGASTO*DIA. EXECUTE. SAVE OUTFILE='C:\Users\jfigueroa\Desktop\TRABAJOS\2016\Base Gastos\2018\Trimestre3\GASTO_GUATE_IMP3.SAV' /DROP SUMA_G_PAQUETE GRUP_G_PAQUETE DIA_G_PAQUETE N_ENCU_G_PAQUETE MEDIA_G_PAQUETE_P MEDIA_G_PAQUETE_D MEDIA_G_PAQUETE_PD. -- *Javier FigueroaProcesamiento y Análisis de bases de datos* *Cel: 5927-4748 / 4970-1940* *Casa: 2289-0184* ===================== To manage your subscription to SPSSX-L, send a message to <email>LISTSERV@.UGA</email> (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD </quote> ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
I agree with David's comments but would like to clarify his datasets comment. Using datasets is good practice, because it eliminates location dependencies, but datasets are actually temporary disk files, so using them does not improve performance. (Eliminating unnecessary EXECUTE commands does improve performance by eliminating unnecessary data passes.) A few other points. The MVA procedure (Analyze > Missing Value Analysis) provides a number of single-imputation methods that are more sophisticated than mean substitution, so you might want to look into that. No macro required. Any single-imputation process distorts the distribution of the imputed variable, so statistical results should be interpreted with caution unless the amount of missing data is small. The MULTIPLE IMPUTATION procedure provides a more sophisticated imputation procedure that can be used with some procedures that does a better job of handling the missing data in a non-distortionary way. If you do need to stick with the mean substitution method you described, I would recommend Python over macro (no surprise to regular readers), because it can produce a better (more robust, more flexible) procedure than is possible with macro. On Thu, Nov 29, 2018 at 8:45 AM David Marso <[hidden email]> wrote: First off, I am not certain of your expectations/intentions. |
Thank you very much for your contributions and I apologize again with this much appreciated and professional community of experts of the SPSS software, because I know that sometimes you can be confused by the translation of the language from Spanish to English, but in other occasions if you have resolved life and do not doubt at any time that this occasion was the same, I am aware that it is a very complicated and delicate work this of the imputations, because not only is to replace one data for another but there is much work of analysis of Appropriate imputation methods, if an inadequate method of imputation is taken can greatly affect the data and in the end the analysis in such a way that you can have more problems than you think will be resolved with the imputation. I keep working and looking for the best way to facilitate the process, right now it is done manually and really you have to have a lot of control for this. THANK YOU AND I HOPE HAVE A HAPPY WEEKEND. Sincerely, El jue., 29 nov. 2018 a las 14:09, Jon Peck (<[hidden email]>) escribió:
Javier Figueroa Procesamiento y Análisis de bases de datos Cel: 5927-4748 / 4970-1940 Casa: 2289-0184 |
Free forum by Nabble | Edit this page |