Administrator
|
"Is there another easier way to do this task?"
DATA LIST / nombre (A20). begin data Juan Manuel Alberto Ana Teresa Marilu Alberto2 Te1resa Maril11u END DATA. LOOP #=1 TO LENGTH(RTRIM(nombre)). COMPUTE BADDATA =SUM(BADDATA,(INDEX(UPCASE(SUBSTR(nombre,#,1)) ,"ABCDEFGHIJKLMNOPQRSTUVWXYZÑ ",1) EQ 0)). END LOOP. LIST. NOMBRE BADDATA Juan Manuel .00 Alberto .00 Ana .00 Teresa Marilu .00 Alberto2 1.00 Te1resa Maril11u 3.00 Number of cases read: 6 Number of cases listed: 6
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Excelent, this definitly solves my problem, thank you very much David Andrés Mg. Andrés Burga León Coordinador de Análisis e Informática Unidad de Medición de la Calidad Educativa (UMC) Ministerio de Educación del Perú Av.de la Arqeuología s/n (cuadra 2) Lima 41 Perú Teléfono 615-5840 - 6155800 anexo 1212 http://www2.minedu.gob.pe/umc/
"Is there another easier way to do this task?" DATA LIST / nombre (A20). begin data Juan Manuel Alberto Ana Teresa Marilu Alberto2 Te1resa Maril11u END DATA. LOOP #=1 TO LENGTH(RTRIM(nombre)). COMPUTE BADDATA =SUM(BADDATA,(INDEX(UPCASE(SUBSTR(nombre,#,1)) ,"ABCDEFGHIJKLMNOPQRSTUVWXYZÑ ",1) EQ 0)). END LOOP. LIST. NOMBRE BADDATA Juan Manuel .00 Alberto .00 Ana .00 Teresa Marilu .00 Alberto2 1.00 Te1resa Maril11u 3.00 Number of cases read: 6 Number of cases listed: 6 ANDRES ALBERTO BURGA LEON wrote > > Hello to everybody: > > I have a sting (nombre) variable (A15) wose content are names of diferent > length. For example: > > Juan Manuel > Alberto > Ana > Teresa Marilu > ... > > I need to chek if there are any typos like a number in the names (for > example Lu1s instead of Luis). > > So far I can only think of creating 15 new variables, each having one of > the string characters positions (COMPUTE name1 = CHAR.SUBTR(name,1,1) and > so on. > > Then, for the 15 new variables count the number of A, B .... Z. and > create 27 new variables (I also need to count Ñ). Then sum this new 27 > variables and check if this sum is equal to CHAR.LENGT(nombre) > > Is there another easier way to do this task? > > Kindly > > Andrés > > Mg. Andrés Burga León > Coordinador de Análisis e Informática > Unidad de Medición de la Calidad Educativa (UMC) > Ministerio de Educación del Perú > Av.de la Arqeuología s/n (cuadra 2) > Lima 41 > Perú > Teléfono 615-5840 - 6155800 anexo 1212 > http://www2.minedu.gob.pe/umc/ > -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Searching-for-characters-different-to-A-B-Z-in-string-tp5107681p5107733.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
Glad that helps.
The Key is the last argument to the INDEX function which chops the 'haystack' into single characters. --
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
I suggest reversing the test and counting
digits.
LOOP #=1 TO LENGTH(RTRIM(nombre)). COMPUTE BADDATA =SUM(BADDATA,(CHAR.INDEX(CHAR.SUBSTR(nombre,#,1),"0123456789",1))). END LOOP. That way, any other characters not listed in the original solution such as accented characters will not trigger a count. One other thing: with Statistics 16 or later, use the CHAR. functions. They give the same result whether in Unicode or code page mode. The old, deprecated functions work on bytes, not characters, and the number of bytes per character can be different in Unicode and code page mode. And if the data above had been, say, Japanese, Korean, or Chinese, the original code would have failed in either mode. (Of course, it would also have failed because the list of alpha characters would have been seriously incomplete.) Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] new phone: 720-342-5621 From: David Marso <[hidden email]> To: [hidden email] Date: 12/29/2011 07:52 AM Subject: Re: [SPSSX-L] Searching for characters different to A, B, ...Z in string Sent by: "SPSSX(r) Discussion" <[hidden email]> Glad that helps. The Key is the last argument to the INDEX function which chops the 'haystack' into single characters. -- ANDRES ALBERTO BURGA LEON wrote > > Excelent, this definitly solves my problem, thank you very much David > > > Andrés > > Mg. Andrés Burga León > Coordinador de Análisis e Informática > Unidad de Medición de la Calidad Educativa (UMC) > Ministerio de Educación del Perú > Av.de la Arqeuología s/n (cuadra 2) > Lima 41 > Perú > Teléfono 615-5840 - 6155800 anexo 1212 > http://www2.minedu.gob.pe/umc/ > > > > David Marso <david.marso@> > Enviado por: "SPSSX(r) Discussion" <SPSSX-L@.UGA> > 29/12/2011 09:27 a.m. > Por favor, responda a > David Marso <david.marso@> > > > Para > SPSSX-L@.UGA > cc > > Asunto > Re: Searching for characters different to A, B, ...Z in string > > > > > > > "Is there another easier way to do this task?" > > DATA LIST / nombre (A20). > begin data > Juan Manuel > Alberto > Ana > Teresa Marilu > Alberto2 > Te1resa Maril11u > END DATA. > LOOP #=1 TO LENGTH(RTRIM(nombre)). > COMPUTE BADDATA =SUM(BADDATA,(INDEX(UPCASE(SUBSTR(nombre,#,1)) > ,"ABCDEFGHIJKLMNOPQRSTUVWXYZÑ ",1) EQ 0)). > END LOOP. > LIST. > > > NOMBRE BADDATA > > Juan Manuel .00 > Alberto .00 > Ana .00 > Teresa Marilu .00 > Alberto2 1.00 > Te1resa Maril11u 3.00 > > > Number of cases read: 6 Number of cases listed: 6 > > ANDRES ALBERTO BURGA LEON wrote >> >> Hello to everybody: >> >> I have a sting (nombre) variable (A15) wose content are names of > diferent >> length. For example: >> >> Juan Manuel >> Alberto >> Ana >> Teresa Marilu >> ... >> >> I need to chek if there are any typos like a number in the names (for >> example Lu1s instead of Luis). >> >> So far I can only think of creating 15 new variables, each having one of >> the string characters positions (COMPUTE name1 = CHAR.SUBTR(name,1,1) > and >> so on. >> >> Then, for the 15 new variables count the number of A, B .... Z. and >> create 27 new variables (I also need to count Ñ). Then sum this new 27 >> variables and check if this sum is equal to CHAR.LENGT(nombre) >> >> Is there another easier way to do this task? >> >> Kindly >> >> Andrés >> >> Mg. Andrés Burga León >> Coordinador de Análisis e Informática >> Unidad de Medición de la Calidad Educativa (UMC) >> Ministerio de Educación del Perú >> Av.de la Arqeuología s/n (cuadra 2) >> Lima 41 >> Perú >> Teléfono 615-5840 - 6155800 anexo 1212 >> http://www2.minedu.gob.pe/umc/ >> > > > -- > View this message in context: > http://spssx-discussion.1045642.n5.nabble.com/Searching-for-characters-different-to-A-B-Z-in-string-tp5107681p5107733.html > > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Searching-for-characters-different-to-A-B-Z-in-string-tp5107681p5107808.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
I believe the original question was to detect characters other than <space>, A..Z or Ñ , not numbers. Perhaps someone fat-fingered the input and commas !, @, %.. etc may have entered the field.
Indeed, the CHAR functions are likely more appropriate however my ancient version does not support them and I usually like to test my code. --
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Hard to tell. The original said
I need to chek if there are any typos like a number in the names (for >>> example Lu1s instead of Luis). and Then, for the 15 new variables count the number of A, B .... Z. and >>> create 27 new variables (I also need to count Ñ) But Spanish orthography also uses a number of characters with acute accents (a, e, i, o, u) and u with diaresis. The best solution might be to use a table of known first and last names and compare against that. That would give some false positives, but it would be much more accurate for other kinds of spelling errors. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] new phone: 720-342-5621 From: David Marso <[hidden email]> To: [hidden email] Date: 12/29/2011 08:55 AM Subject: Re: [SPSSX-L] Searching for characters different to A, B, ...Z in string Sent by: "SPSSX(r) Discussion" <[hidden email]> I believe the original question was to detect characters other than <space>, A..Z or Ñ , not numbers. Perhaps someone fat-fingered the input and commas !, @, %.. etc may have entered the field. Indeed, the CHAR functions are likely more appropriate however my ancient version does not support them and I usually like to test my code. -- Jon K Peck wrote > > I suggest reversing the test and counting digits. > LOOP #=1 TO LENGTH(RTRIM(nombre)). > COMPUTE BADDATA > =SUM(BADDATA,(CHAR.INDEX(CHAR.SUBSTR(nombre,#,1),"0123456789",1))). > END LOOP. > > That way, any other characters not listed in the original solution such as > accented characters will not trigger a count. > > One other thing: with Statistics 16 or later, use the CHAR. functions. > They give the same result whether in Unicode or code page mode. The old, > deprecated functions work on bytes, not characters, and the number of > bytes per character can be different in Unicode and code page mode. And > if the data above had been, say, Japanese, Korean, or Chinese, the > original code would have failed in either mode. (Of course, it would also > have failed because the list of alpha characters would have been seriously > incomplete.) > > Jon Peck (no "h") aka Kim > Senior Software Engineer, IBM > peck@.ibm > new phone: 720-342-5621 > > > > > From: David Marso <david.marso@> > To: SPSSX-L@.uga > Date: 12/29/2011 07:52 AM > Subject: Re: [SPSSX-L] Searching for characters different to A, B, > ...Z in string > Sent by: "SPSSX(r) Discussion" <SPSSX-L@.uga> > > > > Glad that helps. > The Key is the last argument to the INDEX function which chops the > 'haystack' into single characters. > -- > > ANDRES ALBERTO BURGA LEON wrote >> >> Excelent, this definitly solves my problem, thank you very much David >> >> >> Andrés >> >> Mg. Andrés Burga León >> Coordinador de Análisis e Informática >> Unidad de Medición de la Calidad Educativa (UMC) >> Ministerio de Educación del Perú >> Av.de la Arqeuología s/n (cuadra 2) >> Lima 41 >> Perú >> Teléfono 615-5840 - 6155800 anexo 1212 >> http://www2.minedu.gob.pe/umc/ >> >> >> >> David Marso <david.marso@> >> Enviado por: "SPSSX(r) Discussion" <SPSSX-L@.UGA> >> 29/12/2011 09:27 a.m. >> Por favor, responda a >> David Marso <david.marso@> >> >> >> Para >> SPSSX-L@.UGA >> cc >> >> Asunto >> Re: Searching for characters different to A, B, ...Z in string >> >> >> >> >> >> >> "Is there another easier way to do this task?" >> >> DATA LIST / nombre (A20). >> begin data >> Juan Manuel >> Alberto >> Ana >> Teresa Marilu >> Alberto2 >> Te1resa Maril11u >> END DATA. >> LOOP #=1 TO LENGTH(RTRIM(nombre)). >> COMPUTE BADDATA =SUM(BADDATA,(INDEX(UPCASE(SUBSTR(nombre,#,1)) >> ,"ABCDEFGHIJKLMNOPQRSTUVWXYZÑ ",1) EQ 0)). >> END LOOP. >> LIST. >> >> >> NOMBRE BADDATA >> >> Juan Manuel .00 >> Alberto .00 >> Ana .00 >> Teresa Marilu .00 >> Alberto2 1.00 >> Te1resa Maril11u 3.00 >> >> >> Number of cases read: 6 Number of cases listed: 6 >> >> ANDRES ALBERTO BURGA LEON wrote >>> >>> Hello to everybody: >>> >>> I have a sting (nombre) variable (A15) wose content are names of >> diferent >>> length. For example: >>> >>> Juan Manuel >>> Alberto >>> Ana >>> Teresa Marilu >>> ... >>> >>> I need to chek if there are any typos like a number in the names (for >>> example Lu1s instead of Luis). >>> >>> So far I can only think of creating 15 new variables, each having one > of >>> the string characters positions (COMPUTE name1 = CHAR.SUBTR(name,1,1) >> and >>> so on. >>> >>> Then, for the 15 new variables count the number of A, B .... Z. and >>> create 27 new variables (I also need to count Ñ). Then sum this new 27 >>> variables and check if this sum is equal to CHAR.LENGT(nombre) >>> >>> Is there another easier way to do this task? >>> >>> Kindly >>> >>> Andrés >>> >>> Mg. Andrés Burga León >>> Coordinador de Análisis e Informática >>> Unidad de Medición de la Calidad Educativa (UMC) >>> Ministerio de Educación del Perú >>> Av.de la Arqeuología s/n (cuadra 2) >>> Lima 41 >>> Perú >>> Teléfono 615-5840 - 6155800 anexo 1212 >>> http://www2.minedu.gob.pe/umc/ >>> >> >> >> -- >> View this message in context: >> > http://spssx-discussion.1045642.n5.nabble.com/Searching-for-characters-different-to-A-B-Z-in-string-tp5107681p5107733.html > >> >> Sent from the SPSSX Discussion mailing list archive at Nabble.com. >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> LISTSERV@.UGA (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> > > > -- > View this message in context: > http://spssx-discussion.1045642.n5.nabble.com/Searching-for-characters-different-to-A-B-Z-in-string-tp5107681p5107808.html > > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Searching-for-characters-different-to-A-B-Z-in-string-tp5107681p5107935.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Thanks to everybody for the answers. Indeed the names are not typed, but captured by a scanner. The software didn't accept the acute accent or the dieresis, but could misinterpret a letter for a number of other characters, like dot, commas, etc. In principle, the only valid characters are those specified in David's syntax, so it works well for me (adding the CHAR.) Andrés Mg. Andrés Burga León Coordinador de Análisis e Informática Unidad de Medición de la Calidad Educativa (UMC) Ministerio de Educación del Perú Av.de la Arqeuología s/n (cuadra 2) Lima 41 Perú Teléfono 615-5840 - 6155800 anexo 1212 http://www2.minedu.gob.pe/umc/
Hard to tell. The original said I need to chek if there are any typos like a number in the names (for >>> example Lu1s instead of Luis). and Then, for the 15 new variables count the number of A, B .... Z. and >>> create 27 new variables (I also need to count Ñ) But Spanish orthography also uses a number of characters with acute accents (a, e, i, o, u) and u with diaresis. The best solution might be to use a table of known first and last names and compare against that. That would give some false positives, but it would be much more accurate for other kinds of spelling errors. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] new phone: 720-342-5621 From: David Marso <[hidden email]> To: [hidden email] Date: 12/29/2011 08:55 AM Subject: Re: [SPSSX-L] Searching for characters different to A, B, ...Z in string Sent by: "SPSSX(r) Discussion" <[hidden email]> I believe the original question was to detect characters other than <space>, A..Z or Ñ , not numbers. Perhaps someone fat-fingered the input and commas !, @, %.. etc may have entered the field. Indeed, the CHAR functions are likely more appropriate however my ancient version does not support them and I usually like to test my code. -- Jon K Peck wrote > > I suggest reversing the test and counting digits. > LOOP #=1 TO LENGTH(RTRIM(nombre)). > COMPUTE BADDATA > =SUM(BADDATA,(CHAR.INDEX(CHAR.SUBSTR(nombre,#,1),"0123456789",1))). > END LOOP. > > That way, any other characters not listed in the original solution such as > accented characters will not trigger a count. > > One other thing: with Statistics 16 or later, use the CHAR. functions. > They give the same result whether in Unicode or code page mode. The old, > deprecated functions work on bytes, not characters, and the number of > bytes per character can be different in Unicode and code page mode. And > if the data above had been, say, Japanese, Korean, or Chinese, the > original code would have failed in either mode. (Of course, it would also > have failed because the list of alpha characters would have been seriously > incomplete.) > > Jon Peck (no "h") aka Kim > Senior Software Engineer, IBM > peck@.ibm > new phone: 720-342-5621 > > > > > From: David Marso <david.marso@> > To: SPSSX-L@.uga > Date: 12/29/2011 07:52 AM > Subject: Re: [SPSSX-L] Searching for characters different to A, B, > ...Z in string > Sent by: "SPSSX(r) Discussion" <SPSSX-L@.uga> > > > > Glad that helps. > The Key is the last argument to the INDEX function which chops the > 'haystack' into single characters. > -- > > ANDRES ALBERTO BURGA LEON wrote >> >> Excelent, this definitly solves my problem, thank you very much David >> >> >> Andrés >> >> Mg. Andrés Burga León >> Coordinador de Análisis e Informática >> Unidad de Medición de la Calidad Educativa (UMC) >> Ministerio de Educación del Perú >> Av.de la Arqeuología s/n (cuadra 2) >> Lima 41 >> Perú >> Teléfono 615-5840 - 6155800 anexo 1212 >> http://www2.minedu.gob.pe/umc/ >> >> >> >> David Marso <david.marso@> >> Enviado por: "SPSSX(r) Discussion" <SPSSX-L@.UGA> >> 29/12/2011 09:27 a.m. >> Por favor, responda a >> David Marso <david.marso@> >> >> >> Para >> SPSSX-L@.UGA >> cc >> >> Asunto >> Re: Searching for characters different to A, B, ...Z in string >> >> >> >> >> >> >> "Is there another easier way to do this task?" >> >> DATA LIST / nombre (A20). >> begin data >> Juan Manuel >> Alberto >> Ana >> Teresa Marilu >> Alberto2 >> Te1resa Maril11u >> END DATA. >> LOOP #=1 TO LENGTH(RTRIM(nombre)). >> COMPUTE BADDATA =SUM(BADDATA,(INDEX(UPCASE(SUBSTR(nombre,#,1)) >> ,"ABCDEFGHIJKLMNOPQRSTUVWXYZÑ ",1) EQ 0)). >> END LOOP. >> LIST. >> >> >> NOMBRE BADDATA >> >> Juan Manuel .00 >> Alberto .00 >> Ana .00 >> Teresa Marilu .00 >> Alberto2 1.00 >> Te1resa Maril11u 3.00 >> >> >> Number of cases read: 6 Number of cases listed: 6 >> >> ANDRES ALBERTO BURGA LEON wrote >>> >>> Hello to everybody: >>> >>> I have a sting (nombre) variable (A15) wose content are names of >> diferent >>> length. For example: >>> >>> Juan Manuel >>> Alberto >>> Ana >>> Teresa Marilu >>> ... >>> >>> I need to chek if there are any typos like a number in the names (for >>> example Lu1s instead of Luis). >>> >>> So far I can only think of creating 15 new variables, each having one > of >>> the string characters positions (COMPUTE name1 = CHAR.SUBTR(name,1,1) >> and >>> so on. >>> >>> Then, for the 15 new variables count the number of A, B .... Z. and >>> create 27 new variables (I also need to count Ñ). Then sum this new 27 >>> variables and check if this sum is equal to CHAR.LENGT(nombre) >>> >>> Is there another easier way to do this task? >>> >>> Kindly >>> >>> Andrés >>> >>> Mg. Andrés Burga León >>> Coordinador de Análisis e Informática >>> Unidad de Medición de la Calidad Educativa (UMC) >>> Ministerio de Educación del Perú >>> Av.de la Arqeuología s/n (cuadra 2) >>> Lima 41 >>> Perú >>> Teléfono 615-5840 - 6155800 anexo 1212 >>> http://www2.minedu.gob.pe/umc/ >>> >> >> >> -- >> View this message in context: >> > http://spssx-discussion.1045642.n5.nabble.com/Searching-for-characters-different-to-A-B-Z-in-string-tp5107681p5107733.html > >> >> Sent from the SPSSX Discussion mailing list archive at Nabble.com. >> >> ===================== >> To manage your subscription to SPSSX-L, send a message to >> LISTSERV@.UGA (not to SPSSX-L), with no body text except the >> command. To leave the list, send the command >> SIGNOFF SPSSX-L >> For a list of commands to manage subscriptions, send the command >> INFO REFCARD >> > > > -- > View this message in context: > http://spssx-discussion.1045642.n5.nabble.com/Searching-for-characters-different-to-A-B-Z-in-string-tp5107681p5107808.html > > Sent from the SPSSX Discussion mailing list archive at Nabble.com. > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Searching-for-characters-different-to-A-B-Z-in-string-tp5107681p5107935.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |