Searching for characters different to A, B, ...Z in string

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Searching for characters different to A, B, ...Z in string

ANDRES ALBERTO BURGA LEON

Hello to everybody:

I have a sting (nombre) variable (A15) wose content are names of diferent length. For example:

Juan Manuel
Alberto
Ana
Teresa Marilu
...

I need to chek if there are any typos like a number in the names (for example Lu1s instead of Luis).

So far I can only think of creating 15 new variables, each having one of the string characters positions (COMPUTE name1 = CHAR.SUBTR(name,1,1) and so on.

Then, for the 15 new variables count the number of A, B  .... Z. and create 27 new variables (I also need to count Ñ). Then sum this new 27 variables and check if this sum is equal to CHAR.LENGT(nombre)

Is there another easier way to do this task?

 Kindly

Andrés

Mg. Andrés Burga León
Coordinador de Análisis e Informática
Unidad de Medición de la Calidad Educativa (UMC)
Ministerio de Educación del Perú
Av.de la Arqeuología s/n (cuadra 2)
Lima 41
Perú
Teléfono 615-5840 - 6155800 anexo 1212
http://www2.minedu.gob.pe/umc/
Reply | Threaded
Open this post in threaded view
|

Re: Searching for characters different to A, B, ...Z in string

David Marso
Administrator
"Is there another easier way to do this task?"

DATA LIST / nombre (A20).
begin data
Juan Manuel
Alberto
Ana
Teresa Marilu
Alberto2
Te1resa Maril11u
END DATA.
LOOP #=1 TO LENGTH(RTRIM(nombre)).
COMPUTE BADDATA =SUM(BADDATA,(INDEX(UPCASE(SUBSTR(nombre,#,1)) ,"ABCDEFGHIJKLMNOPQRSTUVWXYZÑ ",1) EQ 0)).
END LOOP.
LIST.


NOMBRE                BADDATA

Juan Manuel               .00
Alberto                   .00
Ana                       .00
Teresa Marilu             .00
Alberto2                 1.00
Te1resa Maril11u         3.00


Number of cases read:  6    Number of cases listed:  6
ANDRES ALBERTO BURGA LEON wrote
Hello to everybody:

I have a sting (nombre) variable (A15) wose content are names of diferent
length. For example:

Juan Manuel
Alberto
Ana
Teresa Marilu
...

I need to chek if there are any typos like a number in the names (for
example Lu1s instead of Luis).

So far I can only think of creating 15 new variables, each having one of
the string characters positions (COMPUTE name1 = CHAR.SUBTR(name,1,1) and
so on.

Then, for the 15 new variables count the number of A, B  .... Z. and
create 27 new variables (I also need to count Ñ). Then sum this new 27
variables and check if this sum is equal to CHAR.LENGT(nombre)

Is there another easier way to do this task?

 Kindly

Andrés

Mg. Andrés Burga León
Coordinador de Análisis e Informática
Unidad de Medición de la Calidad Educativa (UMC)
Ministerio de Educación del Perú
Av.de la Arqeuología s/n (cuadra 2)
Lima 41
Perú
Teléfono 615-5840 - 6155800 anexo 1212
http://www2.minedu.gob.pe/umc/
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Searching for characters different to A, B, ...Z in string

ANDRES ALBERTO BURGA LEON

Excelent, this definitly solves my problem, thank you very much David


Andrés

Mg. Andrés Burga León
Coordinador de Análisis e Informática
Unidad de Medición de la Calidad Educativa (UMC)
Ministerio de Educación del Perú
Av.de la Arqeuología s/n (cuadra 2)
Lima 41
Perú
Teléfono 615-5840 - 6155800 anexo 1212
http://www2.minedu.gob.pe/umc/


David Marso <[hidden email]>
Enviado por: "SPSSX(r) Discussion" <[hidden email]>

29/12/2011 09:27 a.m.

Por favor, responda a
David Marso <[hidden email]>

Para
[hidden email]
cc
Asunto
Re: Searching for characters different to A, B, ...Z in string





"Is there another easier way to do this task?"

DATA LIST / nombre (A20).
begin data
Juan Manuel
Alberto
Ana
Teresa Marilu
Alberto2
Te1resa Maril11u
END DATA.
LOOP #=1 TO LENGTH(RTRIM(nombre)).
COMPUTE BADDATA =SUM(BADDATA,(INDEX(UPCASE(SUBSTR(nombre,#,1))
,"ABCDEFGHIJKLMNOPQRSTUVWXYZÑ ",1) EQ 0)).
END LOOP.
LIST.


NOMBRE                BADDATA

Juan Manuel               .00
Alberto                   .00
Ana                       .00
Teresa Marilu             .00
Alberto2                 1.00
Te1resa Maril11u         3.00


Number of cases read:  6    Number of cases listed:  6

ANDRES ALBERTO BURGA LEON wrote
>
> Hello to everybody:
>
> I have a sting (nombre) variable (A15) wose content are names of diferent
> length. For example:
>
> Juan Manuel
> Alberto
> Ana
> Teresa Marilu
> ...
>
> I need to chek if there are any typos like a number in the names (for
> example Lu1s instead of Luis).
>
> So far I can only think of creating 15 new variables, each having one of
> the string characters positions (COMPUTE name1 = CHAR.SUBTR(name,1,1) and
> so on.
>
> Then, for the 15 new variables count the number of A, B  .... Z. and
> create 27 new variables (I also need to count Ñ). Then sum this new 27
> variables and check if this sum is equal to CHAR.LENGT(nombre)
>
> Is there another easier way to do this task?
>
>  Kindly
>
> Andrés
>
> Mg. Andrés Burga León
> Coordinador de Análisis e Informática
> Unidad de Medición de la Calidad Educativa (UMC)
> Ministerio de Educación del Perú
> Av.de la Arqeuología s/n (cuadra 2)
> Lima 41
> Perú
> Teléfono 615-5840 - 6155800 anexo 1212
>
http://www2.minedu.gob.pe/umc/
>


--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Searching-for-characters-different-to-A-B-Z-in-string-tp5107681p5107733.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Searching for characters different to A, B, ...Z in string

David Marso
Administrator
Glad that helps.
The Key is the last argument to the INDEX function which chops the 'haystack' into single characters.
--
ANDRES ALBERTO BURGA LEON wrote
Excelent, this definitly solves my problem, thank you very much David


Andrés

Mg. Andrés Burga León
Coordinador de Análisis e Informática
Unidad de Medición de la Calidad Educativa (UMC)
Ministerio de Educación del Perú
Av.de la Arqeuología s/n (cuadra 2)
Lima 41
Perú
Teléfono 615-5840 - 6155800 anexo 1212
http://www2.minedu.gob.pe/umc/



David Marso <[hidden email]>
Enviado por: "SPSSX(r) Discussion" <[hidden email]>
29/12/2011 09:27 a.m.
Por favor, responda a
David Marso <[hidden email]>


Para
[hidden email]
cc

Asunto
Re: Searching for characters different to A, B, ...Z in string






"Is there another easier way to do this task?"

DATA LIST / nombre (A20).
begin data
Juan Manuel
Alberto
Ana
Teresa Marilu
Alberto2
Te1resa Maril11u
END DATA.
LOOP #=1 TO LENGTH(RTRIM(nombre)).
COMPUTE BADDATA =SUM(BADDATA,(INDEX(UPCASE(SUBSTR(nombre,#,1))
,"ABCDEFGHIJKLMNOPQRSTUVWXYZÑ ",1) EQ 0)).
END LOOP.
LIST.


NOMBRE                BADDATA

Juan Manuel               .00
Alberto                   .00
Ana                       .00
Teresa Marilu             .00
Alberto2                 1.00
Te1resa Maril11u         3.00


Number of cases read:  6    Number of cases listed:  6

ANDRES ALBERTO BURGA LEON wrote
>
> Hello to everybody:
>
> I have a sting (nombre) variable (A15) wose content are names of
diferent
> length. For example:
>
> Juan Manuel
> Alberto
> Ana
> Teresa Marilu
> ...
>
> I need to chek if there are any typos like a number in the names (for
> example Lu1s instead of Luis).
>
> So far I can only think of creating 15 new variables, each having one of
> the string characters positions (COMPUTE name1 = CHAR.SUBTR(name,1,1)
and
> so on.
>
> Then, for the 15 new variables count the number of A, B  .... Z. and
> create 27 new variables (I also need to count Ñ). Then sum this new 27
> variables and check if this sum is equal to CHAR.LENGT(nombre)
>
> Is there another easier way to do this task?
>
>  Kindly
>
> Andrés
>
> Mg. Andrés Burga León
> Coordinador de Análisis e Informática
> Unidad de Medición de la Calidad Educativa (UMC)
> Ministerio de Educación del Perú
> Av.de la Arqeuología s/n (cuadra 2)
> Lima 41
> Perú
> Teléfono 615-5840 - 6155800 anexo 1212
> http://www2.minedu.gob.pe/umc/
>


--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Searching-for-characters-different-to-A-B-Z-in-string-tp5107681p5107733.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Searching for characters different to A, B, ...Z in string

Jon K Peck
I suggest reversing the test and counting digits.
LOOP #=1 TO LENGTH(RTRIM(nombre)).
COMPUTE BADDATA =SUM(BADDATA,(CHAR.INDEX(CHAR.SUBSTR(nombre,#,1),"0123456789",1))).
END LOOP.

That way, any other characters not listed in the original solution such as accented characters will not trigger a count.

One other thing: with Statistics 16 or later, use the CHAR. functions.  They give the same result whether in Unicode or code page mode.  The old, deprecated functions work on bytes, not characters, and the number of bytes per character can be different in Unicode and code page mode.  And if the data above had been, say, Japanese, Korean, or Chinese, the original code would have failed in either mode.  (Of course, it would also have failed because the list of alpha characters would have been seriously incomplete.)

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        David Marso <[hidden email]>
To:        [hidden email]
Date:        12/29/2011 07:52 AM
Subject:        Re: [SPSSX-L] Searching for characters different to A, B,              ...Z in string
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Glad that helps.
The Key is the last argument to the INDEX function which chops the
'haystack' into single characters.
--

ANDRES ALBERTO BURGA LEON wrote
>
> Excelent, this definitly solves my problem, thank you very much David
>
>
> Andrés
>
> Mg. Andrés Burga León
> Coordinador de Análisis e Informática
> Unidad de Medición de la Calidad Educativa (UMC)
> Ministerio de Educación del Perú
> Av.de la Arqeuología s/n (cuadra 2)
> Lima 41
> Perú
> Teléfono 615-5840 - 6155800 anexo 1212
>
http://www2.minedu.gob.pe/umc/
>
>
>
> David Marso &lt;david.marso@&gt;
> Enviado por: "SPSSX(r) Discussion" &lt;SPSSX-L@.UGA&gt;
> 29/12/2011 09:27 a.m.
> Por favor, responda a
> David Marso &lt;david.marso@&gt;
>
>
> Para
> SPSSX-L@.UGA
> cc
>
> Asunto
> Re: Searching for characters different to A, B, ...Z in string
>
>
>
>
>
>
> "Is there another easier way to do this task?"
>
> DATA LIST / nombre (A20).
> begin data
> Juan Manuel
> Alberto
> Ana
> Teresa Marilu
> Alberto2
> Te1resa Maril11u
> END DATA.
> LOOP #=1 TO LENGTH(RTRIM(nombre)).
> COMPUTE BADDATA =SUM(BADDATA,(INDEX(UPCASE(SUBSTR(nombre,#,1))
> ,"ABCDEFGHIJKLMNOPQRSTUVWXYZÑ ",1) EQ 0)).
> END LOOP.
> LIST.
>
>
> NOMBRE                BADDATA
>
> Juan Manuel               .00
> Alberto                   .00
> Ana                       .00
> Teresa Marilu             .00
> Alberto2                 1.00
> Te1resa Maril11u         3.00
>
>
> Number of cases read:  6    Number of cases listed:  6
>
> ANDRES ALBERTO BURGA LEON wrote
>>
>> Hello to everybody:
>>
>> I have a sting (nombre) variable (A15) wose content are names of
> diferent
>> length. For example:
>>
>> Juan Manuel
>> Alberto
>> Ana
>> Teresa Marilu
>> ...
>>
>> I need to chek if there are any typos like a number in the names (for
>> example Lu1s instead of Luis).
>>
>> So far I can only think of creating 15 new variables, each having one of
>> the string characters positions (COMPUTE name1 = CHAR.SUBTR(name,1,1)
> and
>> so on.
>>
>> Then, for the 15 new variables count the number of A, B  .... Z. and
>> create 27 new variables (I also need to count Ñ). Then sum this new 27
>> variables and check if this sum is equal to CHAR.LENGT(nombre)
>>
>> Is there another easier way to do this task?
>>
>>  Kindly
>>
>> Andrés
>>
>> Mg. Andrés Burga León
>> Coordinador de Análisis e Informática
>> Unidad de Medición de la Calidad Educativa (UMC)
>> Ministerio de Educación del Perú
>> Av.de la Arqeuología s/n (cuadra 2)
>> Lima 41
>> Perú
>> Teléfono 615-5840 - 6155800 anexo 1212
>>
http://www2.minedu.gob.pe/umc/
>>
>
>
> --
> View this message in context:
>
http://spssx-discussion.1045642.n5.nabble.com/Searching-for-characters-different-to-A-B-Z-in-string-tp5107681p5107733.html
>
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>


--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Searching-for-characters-different-to-A-B-Z-in-string-tp5107681p5107808.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Searching for characters different to A, B, ...Z in string

David Marso
Administrator
I believe the original question was to detect characters other than <space>, A..Z or Ñ , not numbers.  Perhaps someone fat-fingered the input and commas !, @, %.. etc may have entered the field.
Indeed, the CHAR functions are likely more appropriate however my ancient version does not support them and I usually like to test my code.
--
Jon K Peck wrote
I suggest reversing the test and counting digits.
LOOP #=1 TO LENGTH(RTRIM(nombre)).
COMPUTE BADDATA
=SUM(BADDATA,(CHAR.INDEX(CHAR.SUBSTR(nombre,#,1),"0123456789",1))).
END LOOP.

That way, any other characters not listed in the original solution such as
accented characters will not trigger a count.

One other thing: with Statistics 16 or later, use the CHAR. functions.
They give the same result whether in Unicode or code page mode.  The old,
deprecated functions work on bytes, not characters, and the number of
bytes per character can be different in Unicode and code page mode.  And
if the data above had been, say, Japanese, Korean, or Chinese, the
original code would have failed in either mode.  (Of course, it would also
have failed because the list of alpha characters would have been seriously
incomplete.)

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:   David Marso <[hidden email]>
To:     [hidden email]
Date:   12/29/2011 07:52 AM
Subject:        Re: [SPSSX-L] Searching for characters different to A, B,  
           ...Z in string
Sent by:        "SPSSX(r) Discussion" <[hidden email]>



Glad that helps.
The Key is the last argument to the INDEX function which chops the
'haystack' into single characters.
--

ANDRES ALBERTO BURGA LEON wrote
>
> Excelent, this definitly solves my problem, thank you very much David
>
>
> Andrés
>
> Mg. Andrés Burga León
> Coordinador de Análisis e Informática
> Unidad de Medición de la Calidad Educativa (UMC)
> Ministerio de Educación del Perú
> Av.de la Arqeuología s/n (cuadra 2)
> Lima 41
> Perú
> Teléfono 615-5840 - 6155800 anexo 1212
> http://www2.minedu.gob.pe/umc/
>
>
>
> David Marso <david.marso@>
> Enviado por: "SPSSX(r) Discussion" <SPSSX-L@.UGA>
> 29/12/2011 09:27 a.m.
> Por favor, responda a
> David Marso <david.marso@>
>
>
> Para
> SPSSX-L@.UGA
> cc
>
> Asunto
> Re: Searching for characters different to A, B, ...Z in string
>
>
>
>
>
>
> "Is there another easier way to do this task?"
>
> DATA LIST / nombre (A20).
> begin data
> Juan Manuel
> Alberto
> Ana
> Teresa Marilu
> Alberto2
> Te1resa Maril11u
> END DATA.
> LOOP #=1 TO LENGTH(RTRIM(nombre)).
> COMPUTE BADDATA =SUM(BADDATA,(INDEX(UPCASE(SUBSTR(nombre,#,1))
> ,"ABCDEFGHIJKLMNOPQRSTUVWXYZÑ ",1) EQ 0)).
> END LOOP.
> LIST.
>
>
> NOMBRE                BADDATA
>
> Juan Manuel               .00
> Alberto                   .00
> Ana                       .00
> Teresa Marilu             .00
> Alberto2                 1.00
> Te1resa Maril11u         3.00
>
>
> Number of cases read:  6    Number of cases listed:  6
>
> ANDRES ALBERTO BURGA LEON wrote
>>
>> Hello to everybody:
>>
>> I have a sting (nombre) variable (A15) wose content are names of
> diferent
>> length. For example:
>>
>> Juan Manuel
>> Alberto
>> Ana
>> Teresa Marilu
>> ...
>>
>> I need to chek if there are any typos like a number in the names (for
>> example Lu1s instead of Luis).
>>
>> So far I can only think of creating 15 new variables, each having one
of
>> the string characters positions (COMPUTE name1 = CHAR.SUBTR(name,1,1)
> and
>> so on.
>>
>> Then, for the 15 new variables count the number of A, B  .... Z. and
>> create 27 new variables (I also need to count Ñ). Then sum this new 27
>> variables and check if this sum is equal to CHAR.LENGT(nombre)
>>
>> Is there another easier way to do this task?
>>
>>  Kindly
>>
>> Andrés
>>
>> Mg. Andrés Burga León
>> Coordinador de Análisis e Informática
>> Unidad de Medición de la Calidad Educativa (UMC)
>> Ministerio de Educación del Perú
>> Av.de la Arqeuología s/n (cuadra 2)
>> Lima 41
>> Perú
>> Teléfono 615-5840 - 6155800 anexo 1212
>> http://www2.minedu.gob.pe/umc/
>>
>
>
> --
> View this message in context:
>
http://spssx-discussion.1045642.n5.nabble.com/Searching-for-characters-different-to-A-B-Z-in-string-tp5107681p5107733.html

>
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>


--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Searching-for-characters-different-to-A-B-Z-in-string-tp5107681p5107808.html

Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Searching for characters different to A, B, ...Z in string

Jon K Peck
Hard to tell.  The original said
I need to chek if there are any typos like a number in the names (for
>>> example Lu1s instead of Luis).


and

Then, for the 15 new variables count the number of A, B  .... Z. and
>>> create 27 new variables (I also need to count Ñ)


But Spanish orthography also uses a number of characters with acute accents (a, e, i, o, u) and u with diaresis.

The best solution might be to use a table of known first and last names and compare against that.  That would give some false positives, but it would be much more accurate for other kinds of spelling errors.

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621




From:        David Marso <[hidden email]>
To:        [hidden email]
Date:        12/29/2011 08:55 AM
Subject:        Re: [SPSSX-L] Searching for characters different to A, B,              ...Z in string
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




I believe the original question was to detect characters other than <space>,
A..Z or Ñ , not numbers.  Perhaps someone fat-fingered the input and commas
!, @, %.. etc may have entered the field.
Indeed, the CHAR functions are likely more appropriate however my ancient
version does not support them and I usually like to test my code.
--

Jon K Peck wrote
>
> I suggest reversing the test and counting digits.
> LOOP #=1 TO LENGTH(RTRIM(nombre)).
> COMPUTE BADDATA
> =SUM(BADDATA,(CHAR.INDEX(CHAR.SUBSTR(nombre,#,1),"0123456789",1))).
> END LOOP.
>
> That way, any other characters not listed in the original solution such as
> accented characters will not trigger a count.
>
> One other thing: with Statistics 16 or later, use the CHAR. functions.
> They give the same result whether in Unicode or code page mode.  The old,
> deprecated functions work on bytes, not characters, and the number of
> bytes per character can be different in Unicode and code page mode.  And
> if the data above had been, say, Japanese, Korean, or Chinese, the
> original code would have failed in either mode.  (Of course, it would also
> have failed because the list of alpha characters would have been seriously
> incomplete.)
>
> Jon Peck (no "h") aka Kim
> Senior Software Engineer, IBM
> peck@.ibm
> new phone: 720-342-5621
>
>
>
>
> From:   David Marso &lt;david.marso@&gt;
> To:     SPSSX-L@.uga
> Date:   12/29/2011 07:52 AM
> Subject:        Re: [SPSSX-L] Searching for characters different to A, B,
>            ...Z in string
> Sent by:        "SPSSX(r) Discussion" &lt;SPSSX-L@.uga&gt;
>
>
>
> Glad that helps.
> The Key is the last argument to the INDEX function which chops the
> 'haystack' into single characters.
> --
>
> ANDRES ALBERTO BURGA LEON wrote
>>
>> Excelent, this definitly solves my problem, thank you very much David
>>
>>
>> Andrés
>>
>> Mg. Andrés Burga León
>> Coordinador de Análisis e Informática
>> Unidad de Medición de la Calidad Educativa (UMC)
>> Ministerio de Educación del Perú
>> Av.de la Arqeuología s/n (cuadra 2)
>> Lima 41
>> Perú
>> Teléfono 615-5840 - 6155800 anexo 1212
>>
http://www2.minedu.gob.pe/umc/
>>
>>
>>
>> David Marso &lt;david.marso@&gt;
>> Enviado por: "SPSSX(r) Discussion" &lt;SPSSX-L@.UGA&gt;
>> 29/12/2011 09:27 a.m.
>> Por favor, responda a
>> David Marso &lt;david.marso@&gt;
>>
>>
>> Para
>> SPSSX-L@.UGA
>> cc
>>
>> Asunto
>> Re: Searching for characters different to A, B, ...Z in string
>>
>>
>>
>>
>>
>>
>> "Is there another easier way to do this task?"
>>
>> DATA LIST / nombre (A20).
>> begin data
>> Juan Manuel
>> Alberto
>> Ana
>> Teresa Marilu
>> Alberto2
>> Te1resa Maril11u
>> END DATA.
>> LOOP #=1 TO LENGTH(RTRIM(nombre)).
>> COMPUTE BADDATA =SUM(BADDATA,(INDEX(UPCASE(SUBSTR(nombre,#,1))
>> ,"ABCDEFGHIJKLMNOPQRSTUVWXYZÑ ",1) EQ 0)).
>> END LOOP.
>> LIST.
>>
>>
>> NOMBRE                BADDATA
>>
>> Juan Manuel               .00
>> Alberto                   .00
>> Ana                       .00
>> Teresa Marilu             .00
>> Alberto2                 1.00
>> Te1resa Maril11u         3.00
>>
>>
>> Number of cases read:  6    Number of cases listed:  6
>>
>> ANDRES ALBERTO BURGA LEON wrote
>>>
>>> Hello to everybody:
>>>
>>> I have a sting (nombre) variable (A15) wose content are names of
>> diferent
>>> length. For example:
>>>
>>> Juan Manuel
>>> Alberto
>>> Ana
>>> Teresa Marilu
>>> ...
>>>
>>> I need to chek if there are any typos like a number in the names (for
>>> example Lu1s instead of Luis).
>>>
>>> So far I can only think of creating 15 new variables, each having one
> of
>>> the string characters positions (COMPUTE name1 = CHAR.SUBTR(name,1,1)
>> and
>>> so on.
>>>
>>> Then, for the 15 new variables count the number of A, B  .... Z. and
>>> create 27 new variables (I also need to count Ñ). Then sum this new 27
>>> variables and check if this sum is equal to CHAR.LENGT(nombre)
>>>
>>> Is there another easier way to do this task?
>>>
>>>  Kindly
>>>
>>> Andrés
>>>
>>> Mg. Andrés Burga León
>>> Coordinador de Análisis e Informática
>>> Unidad de Medición de la Calidad Educativa (UMC)
>>> Ministerio de Educación del Perú
>>> Av.de la Arqeuología s/n (cuadra 2)
>>> Lima 41
>>> Perú
>>> Teléfono 615-5840 - 6155800 anexo 1212
>>>
http://www2.minedu.gob.pe/umc/
>>>
>>
>>
>> --
>> View this message in context:
>>
>
http://spssx-discussion.1045642.n5.nabble.com/Searching-for-characters-different-to-A-B-Z-in-string-tp5107681p5107733.html
>
>>
>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>
>
> --
> View this message in context:
>
http://spssx-discussion.1045642.n5.nabble.com/Searching-for-characters-different-to-A-B-Z-in-string-tp5107681p5107808.html
>
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>


--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Searching-for-characters-different-to-A-B-Z-in-string-tp5107681p5107935.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Searching for characters different to A, B, ...Z in string

ANDRES ALBERTO BURGA LEON

Thanks to everybody for the answers.

Indeed the names are not typed, but captured by a scanner. The software didn't accept the acute accent or the dieresis, but could misinterpret a letter for a number of other characters, like dot, commas, etc.

In principle, the only valid characters are those specified in David's syntax, so it works well for me (adding the CHAR.)
Andrés

Mg. Andrés Burga León
Coordinador de Análisis e Informática
Unidad de Medición de la Calidad Educativa (UMC)
Ministerio de Educación del Perú
Av.de la Arqeuología s/n (cuadra 2)
Lima 41
Perú
Teléfono 615-5840 - 6155800 anexo 1212
http://www2.minedu.gob.pe/umc/


Jon K Peck <[hidden email]>
Enviado por: "SPSSX(r) Discussion" <[hidden email]>

29/12/2011 01:13 p.m.

Por favor, responda a
Jon K Peck <[hidden email]>

Para
[hidden email]
cc
Asunto
Re: Searching for characters different to A, B,              ...Z in string





Hard to tell.  The original said
I need to chek if there are any typos like a number in the names (for
>>> example Lu1s instead of Luis).


and


Then, for the 15 new variables count the number of A, B  .... Z. and
>>> create 27 new variables (I also need to count Ñ)


But Spanish orthography also uses a number of characters with acute accents (a, e, i, o, u) and u with diaresis.


The best solution might be to use a table of known first and last names and compare against that.  That would give some false positives, but it would be much more accurate for other kinds of spelling errors.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
new phone: 720-342-5621





From:        
David Marso <[hidden email]>
To:        
[hidden email]
Date:        
12/29/2011 08:55 AM
Subject:        
Re: [SPSSX-L] Searching for characters different to A, B,              ...Z in string
Sent by:        
"SPSSX(r) Discussion" <[hidden email]>




I believe the original question was to detect characters other than <space>,
A..Z or Ñ , not numbers.  Perhaps someone fat-fingered the input and commas
!, @, %.. etc may have entered the field.
Indeed, the CHAR functions are likely more appropriate however my ancient
version does not support them and I usually like to test my code.
--

Jon K Peck wrote
>
> I suggest reversing the test and counting digits.
> LOOP #=1 TO LENGTH(RTRIM(nombre)).
> COMPUTE BADDATA
> =SUM(BADDATA,(CHAR.INDEX(CHAR.SUBSTR(nombre,#,1),"0123456789",1))).
> END LOOP.
>
> That way, any other characters not listed in the original solution such as
> accented characters will not trigger a count.
>
> One other thing: with Statistics 16 or later, use the CHAR. functions.
> They give the same result whether in Unicode or code page mode.  The old,
> deprecated functions work on bytes, not characters, and the number of
> bytes per character can be different in Unicode and code page mode.  And
> if the data above had been, say, Japanese, Korean, or Chinese, the
> original code would have failed in either mode.  (Of course, it would also
> have failed because the list of alpha characters would have been seriously
> incomplete.)
>
> Jon Peck (no "h") aka Kim
> Senior Software Engineer, IBM
> peck@.ibm
> new phone: 720-342-5621
>
>
>
>
> From:   David Marso &lt;david.marso@&gt;
> To:     SPSSX-L@.uga
> Date:   12/29/2011 07:52 AM
> Subject:        Re: [SPSSX-L] Searching for characters different to A, B,
>            ...Z in string
> Sent by:        "SPSSX(r) Discussion" &lt;SPSSX-L@.uga&gt;
>
>
>
> Glad that helps.
> The Key is the last argument to the INDEX function which chops the
> 'haystack' into single characters.
> --
>
> ANDRES ALBERTO BURGA LEON wrote
>>
>> Excelent, this definitly solves my problem, thank you very much David
>>
>>
>> Andrés
>>
>> Mg. Andrés Burga León
>> Coordinador de Análisis e Informática
>> Unidad de Medición de la Calidad Educativa (UMC)
>> Ministerio de Educación del Perú
>> Av.de la Arqeuología s/n (cuadra 2)
>> Lima 41
>> Perú
>> Teléfono 615-5840 - 6155800 anexo 1212
>>
http://www2.minedu.gob.pe/umc/
>>
>>
>>
>> David Marso &lt;david.marso@&gt;
>> Enviado por: "SPSSX(r) Discussion" &lt;SPSSX-L@.UGA&gt;
>> 29/12/2011 09:27 a.m.
>> Por favor, responda a
>> David Marso &lt;david.marso@&gt;
>>
>>
>> Para
>> SPSSX-L@.UGA
>> cc
>>
>> Asunto
>> Re: Searching for characters different to A, B, ...Z in string
>>
>>
>>
>>
>>
>>
>> "Is there another easier way to do this task?"
>>
>> DATA LIST / nombre (A20).
>> begin data
>> Juan Manuel
>> Alberto
>> Ana
>> Teresa Marilu
>> Alberto2
>> Te1resa Maril11u
>> END DATA.
>> LOOP #=1 TO LENGTH(RTRIM(nombre)).
>> COMPUTE BADDATA =SUM(BADDATA,(INDEX(UPCASE(SUBSTR(nombre,#,1))
>> ,"ABCDEFGHIJKLMNOPQRSTUVWXYZÑ ",1) EQ 0)).
>> END LOOP.
>> LIST.
>>
>>
>> NOMBRE                BADDATA
>>
>> Juan Manuel               .00
>> Alberto                   .00
>> Ana                       .00
>> Teresa Marilu             .00
>> Alberto2                 1.00
>> Te1resa Maril11u         3.00
>>
>>
>> Number of cases read:  6    Number of cases listed:  6
>>
>> ANDRES ALBERTO BURGA LEON wrote
>>>
>>> Hello to everybody:
>>>
>>> I have a sting (nombre) variable (A15) wose content are names of
>> diferent
>>> length. For example:
>>>
>>> Juan Manuel
>>> Alberto
>>> Ana
>>> Teresa Marilu
>>> ...
>>>
>>> I need to chek if there are any typos like a number in the names (for
>>> example Lu1s instead of Luis).
>>>
>>> So far I can only think of creating 15 new variables, each having one
> of
>>> the string characters positions (COMPUTE name1 = CHAR.SUBTR(name,1,1)
>> and
>>> so on.
>>>
>>> Then, for the 15 new variables count the number of A, B  .... Z. and
>>> create 27 new variables (I also need to count Ñ). Then sum this new 27
>>> variables and check if this sum is equal to CHAR.LENGT(nombre)
>>>
>>> Is there another easier way to do this task?
>>>
>>>  Kindly
>>>
>>> Andrés
>>>
>>> Mg. Andrés Burga León
>>> Coordinador de Análisis e Informática
>>> Unidad de Medición de la Calidad Educativa (UMC)
>>> Ministerio de Educación del Perú
>>> Av.de la Arqeuología s/n (cuadra 2)
>>> Lima 41
>>> Perú
>>> Teléfono 615-5840 - 6155800 anexo 1212
>>>
http://www2.minedu.gob.pe/umc/
>>>
>>
>>
>> --
>> View this message in context:
>>
>
http://spssx-discussion.1045642.n5.nabble.com/Searching-for-characters-different-to-A-B-Z-in-string-tp5107681p5107733.html
>
>>
>> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>>
>> =====================
>> To manage your subscription to SPSSX-L, send a message to
>> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
>> command. To leave the list, send the command
>> SIGNOFF SPSSX-L
>> For a list of commands to manage subscriptions, send the command
>> INFO REFCARD
>>
>
>
> --
> View this message in context:
>
http://spssx-discussion.1045642.n5.nabble.com/Searching-for-characters-different-to-A-B-Z-in-string-tp5107681p5107808.html
>
> Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>


--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Searching-for-characters-different-to-A-B-Z-in-string-tp5107681p5107935.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD