Address correction

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Address correction

Sea, Carleton, VBAVACO

Help!

 

I have a VAR type Address however, the numbers are running into the letters.  Is there a syntax that will put a space between the last number and the first letter following the last number?  Your assistance would be greatly appreciated.

 

 

 

928NW 14TH ST  è  928 NW 14th ST

929NW 15TH STREET  è 929 NW 15th STREET

 

Carleton Sea

 

Reply | Threaded
Open this post in threaded view
|

Re: Address correction

Albert-Jan Roskam
>Subject: [SPSSX-L] Address correction
>
>
>
>Help!
>
>I have a VAR type Address however, the numbers are running into the letters.  Is there a syntax that will put a space between the last number and the first letter following the last number?  Your assistance would be greatly appreciated.
>
>
>
>928NW 14TH ST  è  928 NW 14th ST
>929NW 15TH STREET  è 929 NW 15th STREET
>
>Carleton Sea
>

Hi,


Untested, but the regex (regular expression) works:


BEGIN PROGRAM.
import re
def correctAddress(s):
    regex = "(?P<prefix>[ \w]+
)(?P<street_no>[0-9]+)(?P<suffix>[ \w]+)", "\g<1>
\g<2> \g<3>"
    return re.sub(regex, s, flags=re.I | re.U)
END PROGRAM.

SPSSINC TRANS RESULT=address_corrected TYPE=500  /FORMULA = "correctAddress(s)".

regards,
ALbert-Jan

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Address correction

Jon K Peck
Here is a slightly more compact version of Albert-Jan's solution:

spssinc trans result=address_corrected type=500
/formula "re.sub(r'^(\d+)(\S)', r'\1 \2', s)".

It looks for an initial string of digits followed immediately by a non-whitespace character and inserts a blank character.  Any later runs such as 14th and fields that do not start with digits are left unmodified.


One could write this procedurally in traditional syntax, but regular expression pattern matching is well worth learning if you work with strings a lot.

This requires the Python Essentials, and, for older versions of the Essentials, you would need to download and install the SPSSINC TRANS extension command.  All available from the SPSS Community site (www.ibm.com/developerworks/spssdevcentral).

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Albert-Jan Roskam <[hidden email]>
To:        [hidden email],
Date:        05/21/2013 12:31 PM
Subject:        Re: [SPSSX-L] Address correction
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




>Subject: [SPSSX-L] Address correction
>
>
>
>Help!
>
>I have a VAR type Address however, the numbers are running into the letters.  Is there a syntax that will put a space between the last number and the first letter following the last number?  Your assistance would be greatly appreciated.
>
>
>
>928NW 14TH ST  è  928 NW 14th ST
>929NW 15TH STREET  è 929 NW 15th STREET
>
>Carleton Sea
>

Hi,


Untested, but the regex (regular expression) works:


BEGIN PROGRAM.
import re
def correctAddress(s):
   regex = "(?P<prefix>[ \w]+
)(?P<street_no>[0-9]+)(?P<suffix>[ \w]+)", "\g<1>
\g<2> \g<3>"
   return re.sub(regex, s, flags=re.I | re.U)
END PROGRAM.

SPSSINC TRANS RESULT=address_corrected TYPE=500  /FORMULA = "correctAddress(s)".

regards,
ALbert-Jan

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Address correction

Albert-Jan Roskam
>
>
>Here is a slightly more compact version
of Albert-Jan's solution:
>
>spssinc trans result=address_corrected
type=500
>/formula "re.sub(r'^(\d+)(\S)',
r'\1 \2', s)".

Nice. I thought using groups in re.sub could only be done with e.g. \g<1>, but you use \1 (like in R). Will this also match Señor Jalapeño Street?

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Address correction

Maguin, Eugene
In reply to this post by Sea, Carleton, VBAVACO

Here’s a tested but far, far less compact syntax version.

 

data list / addr(a20).

begin data

928NW 14TH ST

929NW 15TH STREET

end data.

execute.

 

string #c(a1).

compute #j=char.length(addr).

loop #i=1 to #j.

compute #c=upcase(substr(addr,#i,1)).

compute #k=char.index(#c,'ABCDEFGHIJKLMNOPQRSTUVWXYZ',1).

if (#k NE 0) addr=concat(substr(addr,1,#i-1),' ',substr(addr,#i)).

end loop if (#k ne 0).

execute.

 

Gene Maguin

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Sea, Carleton, VBAVACO
Sent: Tuesday, May 21, 2013 12:17 PM
To: [hidden email]
Subject: Address correction

 

Help!

 

I have a VAR type Address however, the numbers are running into the letters.  Is there a syntax that will put a space between the last number and the first letter following the last number?  Your assistance would be greatly appreciated.

 

 

 

928NW 14TH ST  è  928 NW 14th ST

929NW 15TH STREET  è 929 NW 15th STREET

 

Carleton Sea

 

Reply | Threaded
Open this post in threaded view
|

Re: Address correction

Jon K Peck
In reply to this post by Albert-Jan Roskam
If Señor Jalapeño Street were the whole field, it would be left alone, but that re would work with
123Señor Jalapeño Street
or even
123ñor Jalapeño Street
Named groups are something of a latecomer to regular expressions.  The \x notation is the original form IIRC.  At least that's the form I learned 100 years ago or so.  You just have to be careful to flag the re expression with a preceding r or double the backslash in Python.

Note also that in this case it was not necessary to pull out the code for the conversion into a separate function in a begin program block, although is always legal.  At some point it becomes too complicated for the SPSSINC TRANS parser to deal with, and you have to write a separate function.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Albert-Jan Roskam <[hidden email]>
To:        [hidden email],
Date:        05/21/2013 01:41 PM
Subject:        Re: [SPSSX-L] Address correction
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




>
>
>Here is a slightly more compact version
of Albert-Jan's solution:
>
>spssinc trans result=address_corrected
type=500
>/formula "re.sub(r'^(\d+)(\S)',
r'\1 \2', s)".

Nice. I thought using groups in re.sub could only be done with e.g. \g<1>, but you use \1 (like in R). Will this also match Señor Jalapeño Street?

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


Reply | Threaded
Open this post in threaded view
|

Re: Address correction

Art Kendall
In reply to this post by Jon K Peck
These regular expression seem reminiscent of TECO and SNOBOL.

Do you have a link to a brief introduction to regular expressions?
Art Kendall
Social Research Consultants
On 5/21/2013 3:07 PM, Jon K Peck [via SPSSX Discussion] wrote:
Here is a slightly more compact version of Albert-Jan's solution:

spssinc trans result=address_corrected type=500
/formula "re.sub(r'^(\d+)(\S)', r'\1 \2', s)".

It looks for an initial string of digits followed immediately by a non-whitespace character and inserts a blank character.  Any later runs such as 14th and fields that do not start with digits are left unmodified.


One could write this procedurally in traditional syntax, but regular expression pattern matching is well worth learning if you work with strings a lot.

This requires the Python Essentials, and, for older versions of the Essentials, you would need to download and install the SPSSINC TRANS extension command.  All available from the SPSS Community site (www.ibm.com/developerworks/spssdevcentral).

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Albert-Jan Roskam <[hidden email]>
To:        [hidden email],
Date:        05/21/2013 12:31 PM
Subject:        Re: [SPSSX-L] Address correction
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




>Subject: [SPSSX-L] Address correction
>
>
>
>Help!
>
>I have a VAR type Address however, the numbers are running into the letters.  Is there a syntax that will put a space between the last number and the first letter following the last number?  Your assistance would be greatly appreciated.
>
>
>
>928NW 14TH ST  è  928 NW 14th ST
>929NW 15TH STREET  è 929 NW 15th STREET
>
>Carleton Sea
>

Hi,


Untested, but the regex (regular expression) works:


BEGIN PROGRAM.
import re
def correctAddress(s):
   regex = "(?P<prefix>[ \w]+
)(?P<street_no>[0-9]+)(?P<suffix>[ \w]+)", "\g<1>
\g<2> \g<3>"
   return re.sub(regex, s, flags=re.I | re.U)
END PROGRAM.

SPSSINC TRANS RESULT=address_corrected TYPE=500  /FORMULA = "correctAddress(s)".

regards,
ALbert-Jan

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD





If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Address-correction-tp5720342p5720344.html
To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion, click here.
NAML

Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Address correction

Jon K Peck
re's arose around the same time as TECO and SNOBAL IIRC.

They vary some from one language to another, mainly in extensions to the basics.  For Python, you can see the help for re in the regular Python help.

There is a more expository explanation on the Python website in the HOWTOs section:
http://docs.python.org/2/howto/regex.html

There is a nice O'Reilly book on the subject.

Googling for "regular expression" report 24,200,000 hits.  I notice that there is a web site
regular-expressions.info that boasts of being "The Premier website about Regular Expressions".  I don't know where they learned how to capitalize, but a little work with a regular expression could fix that.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Art Kendall <[hidden email]>
To:        [hidden email],
Date:        05/21/2013 02:13 PM
Subject:        Re: [SPSSX-L] Address correction
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




These regular expression seem reminiscent of TECO and SNOBOL.

Do you have a link to a brief introduction to regular expressions?

Art Kendall
Social Research Consultants

On 5/21/2013 3:07 PM, Jon K Peck [via SPSSX Discussion] wrote:
Here is a slightly more compact version of Albert-Jan's solution:

spssinc trans result=address_corrected type=500

/formula "re.sub(r'^(\d+)(\S)', r'\1 \2', s)".


It looks for an initial string of digits followed immediately by a non-whitespace character and inserts a blank character.  Any later runs such as 14th and fields that do not start with digits are left unmodified.


One could write this procedurally in traditional syntax, but regular expression pattern matching is well worth learning if you work with strings a lot.


This requires the Python Essentials, and, for older versions of the Essentials, you would need to download and install the SPSSINC TRANS extension command.  All available from the SPSS Community site (
www.ibm.com/developerworks/spssdevcentral).

Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM

[hidden email]
phone: 720-342-5621





From:        
Albert-Jan Roskam <[hidden email]>
To:        
[hidden email],
Date:        
05/21/2013 12:31 PM
Subject:        
Re: [SPSSX-L] Address correction
Sent by:        
"SPSSX(r) Discussion" <[hidden email]>




>Subject: [SPSSX-L] Address correction
>
>
>
>Help!
>
>I have a VAR type Address however, the numbers are running into the letters.  Is there a syntax that will put a space between the last number and the first letter following the last number?  Your assistance would be greatly appreciated.
>
>
>
>928NW 14TH ST  è  928 NW 14th ST
>929NW 15TH STREET  è 929 NW 15th STREET
>
>Carleton Sea
>

Hi,


Untested, but the regex (regular expression) works:


BEGIN PROGRAM.
import re
def correctAddress(s):
   regex = "(?P<prefix>[ \w]+
)(?P<street_no>[0-9]+)(?P<suffix>[ \w]+)", "\g<1>
\g<2> \g<3>"
   return re.sub(regex, s, flags=re.I | re.U)
END PROGRAM.

SPSSINC TRANS RESULT=address_corrected TYPE=500  /FORMULA = "correctAddress(s)".

regards,
ALbert-Jan

=====================
To manage your subscription to SPSSX-L, send a message to

[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD






If you reply to this email, your message will be added to the discussion below:
http://spssx-discussion.1045642.n5.nabble.com/Address-correction-tp5720342p5720344.html
To start a new topic under SPSSX Discussion, email [hidden email]
To unsubscribe from SPSSX Discussion,
click here.
NAML

Art Kendall
Social Research Consultants



View this message in context: Re: Address correction
Sent from the
SPSSX Discussion mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: Address correction

Albert-Jan Roskam
In reply to this post by Art Kendall
>
>These regular expression seem reminiscent of TECO and SNOBOL.
>
>Do you have a link to a brief introduction to regular expressions?


This is also considered to be a standard work:

http://www.amazon.com/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124
It is actually quite readable, but the chapter about a comparative appraisal between seven (or so) different regular expression engines was a bit too much for me.


Btw, if you use regexes in R, you can use Perl=TRUE to get the same regex behavior as Python (and Perl).
But in R you have grep, regexpr, gsub, sub, gsubfn and strapply, just to make things a *little* less clear.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD