Python patterns to extract zip codes from right end of address string

classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|

Python patterns to extract zip codes from right end of address string

Art Kendall
This can be done with some messy syntax.

However, I thought there were some python based transforms
that would find out whether there were patterns in strings, but I must not be searching the archives correctly.

How would I use python character pattern matching to find out whether TailEnd contains a complete ZIP+, just a 5 digit zip or no ZIP at all?
Where in the GUI are the menus for this?

[ZIP is USA jargon for postal code]

*make up some data.
data list list/address (a131) WantZip (a5) WantPlus4(a5).
begin data.
"123 oak st #4 someplace, RI 02913-1234" "02913" "blank"
"5678 maple lane townname, md 20111" "20111" "blank"
"9011 cedar place villagename" "blank" "blank"
end data.
Var labels
address "input address"
WantZip "What looking for in ZIP5"
WantPlus4 "What looking for in Plus4".
string Zip5(a5) Plus4 (a4).
compute HowLong = char.length(address).
compute TailEnd = Substr(address,howlong-10,10).
execute.
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Python patterns to extract zip codes from right end of address string

Andy W
Here is a simplified example of using regular expressions in Python plus the SPSSINC TRANS command. The regex could surely be made more complicated to handle other cases, but hopefully this is sufficient to illustrate the workflow:

**********************************.
*make up some data.
data list list/address (a131) WantZip (a5) WantPlus4(a5).
begin data.
"123 oak st #4 someplace, RI 02913-1234" "02913" "blank"
"5678 maple lane townname, md 20111" "20111" "blank"
"9011 cedar place villagename" "blank" "blank"
end data.

*Using regular expressions to find 5 digit zip codes.
BEGIN PROGRAM python.
import re

#define your own function
def SearchZip(MyStr):
  SearchZ5 = re.compile("\d{5}") #5 digits in a row
  SearchZ4 = re.compile("-\d{4}") #a dash and then 4 digits
  Zip5 = re.search(SearchZ5,MyStr)
  Zip4 = re.search(SearchZ4,MyStr)
  #these return none if there is no match, so just replacing with
  #an empty string if there was no match
  if Zip5:
    Z1 = Zip5.group()
  else:
    Z1 = ""
  if Zip4:
    Z2 = Zip4.group()
  else:
    Z2 = ""
  return [Z1,Z2]

#Lets try a couple of examples.
test = ["5678 maple lane townname, md 20111",
        "123 oak st #4 someplace, RI 02913-1234",
        "9011 cedar place villagename"]

for i in test:
  print i
  print SearchZip(i)
END PROGRAM.

*Now can use TRANS to make new variables.
SPSSINC TRANS RESULT=Zip5 Zip4 TYPE=5 5
/FORMULA SearchZip(address).
**********************************.

Code golf wise with your examples it probably doesn't save much over just text processing directly in SPSS, but you can get quite a bit more fancy with regex's than shown here.
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Python patterns to extract zip codes from right end of address string

Art Kendall
Thank you.
{Head slap.} I could not recall "regex" and "SPSSINC TRANS".

In 1971 I learned about pattern matching in TECO and SNOBOL.
In early 90's I wrote WordPerfect macros that would select numeric characters, alphabetic characters, white space, punctuation marks, sentences, paragraphs, etc.

But I do not come across regex often enough to remember how to do it.


Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Python patterns to extract zip codes from right end of address string

Andy W
In reply to this post by Andy W

Jon made a couple of suggestions off-list about best practices, and I will share with the list:

1) within the function there is no need to recompile the regex.
2) regex strings should pretty much always be compiled by using the [r'stuff here'] to prevent python from accidently interpreting escape sequences in the string.

And specific to this situation

3) Zip codes are likely to be at the end of the string

An updated example taking these into account is below:


**********************************************.
BEGIN PROGRAM python.
import re

#move compile outside of function, it is globally available 
#same as the defined function will be
SearchZ = re.compile(r"(\d{5})(-\d{4})?$") #5 digits in a row @ end of string
                                           #and optionally dash plus 4 digits

#define your own function
def SearchZip2(MyStr):
  Zip = re.search(SearchZ,MyStr.rstrip()) #need rstrip to remove trailing 
                                          #white space, or could ammend regex
  #these return None if there is no match, so just replacing with
  #an empty string if there was no match
  if Zip:
    Z1 = Zip.group(1)
    if Zip.group(2):
      Z2 = Zip.group(2)
    else:
      Z2 = ""
  else:
    [Z1,Z2] = ["",""]
  return [Z1,Z2]
END PROGRAM.

SPSSINC TRANS RESULT=Zip5_2 Zip4_2 TYPE=5 5 
/FORMULA SearchZip2(address).
**********************************************.

This makes for a more complicated search pattern, r"(\d{5})(-\d{4})?$", that uses grouping to return different parts of the match and an optional clause (?) for the dash and four digits at the end of the string ($). I strip the whitespace from the string that SPSS passes in using .rstrip, although the regex could be amended as well to search for that.

The inside if functions could probably be cleaned up alittle with some looping (or at least made more general), but it is somewhat complicated because Zip.groups() is not an object that can be iterated over.

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Python patterns to extract zip codes from right end of address string

Art Kendall
Thanks.

I don't know why the address data was entered all in one field.

However, others may have similar situations.

I had cut the last 10 characters into a variable "TailEnd" and had run the function on that.

I'll give this version a try this afternoon.
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Python patterns to extract zip codes from right end of address string

Albert-Jan Roskam-2
In reply to this post by Andy W
Hi Andy,

How do you like this version:

>>> SearchZ = re.compile(r"(\d{5})(?:-)?(\d{4})?\s*$")
>>> SearchZ.search("12345  ").groups()
('12345', None)
>>> SearchZ.search("12345-1234  ").groups()
('12345', '1234')
>>> SearchZ.search("12345-12345  ").groups()
('12345', None)
 
Not sure (no spss here), but maybe TRANS converts None values to "" for you?
>>> zip_pattern = re.compile(r"(\d{5})(-\d{4})?\s*$")
>>> def search_zip(value):
    m = zip_pattern.search(value)
    if m:
        return m.groups()
    return None, None   # not needed?

wrt raw strings: re.escape can occasionally also be handy
>>> re.escape("127.0.0.1")  # the dots won be seen as meta characters
'127\\.0\\.0\\.1'

Regards,

Albert-Jan



~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a

fresh water system, and public health, what have the Romans ever done for us?

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



From: Andy W <[hidden email]>
To: [hidden email]
Sent: Wednesday, August 20, 2014 7:33 PM
Subject: Re: [SPSSX-L] Python patterns to extract zip codes from right end of address string

Jon made a couple of suggestions off-list about best practices, and I will share with the list:
1) within the function there is no need to recompile the regex. 2) regex strings should pretty much always be compiled by using the [r'stuff here'] to prevent python from accidently interpreting escape sequences in the string.
And specific to this situation
3) Zip codes are likely to be at the end of the string
An updated example taking these into account is below:

**********************************************.
BEGIN PROGRAM python.
import re

#move compile outside of function, it is globally available 
#same as the defined function will be
SearchZ = re.compile(r"(\d{5})(-\d{4})?$") #5 digits in a row @ end of string
                                           #and optionally dash plus 4 digits

#define your own function
def SearchZip2(MyStr):
  Zip = re.search(SearchZ,MyStr.rstrip()) #need rstrip to remove trailing 
                                          #white space, or could ammend regex
  #these return None if there is no match, so just replacing with
  #an empty string if there was no match
  if Zip:
    Z1 = Zip.group(1)
    if Zip.group(2):
      Z2 = Zip.group(2)
    else:
      Z2 = ""
  else:
    [Z1,Z2] = ["",""]
  return [Z1,Z2]
END PROGRAM.

SPSSINC TRANS RESULT=Zip5_2 Zip4_2 TYPE=5 5 
/FORMULA SearchZip2(address).
**********************************************.
This makes for a more complicated search pattern, r"(\d{5})(-\d{4})?$", that uses grouping to return different parts of the match and an optional clause (?) for the dash and four digits at the end of the string ($). I strip the whitespace from the string that SPSS passes in using .rstrip, although the regex could be amended as well to search for that.
The inside if functions could probably be cleaned up alittle with some looping (or at least made more general), but it is somewhat complicated because Zip.groups() is not an object that can be iterated over.


View this message in context: Re: Python patterns to extract zip codes from right end of address string

Sent from the SPSSX Discussion mailing list archive at Nabble.com.
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Python patterns to extract zip codes from right end of address string

Jon K Peck
It has always annoyed me that a failed search, since the result is None, then raises an exception when you try to use a method from a search object.  it would be nicer if it returned an object that had methods, and group then returned None.

And, yes, SPSSINC TRANS automatically converts None into "" for a string variable (and sysmis for a numeric).




Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Albert-Jan Roskam <[hidden email]>
To:        [hidden email]
Date:        08/21/2014 03:10 PM
Subject:        Re: [SPSSX-L] Python patterns to extract zip codes from right end of address string
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Hi Andy,

How do you like this version:

>>> SearchZ = re.compile(r"(\d{5})(?:-)?(\d{4})?\s*$")
>>> SearchZ.search("12345  ").groups()
('12345', None)
>>> SearchZ.search("12345-1234  ").groups()
('12345', '1234')
>>> SearchZ.search("12345-12345  ").groups()
('12345', None)

 
Not sure (no spss here), but maybe TRANS converts None values to "" for you?
>>> zip_pattern = re.compile(r"(\d{5})(-\d{4})?\s*$")
>>> def search_zip(value):
   m = zip_pattern.search(value)
   if m:
       return m.groups()
   return None, None   # not needed?

wrt raw strings: re.escape can occasionally also be handy
>>> re.escape("127.0.0.1")  # the dots won be seen as meta characters
'127\\.0\\.0\\.1'


Regards,

Albert-Jan




~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a

fresh water system, and public health, what have the Romans ever done for us?

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~



From: Andy W <[hidden email]>
To:
[hidden email]
Sent:
Wednesday, August 20, 2014 7:33 PM
Subject:
Re: [SPSSX-L] Python patterns to extract zip codes from right end of address string


Jon made a couple of suggestions off-list about best practices, and I will share with the list:
1) within the function there is no need to recompile the regex. 2) regex strings should pretty much always be compiled by using the [r'stuff here'] to prevent python from accidently interpreting escape sequences in the string.
And specific to this situation
3) Zip codes are likely to be at the end of the string
An updated example taking these into account is below:

**********************************************.
BEGIN PROGRAM python.
import re

#move compile outside of function, it is globally available
#same as the defined function will be
SearchZ = re.compile(r"(\d{5})(-\d{4})?$") #5 digits in a row @ end of string
                                          #and optionally dash plus 4 digits

#define your own function
def SearchZip2(MyStr):
 Zip = re.search(SearchZ,MyStr.rstrip()) #need rstrip to remove trailing
                                         #white space, or could ammend regex
 #these return None if there is no match, so just replacing with
 #an empty string if there was no match
 if Zip:
   Z1 = Zip.group(1)
   if Zip.group(2):
     Z2 = Zip.group(2)
   else:
     Z2 = ""
 else:
   [Z1,Z2] = ["",""]
 return [Z1,Z2]
END PROGRAM.

SPSSINC TRANS RESULT=Zip5_2 Zip4_2 TYPE=5 5
/FORMULA SearchZip2(address).
**********************************************.

This makes for a more complicated search pattern, r"(\d{5})(-\d{4})?$", that uses grouping to return different parts of the match and an optional clause (?) for the dash and four digits at the end of the string ($). I strip the whitespace from the string that SPSS passes in using .rstrip, although the regex could be amended as well to search for that.
The inside if functions could probably be cleaned up alittle with some looping (or at least made more general), but it is somewhat complicated because Zip.groups() is not an object that can be iterated over.

Andy W
[hidden email]

http://andrewpwheeler.wordpress.com/


View this message in context: Re: Python patterns to extract zip codes from right end of address string

Sent from the
SPSSX Discussion mailing list archive at Nabble.com.
===================== To manage your subscription to SPSSX-L, send a message to
LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Python patterns to extract zip codes from right end of address string

Andy W
In reply to this post by Albert-Jan Roskam-2
Thanks Jon for the note about None values.

Albert - that works as well. Since the second (?:...) grouping is optional it would actually return a match for 9 digits in a row - which is probably ok for zip codes at the end of the string. You may consider matching if there is a space in between the 5 and 4 digit portions as well.

Alot of possibilities with regex, probably none will work perfectly for a large corpus of hand typed records. Depending on how lazy/how much time I have I don't even bother to parse and clean the addresses, I just submit them to a geocoding engine and live with it. (It tends to be diminishing returns pretty quickly the more I putz with them in my experience.)
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Python patterns to extract zip codes from right end of address string

Albert-Jan Roskam-2
Hi Andy,

A group like (?: pattern) is a non-capturing group. And this is an optional non-capturing group:  (?: pattern)?
So in the second meaning ("optional") the question mark is a quantifier, actually a shorthand for ){,1} or {0,1}
If you remove ?: the hyphen will be returned as a separate group, which I figured was not what you want.


And if you use named groups and/or the VERBOSE flag you can make things more readable. Although in this case it's kind of overkill.
>>> import re
>>> m = re.search(r"""(?P<prefix>\d{5})  # five-digit prefix
                      (?:-)?  # optional hyphen
                      (?P<suffix>\d{4})?  # optional four-digit suffix

                      \s*$  # trailing blanks, if any"""
                  , "12345-1234"
                  , re.VERBOSE)
>>> m.group("prefix")
'12345'


Regards,

Albert-Jan




~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a

fresh water system, and public health, what have the Romans ever done for us?






>________________________________
> From: Andy W <[hidden email]>
>To: [hidden email]
>Sent: Friday, August 22, 2014 2:26 AM
>Subject: Re: [SPSSX-L] Python patterns to extract zip codes from right end of address string
>
>
>Thanks Jon for the note about None values.
>
>Albert - that works as well. Since the second (?:...) grouping is optional
>it would actually return a match for 9 digits in a row - which is probably
>ok for zip codes at the end of the string. You may consider matching if
>there is a space in between the 5 and 4 digit portions as well.
>
>Alot of possibilities with regex, probably none will work perfectly for a
>large corpus of hand typed records. Depending on how lazy/how much time I
>have I don't even bother to parse and clean the addresses, I just submit
>them to a geocoding engine and live with it. (It tends to be diminishing
>returns pretty quickly the more I putz with them in my experience.)
>
>
>
>-----
>Andy W
>[hidden email]
>http://andrewpwheeler.wordpress.com/
>--
>View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Python-patterns-to-extract-zip-codes-from-right-end-of-address-string-tp5727016p5727057.html
>
>Sent from the SPSSX Discussion mailing list archive at Nabble.com.
>
>=====================
>To manage your subscription to SPSSX-L, send a message to
>[hidden email] (not to SPSSX-L), with no body text except the
>command. To leave the list, send the command
>SIGNOFF SPSSX-L
>For a list of commands to manage subscriptions, send the command
>INFO REFCARD
>
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Python patterns to extract zip codes from right end of address string

Art Kendall
I  am not sure how include Albert-Jan's suggestion.
I made Andy's first suggestion work by putting the last 10 characters into a variable TailEnd and working on that. [My addresses are all for cases in state courts so they do not have "USA" in them]

If someone is going to make a more widely applicable ZIP finder syntax available perhaps the syntax would include comments like this
* ----------
A USA Post Office address often ends with these kinds of ZIP codes (post codes).
#####
##### USA
#####USA
#####-####
#####-#### USA
#####-####USA
#########
######### USA
#########USA
*------- .



==========================

Parsing US postal addresses is still a developing field.  There have been some recent standards suggested by some colleagues who are geographers specializing in addressing systems.
However, the standards have not been implemented.

In checking the reliability of data entry, I use
www.findlatitudeandlongitude.com/batch-geocode/
to see if the Returned Address and the Latitude Longitude pair match across come up as duplicate cases.

As a general rule if you have control over the data entry. I STRONGLY suggest that you look up the USPS addressing rules and use those as separate fields when entering addresses.
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Python patterns to extract zip codes from right end of address string

Andy W
Hi Art,

FYI I posted a full example at my blog of creating the function and using SPSSINC TRANS, http://andrewpwheeler.wordpress.com/2014/08/22/using-regular-expressions-in-spss/, that incorporates Albert's suggestion.

My first suggestion would be to attempt to learn regex's a bit. I realize they are somewhat complicated, but ultimately you don't want to have to come back to the list asking for advice for every little difference. If you have data of any magnitude they will inevitably occur. Your update with "USA" on the end just adds alittle complexity to the already complex search! Basically I just added another group to search for "Whitespace + USA" and made the match optional - (?:\s?USA)? . Everyone's situation with addresses is likely to be slightly different, so I suspect it is impossible to make a perfect zip code extractor. For this example you may want to search for both upper or lower case USA, search with periods U.S.A., or get rid of these punctuation marks before you submit the string to Python.

Andy

********************************************.
BEGIN PROGRAM Python.
import re

#SearchZ = re.compile(r"(\d{5})(?:[\s-])?(\d{4})?\s*$") #previous
SearchZ = re.compile(r"(\d{5})(?:[\s-])?(\d{4})?(?:\s?USA)?\s*$")

def SearchZip(MyStr):
  Zip = re.search(SearchZ,MyStr)
  #these return None if there is no match, so just replacing with
  #a tuple of two None's if no match
  if Zip:
    return Zip.groups()
  else:
    return (None,None)

#Lets try a couple of examples.
test = ["5678 maple lane townname, md 20111 USA",
        "5678 maple lane townname, md 20111 \t",
        "123 oak st #4 someplace, RI 02913-1234   ",
        "9011 cedar place villagename",
        "123 oak st #4 someplace, RI 029131234",
        "123 oak st #4 someplace, RI 02913 1234 USA",
        "123 oak st #4 someplace, RI 02913 1234USA"]

for i in test:
  print [i]
  print SearchZip(i)
END PROGRAM.
********************************************.
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Python patterns to extract zip codes from right end of address string

Jon K Peck
For any complex re that will live for a while, I would recommend using verbose mode, which allows you to comment the elements.  While re's are very useful, they can be inscrutable.  Here is an example from the Python help.

a = re.compile(r"""\d +  # the integral part
                  \.    # the decimal point
                  \d *  # some fractional digits""", re.X)



Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Andy W <[hidden email]>
To:        [hidden email]
Date:        08/26/2014 01:23 PM
Subject:        Re: [SPSSX-L] Python patterns to extract zip codes from right end of address string
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Hi Art,

FYI I posted a full example at my blog of creating the function and using
SPSSINC TRANS,
http://andrewpwheeler.wordpress.com/2014/08/22/using-regular-expressions-in-spss/,
that incorporates Albert's suggestion.

My first suggestion would be to attempt to learn regex's a bit. I realize
they are somewhat complicated, but ultimately you don't want to have to come
back to the list asking for advice for every little difference. If you have
data of any magnitude they will inevitably occur. Your update with "USA" on
the end just adds alittle complexity to the already complex search!
Basically I just added another group to search for "Whitespace + USA" and
made the match optional - (?:\s?USA)? . Everyone's situation with addresses
is likely to be slightly different, so I suspect it is impossible to make a
perfect zip code extractor. For this example you may want to search for both
upper or lower case USA, search with periods U.S.A., or get rid of these
punctuation marks before you submit the string to Python.

Andy

********************************************.
BEGIN PROGRAM Python.
import re

#SearchZ = re.compile(r"(\d{5})(?:[\s-])?(\d{4})?\s*$") #previous
SearchZ = re.compile(r"(\d{5})(?:[\s-])?(\d{4})?(?:\s?USA)?\s*$")

def SearchZip(MyStr):
 Zip = re.search(SearchZ,MyStr)
 #these return None if there is no match, so just replacing with
 #a tuple of two None's if no match
 if Zip:
   return Zip.groups()
 else:
   return (None,None)

#Lets try a couple of examples.
test = ["5678 maple lane townname, md 20111 USA",
       "5678 maple lane townname, md 20111 \t",
       "123 oak st #4 someplace, RI 02913-1234   ",
       "9011 cedar place villagename",
       "123 oak st #4 someplace, RI 029131234",
       "123 oak st #4 someplace, RI 02913 1234 USA",
       "123 oak st #4 someplace, RI 02913 1234USA"]

for i in test:
 print [i]
 print SearchZip(i)
END PROGRAM.
********************************************.



-----
Andy W
[hidden email]
http://andrewpwheeler.wordpress.com/
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Python-patterns-to-extract-zip-codes-from-right-end-of-address-string-tp5727016p5727084.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Python patterns to extract zip codes from right end of address string

Andy W
Oh who doesn't love examining a one-liner for 5 minutes and still not know what exactly is going on?

SearchZ = re.compile(r"(\d{5})(?:\s*-?)?(\d{4})?(?:\s*[Uu]\s*\.?\s*[Ss]\s*\.?\s*[Aa]?\s*\.?)?\s*$")

;)

It could go on ad-naseum. With periods a regular transcription error in my experience is to input a comma instead. You might also allow white space between the digits for a rogue click of the space bar as well.

I will go and petition some streets to be renamed "12345-6789 US" and then we will see what wizardry is needed to parse the addresses.
Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: Python patterns to extract zip codes from right end of address string

Art Kendall
In reply to this post by Andy W
Yes learning some python and regex is certain on my 'to do' list. It has been there almost ever since I retired in 2001.
 
I need
to get more of this current pro bono human rights work  finished,
to get some paid work to support, my pro bono work,
to figure out how to make sure SPSS continues my access to later versions,
to get my current house  fixed up and sold.
to find a disadvantaged academic program in social science and human rights to donate my personal library to
to move to our new  retirement condo in Florida,
to review some journal submissions on terrorism and torture issues,
to learn python and regex,
etc. etc.
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: Python patterns to extract zip codes from right end of address string

Jon K Peck
Well, learning Python is done.  Art attended the first Python class ever from SPSS (which I taught) that admitted non-SPSS employees.  But it didn't go into regular expressions or retirement condos.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Art Kendall <[hidden email]>
To:        [hidden email]
Date:        08/26/2014 03:05 PM
Subject:        Re: [SPSSX-L] Python patterns to extract zip codes from right end of address string
Sent by:        "SPSSX(r) Discussion" <[hidden email]>




Yes learning some python and regex is certain on my 'to do' list. It has been
there almost ever since I retired in 2001.

I need
to get more of this current pro bono human rights work  finished,
to get some paid work to support, my pro bono work,
to figure out how to make sure SPSS continues my access to later versions,
to get my current house  fixed up and sold.
to find a disadvantaged academic program in social science and human rights
to donate my personal library to
to move to our new  retirement condo in Florida,
to review some journal submissions on terrorism and torture issues,
to learn python and regex,
etc. etc.




-----
Art Kendall
Social Research Consultants
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Python-patterns-to-extract-zip-codes-from-right-end-of-address-string-tp5727016p5727088.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Python patterns to extract zip codes from right end of address string

Bruce Weaver
Administrator
In reply to this post by Art Kendall
I think you left one off your list, Art...

- Run for condo board president of Del Boca Vista phase III.  

For anyone who has NO idea what I'm referring to, take a look at this:  http://seinfeld.wikia.com/wiki/Del_Boca_Vista  ;-)



Art Kendall wrote
Yes learning some python and regex is certain on my 'to do' list. It has been there almost ever since I retired in 2001.
 
I need
to get more of this current pro bono human rights work  finished,
to get some paid work to support, my pro bono work,
to figure out how to make sure SPSS continues my access to later versions,
to get my current house  fixed up and sold.
to find a disadvantaged academic program in social science and human rights to donate my personal library to
to move to our new  retirement condo in Florida,
to review some journal submissions on terrorism and torture issues,
to learn python and regex,
etc. etc.
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: Python patterns to extract zip codes from right end of address string

King Douglas
In reply to this post by Jon K Peck

So Art’s just pulling our collective leg about learning Python?

 

King

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jon K Peck
Sent: Tuesday, August 26, 2014 5:05 PM
To: [hidden email]
Subject: Re: Python patterns to extract zip codes from right end of address string

 

Well, learning Python is done.  Art attended the first Python class ever from SPSS (which I taught) that admitted non-SPSS employees.  But it didn't go into regular expressions or retirement condos.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        Art Kendall <[hidden email]>
To:        [hidden email]
Date:        08/26/2014 03:05 PM
Subject:        Re: [SPSSX-L] Python patterns to extract zip codes from right end of address string
Sent by:        "SPSSX(r) Discussion" <[hidden email]>





Yes learning some python and regex is certain on my 'to do' list. It has been
there almost ever since I retired in 2001.

I need
to get more of this current pro bono human rights work  finished,
to get some paid work to support, my pro bono work,
to figure out how to make sure SPSS continues my access to later versions,
to get my current house  fixed up and sold.
to find a disadvantaged academic program in social science and human rights
to donate my personal library to
to move to our new  retirement condo in Florida,
to review some journal submissions on terrorism and torture issues,
to learn python and regex,
etc. etc.




-----
Art Kendall
Social Research Consultants
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Python-patterns-to-extract-zip-codes-from-right-end-of-address-string-tp5727016p5727088.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Python patterns to extract zip codes from right end of address string

Jon K Peck
King attended one of the first such courses, which I taught in Munich, around that same time.  So, King, let's see some code.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        "King Douglas" <[hidden email]>
To:        Jon K Peck/Chicago/IBM@IBMUS, <[hidden email]>
Date:        08/26/2014 04:22 PM
Subject:        RE: Python patterns to extract zip codes from right end of address string




So Art’s just pulling our collective leg about learning Python?
 
King
 
 
From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Jon K Peck
Sent:
Tuesday, August 26, 2014 5:05 PM
To:
[hidden email]
Subject:
Re: Python patterns to extract zip codes from right end of address string

 
Well, learning Python is done.  Art attended the first Python class ever from SPSS (which I taught) that admitted non-SPSS employees.  But it didn't go into regular expressions or retirement condos.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621





From:        
Art Kendall <[hidden email]>
To:        
[hidden email]
Date:        
08/26/2014 03:05 PM
Subject:        
Re: [SPSSX-L] Python patterns to extract zip codes from right end of address string
Sent by:        
"SPSSX(r) Discussion" <[hidden email]>





Yes learning some python and regex is certain on my 'to do' list. It has been
there almost ever since I retired in 2001.

I need
to get more of this current pro bono human rights work  finished,
to get some paid work to support, my pro bono work,
to figure out how to make sure SPSS continues my access to later versions,
to get my current house  fixed up and sold.
to find a disadvantaged academic program in social science and human rights
to donate my personal library to
to move to our new  retirement condo in Florida,
to review some journal submissions on terrorism and torture issues,
to learn python and regex,
etc. etc.




-----
Art Kendall
Social Research Consultants
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Python-patterns-to-extract-zip-codes-from-right-end-of-address-string-tp5727016p5727088.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to
LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Python patterns to extract zip codes from right end of address string

King Douglas

Jon’s Python course is/was great, but I fear that I wasn’t the best student.  Python is unbeatable when its needed, but I don’t compose code in Python nearly as well or as comfortably as I do in SPSS.  But the Python support community is very helpful.

 

I’ll be happy to post some code, but would it be useful?  That depends.

 

Cheers,

 

King Douglas

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jon K Peck
Sent: Tuesday, August 26, 2014 5:26 PM
To: [hidden email]
Subject: Re: Python patterns to extract zip codes from right end of address string

 

King attended one of the first such courses, which I taught in Munich, around that same time.  So, King, let's see some code.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621




From:        "King Douglas" <[hidden email]>
To:        Jon K Peck/Chicago/IBM@IBMUS, <[hidden email]>
Date:        08/26/2014 04:22 PM
Subject:        RE: Python patterns to extract zip codes from right end of address string





So Art’s just pulling our collective leg about learning Python?
 
King
 
 
From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Jon K Peck
Sent:
Tuesday, August 26, 2014 5:05 PM
To:
[hidden email]
Subject:
Re: Python patterns to extract zip codes from right end of address string

 
Well, learning Python is done.  Art attended the first Python class ever from SPSS (which I taught) that admitted non-SPSS employees.  But it didn't go into regular expressions or retirement condos.


Jon Peck (no "h") aka Kim
Senior Software Engineer, IBM
[hidden email]
phone: 720-342-5621





From:        
Art Kendall <[hidden email]>
To:        
[hidden email]
Date:        
08/26/2014 03:05 PM
Subject:        
Re: [SPSSX-L] Python patterns to extract zip codes from right end of address string
Sent by:        
"SPSSX(r) Discussion" <[hidden email]>






Yes learning some python and regex is certain on my 'to do' list. It has been
there almost ever since I retired in 2001.

I need
to get more of this current pro bono human rights work  finished,
to get some paid work to support, my pro bono work,
to figure out how to make sure SPSS continues my access to later versions,
to get my current house  fixed up and sold.
to find a disadvantaged academic program in social science and human rights
to donate my personal library to
to move to our new  retirement condo in Florida,
to review some journal submissions on terrorism and torture issues,
to learn python and regex,
etc. etc.




-----
Art Kendall
Social Research Consultants
--
View this message in context:
http://spssx-discussion.1045642.n5.nabble.com/Python-patterns-to-extract-zip-codes-from-right-end-of-address-string-tp5727016p5727088.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Python patterns to extract zip codes from right end of address string

Art Kendall
I did take the python course and I can often follow post Python syntax.

I have not written code in syntax.  On the few occasions I needed to match patterns, I reverted to what I did in 1990 -- use WordPerfect.

I certainly do not know python to the the extent I know spss, program evaluation, survey methods, experimental methods, factor analysis, cluster analysis, federal stat policy, etc on the methods side.
Art Kendall
Social Research Consultants