More String Address Questions?

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

More String Address Questions?

DKUKEC
Several months back David M and Art K were kind enough to help me clean up string addresses in my dataset.  Regrettably, I have tried to revise the syntax below (prepared by David M) to clean up another set of addresses that are a bit more complicated.

For example, the new address data contains street numbers and names, along with city, state and zip in one large string variable.  I would like to parse them into the following variables: address, city, state, zipcode.  I am not sure if this is relevant, however, in this case all of the "Address" (number, direction, streetname, suffix,) are all in upper case, and the city is not.

Current Data:
ADDRESS
1234 N DIXIE HW  West Palm Beach FL- 33401
1234 N DIXIE HW  West Palm Beach FL- 33401
111 N SR 715  Belle Glade FL- 33430
111 N SR 715  Belle Glade FL- 33430
777 13TH ST  West Palm Beach FL- 33401
777 13TH ST  West Palm Beach FL- 33401
561 DOUGLAS Ave  West Palm Beach FL- 33401-
561 DOUGLAS Ave  West Palm Beach FL- 33401-

Desired Data:

ADDRESS CITY STATE ZIP


==================

David M's previous syntax used to fix the data below... worked like a charm.

==================


See RINDEX, RTRIM, LPAD, SUBSTR functions.
<TESTED> 
DATA LIST /VAR1 (A50).
BEGIN DATA

PALM BEACH GARDENS  FL  33418
BOYNTON BEACH  FL  33436
LANTANA  FL  33462
DELRAY BEACH  FL  33483
PALM BEACH GARDENS  FL  33410
LAKE WORTH  FL  33460
BOCA RATON  FL  33428
BOCA RATON  FL  33428
BOCA RATON  FL  33428
BOCA RATON  FL  33428
LAKE WORTH  FL  33467

END DATA.
STRING # (A50) ZIP (A5) STATE (A2) CITY (A30).
COMPUTE #=LPAD(RTRIM(VAR1),50).
DO REPEAT @=1 TO 2 / VAR=ZIP STATE .
COMPUTE VAR=SUBSTR(#,RINDEX(#," ")+1).
COMPUTE #=LPAD(RTRIM(SUBSTR(#,1,RINDEX(#," ")-1) ),50).
END REPEAT.
COMPUTE CITY=LTRIM(#).
LIST.
----------------------------
Modernized: <UNTESTED>.
DATA LIST /VAR1 (A50).
BEGIN DATA

PALM BEACH GARDENS  FL  33418
BOYNTON BEACH  FL  33436
LANTANA  FL  33462
DELRAY BEACH  FL  33483
PALM BEACH GARDENS  FL  33410
LAKE WORTH  FL  33460
BOCA RATON  FL  33428
BOCA RATON  FL  33428
BOCA RATON  FL  33428
BOCA RATON  FL  33428
LAKE WORTH  FL  33467

END DATA.
STRING # (A50) ZIP (A5) STATE (A2) CITY (A30).
COMPUTE #=CHAR.LPAD(RTRIM(VAR1),50).
DO REPEAT @=1 TO 2 / VAR=ZIP STATE .
COMPUTE VAR=CHAR.SUBSTR(#,CHAR.RINDEX(#," ")+1).
COMPUTE #=CHAR.LPAD(RTRIM(CHAR.SUBSTR(#,1,CHAR.RINDEX(#," ")-1) ),50).
END REPEAT.
COMPUTE CITY=LTRIM(#).
LIST
Reply | Threaded
Open this post in threaded view
|

Re: More String Address Questions?

Maguin, Eugene
I'm really curious to see what DM and Art come up with, both being the pros that they are. That said, I wonder if you exploit the structural element that you describe, that being that the cities begin with a capital letter followed by a lower case letter. Starting at col 1, see whether the first letter is between 'A' and 'Z' and whether the following letter is between 'a' and 'z'. I'd plod along like this

Loop #i=1 to 50-1.
Compute #a1=substr(address,#i,1).
Compute #a2=substr(address,#i+1,1).
Do If ((#a1 ge 'A' and #a1 le 'Z') and (#a2 ge 'a' and #a2 le 'z').
*** you've found the street address-city break. You can exploit a modification of DM's code from this point on. You'll need include a Break command in here to jump out of the loop. Although, the While keyword, I think it is, could be used, I think, to exit the loop with the location, which is all you need.

An alternative that very directly exploits DMs code would be to search backwards from position 50 since his code adjusts the text so that the 5th digit of the zip is position 50. Actually, in practice, the search starts at a position to the left of position 50 since, I guess, the state is always FL and there's always that odd '-' following FL

Gene Maguin


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of DKUKEC
Sent: Monday, February 11, 2013 9:53 AM
To: [hidden email]
Subject: More String Address Questions?

Several months back David M and Art K were kind enough to help me clean up string addresses in my dataset.  Regrettably, I have tried to revise the syntax below (prepared by David M) to clean up another set of addresses that are a bit more complicated.

For example, the new address data contains street numbers and names, along with city, state and zip in one large string variable.  I would like to parse them into the following variables: address, city, state, zipcode.  I am not sure if this is relevant, however, in this case all of the "Address"
(number, direction, streetname, suffix,) are all in upper case, and the city is not.

Current Data:
ADDRESS
1234 N DIXIE HW  West Palm Beach FL- 33401
1234 N DIXIE HW  West Palm Beach FL- 33401
111 N SR 715  Belle Glade FL- 33430
111 N SR 715  Belle Glade FL- 33430
777 13TH ST  West Palm Beach FL- 33401
777 13TH ST  West Palm Beach FL- 33401
561 DOUGLAS Ave  West Palm Beach FL- 33401-
561 DOUGLAS Ave  West Palm Beach FL- 33401-

Desired Data:

ADDRESS CITY STATE ZIP


==================

David M's previous syntax used to fix the data below... worked like a charm.

==================


See RINDEX, RTRIM, LPAD, SUBSTR functions.
<TESTED>
DATA LIST /VAR1 (A50).
BEGIN DATA

PALM BEACH GARDENS  FL  33418
BOYNTON BEACH  FL  33436
LANTANA  FL  33462
DELRAY BEACH  FL  33483
PALM BEACH GARDENS  FL  33410
LAKE WORTH  FL  33460
BOCA RATON  FL  33428
BOCA RATON  FL  33428
BOCA RATON  FL  33428
BOCA RATON  FL  33428
LAKE WORTH  FL  33467

END DATA.
STRING # (A50) ZIP (A5) STATE (A2) CITY (A30).
COMPUTE #=LPAD(RTRIM(VAR1),50).
DO REPEAT @=1 TO 2 / VAR=ZIP STATE .
COMPUTE VAR=SUBSTR(#,RINDEX(#," ")+1).
COMPUTE #=LPAD(RTRIM(SUBSTR(#,1,RINDEX(#," ")-1) ),50).
END REPEAT.
COMPUTE CITY=LTRIM(#).
LIST.
----------------------------
Modernized: <UNTESTED>.
DATA LIST /VAR1 (A50).
BEGIN DATA

PALM BEACH GARDENS  FL  33418
BOYNTON BEACH  FL  33436
LANTANA  FL  33462
DELRAY BEACH  FL  33483
PALM BEACH GARDENS  FL  33410
LAKE WORTH  FL  33460
BOCA RATON  FL  33428
BOCA RATON  FL  33428
BOCA RATON  FL  33428
BOCA RATON  FL  33428
LAKE WORTH  FL  33467

END DATA.
STRING # (A50) ZIP (A5) STATE (A2) CITY (A30).
COMPUTE #=CHAR.LPAD(RTRIM(VAR1),50).
DO REPEAT @=1 TO 2 / VAR=ZIP STATE .
COMPUTE VAR=CHAR.SUBSTR(#,CHAR.RINDEX(#," ")+1).
COMPUTE #=CHAR.LPAD(RTRIM(CHAR.SUBSTR(#,1,CHAR.RINDEX(#," ")-1) ),50).
END REPEAT.
COMPUTE CITY=LTRIM(#).
LIST



--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/More-String-Address-Questions-tp5718025.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: More String Address Questions?

Bruce Weaver
Administrator
Gene, I think you could replace

   Do If ((#a1 ge 'A' and #a1 le 'Z') and (#a2 ge 'a' and #a2 le 'z').

with

   DO IF RANGE(#a1,'A','Z') and RANGE(#a2,'a','z').

HTH.


Maguin, Eugene wrote
I'm really curious to see what DM and Art come up with, both being the pros that they are. That said, I wonder if you exploit the structural element that you describe, that being that the cities begin with a capital letter followed by a lower case letter. Starting at col 1, see whether the first letter is between 'A' and 'Z' and whether the following letter is between 'a' and 'z'. I'd plod along like this

Loop #i=1 to 50-1.
Compute #a1=substr(address,#i,1).
Compute #a2=substr(address,#i+1,1).
Do If ((#a1 ge 'A' and #a1 le 'Z') and (#a2 ge 'a' and #a2 le 'z').
*** you've found the street address-city break. You can exploit a modification of DM's code from this point on. You'll need include a Break command in here to jump out of the loop. Although, the While keyword, I think it is, could be used, I think, to exit the loop with the location, which is all you need.

An alternative that very directly exploits DMs code would be to search backwards from position 50 since his code adjusts the text so that the 5th digit of the zip is position 50. Actually, in practice, the search starts at a position to the left of position 50 since, I guess, the state is always FL and there's always that odd '-' following FL

Gene Maguin


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of DKUKEC
Sent: Monday, February 11, 2013 9:53 AM
To: [hidden email]
Subject: More String Address Questions?

Several months back David M and Art K were kind enough to help me clean up string addresses in my dataset.  Regrettably, I have tried to revise the syntax below (prepared by David M) to clean up another set of addresses that are a bit more complicated.

For example, the new address data contains street numbers and names, along with city, state and zip in one large string variable.  I would like to parse them into the following variables: address, city, state, zipcode.  I am not sure if this is relevant, however, in this case all of the "Address"
(number, direction, streetname, suffix,) are all in upper case, and the city is not.

Current Data:
ADDRESS
1234 N DIXIE HW  West Palm Beach FL- 33401
1234 N DIXIE HW  West Palm Beach FL- 33401
111 N SR 715  Belle Glade FL- 33430
111 N SR 715  Belle Glade FL- 33430
777 13TH ST  West Palm Beach FL- 33401
777 13TH ST  West Palm Beach FL- 33401
561 DOUGLAS Ave  West Palm Beach FL- 33401-
561 DOUGLAS Ave  West Palm Beach FL- 33401-

Desired Data:

ADDRESS CITY STATE ZIP


==================

David M's previous syntax used to fix the data below... worked like a charm.

==================


See RINDEX, RTRIM, LPAD, SUBSTR functions.
<TESTED>
DATA LIST /VAR1 (A50).
BEGIN DATA

PALM BEACH GARDENS  FL  33418
BOYNTON BEACH  FL  33436
LANTANA  FL  33462
DELRAY BEACH  FL  33483
PALM BEACH GARDENS  FL  33410
LAKE WORTH  FL  33460
BOCA RATON  FL  33428
BOCA RATON  FL  33428
BOCA RATON  FL  33428
BOCA RATON  FL  33428
LAKE WORTH  FL  33467

END DATA.
STRING # (A50) ZIP (A5) STATE (A2) CITY (A30).
COMPUTE #=LPAD(RTRIM(VAR1),50).
DO REPEAT @=1 TO 2 / VAR=ZIP STATE .
COMPUTE VAR=SUBSTR(#,RINDEX(#," ")+1).
COMPUTE #=LPAD(RTRIM(SUBSTR(#,1,RINDEX(#," ")-1) ),50).
END REPEAT.
COMPUTE CITY=LTRIM(#).
LIST.
----------------------------
Modernized: <UNTESTED>.
DATA LIST /VAR1 (A50).
BEGIN DATA

PALM BEACH GARDENS  FL  33418
BOYNTON BEACH  FL  33436
LANTANA  FL  33462
DELRAY BEACH  FL  33483
PALM BEACH GARDENS  FL  33410
LAKE WORTH  FL  33460
BOCA RATON  FL  33428
BOCA RATON  FL  33428
BOCA RATON  FL  33428
BOCA RATON  FL  33428
LAKE WORTH  FL  33467

END DATA.
STRING # (A50) ZIP (A5) STATE (A2) CITY (A30).
COMPUTE #=CHAR.LPAD(RTRIM(VAR1),50).
DO REPEAT @=1 TO 2 / VAR=ZIP STATE .
COMPUTE VAR=CHAR.SUBSTR(#,CHAR.RINDEX(#," ")+1).
COMPUTE #=CHAR.LPAD(RTRIM(CHAR.SUBSTR(#,1,CHAR.RINDEX(#," ")-1) ),50).
END REPEAT.
COMPUTE CITY=LTRIM(#).
LIST



--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/More-String-Address-Questions-tp5718025.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: More String Address Questions?

David Marso
Administrator
In reply to this post by Maguin, Eugene
Looking closely it appears that there is a double space terminating the address from the city,  If that is indeed the case then the following should be fairly reliable:
--
DATA LIST / address (A100).
BEGIN DATA
1234 N DIXIE HW  West Palm Beach FL- 33401
1234 N DIXIE HW  West Palm Beach FL- 33401
111 N SR 715  Belle Glade FL- 33430
111 N SR 715  Belle Glade FL- 33430
777 13TH ST  West Palm Beach FL- 33401
777 13TH ST  West Palm Beach FL- 33401
561 DOUGLAS Ave  West Palm Beach FL- 33401-
561 DOUGLAS Ave  West Palm Beach FL- 33401-
END DATA.
LIST.
STRING #address (A100).
COMPUTE #address = address .
STRING Addr (A40) City (A40) STATE (A2) ZIP (A5).

COMPUTE #ZipBeg=RINDEX(RTRIM(#address)," ").
COMPUTE Zip=SUBSTR(RTRIM(#address),#zipbeg+1,RINDEX(#address,"0123456789",1)-#zipbeg).
COMPUTE #address=SUBSTR(#address,1,#ZipBeg-1).
COMPUTE STATE=SUBSTR(#address,LENGTH(RTRIM(#address))-2,2).
COMPUTE #address=SUBSTR(#address,1,LENGTH(RTRIM(#address))-3).
COMPUTE #DblSp=INDEX(#address,"  ").
COMPUTE Addr=SUBSTR(#address,1,#Dblsp).
COMPUTE City=LTRIM(SUBSTR(#address,#Dblsp)).
Maguin, Eugene wrote
I'm really curious to see what DM and Art come up with, both being the pros that they are. That said, I wonder if you exploit the structural element that you describe, that being that the cities begin with a capital letter followed by a lower case letter. Starting at col 1, see whether the first letter is between 'A' and 'Z' and whether the following letter is between 'a' and 'z'. I'd plod along like this

Loop #i=1 to 50-1.
Compute #a1=substr(address,#i,1).
Compute #a2=substr(address,#i+1,1).
Do If ((#a1 ge 'A' and #a1 le 'Z') and (#a2 ge 'a' and #a2 le 'z').
*** you've found the street address-city break. You can exploit a modification of DM's code from this point on. You'll need include a Break command in here to jump out of the loop. Although, the While keyword, I think it is, could be used, I think, to exit the loop with the location, which is all you need.

An alternative that very directly exploits DMs code would be to search backwards from position 50 since his code adjusts the text so that the 5th digit of the zip is position 50. Actually, in practice, the search starts at a position to the left of position 50 since, I guess, the state is always FL and there's always that odd '-' following FL

Gene Maguin


-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of DKUKEC
Sent: Monday, February 11, 2013 9:53 AM
To: [hidden email]
Subject: More String Address Questions?

Several months back David M and Art K were kind enough to help me clean up string addresses in my dataset.  Regrettably, I have tried to revise the syntax below (prepared by David M) to clean up another set of addresses that are a bit more complicated.

For example, the new address data contains street numbers and names, along with city, state and zip in one large string variable.  I would like to parse them into the following variables: address, city, state, zipcode.  I am not sure if this is relevant, however, in this case all of the "Address"
(number, direction, streetname, suffix,) are all in upper case, and the city is not.

Current Data:
ADDRESS
1234 N DIXIE HW  West Palm Beach FL- 33401
1234 N DIXIE HW  West Palm Beach FL- 33401
111 N SR 715  Belle Glade FL- 33430
111 N SR 715  Belle Glade FL- 33430
777 13TH ST  West Palm Beach FL- 33401
777 13TH ST  West Palm Beach FL- 33401
561 DOUGLAS Ave  West Palm Beach FL- 33401-
561 DOUGLAS Ave  West Palm Beach FL- 33401-

Desired Data:

ADDRESS CITY STATE ZIP


==================

David M's previous syntax used to fix the data below... worked like a charm.

==================


See RINDEX, RTRIM, LPAD, SUBSTR functions.
<TESTED>
DATA LIST /VAR1 (A50).
BEGIN DATA

PALM BEACH GARDENS  FL  33418
BOYNTON BEACH  FL  33436
LANTANA  FL  33462
DELRAY BEACH  FL  33483
PALM BEACH GARDENS  FL  33410
LAKE WORTH  FL  33460
BOCA RATON  FL  33428
BOCA RATON  FL  33428
BOCA RATON  FL  33428
BOCA RATON  FL  33428
LAKE WORTH  FL  33467

END DATA.
STRING # (A50) ZIP (A5) STATE (A2) CITY (A30).
COMPUTE #=LPAD(RTRIM(VAR1),50).
DO REPEAT @=1 TO 2 / VAR=ZIP STATE .
COMPUTE VAR=SUBSTR(#,RINDEX(#," ")+1).
COMPUTE #=LPAD(RTRIM(SUBSTR(#,1,RINDEX(#," ")-1) ),50).
END REPEAT.
COMPUTE CITY=LTRIM(#).
LIST.
----------------------------
Modernized: <UNTESTED>.
DATA LIST /VAR1 (A50).
BEGIN DATA

PALM BEACH GARDENS  FL  33418
BOYNTON BEACH  FL  33436
LANTANA  FL  33462
DELRAY BEACH  FL  33483
PALM BEACH GARDENS  FL  33410
LAKE WORTH  FL  33460
BOCA RATON  FL  33428
BOCA RATON  FL  33428
BOCA RATON  FL  33428
BOCA RATON  FL  33428
LAKE WORTH  FL  33467

END DATA.
STRING # (A50) ZIP (A5) STATE (A2) CITY (A30).
COMPUTE #=CHAR.LPAD(RTRIM(VAR1),50).
DO REPEAT @=1 TO 2 / VAR=ZIP STATE .
COMPUTE VAR=CHAR.SUBSTR(#,CHAR.RINDEX(#," ")+1).
COMPUTE #=CHAR.LPAD(RTRIM(CHAR.SUBSTR(#,1,CHAR.RINDEX(#," ")-1) ),50).
END REPEAT.
COMPUTE CITY=LTRIM(#).
LIST



--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/More-String-Address-Questions-tp5718025.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: More String Address Questions?

Rich Ulrich
In reply to this post by DKUKEC
You report that the street name is all in upper case, whereas the
city is not. 

In that case, I see a couple of cities named "Ave West Palm Beach".

Preliminary step might convert some specific strings to all-upper-case.

--
Rich Ulrich


> Date: Mon, 11 Feb 2013 06:53:03 -0800

> From: [hidden email]
> Subject: More String Address Questions?
> To: [hidden email]
>
> Several months back David M and Art K were kind enough to help me clean up
> string addresses in my dataset. Regrettably, I have tried to revise the
> syntax below (prepared by David M) to clean up another set of addresses that
> are a bit more complicated.
>
> For example, the new address data contains street numbers and names, along
> with city, state and zip in one large string variable. I would like to
> parse them into the following variables: address, city, state, zipcode. I
> am not sure if this is relevant, however, in this case all of the "Address"
> (number, direction, streetname, suffix,) are all in upper case, and the city
> is not.
>
> Current Data:
> ADDRESS
> 1234 N DIXIE HW West Palm Beach FL- 33401
> 1234 N DIXIE HW West Palm Beach FL- 33401
> 111 N SR 715 Belle Glade FL- 33430
> 111 N SR 715 Belle Glade FL- 33430
> 777 13TH ST West Palm Beach FL- 33401
> 777 13TH ST West Palm Beach FL- 33401
> 561 DOUGLAS Ave West Palm Beach FL- 33401-
> 561 DOUGLAS Ave West Palm Beach FL- 33401-
>
> Desired Data:
>
> ADDRESS CITY STATE ZIP
>
>
... [snip]
Reply | Threaded
Open this post in threaded view
|

Re: More String Address Questions?

DKUKEC
In reply to this post by David Marso
David,

I don't know how you saw that... double space, however, that did the trick.  I'll let you know if that worked on the larger dataset tomorrow.  Amazing.

Thank you yet again!
Damir
Reply | Threaded
Open this post in threaded view
|

Re: More String Address Questions?

Bruce Weaver
Administrator
Would you believe that his middle name is "Hawkeye"?  (That's Hawkeye with an H.)  ;-)


DKUKEC wrote
David,

I don't know how you saw that... double space, however, that did the trick.  I'll let you know if that worked on the larger dataset tomorrow.  Amazing.

Thank you yet again!
Damir
--
Bruce Weaver
bweaver@lakeheadu.ca
http://sites.google.com/a/lakeheadu.ca/bweaver/

"When all else fails, RTFM."

PLEASE NOTE THE FOLLOWING: 
1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above.
2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/).
Reply | Threaded
Open this post in threaded view
|

Re: More String Address Questions?

Andy W
In reply to this post by DKUKEC
A word of advice, you may want to consider geocoding services that return a parsed address for you. Such as through a local GIS, e.g. ESRI, or through online geocoding services such as Google. I assume you will want to map these addresses in the end anyway! If your data are so clean it isn't impossible to parse it yourself, but geocoding services offer a fairly simple solution that will work "out of the box" if your data are really that well behaved as they are in the example you provide.

No advice you receive here will work for your entire database, you will always find some exceptions (the question then becomes how many exceptions exist and how to deal with them). You are better off just trying to understand how the string parsing works through the examples individuals have provided and be able to adapt the code yourself. It is just as likely the double space is an abberation of the 4 example addresses you included here and not consistent in the entire database.

The way ESRI does there address parsing is they split the string into separate elements (by every space), and then do table look-ups and partial string matching (then calculate a proprietary score for every potential match in the look-up tables, and return matches above some pre-specified score). If you have a table of cities, this logic could be fairly easily applied in this situation. For example

******************************************************.
string ADDRESS CITY (A30).
compute #west_palm_beach = INDEX(full_address,"West Palm Beach").
if #west_palm_beach > 0 CITY = "West Palm Beach".
if #west_palm_beach > 0 ADDRESS = SUBSTR(full_address,1,#west_palm_beach - 1).
******************************************************.

It would just be a matter of taking the table of cities and producing the necessary look up code. Which if you only have a limited number of cities in Florida is pretty simple.

Again though, I would highly suggest you look up how geocoding works through GIS's and online API's. They likely provide a convenient solution, My 2cents.

Andy

Andy W
apwheele@gmail.com
http://andrewpwheeler.wordpress.com/
Reply | Threaded
Open this post in threaded view
|

Re: More String Address Questions?

Art Kendall
In reply to this post by DKUKEC
The use of fixed pitch fonts is very useful for spotting patterns in data and syntax.
Art Kendall
Social Research Consultants
On 2/11/2013 8:27 PM, DKUKEC wrote:
David,

I don't know how you saw that... double space, however, that did the trick.
I'll let you know if that worked on the larger dataset tomorrow.  Amazing.

Thank you yet again!
Damir



--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/More-String-Address-Questions-tp5718025p5718038.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD


===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants
Reply | Threaded
Open this post in threaded view
|

Re: More String Address Questions?

John F Hall

. . .especially if you deliberately leave a blank column in a fixed position: it shows up as a vertical white line.  Any shift errors show up easily.

 

 

John F Hall (Mr)

[retired academic survey researcher]

 

Email:     [hidden email]

Website: www.surveyresearch.weebly.com

 

 

 

 

 

From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Art Kendall
Sent: 15 February 2013 15:30
To: [hidden email]
Subject: Re: More String Address Questions?

 

The use of fixed pitch fonts is very useful for spotting patterns in data and syntax.

Art Kendall
Social Research Consultants

On 2/11/2013 8:27 PM, DKUKEC wrote:

David,
 
I don't know how you saw that... double space, however, that did the trick.
I'll let you know if that worked on the larger dataset tomorrow.  Amazing.
 
Thank you yet again!
Damir
 
 
 
--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/More-String-Address-Questions-tp5718025p5718038.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.
 
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
 

 

===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD