Help with extracting a piece of information from a variable
Hello everyone. I'm looking to extract a specific piece of one variable and create a new variable with just that piece of info. Specifically, I've got full addresses and I want just the zip code. Parsing doesn't work because the addresses are of varying lengths: Address 1244 Aronda St., San Jose CA 95112 ------------------->95112 16 W. St. Unit 12 Rear, San Jose CA 95009 -------------->95009 San Jose, CA 95113 --------------------------------------->95113 8332 B Martin Luther King Avenue, San Jose CA 95112 --->95112 The zip is always five digits in a row. One challenge might be that there could be other parts of the address (for example the house number) that are also 5 digits in a row. Any help is greatly appreciated! Thanks in advance, EH |
Administrator
|
Take a look at the NUMBER, CHAR.RINDEX and RTRIM functions.
--
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
In reply to this post by EH44
It should work to compute the length of the expression in characters (CHAR.LENGTH) and then use CHAR.SUBTR to take the rightmost 5 digits.
On Tue, Feb 26, 2013 at 12:44 PM, EH44 <[hidden email]> wrote: Help with extracting a piece of information from a variable This e-mail transmission and any attachments accompanying it may contain confidential and/or proprietary information and is intended only for the person or entity to whom it was originally addressed. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or other use of this information is strictly prohibited. Any unauthorized interception of this transmission is illegal. If you have received this transmission in error, please notify the sender by reply e-mail, and then destroy all copies of this transmission
|
This seems to work. It could be made more compact but this lets you see the steps. COMPUTE len=CHAR.LENGTH(Address). EXECUTE. STRING Zip (A5).
COMPUTE Zip=CHAR.SUBSTR(Address,len-5). EXECUTE. COMPUTE ZipNum=NUMBER(Zip, F5.0). EXECUTE.
On Tue, Feb 26, 2013 at 1:02 PM, Martha Hewett <[hidden email]> wrote: It should work to compute the length of the expression in characters (CHAR.LENGTH) and then use CHAR.SUBTR to take the rightmost 5 digits. This e-mail transmission and any attachments accompanying it may contain confidential and/or proprietary information and is intended only for the person or entity to whom it was originally addressed. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or other use of this information is strictly prohibited. Any unauthorized interception of this transmission is illegal. If you have received this transmission in error, please notify the sender by reply e-mail, and then destroy all copies of this transmission
|
Administrator
|
Martha, I think that needs to be LEN-4. When I tried it with LEN-5, I lost the last digit.
Also, if you don't need the string version of ZIP for anything, you can do directly to ZipNum by including the NUMBER function on the COMPUTE line that pulls out the desired substring. Here's a modification of your version (ZIP1) and the one David hinted at (ZIP2). data list list / address (A55). begin data "1244 Aronda St., San Jose CA 95112" "16 W. St. Unit 12 Rear, San Jose CA 95009" "San Jose, CA 95113" "8332 B Martin Luther King Avenue, San Jose CA 95112" end data. * Martha's method. COMPUTE #len = CHAR.LENGTH(RTRIM(Address)). COMPUTE ZIP1 = NUMBER(CHAR.SUBSTR(Address,#len-4),F5.0). * David's method. COMPUTE #blank = CHAR.RINDEX(RTRIM(address)," "). COMPUTE ZIP2 = NUMBER(CHAR.SUBSTR(address,#blank+1),F5.0). FORMATS ZIP1 ZIP2 (F5.0). LIST. David probably would not have computed #blank, but I find that easier to follow than this one-liner: COMPUTE ZIP = NUMBER(CHAR.SUBSTR(address,CHAR.RINDEX(RTRIM(address)," ")+1),F5.0). ;-) OUTPUT: address ZIP1 ZIP2 1244 Aronda St., San Jose CA 95112 95112 95112 16 W. St. Unit 12 Rear, San Jose CA 95009 95009 95009 San Jose, CA 95113 95113 95113 8332 B Martin Luther King Avenue, San Jose CA 95112 95112 95112 Number of cases read: 4 Number of cases listed: 4
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Administrator
|
"David probably would not have computed #blank, but I find that easier to follow than this one-liner:
COMPUTE ZIP = NUMBER(CHAR.SUBSTR(address,CHAR.RINDEX(RTRIM(address)," ")+1),F5.0). ;-)" Indeed! However use N5 rather than F5 ;-))) --
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Administrator
|
N5...good point! (Would you believe I was just checking that you're still awake? No? Didn't think so.)
--
Bruce Weaver bweaver@lakeheadu.ca http://sites.google.com/a/lakeheadu.ca/bweaver/ "When all else fails, RTFM." PLEASE NOTE THE FOLLOWING: 1. My Hotmail account is not monitored regularly. To send me an e-mail, please use the address shown above. 2. The SPSSX Discussion forum on Nabble is no longer linked to the SPSSX-L listserv administered by UGA (https://listserv.uga.edu/). |
Thanks much to you all. When I test each of these solutions I realize that a part of the original address variable (which is actually named "current_location") is throwing a wrench in this somewhere.
The full content of the current_location field looks like the below examples. Every address has this ": OOD" string which seems to be mucking things up. I guess these is something about this string that SPSS doesn't like? 104 Biscane Road, San Jose, CA 95122, CD: OOD 10448 Surprise Ave Unit 8, San Jose, CA 95148, CD: OOD 124 Plaza Dr., San Jose, CA 95127, CD: OOD 11228 Treeline Dr #1, San Jose, CA 95122, CD: OOD San Jose, CA 95116, CD: OOD Thoughts? Thanks again, this is incredibly helpful! EH |
Administrator
|
I shall refer you to the REPLACE function .
I guess it helps to provide the actual contents of what you are attempting to parse! -- DATA LIST / current_location (A100). BEGIN DATA 104 Biscane Road, San Jose, CA 95122, CD: OOD 10448 Surprise Ave Unit 8, San Jose, CA 95148, CD: OOD 124 Plaza Dr., San Jose, CA 95127, CD: OOD 11228 Treeline Dr #1, San Jose, CA 95122, CD: OOD San Jose, CA 95116, CD: OOD END DATA. STRING #strcopy (A100). COMPUTE #strcopy=REPLACE(current_location,", CD: OOD",""). COMPUTE ZIP=NUMBER(CHAR.SUBSTR(#strcopy,CHAR.RINDEX(#strcopy," ")+1),N5). FORMATS ZIP (N5). LIST. -- current_location ZIP 104 Biscane Road, San Jose, CA 95122, CD: OOD 95122 10448 Surprise Ave Unit 8, San Jose, CA 95148, CD: OOD 95148 124 Plaza Dr., San Jose, CA 95127, CD: OOD 95127 11228 Treeline Dr #1, San Jose, CA 95122, CD: OOD 95122 San Jose, CA 95116, CD: OOD 95116 Number of cases read: 5 Number of cases listed: 5
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Free forum by Nabble | Edit this page |