Stripping Address Field Information

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Stripping Address Field Information

Doug Harden
Hi, I’m new to the list and I would appreciate some help. Here is my
problem. I have a file with over 2 million records. I want to modify the
address field. I need to strip out the apartment information and leave the
rest. It appears (though I haven’t checked every record – so I am assuming
that is the case for every address) that the apartment information comes
at the end of the address. The address can be of varying length and the
apartment information can be presented in different ways, using different
spellings and spaces. Apartment, if included is always capitalized.
Sometimes a # is used instead. The address is one variable and it is a
string field.

For example:

55 Oakwood Drive APT # 1400
2930 So 150 W #2
2359 Lowery Lane # 10
8910 Echolake Terrace APT 168
4105 10th Ave S APT3
62 N 27th ST APT A316

Is there any syntax that can strip out all the apartment information?,
APT, # and the associated numbers.

Thanks. I can’t think of an easier way to do this.

Doug

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Stripping Address Field Information

Maguin, Eugene
Doug,
One way of doing this is to try to exploit the address structure as much as possible. In the examples you show, # always follows APT, and APT is always in capitals. So you can build a do if structure to exploit this. So, search for APT and take everything to the left of APT; if no APT, search for # and take everything to the left of it; if neither, assume the address is not an apartment. Like this.
String extractedaddress(a100).
Do if (index(address,'APT') gt 0).
+  compute extractedaddress=substr(address,1,(index(address,'APT)-2).
else if (index(address,'#') gt 0).
+  compute extractedaddress=substr(address,1,(index(address,'#)-2).
End if.

This won't be completely adequate unless the address data is perfectly structured exactly as you have shown it.

Gene Maguin



-----Original Message-----
From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Doug Harden
Sent: Thursday, January 05, 2012 5:26 PM
To: [hidden email]
Subject: Stripping Address Field Information

Hi, I’m new to the list and I would appreciate some help. Here is my
problem. I have a file with over 2 million records. I want to modify the
address field. I need to strip out the apartment information and leave the
rest. It appears (though I haven’t checked every record – so I am assuming
that is the case for every address) that the apartment information comes
at the end of the address. The address can be of varying length and the
apartment information can be presented in different ways, using different
spellings and spaces. Apartment, if included is always capitalized.
Sometimes a # is used instead. The address is one variable and it is a
string field.

For example:

55 Oakwood Drive APT # 1400
2930 So 150 W #2
2359 Lowery Lane # 10
8910 Echolake Terrace APT 168
4105 10th Ave S APT3
62 N 27th ST APT A316

Is there any syntax that can strip out all the apartment information?,
APT, # and the associated numbers.

Thanks. I can’t think of an easier way to do this.

Doug

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Reply | Threaded
Open this post in threaded view
|

Re: Stripping Address Field Information

David Marso
Administrator
In reply to this post by Doug Harden
COMPUTE #=RINDEX(address,"#").
IF # > 0 Address=SUBSTR(address,1,#-1).
COMPUTE #=RINDEX(address,"APT").
IF # > 0 Address=SUBSTR(address,1,#-1).
LIST.

Doug Harden wrote
Hi, I’m new to the list and I would appreciate some help. Here is my
problem. I have a file with over 2 million records. I want to modify the
address field. I need to strip out the apartment information and leave the
rest. It appears (though I haven’t checked every record – so I am assuming
that is the case for every address) that the apartment information comes
at the end of the address. The address can be of varying length and the
apartment information can be presented in different ways, using different
spellings and spaces. Apartment, if included is always capitalized.
Sometimes a # is used instead. The address is one variable and it is a
string field.

For example:

55 Oakwood Drive APT # 1400
2930 So 150 W #2
2359 Lowery Lane # 10
8910 Echolake Terrace APT 168
4105 10th Ave S APT3
62 N 27th ST APT A316

Is there any syntax that can strip out all the apartment information?,
APT, # and the associated numbers.

Thanks. I can’t think of an easier way to do this.

Doug

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me.
---
"Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis."
Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?"
Reply | Threaded
Open this post in threaded view
|

Re: Stripping Address Field Information

Doug Harden
Thanks to everyone for their help. This syntax is brillant. It took only a few seconds to clean up a 2 million + data set. So Cool.
 
Doug

From: David Marso <[hidden email]>
To: [hidden email]
Sent: Thursday, January 5, 2012 5:16 PM
Subject: Re: Stripping Address Field Information

COMPUTE #=RINDEX(address,"#").
IF # > 0 Address=SUBSTR(address,1,#-1).
COMPUTE #=RINDEX(address,"APT").
IF # > 0 Address=SUBSTR(address,1,#-1).
LIST.


Doug Harden wrote

>
> Hi, Iâm new to the list and I would appreciate some help. Here is my
> problem. I have a file with over 2 million records. I want to modify the
> address field. I need to strip out the apartment information and leave the
> rest. It appears (though I havenât checked every record â so I am
> assuming
> that is the case for every address) that the apartment information comes
> at the end of the address. The address can be of varying length and the
> apartment information can be presented in different ways, using different
> spellings and spaces. Apartment, if included is always capitalized.
> Sometimes a # is used instead. The address is one variable and it is a
> string field.
>
> For example:
>
> 55 Oakwood Drive APT # 1400
> 2930 So 150 W #2
> 2359 Lowery Lane # 10
> 8910 Echolake Terrace APT 168
> 4105 10th Ave S APT3
> 62 N 27th ST APT A316
>
> Is there any syntax that can strip out all the apartment information?,
> APT, # and the associated numbers.
>
> Thanks. I canât think of an easier way to do this.
>
> Doug
>
> =====================
> To manage your subscription to SPSSX-L, send a message to
> LISTSERV@.UGA (not to SPSSX-L), with no body text except the
> command. To leave the list, send the command
> SIGNOFF SPSSX-L
> For a list of commands to manage subscriptions, send the command
> INFO REFCARD
>


--
View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Stripping-Address-Field-Information-tp5124076p5124161.html
Sent from the SPSSX Discussion mailing list archive at Nabble.com.

=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD