Hi, Iâm new to the list and I would appreciate some help. Here is my
problem. I have a file with over 2 million records. I want to modify the address field. I need to strip out the apartment information and leave the rest. It appears (though I havenât checked every record â so I am assuming that is the case for every address) that the apartment information comes at the end of the address. The address can be of varying length and the apartment information can be presented in different ways, using different spellings and spaces. Apartment, if included is always capitalized. Sometimes a # is used instead. The address is one variable and it is a string field. For example: 55 Oakwood Drive APT # 1400 2930 So 150 W #2 2359 Lowery Lane # 10 8910 Echolake Terrace APT 168 4105 10th Ave S APT3 62 N 27th ST APT A316 Is there any syntax that can strip out all the apartment information?, APT, # and the associated numbers. Thanks. I canât think of an easier way to do this. Doug ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Doug,
One way of doing this is to try to exploit the address structure as much as possible. In the examples you show, # always follows APT, and APT is always in capitals. So you can build a do if structure to exploit this. So, search for APT and take everything to the left of APT; if no APT, search for # and take everything to the left of it; if neither, assume the address is not an apartment. Like this. String extractedaddress(a100). Do if (index(address,'APT') gt 0). + compute extractedaddress=substr(address,1,(index(address,'APT)-2). else if (index(address,'#') gt 0). + compute extractedaddress=substr(address,1,(index(address,'#)-2). End if. This won't be completely adequate unless the address data is perfectly structured exactly as you have shown it. Gene Maguin -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Doug Harden Sent: Thursday, January 05, 2012 5:26 PM To: [hidden email] Subject: Stripping Address Field Information Hi, I’m new to the list and I would appreciate some help. Here is my problem. I have a file with over 2 million records. I want to modify the address field. I need to strip out the apartment information and leave the rest. It appears (though I haven’t checked every record – so I am assuming that is the case for every address) that the apartment information comes at the end of the address. The address can be of varying length and the apartment information can be presented in different ways, using different spellings and spaces. Apartment, if included is always capitalized. Sometimes a # is used instead. The address is one variable and it is a string field. For example: 55 Oakwood Drive APT # 1400 2930 So 150 W #2 2359 Lowery Lane # 10 8910 Echolake Terrace APT 168 4105 10th Ave S APT3 62 N 27th ST APT A316 Is there any syntax that can strip out all the apartment information?, APT, # and the associated numbers. Thanks. I can’t think of an easier way to do this. Doug ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Administrator
|
In reply to this post by Doug Harden
COMPUTE #=RINDEX(address,"#").
IF # > 0 Address=SUBSTR(address,1,#-1). COMPUTE #=RINDEX(address,"APT"). IF # > 0 Address=SUBSTR(address,1,#-1). LIST.
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
Thanks to everyone for their help. This syntax is brillant. It took only a few seconds to clean up a 2 million + data set. So Cool.
Doug
From: David Marso <[hidden email]> To: [hidden email] Sent: Thursday, January 5, 2012 5:16 PM Subject: Re: Stripping Address Field Information COMPUTE #=RINDEX(address,"#"). IF # > 0 Address=SUBSTR(address,1,#-1). COMPUTE #=RINDEX(address,"APT"). IF # > 0 Address=SUBSTR(address,1,#-1). LIST. Doug Harden wrote > > Hi, Iâm new to the list and I would appreciate some help. Here is my > problem. I have a file with over 2 million records. I want to modify the > address field. I need to strip out the apartment information and leave the > rest. It appears (though I havenât checked every record â so I am > assuming > that is the case for every address) that the apartment information comes > at the end of the address. The address can be of varying length and the > apartment information can be presented in different ways, using different > spellings and spaces. Apartment, if included is always capitalized. > Sometimes a # is used instead. The address is one variable and it is a > string field. > > For example: > > 55 Oakwood Drive APT # 1400 > 2930 So 150 W #2 > 2359 Lowery Lane # 10 > 8910 Echolake Terrace APT 168 > 4105 10th Ave S APT3 > 62 N 27th ST APT A316 > > Is there any syntax that can strip out all the apartment information?, > APT, # and the associated numbers. > > Thanks. I canât think of an easier way to do this. > > Doug > > ===================== > To manage your subscription to SPSSX-L, send a message to > LISTSERV@.UGA (not to SPSSX-L), with no body text except the > command. To leave the list, send the command > SIGNOFF SPSSX-L > For a list of commands to manage subscriptions, send the command > INFO REFCARD > -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Stripping-Address-Field-Information-tp5124076p5124161.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |