Help! I have a VAR type Address however, the numbers are running into the letters. Is there a syntax that will put a space between the last number and the first letter following the last number? Your assistance would be greatly appreciated. 928NW 14TH ST è 928 NW 14th ST 929NW 15TH STREET è 929 NW 15th STREET Carleton Sea |
>Subject: [SPSSX-L] Address correction
> > > >Help! > >I have a VAR type Address however, the numbers are running into the letters. Is there a syntax that will put a space between the last number and the first letter following the last number? Your assistance would be greatly appreciated. > > > >928NW 14TH ST è 928 NW 14th ST >929NW 15TH STREET è 929 NW 15th STREET > >Carleton Sea > Hi, Untested, but the regex (regular expression) works: BEGIN PROGRAM. import re def correctAddress(s): regex = "(?P<prefix>[ \w]+ )(?P<street_no>[0-9]+)(?P<suffix>[ \w]+)", "\g<1> \g<2> \g<3>" return re.sub(regex, s, flags=re.I | re.U) END PROGRAM. SPSSINC TRANS RESULT=address_corrected TYPE=500 /FORMULA = "correctAddress(s)". regards, ALbert-Jan ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Here is a slightly more compact version
of Albert-Jan's solution:
spssinc trans result=address_corrected type=500 /formula "re.sub(r'^(\d+)(\S)', r'\1 \2', s)". It looks for an initial string of digits followed immediately by a non-whitespace character and inserts a blank character. Any later runs such as 14th and fields that do not start with digits are left unmodified. One could write this procedurally in traditional syntax, but regular expression pattern matching is well worth learning if you work with strings a lot. This requires the Python Essentials, and, for older versions of the Essentials, you would need to download and install the SPSSINC TRANS extension command. All available from the SPSS Community site (www.ibm.com/developerworks/spssdevcentral). Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Albert-Jan Roskam <[hidden email]> To: [hidden email], Date: 05/21/2013 12:31 PM Subject: Re: [SPSSX-L] Address correction Sent by: "SPSSX(r) Discussion" <[hidden email]> >Subject: [SPSSX-L] Address correction > > > >Help! > >I have a VAR type Address however, the numbers are running into the letters. Is there a syntax that will put a space between the last number and the first letter following the last number? Your assistance would be greatly appreciated. > > > >928NW 14TH ST è 928 NW 14th ST >929NW 15TH STREET è 929 NW 15th STREET > >Carleton Sea > Hi, Untested, but the regex (regular expression) works: BEGIN PROGRAM. import re def correctAddress(s): regex = "(?P<prefix>[ \w]+ )(?P<street_no>[0-9]+)(?P<suffix>[ \w]+)", "\g<1> \g<2> \g<3>" return re.sub(regex, s, flags=re.I | re.U) END PROGRAM. SPSSINC TRANS RESULT=address_corrected TYPE=500 /FORMULA = "correctAddress(s)". regards, ALbert-Jan ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
>
> >Here is a slightly more compact version of Albert-Jan's solution: > >spssinc trans result=address_corrected type=500 >/formula "re.sub(r'^(\d+)(\S)', r'\1 \2', s)". Nice. I thought using groups in re.sub could only be done with e.g. \g<1>, but you use \1 (like in R). Will this also match Señor Jalapeño Street? ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Sea, Carleton, VBAVACO
Here’s a tested but far, far less compact syntax version. data list / addr(a20). begin data 928NW 14TH ST 929NW 15TH STREET end data. execute. string #c(a1). compute #j=char.length(addr). loop #i=1 to #j. compute #c=upcase(substr(addr,#i,1)). compute #k=char.index(#c,'ABCDEFGHIJKLMNOPQRSTUVWXYZ',1). if (#k NE 0) addr=concat(substr(addr,1,#i-1),' ',substr(addr,#i)). end loop if (#k ne 0). execute. Gene Maguin From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Sea, Carleton, VBAVACO Help! I have a VAR type Address however, the numbers are running into the letters. Is there a syntax that will put a space between the last number and the first letter following the last number? Your assistance would be greatly appreciated. 928NW 14TH ST è 928 NW 14th ST 929NW 15TH STREET è 929 NW 15th STREET Carleton Sea |
In reply to this post by Albert-Jan Roskam
If Señor Jalapeño
Street were the whole field,
it would be left alone, but that re would work with
123Señor Jalapeño Street or even 123ñor Jalapeño Street Named groups are something of a latecomer to regular expressions. The \x notation is the original form IIRC. At least that's the form I learned 100 years ago or so. You just have to be careful to flag the re expression with a preceding r or double the backslash in Python. Note also that in this case it was not necessary to pull out the code for the conversion into a separate function in a begin program block, although is always legal. At some point it becomes too complicated for the SPSSINC TRANS parser to deal with, and you have to write a separate function. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Albert-Jan Roskam <[hidden email]> To: [hidden email], Date: 05/21/2013 01:41 PM Subject: Re: [SPSSX-L] Address correction Sent by: "SPSSX(r) Discussion" <[hidden email]> > > >Here is a slightly more compact version of Albert-Jan's solution: > >spssinc trans result=address_corrected type=500 >/formula "re.sub(r'^(\d+)(\S)', r'\1 \2', s)". Nice. I thought using groups in re.sub could only be done with e.g. \g<1>, but you use \1 (like in R). Will this also match Señor Jalapeño Street? ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Jon K Peck
These regular expression seem reminiscent
of TECO and SNOBOL.
Do you have a link to a brief introduction to regular expressions? Art Kendall Social Research ConsultantsOn 5/21/2013 3:07 PM, Jon K Peck [via SPSSX Discussion] wrote: Here is a slightly more compact version of Albert-Jan's solution:
Art Kendall
Social Research Consultants |
re's arose around the same time as TECO
and SNOBAL IIRC.
They vary some from one language to another, mainly in extensions to the basics. For Python, you can see the help for re in the regular Python help. There is a more expository explanation on the Python website in the HOWTOs section: http://docs.python.org/2/howto/regex.html There is a nice O'Reilly book on the subject. Googling for "regular expression" report 24,200,000 hits. I notice that there is a web site regular-expressions.info that boasts of being "The Premier website about Regular Expressions". I don't know where they learned how to capitalize, but a little work with a regular expression could fix that. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Art Kendall <[hidden email]> To: [hidden email], Date: 05/21/2013 02:13 PM Subject: Re: [SPSSX-L] Address correction Sent by: "SPSSX(r) Discussion" <[hidden email]> These regular expression seem reminiscent of TECO and SNOBOL. Do you have a link to a brief introduction to regular expressions? Art Kendall Social Research Consultants On 5/21/2013 3:07 PM, Jon K Peck [via SPSSX Discussion] wrote: Here is a slightly more compact version of Albert-Jan's solution: spssinc trans result=address_corrected type=500 /formula "re.sub(r'^(\d+)(\S)', r'\1 \2', s)". It looks for an initial string of digits followed immediately by a non-whitespace character and inserts a blank character. Any later runs such as 14th and fields that do not start with digits are left unmodified. One could write this procedurally in traditional syntax, but regular expression pattern matching is well worth learning if you work with strings a lot. This requires the Python Essentials, and, for older versions of the Essentials, you would need to download and install the SPSSINC TRANS extension command. All available from the SPSS Community site (www.ibm.com/developerworks/spssdevcentral). Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Albert-Jan Roskam <[hidden email]> To: [hidden email], Date: 05/21/2013 12:31 PM Subject: Re: [SPSSX-L] Address correction Sent by: "SPSSX(r) Discussion" <[hidden email]> >Subject: [SPSSX-L] Address correction > > > >Help! > >I have a VAR type Address however, the numbers are running into the letters. Is there a syntax that will put a space between the last number and the first letter following the last number? Your assistance would be greatly appreciated. > > > >928NW 14TH ST è 928 NW 14th ST >929NW 15TH STREET è 929 NW 15th STREET > >Carleton Sea > Hi, Untested, but the regex (regular expression) works: BEGIN PROGRAM. import re def correctAddress(s): regex = "(?P<prefix>[ \w]+ )(?P<street_no>[0-9]+)(?P<suffix>[ \w]+)", "\g<1> \g<2> \g<3>" return re.sub(regex, s, flags=re.I | re.U) END PROGRAM. SPSSINC TRANS RESULT=address_corrected TYPE=500 /FORMULA = "correctAddress(s)". regards, ALbert-Jan ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD If you reply to this email, your message will be added to the discussion below: http://spssx-discussion.1045642.n5.nabble.com/Address-correction-tp5720342p5720344.html To start a new topic under SPSSX Discussion, email [hidden email] To unsubscribe from SPSSX Discussion, click here. NAML Art Kendall View this message in context: Re: Address correction Sent from the SPSSX Discussion mailing list archive at Nabble.com. |
In reply to this post by Art Kendall
>
>These regular expression seem reminiscent of TECO and SNOBOL. > >Do you have a link to a brief introduction to regular expressions? This is also considered to be a standard work: http://www.amazon.com/Mastering-Regular-Expressions-Jeffrey-Friedl/dp/0596528124 It is actually quite readable, but the chapter about a comparative appraisal between seven (or so) different regular expression engines was a bit too much for me. Btw, if you use regexes in R, you can use Perl=TRUE to get the same regex behavior as Python (and Perl). But in R you have grep, regexpr, gsub, sub, gsubfn and strapply, just to make things a *little* less clear. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Free forum by Nabble | Edit this page |