I have a data set with British postcodes which normally consist of a postal district (one or two letters followed by 1 or 2 digits) followed by 3 characters (one digit two letters) signifying address group. Full postcodes consist of 5, 6 or 7 characters, but some of the entries only show the district. These are in a single string variable postcode: NW11RS NW52XB N78HQ N1 N78EU N19DG N18DW N7 Some have spaces separating the district and the address group, but the spaces can be deleted using [CTRL]H N1 9BL N1 9JW N1 9BA A further complication is that some central London postcodes are subdivided by an additional letter: EC1V97Q [This one is impossible, but the district code is OK] EC1N9BB EC1V9BB WC1N1BG WC1N1HD WC1N1HJ To avoid any risk of disclosure, these need to be anonymised by stripping off the last three characters, but leaving the district code if it does not have these. Thus for the above examples I need to have only: NW1 NW5 N7 N1 N7 N1 N1 N7 N1 N1 N1 EC1V EC1N EC1V WC1N WC1N WC1N I’ve done it the long way round with: autorecode var postcode / into district. freq district. . . then visually grouped (ignoring the additional letter for central London) with: recode district (2 thru 96 =1) (97 thru 394 = 2) (395 thru 457 = 3) (458 thru 534 = 4) (535 thru 697 = 5) (702 thru 931 = 6) (932 thru 1011 = 7) (1013 thru 1022 = 8) (1023 thru 1125 = 9) (else = -1) into area. format area (f2.0). missing values area (-1). freq area.
. . but there must be a quicker way using one of the string functions. Any suggestions? John F Hall (Mr) [Retired academic survey researcher] Email: [hidden email] Website: www.surveyresearch.weebly.com SPSS start page: www.surveyresearch.weebly.com/1-survey-analysis-workshop |
This may help to some degree, starting point, to validate the postcode, group into parts and then remove as much of trailing postcode as required. SPSSINC TRANS will be useful here.
However I suspect RegEx will be new to you John...? :-) Well worth looking into.. |
Jignesh
Thanks: interesting. Think I'll stick to my solution, then try to work out what the 11 dodgy codes might be (hasty typing by the client?). I'm doing this because Edinburgh have offered to take some small surveys off my hands which I have preserved for years and want to put somewhere safe. I can't get authorisation to release them for teaching or research because the original clients no longer exist or simply don't respond to requests for authorisation. I just can't bring myself to delete them. John -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jignesh Sutar Sent: 18 May 2015 11:10 To: [hidden email] Subject: Re: Stripping postcodes This <http://stackoverflow.com/questions/164979/uk-postcode-regex-comprehensive> may help to some degree, starting point, to validate the postcode, group into parts and then remove as much of trailing postcode as required. SPSSINC TRANS will be useful here. However I suspect RegEx will be new to you John...? :-) Well worth looking into.. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Stripping-postcodes-tp5729566p 5729567.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Hi, John, if you've got a Python installation within your SPSS version I can probably give you a syntax to do what Jignesh proposed. GL, Mario John F Hall <[hidden email]> schrieb am 12:08 Montag, 18.Mai 2015: Jignesh Thanks: interesting. Think I'll stick to my solution, then try to work out what the 11 dodgy codes might be (hasty typing by the client?). I'm doing this because Edinburgh have offered to take some small surveys off my hands which I have preserved for years and want to put somewhere safe. I can't get authorisation to release them for teaching or research because the original clients no longer exist or simply don't respond to requests for authorisation. I just can't bring myself to delete them. John -----Original Message----- From: SPSSX(r) Discussion [mailto:[hidden email]] On Behalf Of Jignesh Sutar Sent: 18 May 2015 11:10 To: [hidden email] Subject: Re: Stripping postcodes This <http://stackoverflow.com/questions/164979/uk-postcode-regex-comprehensive> may help to some degree, starting point, to validate the postcode, group into parts and then remove as much of trailing postcode as required. SPSSINC TRANS will be useful here. However I suspect RegEx will be new to you John...? :-) Well worth looking into.. -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/Stripping-postcodes-tp5729566p 5729567.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Mario Giesel
Munich, Germany |
In reply to this post by John F Hall
How about this, where postalcode is the
string variable holding the code.
spssinc trans result=prunedpostal type=10 /formula "re.sub(r'(\d+)\D+$', r'\1',postalcode)". Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: John F Hall <[hidden email]> To: [hidden email] Date: 05/18/2015 02:28 AM Subject: [SPSSX-L] Stripping postcodes Sent by: "SPSSX(r) Discussion" <[hidden email]> I have a data set with British postcodes which normally consist of a postal district (one or two letters followed by 1 or 2 digits) followed by 3 characters (one digit two letters) signifying address group. Full postcodes consist of 5, 6 or 7 characters, but some of the entries only show the district. These are in a single string variable postcode:
NW11RS NW52XB N78HQ N1 N78EU N19DG N18DW N7
Some have spaces separating the district and the address group, but the spaces can be deleted using [CTRL]H
N1 9BL N1 9JW N1 9BA
A further complication is that some central London postcodes are subdivided by an additional letter:
EC1V97Q [This one is impossible, but the district code is OK] EC1N9BB EC1V9BB WC1N1BG WC1N1HD WC1N1HJ
To avoid any risk of disclosure, these need to be anonymised by stripping off the last three characters, but leaving the district code if it does not have these. Thus for the above examples I need to have only:
NW1 NW5 N7 N1 N7 N1 N1 N7 N1 N1 N1 EC1V EC1N EC1V WC1N WC1N WC1N
I’ve done it the long way round with:
autorecode var postcode / into district. freq district.
. . then visually grouped (ignoring the additional letter for central London) with:
recode district (2 thru 96 =1) (97 thru 394 = 2) (395 thru 457 = 3) (458 thru 534 = 4) (535 thru 697 = 5) (702 thru 931 = 6) (932 thru 1011 = 7) (1013 thru 1022 = 8) (1023 thru 1125 = 9) (else = -1) into area.
format area (f2.0). missing values area (-1). freq area.
. . but there must be a quicker way using one of the string functions. Any suggestions?
John F Hall (Mr) [Retired academic survey researcher]
Email: johnfhall@... Website: www.surveyresearch.weebly.com SPSS start page: www.surveyresearch.weebly.com/1-survey-analysis-workshop
===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@... (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by John F Hall
The programmer's Guide to PAF of Royal Mail
http://www.royalmail.com/sites/default/files/docs/pdf/programmers_guide_edition_7_v5.pdf specifies the British postcode system. The postcode is composed by two sections, outward and inward. Outward seems to hold 2-4 characters, and inward holds 3 characters (A##), in all 5 to 7 characters. Normally these are divided by a space. If you don't have to verify valid postcodes and trust your data you could just drop the inward codes and keep outward (which I believe was your wish). I assume PostCode is in correctly format and properly truncated if <5 characters. HTH, PR DATA LIST list / PostCode (A7). BEGIN DATA NW11RS NW52XB N78HQ N1 N78EU N19DG N18DW N7 N1 9BL N1 9JW N1 9BA EC1V97Q EC1N9BB EC1V9BB WC1N1BG WC1N1HD WC1N1HJ END DATA. * Length is 5-7 characters without the space *. STRING #pc (A7) /PcOutward (A4) /PcInward (A3). * Remove spaces *. COMPUTE #pc=REPLACE(PostCode, ' ', ''). COMPUTE ## =CHAR.LENGTH(#pc). DO IF (## LT 5). /* Assume inward code is missing */. COMPUTE PcOutward=#pc. COMPUTE PcInward =''. ELSE. /* Copy Outward code */. COMPUTE PcOutward=CHAR.SUBSTR(#pc, 1, ##-3). COMPUTE PcInward =CHAR.SUBSTR(#pc, ##-2, 3). END IF. LIST ALL. |
Free forum by Nabble | Edit this page |