Dear SPSSX-L listserve group
I have a large data set N=20,000. Each individual record has a field "zip code" which contains either a 5 character string (ex. 11733, or a 5 plus 4 character string separated by a dash (ex 11733-4567). The zip code field is not a numeric but a string variable. Is there a way to recode the string zip code into a new field zip code numeric, that retains the same order of numbers, but converts them into a numeric variable? Thank you so much for your suggestions. Michael Tunik, MD Associate Professor of Emergency Medicine and Pediatrics NYU Grossman School of Medicine ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
Yes, but you have two cases: zip and zip+4. You haven't told us how you want to treat the zip+4 values. I'll say you want to discard the +4 component. The variable, I'll call it zipcode, should a10 or longer.
Compute zipnumeric=number(char.substr(zipcode,1,5),f5.0). Gene Maguin -----Original Message----- From: SPSSX(r) Discussion <[hidden email]> On Behalf Of Michael Tunik Sent: Saturday, October 10, 2020 2:54 PM To: [hidden email] Subject: converting zipcodes as strings to numeric Dear SPSSX-L listserve group I have a large data set N=20,000. Each individual record has a field "zip code" which contains either a 5 character string (ex. 11733, or a 5 plus 4 character string separated by a dash (ex 11733-4567). The zip code field is not a numeric but a string variable. Is there a way to recode the string zip code into a new field zip code numeric, that retains the same order of numbers, but converts them into a numeric variable? Thank you so much for your suggestions. Michael Tunik, MD Associate Professor of Emergency Medicine and Pediatrics NYU Grossman School of Medicine ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Michael Tunik
How about this? It shifts the 5 digit zips over by 4 places to be consistent with the 9-digit zips, but you can just remove the multiplication in that compute for 5-digit zips if you want to leave those values as is. I assigned N format to the result, which eliminates comma separators and decimal points. data list list /zip(a10). begin data. 11733 11733-4567 end data. compute #dash = char.index(zip, "-"). do if #dash gt 0. compute nzip = number(char.substr(zip,1,5), F5.0) * 10000 +number(char.substr(zip, 7), F4.0). else. compute nzip = number(zip, F5.0) * 10000. end if. format nzip(N9). exec. On Sat, Oct 10, 2020 at 12:53 PM Michael Tunik <[hidden email]> wrote: Dear SPSSX-L listserve group |
Just curious.
Others have pointed out ways to do this. I have never seen Zipcodes used arithmetically. Why is it a problem if it is a string field? ----- Art Kendall Social Research Consultants -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
In reply to this post by Jon Peck
Thank you for your advice. I will try this approach and get back to you as soon as possible with results. Best Mike Tunik On Sat, Oct 10, 2020, 3:43 PM Jon Peck <[hidden email]> wrote:
|
In reply to this post by Art Kendall
Thanks for your question. We will be matching our data base zip codes with others in another local data base. These will be numeric and I did not think they could be matched if the zip code fields were of different types. Best Mike Tunik On Sat, Oct 10, 2020, 5:16 PM Art Kendall <[hidden email]> wrote: Just curious. |
If you are doing the matching in SPSS, you could change either source.
Do both sources have a mix of complete and incomplete ZIPs? There could be a mismatch if one source has a complete zip+4 and the other source has an incomplete code. Creating 2 new temporary variables for the matching process might make help thinking it out. Two variables would help when useing human eyeballs for some quality assurance checking. ----- Art Kendall Social Research Consultants -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
In reply to this post by Maguin, Eugene
Gene, thank you for the advice. I am not sure why this did not work. I ran this syntax and it converted the 5 and 10 character zip codes to system missing fields. Best Mike Tunik On Sat, Oct 10, 2020 at 3:30 PM Maguin, Eugene <[hidden email]> wrote: Yes, but you have two cases: zip and zip+4. You haven't told us how you want to treat the zip+4 values. I'll say you want to discard the +4 component. The variable, I'll call it zipcode, should a10 or longer. -- Michael G. Tunik
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
|
In reply to this post by Jon Peck
Hi Jon Thank you for the advice and the SPSS syntax. I ran the syntax and it did convert the 5 and 10 character zip codes to a 5 character zip code with 2 decimal places. Very helpful! Best Mike Tunik On Sat, Oct 10, 2020 at 3:43 PM Jon Peck <[hidden email]> wrote:
-- Michael G. Tunik
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
|
In reply to this post by Art Kendall
Hi Art Thank you very much for the advice. I do not believe the zip codes will need to be the 4 digit extension,just the 5 digit codes. Once I have the zip codes I need to match with the ones in the patient data set, I will have a better idea of what is needed for the next step. Best Mike Tunik On Mon, Oct 12, 2020 at 11:02 AM Art Kendall <[hidden email]> wrote: If you are doing the matching in SPSS, you could change either source. -- Michael G. Tunik
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
|
*If you only need the short ZIP this should work.
data list list /zip(a10). begin data. 11733 11733-4567 01234 end data. Numeric nzip (n5). compute nzip = number(char.substr(zip,1,5), N5). LIST. Consider using FIND DUPLICATE CASES a few time to see how well you are locating you duplicates. Or if you are using MATCH Files be sure to check on match failures and on false matches. ----- Art Kendall Social Research Consultants -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
Hi Art I tried this, and the result is SPSS creates a separate data base, with only 3 zip codes 11733, 11733-4567, 01234. These show up as 2 variables, the first variable is a string variable, and a second variable which is a restricted numeric both have the same 3 zip codes. I am not sure why this syntax is not converting the zip string variable in the primary data base to an nzip numeric. Would appreciate any thoughts. Thank you Best Mike Tunik On Mon, Oct 19, 2020 at 9:45 AM Art Kendall <[hidden email]> wrote: *If you only need the short ZIP this should work. Michael G. Tunik
=====================
To manage your subscription to SPSSX-L, send a message to
[hidden email] (not to SPSSX-L), with no body text except the
command. To leave the list, send the command
SIGNOFF SPSSX-L
For a list of commands to manage subscriptions, send the command
INFO REFCARD
|
In reply to this post by Michael Tunik
How can that possibly be?
Run the transformation code on your own data rather than the sample mockup. Face palm time. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
In reply to this post by Michael Tunik
* this part is only to have some example data to see whether the concept
works. * it only touches the data it creates and not your data. * it only create 1 string variable *If you only need the short ZIP this should work. data list list /zip(a10). begin data. 11733 11733-4567 01234 end data. * this part is suggested syntax to (a) create a numeric variable and (b) COMPUTE a new variable. * when the LIST command passes the data. Numeric nzip (n5). compute nzip = number(char.substr(zip,1,5), N5). * the LIST command simple lists the 2 variables. The string variable read in from the syntax and the * numeric variable that is COMPUTEd. LIST. ----- Art Kendall Social Research Consultants -- Sent from: http://spssx-discussion.1045642.n5.nabble.com/ ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
Art Kendall
Social Research Consultants |
Free forum by Nabble | Edit this page |