|
Is there a method in spss or on the web somewhere that reads a file of five digit zip codes and returns (writes back) a file of the distance between them? Somebody has pointed out that this website (http://www.melissadata.com/lookups/zipdistance.asp)
will return a distance for a pair of typed/copied in zip codes (it may do more but in return for something). We have, potentially, several thousand to do.
Thanks, Gene Maguin |
|
I did something like this a few years ago.
If you have a zip codes table with lat/long values, you can use the
SPSSINC TRANS extension command with the extendedTransforms.ellipseDist
function (or use spherical distances) to compute the distances. But
do you really want all by all? That's going to be millions of numbers.
Note also that zip code areas can have funny shapes, so, especially
for close areas, the distances won't be super accurate.
Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: "Maguin, Eugene" <[hidden email]> To: [hidden email] Date: 09/02/2015 11:43 AM Subject: [SPSSX-L] distance between zipcodes Sent by: "SPSSX(r) Discussion" <[hidden email]> Is there a method in spss or on the web somewhere that reads a file of five digit zip codes and returns (writes back) a file of the distance between them? Somebody has pointed out that this website (http://www.melissadata.com/lookups/zipdistance.asp) will return a distance for a pair of typed/copied in zip codes (it may do more but in return for something). We have, potentially, several thousand to do. Thanks, Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@...(not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
|
|
Thanks, Jon. No, not all by all. The input file would be id, zip1, zip2 and the output file would be id, zip1, zip2, distance. I understand your point about
accuracy but we don’t have street addresses. Ok, so a python routine can do the computation given lat/long numbers. Do there exist files of lat/long numbers for zip code centers (however those centers are defined)? Gene Maguin From: SPSSX(r) Discussion [mailto:[hidden email]]
On Behalf Of Jon K Peck I did something like this a few years ago. If you have a zip codes table with lat/long values, you can use the SPSSINC TRANS extension command with the extendedTransforms.ellipseDist
function (or use spherical distances) to compute the distances. But do you really want all by all? That's going to be millions of numbers. Note also that zip code areas can have funny shapes, so, especially for close areas, the distances won't be super
accurate.
===================== To manage your subscription to SPSSX-L, send a message to
[hidden email](not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions,
send the command INFO REFCARD
|
|
Census has ZCTA (Zip code Tabulation Areas) with with Shapefiles for mappping. I also have some old files which have lat/long centroid of ZCTA. I do not recall if these appear in more recent issues of thie product. ZCTA does not include all zipcodes. ... Mark Miller On Wed, Sep 2, 2015 at 11:18 AM, Maguin, Eugene <[hidden email]> wrote:
|
|
After checking my own files, I have Zipcode files from 2004 thru 2012 which contain (supposedly) Lat/Long for centroids. At least one of these is a SAS data filewhich is easily converted to SPSS. There are 33233 Zipcodes listed in the SAS file. ... Mark Miller On Wed, Sep 2, 2015 at 11:22 AM, Mark Miller <[hidden email]> wrote:
|
|
In reply to this post by Maguin, Eugene
Here's an snippet example using a zipcode
file I found somewhere on the net and squirreled away.
get file="c:/data/zipcodes.sav". dataset name zipcodes. dataset activate zipcodes. sort cases by zipcode. dataset activate main. MATCH FILES /FILE=* /TABLE='zipcodes' /BY zipcode. spssinc trans result=distance /formula "extendedTransforms.ellipseDist(latitude, longitude, lat2, long2)". Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: "Maguin, Eugene" <[hidden email]> To: [hidden email] Date: 09/02/2015 12:21 PM Subject: Re: [SPSSX-L] distance between zipcodes Sent by: "SPSSX(r) Discussion" <[hidden email]> Thanks, Jon. No, not all by all. The input file would be id, zip1, zip2 and the output file would be id, zip1, zip2, distance. I understand your point about accuracy but we don’t have street addresses. Ok, so a python routine can do the computation given lat/long numbers. Do there exist files of lat/long numbers for zip code centers (however those centers are defined)? Gene Maguin From: SPSSX(r) Discussion [[hidden email]] On Behalf Of Jon K Peck Sent: Wednesday, September 02, 2015 1:55 PM To: [hidden email] Subject: Re: distance between zipcodes I did something like this a few years ago. If you have a zip codes table with lat/long values, you can use the SPSSINC TRANS extension command with the extendedTransforms.ellipseDist function (or use spherical distances) to compute the distances. But do you really want all by all? That's going to be millions of numbers. Note also that zip code areas can have funny shapes, so, especially for close areas, the distances won't be super accurate. Jon Peck (no "h") aka Kim Senior Software Engineer, IBM peck@... phone: 720-342-5621 From: "Maguin, Eugene" <emaguin@...> To: [hidden email] Date: 09/02/2015 11:43 AM Subject: [SPSSX-L] distance between zipcodes Sent by: "SPSSX(r) Discussion" <[hidden email]> Is there a method in spss or on the web somewhere that reads a file of five digit zip codes and returns (writes back) a file of the distance between them? Somebody has pointed out that this website (http://www.melissadata.com/lookups/zipdistance.asp) will return a distance for a pair of typed/copied in zip codes (it may do more but in return for something). We have, potentially, several thousand to do. Thanks, Gene Maguin ===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@...(not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@...(not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
|
|
Jon,
I ruthlessly avoided having anything to do with python or extension commands. Time to do something different because of this problem.
This is the key: spssinc trans result=distance So. I’m guessing that this is an python extension command. I looked in the python reference and see that the appendix F lists a spssinc trans function. Does that
imply that this formula is somewhere on my spss install? I see the Run Scripts in Utilities. It wants a file name. What’s the file name? I assume my verison of 23 will run this. True assumption? Thanks, Gene Maguin From: SPSSX(r) Discussion [mailto:[hidden email]]
On Behalf Of Jon K Peck Here's an snippet example using a zipcode file I found somewhere on the net and squirreled away.
===================== To manage your subscription to SPSSX-L, send a message to
[hidden email](not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions,
send the command INFO REFCARD
===================== To manage your subscription to SPSSX-L, send a message to
[hidden email](not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD
|
|
Administrator
|
When I run it as is I receive the error:
Warnings No module named extendedTransforms There is also nothing I see in the available Extensions on the IBM site when I connect via the Utilities>Extension Bundles...
Please reply to the list and not to my personal email.
Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" |
|
In reply to this post by Maguin, Eugene
If you have the lat and lon's already, all you need is the law of cosines to calculate the distance. This webpage, http://www.movable-type.co.uk/scripts/latlong.html, shows it in Excel, which is pretty easy to port to SPSS. SPSS does not have ACOS, but this tech note shows how to compute it, https://www-304.ibm.com/support/docview.wss?uid=swg21476208.
Given the coarseness of zipcodes, you don't need to worry about problems with calculating small distances, http://gis.stackexchange.com/a/4911/751. I'm not sure about the error introduced by assuming the earth is a perfect sphere, but I imagine it isn't that big either. |
|
In reply to this post by David Marso
extendedTransforms.py is a utility module,
not an extension command itself, so it is in the Utilities collection,
which is accessible via the Downloads page here.
https://www.ibm.com/developerworks/community/wikis/home?lang=en#/wiki/We70df3195ec8_4f95_9773_42e448fa9029/page/Downloads%20for%20IBM%C2%AE%20SPSS%C2%AE%20Statistics or directly here https://www.ibm.com/developerworks/community/files/app?lang=en#/file/abea0af7-da27-4dd1-80f9-958b935eeb48 I think we install it starting with V23, but I'm not positive about that. It would need to be saved to a location on the Python search path such as the python\lib\site-packages directory under the Statistics installation (as of V22). For those interested, here is a list of the functions in that module """Functions designed to be used with the trans module to carry out one or more transformations on casewise data. search: search a string for a match to a regular expression, case sensitive or not subs: replace occurrences of a regular expression pattern with specified values templatesub: substitue values in a template expression levenshteindistance: calculate similarity between two strings soundex: calculate the soundex value of a string (a rough phonetic encoding) nysiis: enhanced sound encoding (claimed superior to soundex for surnames) soundexallwords: calculate the soundex value for each word in a string and return a blank-separated string median: median of a list of values mode: mode of a list of values multimode: up to n modes of a list of values matchcount: compare value with list of values and count matches using standard or custom comparison function strtodatetime: convert a date/time string to an SPSS datetime value using a pattern datetimetostr: convert an SPSS date/time value to a string using a pattern lookup: return a value from a table lookup vlookup: return a value from a table lookup (more convenient than lookup w SPSSINC TRANS) vlookupinterval: return a value from a table lookup using intervals sphDist: calculate distance between two points on earth using spherical approximation ellipseDist: calculate distance between two points on earth using ellipsoidal approximation jaroWinkler calculate Jaro-Winkler string similarity measure extractDummies extract a set of binary variables from a value coded in powers of 2 packDummies pack a sequence of numeric and/or string values into a single float translatechar map characters according to a conversion table countWkdays count number of days between two dates that are not excluded vlookupgroupinterval return a value associated with a group and a set of intervals for that group countDaysWExclusions count days in interval exclusing specificied weekdays and other dates DiceStringSimilarity compare strings using Dice bigram metric. Dictdict find best match of strings using Dice metric setRandomSeed initialize random number generator invGaussian inverse Gaussian distribution random numbers triangular triangular random numbers Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: David Marso <[hidden email]> To: [hidden email] Date: 09/02/2015 04:59 PM Subject: Re: [SPSSX-L] distance between zipcodes Sent by: "SPSSX(r) Discussion" <[hidden email]> When I run it as is I receive the error: Warnings No module named extendedTransforms There is also nothing I see in the available Extensions on the IBM site when I connect via the Utilities>Extension Bundles... ----- Please reply to the list and not to my personal email. Those desiring my consulting or training services please feel free to email me. --- "Nolite dare sanctum canibus neque mittatis margaritas vestras ante porcos ne forte conculcent eas pedibus suis." Cum es damnatorum possederunt porcos iens ut salire off sanguinum cliff in abyssum?" -- View this message in context: http://spssx-discussion.1045642.n5.nabble.com/distance-between-zipcodes-tp5730565p5730569.html Sent from the SPSSX Discussion mailing list archive at Nabble.com. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Andy W
Here is a macro that uses the law of cosines I mentioned.
DEFINE !CosDist (Lat1 = !TOKENS(1) /Lon1 = !TOKENS(1) /Lat2 = !TOKENS(1) /Lon2 = !TOKENS(1) /Rad = !DEFAULT(6371000) !TOKENS(1) /Res = !TOKENS(1)) COMPUTE #ToRad = ( 4*ARTAN(1) )/180. COMPUTE #L1R = !Lat1*#ToRad. COMPUTE #L2R = !Lat2*#ToRad. COMPUTE #Lo1R = !Lon1*#ToRad. COMPUTE #Lo2R = !Lon2*#ToRad. COMPUTE #S = SIN(#L1R)*SIN(#L2R). COMPUTE #C = COS(#L1R)*COS(#L2R)*COS(#Lo2R-#Lo1R). COMPUTE !Res = (2*ARTAN(1) - ARSIN(#S + #C))*!Rad. !ENDDEFINE. I compared this to a sample of zipcode distances for New York to one location, https://dl.dropboxusercontent.com/u/3385251/Cosine_Distances.sps, to see what the error between this and the "extendedTransforms.ellipseDist" function. (See this blogpost for background, https://andrewpwheeler.wordpress.com/2014/11/19/using-the-google-distance-api-in-spss-plus-some-eda-of-travel-time-versus-geographic-distance/.) For that sample, the average error was around 500 meters, but grew with the distance. The percent error was always less than 0.3% in that sample (which I saw that number somewhere else as well, so it might be a universal rule-of-thumb). So if the typical distances are around 20 kilometers, the error from using the law of cosines above is likely to be around 60 meters. For 500 kilometers, the error would be 1500 meters, etc. Probably reasonable for zipcode distances, as I would guess the coarseness of them makes around a 1 kilometer average error even in city areas where zips are smaller. |
|
I remember something related was published on Raynald's site long
ago.
http://www.spsstools.net/Syntax/Compute/ComputeDistancesOnEarth.sps ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
This is the complete syntax for looking
up the zipcode coordinates and calculating the distance using a database
of zipcodes.
* zip code file with variables zipcode, latitude, longitude (and others). get file="c:/data/zipcodes.zsav". dataset name zipcodes. * test data. data list list/id zip1 zip2(3F5.0). begin data 1 60093 60090 2 44074 60090 3 07090 60093 4 87506 60093 5 60093 87506 6 87506 87501 7 87506 87506 end data. dataset name clients. format zip1 zip2(N5). * map zip1 and zip2 to latitude, longitude coordinates. * Note that all references to variables must match the letter case exactly. * The terms in square brackets list the looked up values to return. spssinc trans result=lat1 long1 /initial "extendedTransforms.vlookup('zipcode', ['latitude', 'longitude'], 'zipcodes')" /formula func(zip1). spssinc trans result=lat2 long2 /initial "extendedTransforms.vlookup('zipcode', ['latitude', 'longitude'], 'zipcodes')" /formula func(zip2). * Calculate the distance between the zipcodes using the coordinates. * Coordinates are in degrees, so inradians is set to false. spssinc trans result=distance /formula "extendedTransforms.ellipseDist(lat1, long1, lat2, long2, inradians=False)". * Just for curiosity, calculate the distance using the spherical approximation. spssinc trans result=sphdistance /formula "extendedTransforms.sphDist(lat1, long1, lat2, long2, inradians=False)". Jon Peck (no "h") aka Kim Senior Software Engineer, IBM [hidden email] phone: 720-342-5621 From: Kirill Orlov <[hidden email]> To: [hidden email] Date: 09/03/2015 08:01 AM Subject: Re: [SPSSX-L] distance between zipcodes Sent by: "SPSSX(r) Discussion" <[hidden email]> I remember something related was published on Raynald's site long ago. http://www.spsstools.net/Syntax/Compute/ComputeDistancesOnEarth.sps ===================== To manage your subscription to SPSSX-L, send a message to LISTSERV@...(not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
|
In reply to this post by Maguin, Eugene
At 04:32 PM 9/2/2015, Maguin, Eugene wrote:
>I ruthlessly avoided having anything to do with python or extension >commands. Time to do something different because of this problem. As noted by others, if spherical approximation is good enough, the problem is easily amenable to native SPSS code. Here's a solution that has been posted two or three times in the past. It's not wrapped in a macro, but since the core is a single COMPUTE statement, I don't think it needs to be. It's from a (probably over-elaborate) post I wrote on the subject, back in 2009(*): FAQ: Computing distance from latitude and longitude (DRAFT) Sections below are ===== (1) Solution in native SPSS transformation code ===== (2) Using Python code from Developer Central ===== (3) Test data and test run of native SPSS code ===== (1) Native SPSS code ===================================== Earth-radius values are from http://en.wikipedia.org/wiki/Earth_radius; see also http://nssdc.gsfc.nasa.gov/planetary/factsheet/earthfact.html. * ............... Initialize constants ................. . * The following code >REQUIRES< that angles in the base system . * (SPSS) be in radians, so that the trigonometric distance is . * in radians, and can be multiplied directly by the Earth's . * radius. . DO IF $CASENUM EQ 1. * These initializations >MUST< be performed: . * #EarthRad is the Earth's radius in whatever units you please; . * the calculated distance will be in those units: . * 6,372.7976 km, . * 3,959.873 statute miles, . * 3,441.035 nautical miles. . . COMPUTE #EarthRad = 3959.873 /* statute miles */. * #AngleCvt is the number of your angle units (degrees, here) . * in one of SPSS's angle units (radians). It uses that . * ARCTAN(1)is PI/4 radians or (in any angle measure) 1/8 circle . . . COMPUTE #AngleCvt = 360 /* Number of input units in a full circle */ /(8*ARTAN(1)). END IF. * ............... Compute distance ................ . * Compute distance between points with coordinates . * (lat1,lon1) and (lat2,lon2) . compute distance = #EarthRad* (2*artan(1)-arsin( sin(lat1/#AngleCvt) /* (sin(lat1) */ *sin(lat2/#AngleCvt) /* .sin(lat2) */ + cos(lat1/#AngleCvt) /* +cos(lat1) */ *cos(lat2/#AngleCvt) /* .cos(lat2) */ *cos(lon2/#AngleCvt /* .cos(long2 */ -lon1/#AngleCvt) /* -long1))*/ )). FORMAT Distance (F7.2). ===== (2) Using Python code ==================================== From Peck, Jon, "Re: Function for arc cosine", to SPSSX-L Thu, 7 Jun 2007 09:53:06 -0500"" "In the extendedTransforms module on SPSS Developer Central ( www.spss.com/devcentral), there are two functions that implement distance calculations on Earth latitude and longitude coordinates. sphDist: calculate distance between two points on earth using spherical approximation ellipseDist: calculate distance between two points on earth using ellipsoidal approximation "Here is a simple usage example for just a single distance pair. ............. begin program. import spss import extendedTransforms fromloc = (41.90, 87.65) toloc = (41.73, 71.43) dist1 = extendedTransforms.ellipseDist(fromloc[0], fromloc[1], toloc[0], toloc[1], inradians=False) dist2 = extendedTransforms.sphDist(fromloc[0], fromloc[1], toloc[0], toloc[1], inradians=False) print dist1, dist2 end program. (3) ===== Test data, and test run ================================= The 'given' values which are compared with the calculation are, . Providence to Chicago distance, from Jon Peck's posting "Re: Function for arc cosine", Thu, 7 Jun 2007 09:53:06 -0500 . Others, arbitrary test points with distance calculated at site http://www.movable-type.co.uk/scripts/latlong.html. It's not clear why tiny discrepancies remain. DATA LIST LIST / City1 lat1 lon1 City2 lat2 lon2 GivenDist (A4, F6.2, F6.2,A4, F6.2, F6.2, F7.2). BEGIN DATA Pvd 41.90 87.65 Chi 41.73 71.43 836.27 A1 42.00 80.00 A2 39.00 70.00 564.33 B1 44.00 70.00 B2 49.00 85.00 791.00 END DATA. . /*-- LIST /*-*/. * ............... Initialize constants ................. . * The following code >REQUIRES< that angles in the base system . * (SPSS) be in radians, so that the trigonometric distance is . * in radians, and can be multiplied directly by the Earth's . * radius. . DO IF $CASENUM EQ 1. * These initializations >MUST< be performed: . * #EarthRad is the Earth's radius in whatever units you please; . * the calculated distance will be in those units: . * 6,372.7976 km, . * 3,959.873 statute miles, . * 3,441.035 nautical miles. . . COMPUTE #EarthRad = 3959.873 /* statute miles */. * #AngleCvt is the number of your angle units (degrees, here) . * in one of SPSS's angle units (radians). It uses that . * ARCTAN(1)is PI/4 radians or (in any angle measure) 1/8 circle . . . COMPUTE #AngleCvt = 360 /* Number of input units in a full circle */ /(8*ARTAN(1)). END IF. * ............... Compute distance ................ . * Compute distance between points with coordinates . * (lat1,lon1) and (lat2,lon2) . compute distance = #EarthRad* (2*artan(1)-arsin( sin(lat1/#AngleCvt) /* (sin(lat1) */ *sin(lat2/#AngleCvt) /* .sin(lat2) */ + cos(lat1/#AngleCvt) /* +cos(lat1) */ *cos(lat2/#AngleCvt) /* .cos(lat2) */ *cos(lon2/#AngleCvt /* .cos(long2 */ -lon1/#AngleCvt) /* -long1))*/ )). FORMAT GivenDist(F7.2). COMPUTE DeltaPct = 100*(Distance/GivenDist-1). FORMATS DeltaPct (PCT7.2). LIST. ===================== To manage your subscription to SPSSX-L, send a message to [hidden email] (not to SPSSX-L), with no body text except the command. To leave the list, send the command SIGNOFF SPSSX-L For a list of commands to manage subscriptions, send the command INFO REFCARD |
| Free forum by Nabble | Edit this page |
