Skip to main content

Geolocation Search

Services that allow users to identify nearby points of interest continue to grow in popularity. I'm sure we're all familiar with social websites that let you search for the profiles of people near a postal code, or mobile applications that use geolocation to identify Thai restaurants within walking distance. It's surprisingly simple to implement such functionality, and in this post I will discuss how to do so.

The first step is to obtain the latitude and longitude coordinates of any locations you want to make searchable. In the restaurant scenario, you'd want the latitude and longitude of each eatery. In the social website scenario, you'd want to obtain a list of postal codes with their centroid latitude and longitude.

In general, postal code-based geolocation is a bad idea; their boundaries rarely form simple polygons, the area they cover vary in size, and are subject to change based on the whims of the postal service. But many times we find ourselves stuck on a client project that calls for things we know better than to do, so I'll spare you my rant and instead ask that you keep the following in mind:

  • ZCTAs are not ZIP codes so do not use them as such. Some people recommend using the US Census Bureau's 2000 ZCTA dataset as freely available ZIP code latitude/longitude information for the United States, but the Bureau's own website warns ZCTA codes are designed to label census blocks and do not always correspond to ZIP codes assigned to the same area by the US Postal Service. Unpopulated areas such as lakes and forests may have their own ZCTA code, while smaller ZIP codes may not even "appear in the ZCTA universe."

  • Postal codes are volatile so make sure you understand your dataset provider's update policies. CD Light, LLC offers a line of commercial datasets which they license from the US Postal Service and Canada Post Corporation and then augment and maintain using a variety of additional sources. New codes are added and obsolete codes are removed from their datasets. Quarterly and monthly updates are made available through reasonably priced subscription plans. federalgovernmentzipcodes.us on the other hand offers a free US ZIP code dataset that is irregularly-maintained using a smaller set of additional sources. Obsolete ZIP codes are included but marked "decommissioned." It's not clear however which company or individual is behind the offering.
Regardless if you're hosting your dataset yourself or using a web service such as GeoCoder.ca, it's still not a bad idea to familiarize yourself with what efforts are taken to build it from current information and to maintain its accuracy. It's better to have more-specific/accurate data than not.

The second step is to develop an appropriate strategy to search your data. You don't want to calculate the distance to all target locations' coordinates from a given origin; this naive, brute-force search is rarely desirable, even if the dataset is small and your application/site traffic is low. My approach is to calculate the latitude and longitude coordinates for a search perimeter's north, east, south, and west bounds (bearings of 0°, 90°, 180°, and -90° or 270° respectively). I can then search the subset of records that have locations which falls within this bounding-box.

I use the following formulas to calculate target coordinates and restrict my dataset search:



where:
  • d is distance
  • R is the Earth's radius (WGS84 model semi-major axis is 3,963.19 miles)
  • is latitude
  • λ is longitude
  • θ is bearing

Implemented in code, the formulas look like this:

// calculate destination lat/lon given a starting point, bearing, and distance
function destination($lat,$lon, $bearing, $distance,$units="mi") {
    $radius = strcasecmp($units, "km") ? 3963.19 : 6378.137;
    $rLat = deg2rad($lat);
    $rLon = deg2rad($lon);
    $rBearing = deg2rad($bearing);
    $rAngDist = $distance / $radius;

    $rLatB = asin(sin($rLat) * cos($rAngDist) + 
        cos($rLat) * sin($rAngDist) * cos($rBearing));

    $rLonB = $rLon + atan2(sin($rBearing) * sin($rAngDist) * cos($rLat), 
                           cos($rAngDist) - sin($rLat) * sin($rLatB));

    return array("lat" => rad2deg($rLatB), "lon" => rad2deg($rLonB));
}

which a simple wrapper function can then call repeatedly with different bearings to determine the search bounds:

// calculate bounding box
function bound($lat,$lon, $distance,$units="mi") {
return array("N" => destination($lat,$lon,   0, $distance,$units),
             "E" => destination($lat,$lon,  90, $distance,$units),
             "S" => destination($lat,$lon, 180, $distance,$units),
             "W" => destination($lat,$lon, 270, $distance,$units));
}

Records that are located within this search area still need to be filtered, and I use the following formula to calculate the distance between two coordinates:



Implemented in code, the formula look like this:

// calculate distance between two lat/lon coordinates
function distance($latA,$lonA, $latB,$lonB, $units="mi") {
    $radius = strcasecmp($units, "km") ? 3963.19 : 6378.137;
    $rLatA = deg2rad($latA);
    $rLatB = deg2rad($latB);
    $rHalfDeltaLat = deg2rad(($latB - $latA) / 2);
    $rHalfDeltaLon = deg2rad(($lonB - $lonA) / 2);

    return 2 * $radius * asin(sqrt(pow(sin($rHalfDeltaLat), 2) +
        cos($rLatA) * cos($rLatB) * pow(sin($rHalfDeltaLon), 2)));
}

A search using these functions would look like this (assume $latitude and $longitude are given and provide the origin coordinates, and $distance and $units provide the desired search radius):

$b = bound($latitude,$longitude, $distance,$units);
$query = "SELECT location_name, address, city, state, zip_code,
    latitude, longitude FROM locations WHERE
    latitude BETWEEN {$b["S"]["lat"]} AND {$b["N"]["lat"]} AND
    longitude BETWEEN {$b["W"]["lon"]} AND {$b["E"]["lon"]}";
    $result = $db->query($query);

    $locations = array();
    while ($row = $result->fetch(PDO::FETCH_ASSOC)) {
    $dist = distance($latitude,$lonitude, $row["latitude"],$row["longitude"], $units);
    if ($dist <= $distance) {
         $locations[] = array("name"     => $row["location_name"],
                              "address"  => $row["address"],
                              "city"     => $row["city"],
                              "state"    => $row["state"],
                              "zip_code" => $row["zip_code"],
                              "distance" => round($dist, 2));
    }
}

A deep understanding of the mathematics used in navigation is nice to have, but is not truly necessary. We're lucky to live in an age where finding suitable formulas is just a quick Internet search away, and discussions on the trade-offs between their simplicity and accuracy are easy to find. The formulas I used are based on the Haversine Formula which have been accurate enough for my needs. Others such as Vincenty's Formulae take into account the ellipsoidal nature of the Earth to produce more accurate results, but I'm helping people locate near-by friends or businesses... not programming cruise missiles.

The final step is obtaining origin coordinates. In the postal code example from the beginning you can look up the code's centroid latitude and longitude and use those as your search origin. If your target audience is receptive, you may want to consider taking advantage of the JavaScript Geolocation API defined by the W3C. The API exposes the navigator.geolocation object which can obtain the user's current location. A user will rarely know his or her exact coordinates, but many Internet-enabled devices and smartphones now ship with GPS capabilities which the browser can access. navigator.geolocation.getCurrentPosition() causes the brower to request permission from the user to report his or her location, and if permission is granted will invoke a specified callback function.

<form>
 Search <input id="txtDist" maxlength="3" name="distance" size="3" type="text">
 <select name="units">
  <option value="mi">mi</option>
  <option value="km">km</option>
 </select> from<br>
 <input checked="checked" name="searchBy" type="radio" value="zip">ZIP code
 <input maxlength="5" name="zipCode" size="5" type="text"><br>
 <input id="coords" name="searchBy" type="radio" value="coords">coordinates
 <input id="txtLat" maxlength="10" name="latitude" size="6" type="text"> Lat.
 <input id="txtLon" maxlength="10" name="longitude" size="6" type="text"> Lon.<br>
 <input type="submit" value="Search">
</form>
<script type="text/javascript">
(function () {
    function setCoords(pos) {
        document.getElementById("coords").checked = true;
        document.getElementById("txtLat").value = pos.coordinates.latitude;
        document.getElementById("txtLon").value = pos.coordinates.longitude;
    }

    if (navigator.geolocation) {
        navigator.geolocation.getCurrentPosition(setCoords);
    }
})();
</script>

Hopefully you now seen how relatively easy it is to support searching nearby points of interest in your web-application. First and foremost it is important to maintain a good dataset of target coordinates, since though users will understand the search results to be approximations, you can never return good results with bad data. Implementing algorithms necessary to search the set efficiently is relatively straightforward and I have shown you the approach I use; other algorithms and formulas can be found and implemented if you need more accurate results. And modern APIs like navigator.geolocation make it possible for you to offer new ways to your users to interact with your data. Feel free to share your thoughts and experiences in the comments section below.

Special thanks to Matthew Turland and Davey Shafik who both helped me out immensely when I first got my feet wet with geolocation.

Esperanto translation is available at zaemis.tumblr.com/post/38447430414/geo-loka-sercxado

Comments

  1. Your strategy for distance searching is exactly what I had in mind. Thank you for providing all the formulas and explanation.

    ReplyDelete
  2. Huge thanks for creating an easy to use set of functions!!

    ReplyDelete
  3. Thanks, awasome, keep it up ,mind blowing

    ReplyDelete

Post a Comment

Popular posts from this blog

Composing Music with PHP

I’m not an expert on probability theory, artificial intelligence, and machine learning. And even my Music 201 class from years ago has been long forgotten. But if you’ll indulge me for the next 10 minutes, I think you’ll find that even just a little knowledge can yield impressive results if creatively woven together. I’d like to share with you how to teach PHP to compose music. Here’s an example: You’re looking at a melody generated by PHP. It’s not the most memorable, but it’s not unpleasant either. And surprisingly, the code to generate such sequences is rather brief. So what’s going on? The script calculates a probability map of melodic intervals and applies a Markov process to generate a new sequence. In friendlier terms, musical data is analyzed by a script to learn which intervals make up pleasing melodies. It then creates a new composition by selecting pitches based on the possibilities it’s observed. . Standing on Shoulders Composition doesn’t happen in a vacuum. Bach wa

Learning Prolog

I'm not quite sure exactly I was searching for, but somehow I serendipitously stumbled upon the site learnprolognow.org a few months ago. It's the home for an introductory Prolog programming course. Logic programming offers an interesting way to think about your problems; I've been doing so much procedural and object-oriented programming in the past decade that it really took effort to think at a higher level! I found the most interesting features to be definite clause grammars (DCG), and unification. Difference lists are very powerful and Prolog's DCG syntax makes it easy to work with them. Specifying a grammar such as: s(s(NP,VP)) --> np(NP,X,Y,subject), vp(VP,X,Y). np(np(DET,NBAR,PP),X,Y,_) --> det(DET,X), nbar(NBAR,X,Y), pp(PP). np(np(DET,NBAR),X,Y,_) --> det(DET,X), nbar(NBAR,X,Y). np(np(PRO),X,Y,Z) --> pro(PRO,X,Y,Z). vp(vp(V),X,Y) --> v(V,X,Y). vp(vp(V,NP),X,Y) --> v(V,X,Y), np(NP,_,_,object). nbar(nbar(JP),X,3) --> jp(JP,X). pp(pp(PREP,N

What's Wrong with OOP

Proponents of Object Oriented Programming feel the paradigm yields code that is better organized, easier to understand and maintain, and reusable. They view procedural programming code as unwieldy spaghetti and embrace OO-centric design patterns as the "right way" to do things. They argue objects are easier to grasp because they model how we view the world. If the popularity of languages like Java and C# is any indication, they may be right. But after almost 20 years of OOP in the mainstream, there's still a large portion of programmers who resist it. If objects truly model the way people think of things in the real world, then why do people have a hard time understanding and working in OOP? I suspect the problem might be the focus on objects instead of actions. If I may quote from Steve Yegge's Execution in the Kingdom of Nouns : Verbs in Javaland are responsible for all the work, but as they are held in contempt by all, no Verb is ever permitted to wander about