Saturday, January 8, 2011

Geolocation Search

Services that allow users to identify nearby points of interest continue to grow in popularity. I'm sure we're all familiar with social websites that let you search for the profiles of people near a postal code, or mobile applications that use geolocation to identify Thai restaurants within walking distance. It's surprisingly simple to implement such functionality, and in this post I will discuss how to do so.

The first step is to obtain the latitude and longitude coordinates of any locations you want to make searchable. In the restaurant scenario, you'd want the latitude and longitude of each eatery. In the social website scenario, you'd want to obtain a list of postal codes with their centroid latitude and longitude.

In general, postal code-based geolocation is a bad idea; their boundaries rarely form simple polygons, the area they cover vary in size, and are subject to change based on the whims of the postal service. But many times we find ourselves stuck on a client project that calls for things we know better than to do, so I'll spare you my rant and instead ask that you keep the following in mind:

  • ZCTAs are not ZIP codes so do not use them as such. Some people recommend using the US Census Bureau's 2000 ZCTA dataset as freely available ZIP code latitude/longitude information for the United States, but the Bureau's own website warns ZCTA codes are designed to label census blocks and do not always correspond to ZIP codes assigned to the same area by the US Postal Service. Unpopulated areas such as lakes and forests may have their own ZCTA code, while smaller ZIP codes may not even "appear in the ZCTA universe."

  • Postal codes are volatile so make sure you understand your dataset provider's update policies. CD Light, LLC offers a line of commercial datasets which they license from the US Postal Service and Canada Post Corporation and then augment and maintain using a variety of additional sources. New codes are added and obsolete codes are removed from their datasets. Quarterly and monthly updates are made available through reasonably priced subscription plans. federalgovernmentzipcodes.us on the other hand offers a free US ZIP code dataset that is irregularly-maintained using a smaller set of additional sources. Obsolete ZIP codes are included but marked "decommissioned." It's not clear however which company or individual is behind the offering.
Regardless if you're hosting your dataset yourself or using a web service such as GeoCoder.ca, it's still not a bad idea to familiarize yourself with what efforts are taken to build it from current information and to maintain its accuracy. It's better to have more-specific/accurate data than not.

The second step is to develop an appropriate strategy to search your data. You don't want to calculate the distance to all target locations' coordinates from a given origin; this naive, brute-force search is rarely desirable, even if the dataset is small and your application/site traffic is low. My approach is to calculate the latitude and longitude coordinates for a search perimeter's north, east, south, and west bounds (bearings of 0°, 90°, 180°, and -90° or 270° respectively). I can then search the subset of records that have locations which falls within this bounding-box.

I use the following formulas to calculate target coordinates and restrict my dataset search:



where:
  • d is distance
  • R is the Earth's radius (WGS84 model semi-major axis is 3,963.19 miles)
  • is latitude
  • λ is longitude
  • θ is bearing

Implemented in code, the formulas look like this:

// calculate destination lat/lon given a starting point, bearing, and distance
function destination($lat,$lon, $bearing, $distance,$units="mi") {
    $radius = strcasecmp($units, "km") ? 3963.19 : 6378.137;
    $rLat = deg2rad($lat);
    $rLon = deg2rad($lon);
    $rBearing = deg2rad($bearing);
    $rAngDist = $distance / $radius;

    $rLatB = asin(sin($rLat) * cos($rAngDist) + 
        cos($rLat) * sin($rAngDist) * cos($rBearing));

    $rLonB = $rLon + atan2(sin($rBearing) * sin($rAngDist) * cos($rLat), 
                           cos($rAngDist) - sin($rLat) * sin($rLatB));

    return array("lat" => rad2deg($rLatB), "lon" => rad2deg($rLonB));
}

which a simple wrapper function can then call repeatedly with different bearings to determine the search bounds:

// calculate bounding box
function bound($lat,$lon, $distance,$units="mi") {
return array("N" => destination($lat,$lon,   0, $distance,$units),
             "E" => destination($lat,$lon,  90, $distance,$units),
             "S" => destination($lat,$lon, 180, $distance,$units),
             "W" => destination($lat,$lon, 270, $distance,$units));
}

Records that are located within this search area still need to be filtered, and I use the following formula to calculate the distance between two coordinates:



Implemented in code, the formula look like this:

// calculate distance between two lat/lon coordinates
function distance($latA,$lonA, $latB,$lonB, $units="mi") {
    $radius = strcasecmp($units, "km") ? 3963.19 : 6378.137;
    $rLatA = deg2rad($latA);
    $rLatB = deg2rad($latB);
    $rHalfDeltaLat = deg2rad(($latB - $latA) / 2);
    $rHalfDeltaLon = deg2rad(($lonB - $lonA) / 2);

    return 2 * $radius * asin(sqrt(pow(sin($rHalfDeltaLat), 2) +
        cos($rLatA) * cos($rLatB) * pow(sin($rHalfDeltaLon), 2)));
}

A search using these functions would look like this (assume $latitude and $longitude are given and provide the origin coordinates, and $distance and $units provide the desired search radius):

$b = bound($latitude,$longitude, $distance,$units);
$query = "SELECT location_name, address, city, state, zip_code,
    latitude, longitude FROM locations WHERE
    latitude BETWEEN {$b["S"]["lat"]} AND {$b["N"]["lat"]} AND
    longitude BETWEEN {$b["W"]["lon"]} AND {$b["E"]["lon"]}";
    $result = $db->query($query);

    $locations = array();
    while ($row = $result->fetch(PDO::FETCH_ASSOC)) {
    $dist = distance($latitude,$lonitude, $row["latitude"],$row["longitude"], $units);
    if ($dist <= $distance) {
         $locations[] = array("name"     => $row["location_name"],
                              "address"  => $row["address"],
                              "city"     => $row["city"],
                              "state"    => $row["state"],
                              "zip_code" => $row["zip_code"],
                              "distance" => round($dist, 2));
    }
}

A deep understanding of the mathematics used in navigation is nice to have, but is not truly necessary. We're lucky to live in an age where finding suitable formulas is just a quick Internet search away, and discussions on the trade-offs between their simplicity and accuracy are easy to find. The formulas I used are based on the Haversine Formula which have been accurate enough for my needs. Others such as Vincenty's Formulae take into account the ellipsoidal nature of the Earth to produce more accurate results, but I'm helping people locate near-by friends or businesses... not programming cruise missiles.

The final step is obtaining origin coordinates. In the postal code example from the beginning you can look up the code's centroid latitude and longitude and use those as your search origin. If your target audience is receptive, you may want to consider taking advantage of the JavaScript Geolocation API defined by the W3C. The API exposes the navigator.geolocation object which can obtain the user's current location. A user will rarely know his or her exact coordinates, but many Internet-enabled devices and smartphones now ship with GPS capabilities which the browser can access. navigator.geolocation.getCurrentPosition() causes the brower to request permission from the user to report his or her location, and if permission is granted will invoke a specified callback function.

<form>
 Search <input id="txtDist" maxlength="3" name="distance" size="3" type="text">
 <select name="units">
  <option value="mi">mi</option>
  <option value="km">km</option>
 </select> from<br>
 <input checked="checked" name="searchBy" type="radio" value="zip">ZIP code
 <input maxlength="5" name="zipCode" size="5" type="text"><br>
 <input id="coords" name="searchBy" type="radio" value="coords">coordinates
 <input id="txtLat" maxlength="10" name="latitude" size="6" type="text"> Lat.
 <input id="txtLon" maxlength="10" name="longitude" size="6" type="text"> Lon.<br>
 <input type="submit" value="Search">
</form>
<script type="text/javascript">
(function () {
    function setCoords(pos) {
        document.getElementById("coords").checked = true;
        document.getElementById("txtLat").value = pos.coordinates.latitude;
        document.getElementById("txtLon").value = pos.coordinates.longitude;
    }

    if (navigator.geolocation) {
        navigator.geolocation.getCurrentPosition(setCoords);
    }
})();
</script>

Hopefully you now seen how relatively easy it is to support searching nearby points of interest in your web-application. First and foremost it is important to maintain a good dataset of target coordinates, since though users will understand the search results to be approximations, you can never return good results with bad data. Implementing algorithms necessary to search the set efficiently is relatively straightforward and I have shown you the approach I use; other algorithms and formulas can be found and implemented if you need more accurate results. And modern APIs like navigator.geolocation make it possible for you to offer new ways to your users to interact with your data. Feel free to share your thoughts and experiences in the comments section below.

Special thanks to Matthew Turland and Davey Shafik who both helped me out immensely when I first got my feet wet with geolocation.

Esperanto translation is available at zaemis.tumblr.com/post/38447430414/geo-loka-sercxado

3 comments:

  1. Your strategy for distance searching is exactly what I had in mind. Thank you for providing all the formulas and explanation.

    ReplyDelete
  2. This is awesome...

    ReplyDelete