this is aaronland

things I have written elsewhere #20160219

I Am Here

This post was originally published on the Mapzen weblog, in February 2016.

tl;dr

https://whosonfirst.mapzen.com/iamhere/ is a shiny new version of Simon Willison's classic Get Lat Lon application full of Mapzen-y goodness.

A short history

In late 2007 Simon Willison launched what some people have described as the most useful website on the internet. The website was called Get Lat Lon and its entire purpose was to enable a visitor to find the latitude and longitude of a point on a map.

The website was built using the Google Maps API and had a form for geocoding addresses or place names but the primary interface was a simple map with a set of crosshairs centered in the viewport. Get Lat Lon would simply print the geographic coordinates of whatever location happened to be beneath the crosshairs. Brilliant!

Somewhere between 2007 and now the domain renewal for Get Lat Lon lapsed and now it's... something else entirely, something not worth linking to. You can still get a feeling for the simplicity and elegance of its overall design because there are snapshots of the website in the Wayback Machine except... none of the Javascript works anymore.

I am not here

In 2009 I decided to write my own version of Get Lat Lon. Instead of using the Google Maps API it would use all open software and data. The map data would be from OpenStreetMap. The map tiles would be from CloudMade using exciting new cartography from Stamen Design. It would use the modestmaps.js library for managing all those tiles. It would support the then still-nascent browser-based Geolocation API to help determine your location. The geocoding would be handled by Flickr and in addition to geocoding it would also try to reverse geocode your location and display the shape of the place contained by a latlon, again using the Flickr API.

old Baghdad

And... it would be so clever and modular that it would support multiple service providers and you could just drop it in to any webpage and it would work, as if like magic. It was called I Am Here and I think I was the only person to ever use it but it's still running. In 2014, though, CloudMade got out of the tile business and so there is literally not much to see anymore.

no Baghdad

I am pretty sure that it's exactly one line of code to define a new map provider to make I Am Here work again but to be perfectly honest just looking at all that too-too clever code now, in 2016, is exhausting. Also, see the way the licensing information on the map data hasn't been updated to reflect the switch to the ODbL...

Reverse geocoding

Fast forward to last year (2015) and work has begun in earnest on the Who's On First (WOF) gazetteer at Mapzen. Part of that work has been to build hierarchies for each record in the gazetteer which is something of a chicken-and-egg problem. We've been automating the process with a general purpose point-in-polygon tool that we've written in-house using the Go programming language. It is called go-whosonfirst-pip and it works like this:

That's it. The purpose of the go-whosonfirst-pip code is to do fiddly math across a large and heterogenous dataset as quickly as possible.

rtree

This is what an R-tree looks like, courtesy Wikimedia user Chire

It only knows about points and things that contains those points but it does not know about context. For example, consider the following question: What continent is Russia a part of? Europe? Asia? All of the above? There are lots of interesting applications that remain to be built on top of go-whosonfirst-pip but it is important to remember that it is not an inference engine, by design.

The code includes a simple HTTP server (called wof-pip-server) that you can use to easily load (and then query) one or more meta files containing pointers to different WOF documents. If a WOF document is just a GeoJSON with a few explicit properties then a meta file is just a CSV with a path column containing a relative path to a WOF document.

Although the meta files were originally conceived as little more than a simple helper tool (or index) for large volumes of data they have grown in to something of a first class object inside the world of Who's On First, with more and more of the tooling and infrastructure built around them. They are due for a longer more detailed discussion but not today.

To get started with an instance of wof-pip-server that will query for countries and neighbourhoods you would do:

$> ./bin/wof-pip-server -data /usr/local/mapzen/whosonfirst-data/data/ \
	/usr/local/mapzen/whosonfirst-data/meta/wof-country-latest.csv \
	/usr/local/mapzen/whosonfirst-data/meta/wof-neighbourhood-latest.csv 
[placetype] country 219
[placetype] neighbourhood 49906
		      

Depending on how fast your computer is the indexing process might take a couple of minutes. By default the wof-pip-server listens for requests on port 8080 on your computer's local loopback network interface which is also called localhost, so the URL for querying the server would be http://localhost:8080. For example:

$> curl 'http://localhost:8080?latitude=40.677524&longitude=-73.987343'
[
	{
		"Id": 102061079,
		"Name": "Gowanus Heights",
		"Placetype": "neighbourhood"
	},
	{
		"Id": 85633793,
		"Name": "United States",
		"Placetype": "country"
	},
	{
		"Id": 85865587,
		"Name": "Gowanus",
		"Placetype": "neighbourhood"
	}
]
			

If you want to limit the result set to a specific placetype simply append placetype=PLACETYPE to your query string, like this:

$> curl 'http://localhost:8080?latitude=40.677524&longitude=-73.987343&placetype=neighbourhood'
[
	{
		"Id": 102061079,
		"Name": "Gowanus Heights",
		"Placetype": "neighbourhood"
	},
	{
		"Id": 85865587,
		"Name": "Gowanus",
		"Placetype": "neighbourhood"
	}
]
		      

Currently it is not possible to filter the result set with multiple placetypes. That's not technically a bug but it's become clear that it's also not a feature.

The wof-pip-server returns as little information as possible because it stores as little information as possible, mostly for performance reasons. It is left up to applications using wof-pip-server to decide whether and how to look up more information about any given WOF document.

The nice thing about the go-whosonfirst-pip tools is that they are designed to be agnostic as possible about the data they index and serve. For example I recently downloaded version 2.0.1 of the Flickr Alpha Shape files and re-jiggled the file structure (but not the actual data) of the alpha shapes and now they will Just Work™ with wof-pip-server.

whosonfirst-www-iamhere

So far, so good. We have an enormous bag of (Who's On First) data and we have a tool for establishing the relationship(s) between those files but any volume of geographic data absent a map is... hard to see.

gowanus

So, I rebuilt I Am Here (or Get Lat Lon) ... again. It's called whosonfirst-www-iamhere or just I Am Here (again) for short.

It does everything that the combination of Get Lat Lon and the original I Am Here did, but is built entirely using Mapzen tools and services.

Aside from an ongoing need to simply know what the coordinates are for any given spot on a map it seemed like whosonfirst-www-iamhere would be a good and useful tool for visualizing and sanity-checking the results returned by the go-whosonfirst-pip code.

probably a bug

For example, Golden Gate...what?

This time, though, it's been built with two guiding principles in mind. The first is that Mapzen should always be Consumer Zero (of Mapzen services) and the second is to minimize the pain and nuisance of any one piece, of what is actually a pretty complex application, failing or shutting down or otherwise going offline.

Mapzen as Consumer Zero

The latest version of I Am Here uses a bunch of Mapzen services already:

In time, it will also use:

Small pieces, loosely failing

The ultimate goal of whosonfirst-www-iamhere is to work from your own computer in offline-mode (or when you don't have a network connection) without needing to download and install a long list of dependencies. As of this writing:

Here is an example of how you might start whosonfirst-www-iamhere from your computer. This assumes that you have downloaded (or cloned) the whosonfirst-www-iamhere code and have navigated in to the root directory.

$> ./bin/start.py -d /path/to/your/whosonfirst-data/data \
	/path/to/your/whosonfirst-data/meta/wof-neighbourhood-latest.csv \
	/path/to/your/whosonfirst-data/meta/wof-locality-latest.csv
			  
baghdad-2

The short version is that once the start.py script has finished setting everything up you can open your web browser up at http://localhost:8001 and start poking around countries and neighbourhoods from Who's On First.

The longer version follows. By default the start.py tool requires a minimum of two arguments. The first (-d) is the path to where, on your computer, you've stored your raw Who's On First data files. The second and third arguments are the paths to meta files (remember them?) that the wof-pip-server will index. The start.py tool will start three separate servers running on your computer:

All of these port numbers can be changed if necessary. To do so you would pass your own setting as parameters to the start.py tools and as custom settings in the mapzen.whosonfirst.config.js config file.

boston

Bundles

One of the challenges with Who's On First has been balancing our desire for a robust and portable data format (plain text GeoJSON files), the needs for an historical audit trail and the mechanics of working with and distributing a large and ever-growing dataset. We have been using Git and GitHub extensively for much of the work to date but as the commit history around the data grows so too does the size of the whosonfirst-data repository and the burden in using it or simply getting started with Who's On First.

quebec

As an alternative we have been working on something called bundles. Bundles are:

...a collection of GeoJSON formatted files (Who's On First data) grouped by a specific property, like placetype. They allow for people to more easily bulk download a subset of the entire Who's On First dataset. Currently there are only bundles by placetype but eventually we will add a variety of different slices of the data as demand and interest require.

Each bundle contains a meta file (see... they just keep popping up all over the place!) and a folder named data which contains the files listed in the meta file. Bundles do not contain any Git history or related metadata but our hunch is that many people don't need or want that information. The startup tool mentioned above does not yet have support for bundles but that will happen shortly. In the meantime you can get started with whosonfirst-www-iamhere and bundles with a few short commands in your terminal.

For example, if you just wanted to run a copy of whosonfirst-www-iamhere using only microhoods (which are currently all in San Francisco) you would do the following:

$> cd /path/to/your/whosonfirst-www-iamhere
$> curl -O https://whosonfirst.mapzen.com/bundles/wof-microhood-latest-bundle.tar.bz2
$> tar -xvjf wof-microhood-latest-bundle.tar.bz2
$> ./bin/start.py -d wof-microhood-latest-bundle/data wof-microhood-latest-bundle/wof-microhood-latest.csv
				
this really happened...

All of the details and currently available bundles are listed over at https://whosonfirst.mapzen.com/bundles and... yes, Super Bowl City is a thing that really happened in 2016.

Or you can just use our version

As mentioned at the beginning of this blog post there is a publicly accessible version of I Am Here for you to play with at https://whosonfirst.mapzen.com/iamhere/.

Right now it only display neighbourhoods but shortly we will add the ability to select different (even multiple) placetypes to display at the same time. And as circumstance permits we will add the additional features (routing and IP lookups) mentioned above. And then all the stuff we haven't even thought of yet.

Enjoy!