prettymaps.local
One of the early design decisions behind prettymaps was: No moving pieces. Which really meant: No tile servers.
Tile servers are yet-another point of failure and unlike most other special-case servers are usually meant to be exposed directly to the public. Stamen hadn't yet decided to tackle the tile serving problem in earnest as they have now with maps.stamen.com so we had the enforced luxury of choosing not to serve street level tiles for the entire world.
prettymaps has global coverage from zoom levels two through ten and then goes down to zoom level 15 for something like two dozen individual cities. Which is still a lot of map tiles. Lots and lots of tiles. Gigabytes and gigabytes of tiny little files. But we pre-rendered them all and put them on an EBS volume that was attached to an EC2 server and told the world about the project and things seemed to work out okay.
Eventually the site was moved from EC2 to a very big S3 bucket
. There is a lot
to like about S3 but one of the things I like best is the
ability to serve an entire website from a bucket. You are
still stuck serving your website from an ugly-ass AWS domain name
but you can get around that with CNAMEs. For example:
$> dig prettymaps.stamen.com ; <<>> DiG 9.8.3-P1 <<>> prettymaps.stamen.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 44584 ;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;prettymaps.stamen.com.IN A ;; ANSWER SECTION: prettymaps.stamen.com. 1023 IN CNAME prettymaps.stamen.com.s3-website-us-east-1.amazonaws.com. prettymaps.stamen.com.s3-website-us-east-1.amazonaws.com. 60 IN CNAME s3-website-us-east-1.amazonaws.com. s3-website-us-east-1.amazonaws.com. 23 IN A 72.21.215.203 ;; Query time: 126 msec ;; SERVER: 10.0.1.1#53(10.0.1.1) ;; WHEN: Sat Mar 2 08:13:35 2013 ;; MSG SIZE rcvd: 136
From an operational point of view we were able to more or less remove (or more accurately delegate) all the moving pieces in one fell swoop. It's a little more complicated than that, but not really, and overall it has proven to be a nice and elegant solution to the problem of how you keep an historical project like this running without it interrupting the day-to-day needs of the present and in particular a busy client-services company.
The Internet Archive should really make this possible for their S3-style buckets as well. It's actual work to to enable but I reckon that if there was both a way to store things in the Archive and still keep them running on the internet people would start uploading stuff with tears of joy in their eyes.
Fast-forward to a couple months ago when the Raspberry Pi I ordered
finally arrived. If you haven't heard about the Raspberry Pi yet
it's a credit-card sized Linux computer that costs about 35$ and
uses an SD card as its hard drive
. That's about it on the surface but
it is the room for experimentation and the low cost of failure
made possible by the availability of a real and proper computer at
such a low price that is novel and exciting.
It makes it possible to think about setting up silly projects like Loopr without also having to think about spending non-silly amounts of money on the hardware necessary to run them.
Almost all the documentation makes it
clear that you don't need anything bigger than a 4GB SD card to get
started. After going through all the usual wax on wax
off exercises to get a basic system up and running I started
wondering what you could do with a big honking SD card. You can
get a 256GB card for about four hundred dollars so the Pi is still
not quite the mythical OpenStreetMap in a box
that
everyone dreams of. It's both too expensive and too small to
manage the raw XML data dump. Maybe the work to transition the
data exports to use Google's Protocol Buffer format would make
it possible? It's been a while since I've looked at that
stuff...
Still, you can get a 64GB SD card for about fifty bucks these days. So I did.
I haven't done a lot of load testing on the Raspberry Pi. You
could probably deploy one for a low-to-medium traffic website
without worrying too much but I doubt I would use one to run a
public map-tile server even if I still think about
it every few days. I don't know how well PostGIS performs on a Pi but I know
for a fact that TileStache
and gunicorn both work like a charm
(although libgevent
isn't very happy and needs to
replaced with eventlet
).
Eventually I started to think that maybe the most exciting part of the Pi, right now, wasn't running it in public but being able to use it as an archiving tool. Not a suspend-it-in-amber kind of archive but something like, or a variation on, the kind of living breathing shadow copy that things like parallel-flickr or privatesquare try to be. Could I do that for prettymaps?
And I totally can, it turns out. There's not a lot to show for that statement, though. The only thing to show really is this:
Really.
I am not running a second instance of prettymaps on the internet because that would be dumb. prettymaps.stamen.com is still happily chugging away which is awesome and I don't imagine that will change any time soon. Maybe our confidence in Amazon's services and its longevity, not to mention benevolence, as a company is just a kind of collective Stockholm Syndrome but, well... here we are.
But that SD card has a working clone of prettymaps on it,
complete with all the browser-crushing interactive bits and all
the software needed to run it. I can simply plug it in to a
Raspberry Pi and point a web browser at
http://prettymaps.local
and there it is. Which is
kind of exciting.
It's not a silver bullet. From an archival perspective lots of weak links remain. You still need the physical Raspberry Pi hardware. You still need ensure the physical integrity of the SD card itself (which is fancy talk for keeping multiple copies of it). You still need to ensure that the code itself will continue to run on contemporary web browsers. The Pi has a built-in web browser when you run it in GUI mode but I have not tested to see whether prettymaps makes it cry yet. If it does work then it more or less solves the viewing problem since all the software comes pre-baked with the archive itself. Oh, and also electricity. That, at least, is not a problem I am going to solve in the same margins of the day that I used to work on this project.
But a thing that can be put to sleep and then asked to spring back to life — preserving its functionality — simply by plugging it in, at a cost of less than a hundred dollars feels like progress to me. And probably even less that that if you assume that any one Raspberry Pi can service multiple projects.
I have not tried to get parallel-flickr to run on a Raspberry Pi but that seems like an obvious next step. It is interesting to consider a pre-baked disk image that could be written on to a self-addressed and stamped SD card, so to speak, with a friendly web-based UI for people who aren't interested in sweating the technical sausage. There's nothing about the bare bones version of parallel-flickr that should be a problem so that's encouraging. The fancy parts of the parallel-flickr rely on the Java-based Solr project which I have not tested yet. I know that Java will run on the Pi but it is also brutally slow. Maybe that is acceptable for the purpuses of a shallow-breathing but still living archive?
Like I said, it feels like forward motion which is a nice way to start a Saturday morning. Especially when you're hungover.
Some technical notes, for those of you who are in to that sort of thing:
- Given that prettymaps is a web application it's just being served up by Apache. It could probably be done as easily and with a little less overhead by using nginx but I am not really worried about that.
- Unlike the S3-backed scenario described above the archive is not serving up map tiles from a humongous bag of static files. One thing the Pi does not do is optimize the file system for use with something like the volume of files that prettymaps has. In other words you run out of inodes long before you've stored all your data on disk. I spent a little bit of time thinking about this and all the solutions involved a lot of low-level steps building and re-building the operating system from scratch and I did not have the stamina for that. Which means:
- The archive is running a tile server. Specifically it is
running TileStache (behind gunicorn which is in turn proxied
through Apache) and serving tiles that have been squirted in
to a
collection of MBTiles databases. Rather than write a custom
TileStache provider to account for the fact that prettymaps
requests tiles using static
safe disk cache
URLs instead of standardzoom/x/y
tile server URLs I just added a flag to the site Javascript to do the right thing. Which means: - I've probably broken some cardinal rule of archiving by changing the code but I also signed and numbered — with a Mission Integer of course — my own 20x200 prettymaps print of San Francisco so I can claim precedence for this kind of bad behaviour. Or something like that.
- I've always thought MBTiles was kind of a weird bird but it has proven to be lovely and wonderful for this project. The 3.8 million remaining inodes on my SD card send their thanks.
- The
.local
part in the prettymaps.local name is a function of running the avahi daemon for doing zeroconf broadcasting. I don't really know if that's the best solution but it was easy and enforces a degree of consistency in naming conventions. - I would love to get David Blackman's twofishes geocoder running locally as a way to
preserve the geocoding functionality. That looks like it might
be a pretty involved process, though. Aside from the question
of whether or not the Pi has enough
ooomph
to both load all the data and serve requests the build process itself requires installing a custom version of mongodb. I had hopes that I might be able to get it all set up in the time it took to write this blog post but after reading that last link, it's a thing that will need to be saved for another day.
Onwards. In my case, towards a sink full of dirty dishes.
This blog post is full of links.
#prettymaps.local