this is aaronland

things I have written about elsewhere #20190517

Harvey Milk Plane Has a Permalink – Updated flight data at SFO Museum

This was originally published on the SFO Museum Mills Field weblog, in May 2019.

Today we are happy to announce something that's actually been live for a couple weeks now: Improved and updated data for flights arriving at and departing from SFO. This is the proverbial "version 2" of work that we began way back in January. Since the beginning of May we have been recording the following data for many (but still not all) flights:

Here are all the distinct aircraft (models) we've seen so far:

These are not actually all the aircraft. We are excluding private aircraft but we also know that some flights are missing for reasons-involving-computers that we haven't solved yet. For example, where are all the Embraer planes that we know are operated by airlines at SFO? With any luck, we will find them soon.

Here are all the individual airplanes that have visited the airport:

If you had told me that a little over two weeks of data would have yielded almost 1,400 unique airplanes at SFO I would have been surprised. I am surprised.

We have permalinks for every airplane we see identified by its registration number, sometimes called a "tail number". For example the "Harvey Milk Plane", operated by Norwegian Air:

You can see all the individual flights for an airplane or the airports that it has traveled to or arrived from (where one half of the journey is SFO):

It's also possible to search for flights by their tail number:

You can see flights and airports for individual aircraft models:

Did you know that Lufthansa operates at least three different A380s in and out of SFO? I didn't. Here are all the flights for all the different 737 models taking off and landing at the airport:

We have endpoints for the individual airplanes and aircraft models that an airline has flown in and out of SFO:

Or the different aircraft models that visited a particular gate at the airport:

The combinatorial possibilities for slicing and dicing all this data are as dizzying as they are exciting. A few things that should exist but don't, yet, include:

We've also started publishing planned flight routes and the actual flight path recorded, when that data is available.

Both the route and the path are published alongside the principal flight record as "alternate" geometries.

We've talked about alternate geometries at length, in the past, in the context of "where" a gate is located at the airport, so it's like that but for flight data!

As of this writing the flight paths are coarse as well as clipped in and around airports and at the edges of the United States. That's not ideal and we're working on getting better data. Since there was no flight path data available a couple of weeks ago we think it's a good start and qualifies as "better than yesterday".

"Version 2"

As you can see from the diagram above this project is fully "boxes and arrows" compliant. The bits in yellow are actual code that we write ourselves and everything else are abstraction layers and third-party services that make up "modern" software development in 2019.

The un-numbered boxes at the top are the data sources for the flight data. The un-numbered boxes at the bottom are this website. Boxes 1, 2 and 4 are what we started with back in January when we began publishing flight data.

The boxes numbered 3 and 5 are the newer tools we use to process a recent data source that we are still getting familiar with and where all this new data comes from. In time the newer data consumers (boxes 3 and 5) may become the primary source for all things flight-related but it's early days still.

We are trying to learn by doing and in order for that to work we need to "do" things in small steps, in a way that doesn't mean tearing everything down and starting from scratch everytime we have a new idea.

We need to make "version 1" simple enough to launch and "version 2" (and 3 and 4 and so on) simple enough to make revisiting a project worthwhile and, importantly, simple enough that it doesn't break version 1. Sometimes that is easier said than done but that's the work, right?

Box 6 is the openly licensed flight data that we publish every day and boxes 7 and 8 are the tools that we use to wrangle that same data in to this website. Everything you see here has been built using the same raw materials that we’ve made available for you to do something with.

Earlier this year I attended the annual Museums and the Web conference and delivered a talk to accompany a paper about the work we're doing with the Mills Field website. I've excerpted one part of the talk that is relevant here:

Historically the model for most digital or web-based initiatives has been to first export data from an internal collections management system. Second, that data is massaged in to an intermediate form for use by the project at hand and then third, exported again in to a typically bespoke machine-readable format.

We have changed the order of things to publish the open data representation first and then, from there, to build our own websites and services on top of that.

Everything I've described so far has been built using the same raw materials that we've made available for you to do something with. This introduces a non-zero cost in the build process for the public-facing museum efforts but we believe it's worth the cost.

But why, right?

First of all we want other people to build new interfaces and new services, new "experiences" even, on top of our collection so this is a way to keep ourselves honest. If we can't build something with this stuff why should we imagine you will?

Second, we want to ensure that the data we release and the manner in which it is published, is actually robust and flexible enough to engender a variety of interfaces and uses because we need that variety. It is important to the museum because I don't believe there is, or should be, only one master narrative in to the collection.

We have published five months and (counting worth) of flight data so far and while that data may not seem terribly exciting on any given day its value grows in the aggregate.

There are stories attached to every one of those flights and given that part of the museum's mandate is the history of the airport we have an interest in crafting a space, a dance floor even, for those stories to strut their stuff. There are also places attached to each flight and given that everything we collect and display is also from somewhere we are excited to use any given flight, to and from SFO, as an avenue in to our collection.

Meanwhile, speaking of tail numbers...