this is aaronland

ur:mum=machinetag

I went to space mountain and all I got was [redacted]

I don't usually do year in review style blog posts but a few months ago, on what I guess was a grimmer day than most, I made a list of the things I had managed to get done in 2010 in an effort to try and make myself feel a little better. I did. It's a list I can be happy with. It is also a list whose every member was like pulling teeth. And then stabbing myself in the face with their bloody nubs.

It was that kind of year. The photo above was taken at the end of 2009 and that print was meant to be a cheeky bow around a year with its own special set of ups and downs. Ultimately, it proved to be more of an omen for everything that was lurking just over the horizon. Life is funny-ha-ha that way, right? So, this blog post is mostly for me: To remember that it's still possible to make all the shit life throws at you in to a space mountain. Even when it's harder than you'd like to be.

In between there was a lot of drinking in hotel rooms. And the kindness and patience of friends.

Todd and Naomi's wedding, Edgartown

tag as navelgazing

Screen shot 2010-12-07 at 7.10.09 AM

You may be asking yourself: What's up with those tags, anyway? I'm pretty sure that I managed to accidentally blow away the tag database for all my blog posts around year three and eventually just decided to fix the problem by bulk auto-assigning new ones. It is, after all, my party...

On Sunday I wrote a quick and dirty Perl script to plow through the last five and half years of blog posts (the long-form posts that I sometimes read to Myles as bedtime stories) and bookmarked them all on pinboard. I also bookmarked all the outbound links as well making sure to tag them with pointers back to the original post. I used to do this inline in older versions of the blog. Eventually I disabled the feature for reasons I no longer remember so it's kind of nice to have it back, after a fashion. For example, here are all the links from the last post about enplacification:

Here are all the blog posts about the Papernet:

Here are all the blog posts I've written pointing to other blog posts I've written:

If you look carefully at the source of these blog posts you'll see there have always been semantic-tag-classes assigned to the titles, so now there's a way to actually search them. There is also some broken Javascript floating around that's supposed to display them inline on the page but broken Javascript is broken...

There's nothing here, so far, that you can't already do with delicious. It's just tag intersections and that's been around for a long time. What made it more interesting to do in pinboard (I haven't jumped ship, not yet anyway, if you're wondering) was Maciej's blog post about archiving bookmarks:

However, in 2010 I don't believe it makes any sense to try to archive bookmarks if you're not willing to resolve dependencies. Modern websites are a rich gumbo of javascript, CSS, Flash, images and embedded video, and from a user's perspective an archived copy should behave like the original, no matter what it takes to make that happen.

Though we're still far from reaching this goal ourselves, I think it's important that our users be aware of what's at stake. Whether it's archived bookmarks or your own personal data, there's nothing less fun than believing you had a working backup only to find out after the fact that you were wrong.

This weblog has been around for something like eleven years now and it always sucks to see things I have pointed to over the years vanish off the face of the Interwebs. Saving the larger question of whether or not it's okay (and healthy) to let things go, now and then, I figured I would see what it would be like if I asked pinboard to make a copy of everything I've ever linked to. If nothing else, the archiving stuff is worth is just to see where the linkrot is:

So far as I can tell, poking around the site, there's no way to get the archived versions of your bookmarks out of pinboard or to list them all in the API. I thought maybe there would be way to use the download for offline reading feature to create a local archive everytime I published a new blog post but that seems functionality limited to only recent bookmarks (which means that it's probably possible to shoehorn in what I'd like to do but life is short...) On the other hand if you look at the source of the tag pages (when you're logged in to pinboard) there's this, so I'm not really worried about it yet:

<a href="/cached/ad614164b55e/" class="cached">☑</a>

It would be nice if http://pinboard.in/u:straup/t:aa:ima=link+code:404 also worked but if I learned only one thing in five years at Flickr it was to never ever again say: Why can't you just... Calling out linkrot at all is more than delicious does and I'm willing to spend the twenty-five bucks for the year, just to see what happens, even if the experiment doesn't work in the end.

Finally, there's nothing special about the tags I'm using. They happen to look and feel like machine tags but since pinboard doesn't index them as such they're really just fancy strings that I use to help me organize stuff. This is what I've come up with so far:

Sites like delicious and pinboard have both developed their own internal lexicon of magic words to assign meaning and sometimes action to a tag: Things like code:404 or via:blech or system:medium:audio. Maybe some day there will be an equivalent set of tags (we'll save the discussion of whether or not they're really just machine tags with a different name for another post) for blog-like webpages but for now this is what I'm doing.

See also:

Enplacification

Screen shot 2010-12-04 at 11.41.42 PM

On Thursday night, it was very quietly made known that Flickr had enabled the machine tags extras love for photos with Foodspotting machine tags and that the Foodspotting and Instagram iPhone applications now automagically add dspotting:place= and foursquare:venue= tags, respectively, to the photos they upload to Flickr.

This is happy making. One of the hopes behind all the machine tags and machine tags extras work is that it would be incentive for other services to build bespoke Flickr uploader applications. Perhaps we should have been more vocal about that but some days it's hard to know whether you beat people over the head about the value of an idea or you let them discover it on their own. I am often wrong about this but I prefer the latter and a little bit of patience. In a world where for a long time there was only the Noticin.gs uploadr now there are three!

I have been working on a little side-project that involves being able to define a location simply by copy-pasting a URL from a third-party provider in to a text field. This isn't about hoovering someone else's database or all their users mostly because I am building this site for me and I don't want to be beholden to any one particular service. It's also about having the site take care of extracting just enough data to store locally for simple search and retrieval and display.

I think the cool kids all call this linked data these days. Isn't that right, Myles?

As it happens, lots of the places that I would like to be able to quickly note down for future use are in Flickr photos, particularly photos of restaurants and d. For a bunch of dull reasons, not all of these photos have been geotagged but many have been machine tagged. Which means that if a photo has a machine tag for a service known that have location information it ought to be possible to reach out across the Internetworks and use the value of that machine tag as a key into their system to retrieve geo data.

Which means you ought to be able to use the photo URL as the key itself. And lo, you can!

Remember: As I wrote this, none of the photos in these examples were geotagged. What follows are examples and recipes for what to do after you've called flickr.photos.getInfo only to discover that a photo has no location information but does have machine tags.

foursquare:venue=

# this photo is tagged foursquare:venue=1088273

dumper(places_flickr_enplacify("http://www.flickr.com/photos/55224303@N05/5233199247/"));

array (
  'latitude' => 35.6974783,
  'longitude' => 139.8121268,
  'name' => 'アルカキット錦糸町 (ARCAKIT KINSHICHO)',
  'address' => '錦糸2-2-1',
  'derived_from' => 'flickr',
  'derived_from_id' => '5233199247',
  'url' => 'http://foursquare.com/venue/1088273',
  'phone' => '0338295656',
)

Foursquare is the easiest of the lot because it's got a proper API with methods for looking up venues, so if you've got a photo machine tagged foursquare:venue=1088273 you're basically done. It's worth noting though the individual web pages for venues now appear to be chock full of Facebook OpenGraph meta tags. I haven't bothered to see if it's a one-for-one parity with the API but I mention it because it's relevant to the next example.

yelp:biz=

# this photo is tagged yelp:biz=the-red-door-ale-house-seattle

dumper(places_flickr_enplacify("http://www.flickr.com/photos/jwalsh_/3610758308/"));

array (
  'latitude' => '47.65014',
  'longitude' => '-122.351693',
  'name' => 'Red Door',
  'phone' => '(206) 547-7521',
  'url' => 'http://www.yelp.com/biz/q20FkqFbmdOhfSEhaT5IHg',
  'address' => '3401 Evanston Ave N',
  'derived_from' => 'flickr',
  'derived_from_id' => '3610758308',
)

Meta tags, right? I mean who knew...

Yelp is bit more complicated because there's no way to lookup individual listings in the Yelp API so we need to scrape the listing page from the website proper. The good news is those HTML pages are packed with your website is your API goodness in the form of embedded Facebook OpenGraph meta tags and vCard information stored in semantic CSS classes. It's definitely not pretty and surely prone to breaking but at least now you can get close-enough-to-play-it-on-TV complete location data for a place.

I've not posted actual code for doing any of this generically because I don't have the time to maintain it and it's one of those things that's probably easier and faster for any given Internet Typist to write using their prefered bag of hammers and idioms. Here's some pseudo-code (and some example code for parsing out OpenGraph and vCard data) to demonstrate the basic idea:

flickr_data = { ... }

for flickr_tag in tags:

  if not tag.machinetag:
    continue

  nspred, value = tag.raw.split('=')
  ns, pred = nspred.split(':')

  if ns == 'yelp' and pred == 'biz':

    # 1) construct a proper URL using 'value' as the business ID
    # 2) fetch that URL
    # 3) parse the body of the URL looking for OpenGraph tags
    # 4) parse the body of the URL looking for vCard classes
    # 5) Return joy

    yelp_data = places_yelp_enplacify('http://www.yelp.com/biz/' + value)

    flickr_data = array_merge(yelp_data, flickr_data)
    break

Also, don't forget to account for the fact that Yelp redirects all its pretty URLs to ones with obfuscated business IDs (for example, http://www.yelp.com/biz/humphry-slocombe-ice-cream-san-francisco becomes http://www.yelp.com/biz/47OC_X6KkiDDQ4jwoCUjFg) so unless you're working with an HTTP library that handles 301 responses, you'll need to deal with that yourself.

If you're looking at the pseudo-code and asking yourself whether enplacify can enplacify itself that's good because the answer is yes. Ultimately you are bounded only by how long you want to wait for an answer to come back or if you're running in a live production environment how long you can leave your HTTP connections open before the sky starts to fall.

[ insert Semantic Web one true graph joke here ]

dspotting:place=

# this photo is tagged dspotting:place=2617

dumper(places_flickr_enplacify("http://www.flickr.com/photos/cynk/5084197983/"));

array (
  'name' => 'Bund Shanghai',
  'derived_from' => 'flickr',
  'derived_from_id' => '5084197983',
  'latitude' => 37.796114,
  'longitude' => -122.405999,
  'address' => '640 Jackson',
)

If Foodspotting has an API I've not been able to find it and the only metadata in their places webpages is some spotty vCard data marked up in semantic CSS. That being said if a listing contains enough adress information (I've only included the street address above, but often locality and region will be present) you can feed that data in to another geocoding service and pull out the exact (-ish) location for the thing you're looking at. Again, pseudo-code:

rsp = http_get(dspotting_url)
fs_data = vcard_parse_html(rsp['body'])

if fs_data['street-address'] and fs_data['locality'] and fs_data['region']:

  query = "%s, %s %s" % (fs_data['street-address'], fs_data['locality'], fs_data['region'])
  geo = yahoo_google_whatever_geocode_string(query)

  fs_data = array_merge(geo, fs_data)
  return fs_data

# and then later on:

flickr_data = array_merge(fs_data, flickr_data)

Good!

Of course there's nothing here that's technically Earth shattering other than the fact that it's finally possible and relatively easy to do this stuff, which is the exciting part. Other notable services to connect are Facebook Places (someone who's actually got a Facebook account and can log in to see their bloody developer docs, please let me know) and OpenStreetMap or maybe even Dopplr which uh... yeah, anyway... More importantly, it's a pattern that's not exclusive to locations.

We never did get nyt:person= machine tags working properly, at Flickr, because the Times kept changing the way their identifiers move and we never re/built the machine tags infrastructure to account for photos with more than one identical namespace-predicate pair but I loved that it (almost) made possible a completely sideways avenue for people-tagging your photos. What could you do with article specific machine tags for the New York Times? Or The Guardian? You get the idea.

Matt Biddulph bookmarked an article on del.icio.us the other day that nicely expresses the sentiment:

Kinect is making nothing which wasn't already technically possible, possible. It is just making it accessible, not just in terms of price, but also in terms of simplicity and ease. The question should not be "what can you do with kinect that you couldn't do before", but it should be "how much simpler is it (technically) to do something with kinect, which was a lot harder with consumer devices before kinect.

http://www.memo.tv/kinect_why_it_matters

And if you're wondering: Yes, going through all the not-geotagged-but-machine-tagged photos and calling out to those third party services for location information was one of the things that I wanted to do while I was working on all this stuff at Flickr. Sorry, I still feel bad about that... The value of things like machine tags is not simply to be a means of broadcasting information outwards but also to be a kind of homing beacon that can be used to attract, to ingest and finally to dress itself anew in the uses the street finds for it.

Small bridges, indeed!

Screen shot 2010-12-05 at 10.07.42 AM