this is aaronland

The Hammock of Interpretation

In a spinny bar

spinny bar is                                                           spinning
No one asked me to do a recap presentation for Museums and the Web but since I enjoyed the one I did for ETech so much I just decided to abuse the format and do a “dog-eared” conference post from my notes and conversations.
compromised by                             excessive orientation
From Max Anderson’s opening keynote, which was excellent and inspiring and seemed to act as the touchstone for the rest of the conference.
foster projection
never willingly                             outsource creativity
From Max Anderson’s opening keynote. This rang especially true with me since it was part of the point I was trying to get across when I spoke at Museums and the Web last year and the IMA are proof of what a small team that gives a shit can accomplish with a little bit of support.
#passiveinvitations
This was from the social media session and was a comment by Joe Hoover about placeography and creating projects as passive invitations. I also like it as a way to describe the way Twitter was used as a very public and good-natured hecklebot throughout the conference.
object db wiki
The social media session was heavily focused on the use of wikis in a museum context and two things stuck out: 1) How many people are starting to use MediaWiki and how many of them can already imagine the possibility of using it "in-house"; that is, effectively replacing their existing and highly-specialized CMS' and having curators and the "community" work on the same document. 2) The progress that the Semantic MediaWiki people have made. I say that not because I want to spray triples all over everything but because they are slowly building out the tools to hide the vagueries of entering structured data from people.
oral history, written                             history, cloud history
Darren Peacock asked about this in the “psychogeography and storytelling” unconference session and it's just a nice thought exercise. I don’t expect there are any answers (yet) but it was an idea that kept cropping up in conversations.
the weird dancing lady                             and the question of art
One of the things I started getting on about the second night, at the bar covered in moose heads in the building said to be designed by Kurt Vonnegut’s grandfather, was printed FAQs under individual works of arts. Why not actually answer the question: Why is this considered art? Why not try to answer the question outside of the formal language of art discourse? Why not see what questions people are really asking and then, at least for some of the questions, let visitors answer some of them themselves?

This seemed all the more interesting to me because there is an LED/video installation near the bar showing a stylized woman swaying her hips back and forth, in an endless loop. It’s a bit creepy really and from a distance looks like a weird and slightly disjoint attempt at public art. It prompted a lot of jokes until someone pointed out that it really was capital-A art and then everyone shut up. Because, you know, it was...art.
deeds of gift
Josh Greenberg, from the NYPL, and I talked about this in the context of the Commons, the desire (and pitfalls) of people wanting to put their work in the public domain and generally the uncharted territory called “backing up the web”.
attaching a scene
This was a phrase Richard Urban mentioned in the context of some semantic markup language for collections whose name I’ve forgotten. I’m less concerned with the mechanics than with the idea of attaching a whole scene, something more than a series of staccato tags or keywords, to a piece. Like a short story or a winter coat, which would have made artists like Francis Bacon cringe but it tickled my “magic words” bell which I always enjoy.

It is also an interesting avenue when applied to maps and Josh (Greenberg) had just finished showing me the work that Schuyler (Erle) has been doing building tools for the NYPL historical maps collection.
they can't make you                             win but they can intervene to keep you playing
Nina Simon talking about casino theory but applied in a non-creepy way to museums.
return books | return                             awesom books
Nina Simon had two “library” slides in her talk. One was a fancy-pants RFID-enabled library in the Netherlands where the both the books and the drop-off shelves were programmed to automatically tag a book (“good”, “bad”, “sad”, “mad”, etc...) when placed on a particular shelf and the other was a mocked-up photo of an old skool book drop on the side of a library where the labels had been changed to read “return books” and “return awesome books”. I liked Nina’s better.
last.fm for museums
Richard (Urban) suggested this during Nina Simon’s talk and all I can say is: Um... fuck yeah!
follow bacon
This was the other seed that I left in anyone’s ear who cared to listen. I want to able to “friend”, “follow”, whatever individual works of art in a collection. I want to know when a painting goes on display or back to the storage facility. Never mind big traveling exhibitions, institutions lend out individual works all the time and I want to know when something that I’m interested in is going to be on display nearby. It’s the same principle as Dopplr really: If a work I travels to Los Angeles then I might make the effort to visit it too.
museums are the new                                                           amusement parks
One of Neb’s comments in his talk at ETech, this year, was that an amusement park afforded you the freedom to side-step some of the thornier issues surrounding ubiquitous (and physical) computing and the sensor world because you were working with “willing participants”. I'm just saying...
an historical                                                           sensation
write a confession in                             front of a mirror
parks canada has a new                             media department
Who knew? I mean... seriously, who knew? Now they just need a, whadaya call it, a website.
torque vs. power
It turns out the spinny bar, at the top of the conference hotel, runs off of nothing more than a single 5/8 horsepower engine and a pair of gear reducers that push the whole of thing around on a rail. There’s a lesson in that.

The other lesson is that so-called private tours, of anything really, are the most interesting and helps me belabour the point that curators and Art Professionals are about a million times more interesting when they let their guard down (read: are drunk) and speak simply, rather than in the language of “professional discourse”, about the things that they are passionate about.
start small but start
It’s a good thing Paula’s talk was so interesting because otherwise I’d give her shit for dropping the “switch” and “wookie” slide from the materials she posted online.
critical friends
This was from Brian Kelly’s “stop and make sure we haven’t painted ourselves in to a Web 2.0 corner” talk. I don’t actually remember how he segued from that in to the idea of critical friends but it’s a lovely phrase.
a graph that goes like                             this ... is just a graph of the internet
do one thing
what would brooklyn                                                           do?
No one from the Brooklyn Museum could make it to the conference this year which is doubly sad since they seemed to be present in most people’s conversations and cleaned up at the awards ceremony so I’ll just point to this interview that Mike Ellis did with them:
PLAY ART LOUD

And then I said...

See also: the short version.

py-wsclustr.php

Where place matters more and space matters less...

Benjamin Bratton

I will be at Museums and the Web, this week, to talk about the work we've been doing at Flickr around geotagging photos, reverse-geocoding and shapefiles and more broadly notions of bias in and the interpretation of place. Plus, I get to speak alongside the Philly History crew which is extra-exciting!

I'm other thing that I'm excited about is being to talk about how Clustr, the open source tool we use to generate shapefiles, is now bundled as part of the Maps From Scratch Amazon EC2 AMI. There's a long and detailed blog post about all that on the code.flickr blog but the short version is:

We expressly chose to make Clustr an open-source project to share some of the tools we’ve developed with the community but it has also always had a relatively high barrier to entry. Building and configuring a Unix machine is often more that most people are interested in, let alone compiling big and complicated maths libraries from scratch. Clustr on EC2 is not a magic pony factory but hopefully it will make the application a little friendlier.

In that post I talked about wanting to be able to use Clustr by calling a simple web service so eventually I wrote the quickest and dirtiest implementation I could think of: a PHP script that simply shells out to the Clustr application and then returns the output (compressed). I encourage anyone who wants to get hung up on the lack of elegance in that approach to port CGAL to PHP. Your efforts will be amply rewarded, I'm sure, but in the meantime this already works:

$> curl -H 'x-clustr-alpha:0.00001' -v --data-binary '@/path/to/points.txt' \
	http://ec2-xxxxxxxx.compute-1.amazonaws.com/ws-clustr/ > ~/path/to/shapefile.tar.gz

ws-clustr.php is available for anyone to download on GitHub, along with a handy README file for getting it to work with the Maps From Scratch AMI. Which is all good but you still need something to make shapes of. How about all the geotagged photos uploaded to Flickr on March 24, 2009:

$> python flickr-tools/geotagged.for_day.py -c /path/to/flickr.cfg -d '2009-03-24' --clustr

That yields a file with 54, 673 points that I can ask ws-clustr to plot. By passing those points to ws-clustr with a variety of alpha sizes (11 times to be exact) I was able to generate the following image in QGIS:

testing     py-wsclustr, alpha = (0.001 - 100)

The geotagged.for_day.py script is one of several Flickr related helper tools available for download on Github as part of the flickr-tools package.

So now what? Or rather: What if my mapfromscratch/ws-clustr AMI isn't already up and running and I want to generate hawt shapefile action? EC2 servers are great for doing short-fast tasks but if left running for days or weeks on end starts to incur noticeable fees. Fortunately, starting and stopping EC2 can be done programatically so I wrote a client-side interface, in Python, to (ws) Clustr that starts a new EC2 instance, exchanges a points file for a (compressed) shapefile and then shuts the server down again. The code also checks to see if there is already a running instance of the AMI you want to use and simply uses that one if available.

Like this:

from wsclustr import wsclustr

wsc = wsclustr('amz_access_key', 'amz_secret_key')
wsc.startup('ami-xxxxx')
    
while not wsc.ready() :
    time.sleep(5)

shpfile = wsc.clustr('2009-03-24-geotagged.txt')

wsc.shutdown()

Which was great, except for the part where I sent the same 1.3MB file across the wire 11 times in order to create all the shapefiles for the image above. EC2 is pretty cheap as far as these things go but sooner or later all that data and traffic is going to add up and Amazon won't hesitate to send you a bill for it. So, now both ws-clustr and py-wsclustr support an equally bare-bones caching layer for the data the client sends to the server. As far as the Python side of things go, it looks and acts like this:

shpfile1 = wsc.clustr('2009-03-24-geotagged.txt', alpha=0.001, try_cache=1)
shpfile2 = wsc.clustr('2009-03-24-geotagged.txt', alpha=0.01, try_cache=1)
shpfile3 = wsc.clustr('2009-03-24-geotagged.txt', alpha=0.1, try_cache=1)

If the cached version exists on the server then the shapefile will be generated using that without the client having to send all that data again. If the cached version does not exist then the server will return an HTTP 404 error and the client will re-try the request with the data. Caches are stored and referenced with identifiers generated from the contents of the data file. Specifically: clustr- + the value of md5sum(2009-03-24-geotagged.txt). If you look behind the curtain, what's actually being sent to the server is something like this:

$> curl -H 'x-clustr-alpha:0.01' -H 'x-clustr-cache: clustr-c77cae39a4f7e506a9cc8205176f1239' \
	http://ec2-xxxxxxxx.compute-1.amazonaws.com/ws-clustr/ > ~/path/to/shapefile.tar.gz

The Housekeeping Department would like me to remind you that it is left as an exercise to people running their own ws-clustr servers to take care of cleaning up their system's temporary directories, where the cache files are stored. ws-clustr was built to run on an EC2 instance where it is expected that the server, along with all its data, will be torn down long before disk space becomes an issue but since it's just a PHP script there's nothing to prevent it from being used outside of Amazon's cloud castle. Just something to keep in mind.

Picture 9

Likewise with caching the output, or supporting something like If-Modified tags, which currently isn't done yet for two reasons. The first is that Clustr is just Really Fast so I'd rather spend my time solving other problems than caching for caching's sake. The second is that there's no (automatic) expectation that the EC2 server running ws-clustr will ever be running long enough to warrant caching shapefiles by their alpha number and the contents of their data. Again, if people start to use the server outside of EC2 then it might be warranted but until then there are problems better solved sooner.

Now that you've sucked down shapefiles in Python it would be useful to do something with them. I like using Zachary Forest Johnson's shpUtils.py library to do the actual parsing (though the ESRI shapefile spec is actually pretty simple if you need to write a specialized one-off). Here is some sample code to parse a shapefile returned by ws-clustr and munge it in to list of Shapely polygon objects. Shapely is useful for doing all sorts of hairy geometry and head-scratchy math but the shorter way to think about it is that it's basically Just Awesome.

The complete code listing is included in the examples directory of the py-wsclustr project on GitHub.

t = tarfile.open(shpfile)
t.extractall()

# Because the tarfile.getnames method always seems
# return the list of files in random order...

shp = shpfile.replace(".tar.gz", "")
shp = "%s/%s.shp" % (shp, shp)

import shpUtils
from shapely.geometry import Polygon

polys = []

for record in shpUtils.loadShapefile(shp) :
    for part in record['shp_data']['parts'] :

        poly = []

        for pt in part['points'] :
            if pt.has_key('x') and pt.has_key('y') :
                poly.append((pt['x'], pt['y']))

        poly = tuple(poly)
        p = Polygon(poly)
        polys.append(p)

                    
cl-interboxes

Or, if you're like me you'll want to display all those shapes using ModestMaps. Here is the code used to generate the image below, modulo the part where the modestMMarkers package is not public yet. This is code still under active development to display the turkishMMap (remember that?) cluster-y bits but that's not really the point. The point is that there are now a few more nubby bits in the toolbox with which to build things. I happen to have a bit of a map fetish.

alphas = (100, 25, 10, 5, 1, .1, .01, .05, .001, .0005)

swlat = None
swlon = None
nelat = None
nelon = None

shapes = []
    
for a in alphas :

    shpfile = clustr.clustr('2009-03-24-geotagged.txt', alpha=a, try_cache=True)
    
    t = tarfile.open(shpfile)    
    t.extractall()

    shp = shpfile.replace(".tar.gz", "")
    shp = "%s/%s.shp" % (shp, shp)
            
    records = shpUtils.loadShapefile(shp)
    polys = []
        
    for record in records :

    	# this is a bit redundant since it only
        # needs to be calculated once but you get
        # the idea...

        data = record['shp_data']
        
        if not swlat :
            swlat = data['ymin']
        else :
            swlat = min(swlat, data['ymin'])

        if not swlon :
            swlon = data['xmin']
        else :
            swlon = min(swlon, data['xmin'])

        if not nelat :
            nelat = data['ymax']
        else :
            nelat = max(nelat, data['ymax'])

        if not nelon :
            nelon = data['xmax']
        else :
            nelon = max(nelon, data['xmax'])

        for part in record['shp_data']['parts'] :

            poly = []
            
            for pt in part['points'] :
                if pt.has_key('x') and pt.has_key('y') :
                    poly.append({'longitude':pt['x'], 'latitude':pt['y']})

            polys.append(poly)
            
    shapes.append(polys)
            
w = 6000
h = 4000

pr = ModestMaps.builtinProviders['BLUE_MARBLE']()    
sw = ModestMaps.Geo.Location(swlat, swlon)
ne = ModestMaps.Geo.Location(nelat, nelon)
dims = ModestMaps.Core.Point(w, h)
    
mm_obj = ModestMaps.mapByExtent(pr, sw, ne, dims)
map_img = mm_obj.draw()
    
shp_img = PIL.Image.new('RGBA', (w, h), 'white')
    
# Hey look! This is modestMMarkers.py; it has not been released yet!!

poly = modestMMarkers.polylines.polyline(mm_obj)

for polys in shapes :
    shp_img = poly.draw_polylines(shp_img, polys, color=(0,0,0))

mask = shp_img.convert('L')

enh = PIL.ImageEnhance.Contrast(mask)
mask = enh.enhance(2.5)

mask = PIL.ImageChops.invert(mask)

cnv = PIL.Image.new('RGBA', (w, h), 'white')    
cnv.paste(map_img, (0, 0), mask)
py-wsclustr + modestmmarkers (2)

No, really.

Like everything else, py-wsclustr is available for anyone to play with on the GitHub. At some point in the near future I will make sure that all these packages are also given a home on aaronland.info, filed under Just In Case.

As an aside, I finally made my peace with EC2 and Amazon on the grounds that, at the end of the day, it's just a plain old Unix box with tailored build instructions that can be backed up and re-created like any other server and if you're not already backing up your machines then you've got bigger problems than whether or not Jeff Bezos wants all your base. Compare this to Google's AppEngine which looks really interesting but for some reason requires that you give them your fucking phone number to sign up for a developer's account. It's like a whole new and perverted twist on the honeypot some days...

Meanwhile, come May I will be speaking about Clustr and shapefiles and communities of authority at Where 2.0, in San Jose. In the talk-is-cheap-always-try-to-have-working-code department I had sort of imagined not being able to get to the HTTP client libraries for Clustr working so soon; now I'll just have to dream up something new to share with people! If you've been thinking about attending but needed a little more coaxing the nice folks at O'Reilly have given me a 25% discount code (for the registration fee) to pass along: WHR09FSP.

In July, I am looking forward to returning to Vancouver and speaking at GeoWeb 2009 about the idea of nearby, and history boxes and trying to encourage a more nuanced understanding of place that can be read and traveled like a contour map of meaning. Or something like that. There's a lot of twisty in that one so I am pleased to have the chance to try and give a little more form to the idea. Indeed, there are still long and twisty blog posts about nearby and history boxes and the importance of artifacts and the Papernet to be written, each of which will surely feed the talk.

But not tonight.