I just want to share my social network... with myself

Recently, someone (Kellan, I think) said to me:

You know I am right there with you with your "Your Address Book is the Center of your Social Network" Mantra... Surprised no one has created a plugin for Apple's. It would be *A Good Start TM*

And I said:

Back during the days of the Job Search Before Flickr I was slowly trying to teach myself Cocoa for just this reason; essentially the germ that has grown into "restobook" (and del.icio.us maps). The problem is that there is no way to modify the default "panel" for an addressbook entry. You can only add overlays, which made the whole thing kind of suck.

So, I am reduced to writing S60 apps.

Now that I have a working developer's certificate I am revisiting things like the original fl(ickr)addressbook and the nwtracker and poking at the landmarks (the maps app) database. The latter is actually sort of interesting since you can populate it on the fly. It is not so interesting because it doesn't have any sort of hooks into the address book beyond abusing existing fields.

In *theory* you could run a copy of raccoon (web server) on the device and assign the www link fields in both the contacts and landmarks database to point to each other and ... uh ... synergize ?

Which is kind of a round-about way of saying that I wrote a Perl module on top of Christophe Beauregard's Flickr::Upload.pm library to fetch your current location, according to the Dopplr API, and to automagically add tags, machinetags and geotags accordingly.

For example, if you uploaded a photo while you were travelling in Beijing the following extra bits of information would be added:

The tag Beijing would be added to your photo.
The machine tag geonames:locality=1816670 would be added to your photo because Dopplr uses Geonames for their geocoding and that's their unique identifier for Beijing.
If you allowed it to be, the machine tag dopplr:trip=9999999 (read : whatever your actual trip ID is) would be added to your photo. I wonder if you'll ever be able to share trips in Dopplr...
Your photo would be geotagged with a latitude of 39.9289 and a longitude of 116.388 and an accuracy of 11, or city level.

In the same vein, YBR's ZoneTag has been using Upcoming, to infer location, since last year and the soon-to-be-public FireEagle is the same idea bundled up as a discreet service. So, if we can't ever figure out who is on first we can at least litter the Intercloud with little fortune cookies made of magic words.

Anyway, Flickr::Upload::Dopplr.pm

What the fuck are graduate students doing these days?

                
22:53:47 M: why so many?
22:53:53 M: all combinations?
22:54:00 A: yes
22:54:03 A: leftmost-iness
22:54:33 M: I don't see much of a way around that
22:54:38 A: because you will want to GROUP BY document_id
22:54:57 A: in order to do LIMITS without totally fucking up pagination
22:55:41 A: that was the first lesson of flickr for me : all bets are off the moment you decide that accurate pagination counts are important
22:56:01 M: bad decision ;)
22:57:15 A: but really, it's all INT-y indexes except for the string_value stuff so...all things considered, it's probably worth it
22:58:09 A: but it still begs the question : what the fuck are graduate students doing these days?
22:58:49 A: "I hold a doctorate from the University of DWIM"
22:59:41 M: they're all working on the semantic web
22:59:47 M: and polygon counts
22:59:52 M: and natural language web search
23:00:58 A: "the cloud will save us"
23:02:14 A: "how many polygons does a cloud have"
23:02:48 A: "would you like to buy a fog machine"

In October I am going to do a talk on machine tags, at the Semantic Web Strategies conference, in San Jose. Because talk is cheap, I always try to have something that looks like working code to make my case. I could, of course, simply point to Flickr.

But here's the thing : We have to architect things, including machine tags, differently than you. That is not a value judgement or posturing. It's just true. We have over one and a quarter billion photos.

And databases suck (which I'll get to in a minute).

There's lot of space between nothing and a billion and that's always been the place where I've hoped that people would play with machine tags. Ideally something you could plug into Wordpress or some other publishing system. Nothing fancy. Just a tool to index machine tags and provide a search interface so that you could link up disparate data sources.

In vanilla web-services-speak, you might say :

mt.namespaces.list(predicate='', value='')
mt.predicates.list(namespace='', value='')
mt.values.list(namespace='', predicate='')
mt.documents.search(namespace='', predicate='', value='')
mt.documents.search_by_range(range, namespace='', predicate='')
mt.document.add(uri, tags)
mt.document.remove(uri)
mt.document.tags(uri)

So I wrote one.

Specifically, I wrote a really simple stand-alone machine tag store that does not have ponies and will, hopefully, act as a spark for someone(s) to take it further.

It could be that I am the only person out there who really likes machine tags. I prefer, though, to think that part of the reason no one else has done this is that while machine tags are pretty simple, conceptually, by the time you start thinking about storing and querying them it begins to get ugly and complicated.

Typically, when you start to capital-T think about the problem you find yourself saying things like : Well, I could install Lucene or What about a triple store?. While there's nothing wrong with these approaches they're a little like telling the person who wants a glass of milk that they need to buy a cow.

I opted for making Python 2.5 the only requirement, and to use the built-in sqlite3 database magic.

Here's the rub : There are another 22 indexes on the machinetagstore table, in addition to what is displayed in that diagram. By any measure that is too many, even if they are all mostly just collections of the integers whose order has been re-arranged.

You could use a proper full-text search-engine to do this and not worry (as much) about indexing but then, more than likely, you won't be able to do range queries on your machine tag values. If you don't think you'll ever want to find stuff where, for example, the temperature was between -5 and 28 degrees Celcius then you could build something using a custom Namazu filter in time for dinner.

That said, range queries are easily the biggest feature request people have for machine tags on Flickr.

If you use a proper relational database then you get ranges but at the cost of leftmost-iness. Specifically, to quote the sqlite documentation : It is not necessary for every column of an index to appear in a WHERE clause term in order for that index to be used. But there can not be gaps in the columns of the index that are used. This is a problem native to most (all?) SQL databases and the net result for machine tags is that unless you enforce a strict ordering of query parameters, thereby limiting the search-iness of your data, you end up with indexes to match all the various combinations :

All the distinct namespaces
All the distinct values, where the predicate is subject
All the distinct documents, where the namespace is dc and the value is new york
And so on...

Where distinct documents in a database context best means the ability to group results (read : documents) as part of the actual query so that you can reliably define an offset and limit to the number of results you return at once. Given that this isn't meant to scale to the moon, in the context of a home-user, or tinkerer, you could probably safely return all the results for a query and distinct-ify them in memory (read : programming language) but the issues are still the same.

All of this is made worse if you want to range and/or full-text indexing — or in my happy magic world a context id, which is just an arbitrary numeric identifier that users can associate with a document — because you just have to add that many more indexes. Did I mention that people really seem to want to be able to do range queries?

Or you can brush up on your Java and use Lucene which I'm told is smart enough to do (a convincing imitation) of range queries. Even then, though, you have to abuse the underlying model since you can't store the pieces of a machine tag as attributes. Or at least not more than one machine tag per document.

I would also like nothing more than to proven wrong about this because it's all a bloody nuisance. But I don't think I am.

So.

Really simple to install. And use.

Tinkeroserability.

How about we just worry about all those indexes if and when they become a problem?

Out of the box mtdb (machinetag database) comes with a standard Python interface for storing and retrieving machine tags associated with a document (read : URL) and a bare-bones HTTP interface, that squirts out XML and JSON, for playing with the database from another language. Say, JavaScript.

The HTTP interface is nothing more than a proof-of-concept; a Python script doubling as a web server and meant to be run on a personal computer or trusted server. Understand that it has no authentication or authorization mechanisms and its input validation is shockingly naive so you should not expose it publicly without suitable modifications.

Improvements aside, it is also something that a person using another another publishing system could write (or install) a plugin to use, with almost no additional overhead besides, like, the Interweb : 1) send an HTTP request 2) receive chunks of data formatted as whatever 3) profit!

And it implements all those methods mt.* API methods I described above.

Ladies and gentlemen, mtdb.py 0.1

You will also need to install the machinetag and simplejson Python libraries as dependencies.

Release notes are boring

Like most point releases, especially in the early stages when you're going from versions 0.1 to 0.2, it can be difficult to keep sounding the thunder. But there you go : del.icio.us maps has blessed as version 0.2 (complete with a tarball and everything!) and given a more-better permanent home on the Interweb.

The exercise with My Maps reinforced my idea that maps are, metaphorically at least, allelopathic. “The inhibition of growth in one species of plants by chemicals produced by another species” (source). So existing maps (a species?) poison (limit, prevent) a diversity of potential maps (yes I know biological metaphors for social phenomena can be dangerous). This is a particular problem with creative mapping tools (like My Maps) aimed at the general public who have seen mostly maps of the Google/Yahoo!/MapQuest species. What if you could map anything and you just mapped what is on typical maps?

John Krygier

With that in mind, here's the list of things that have been added or changed in version 0.2 :

Magic Words

Restobook

To call restobook magic is being generous since it is just a collection bastardized machine tags whose syntax is a direct result of the need to abuse the del.icio.us note field, itself limited to 255 characters (hence the informal machine tag syntax).

There's not a lot of formatting going on, as previously discussed here, and here and here. In fact there's not really a spec to format against beyond vague notions of copying what my friend Sarah did for her book on cheap places to eat in Montréal. At the moment there are some very basic rules for displaying street addresses and phone numbers and Yelp links. I will sort out the rest in time and continue to wonder, some more, about the advisibility of using the syntax to store addresses for, say, museums.

Anyway : complex data in del.icio.us / pretty data on the map.

Geotudes

Geowhat?

Yeah, I'm still undecided but figured this was as good a place as any to try them out. Geotudes consist of two parts : A major and minor identifier. Every latitude and longitude can be identified by one of the 65, 000 major Geotudes representing an area approximately 9, 000 square kilometers and an infinite number of minor Geotudes. Inifite although by the time you get to 12 points (or 6 pair, each containing two digits and separated by dots) you are dealing with an area approximately 98 centimeters square. Someone might try to measure England with Geotudes but we'll cross that (tiny) bridge when we get there.

Geotudes are calculated on-the-fly and added automatically (as machine tags) when you save a new location in del.icio.us maps and the major is used to create the nearby-ish links a place. It's not perfect but everything within the same 80 x 100 (ish) kilometer box is a start and you can more easily fudge things like the San Francisco Bay Area rather than searching for sanfrancisco + oakland + maybesanjose. (Which you can't, anyway, but I'll get to that shortly.)

The website states : [A] Geotude is permanent and hierarchical. And as a trade-off: Geotude is less intuitive than address, but more intuitive than latitude/longitude. Geotude is more precise than address, but less precise than latitude/longitude.

Unfortunately, it's not possible to do wild-card tag searches in del.cio.us so there's no way to search for a particular major Geotude and then narrow it down by one or two minor pairs (approximately 10 x 10 and 1 x 1 kilometers, respectively).

Still, it's good enough for cities.

And machines.

If you're wondering, besides writing Geotude functions in JavaScript I also wrote libraries for PHP and Perl. If someone else would like to write the Python bindings, I would be much obliged.

Machine Tags

Like the restobook stuff, there isn't too much happening here. Yet. This is still just a point release so the feature-ness of machine tags is they are recognized, and parsed, as such. And then mostly not displayed.

Except for Geotudes.

Barcodes

O RLY?

A little silly in the browser, maybe, but they do mostly work and that could be useful for things like phone numbers. Or URLs for — whoa — mobile websites.

At the moment I am using Guido Sohne's SemaFox encoder which creates Semacodes. I generally prefer QR codes since they are better suited for arbitrary text, rather than just URLs. There's a very impressive pure-JavaScript library for generating QR codes but it runs ape-shit over all the default String and Array methods in the language (why do people do that ... it makes Perl hackers look like prudes, by comparison) so I eventually put it on the back-burner.

My hope was that addresses and phone numbers would be small enough to Just Work™ in a Semacode. This is only Sometimes True™.

Both libraries are, however, pretty slow since they use tables to draw all those boxes. Eventually, I may have to write a Canvas rendering widget for barcodes...

Navigation

Tag-surfing

That is, tags for individual locations are displayed in the marker widget and, when clicked, will redefine the search query and redraw the map accordingly.

That's the good news. The bad news is that you can not poke around the intersection of multiple tags. At the moment the del.icio.us JSON feeds are available only on a per-user basis and limited to a maximum of two combined tags. Since one of the tags has to be del:bookmark=geo in order to find stuff with actual geographic coordinates, that only leaves one tag left to play with.

Maybe I will add an optional feature to pull back anything matching a particular set of tags and then loop over them looking for geo tags in the browser. This will produce some weird results for some people but might be a better 80/20 solution for most.

Permalinks

Tag-surfing happens entirely in JavaScript land to so that the browser does not need to refresh the page and suck down all the various dependencies. I've also added hooks for the code to read in query parameters in the URL so you can point to a specific tag search or modify it in the location bar.

Ultimately this will need to be expanded to shamelessly copy the Oakland Crimespotting site whose permalinks are updated based on the map's position and zoom level.

Icons

Maybe not better icons. But icons.

ph3ar

All the input from del.icio.us is properly sanitized. It's not that I don't trust the del.icio.us kids to send safe data. It's just that not trusting anything that comes back across the wire is More Safe ®.

I say properly because I think I've covered all the bases but this stuff gets ugly and complicated fast.

Where complicated means it's actually all pretty straightforward except for the part where you spend 80% of your time accounting for things like Wikipedia including literal ' symbols in their URLs....

If you see something I've missed please direct the clue-bats accordingly.

Going forward, the ballpark roadmap in my mind looks something like this :

ongoing — Clean up and tighten the design (consider the search interfaces which are, well, bad) and spend some time consolidating the JavaScript code; both of which are starting to look like a neglected squash patch.
0.3 — Replace the standard Y! Maps marker with a custom overlay and ensure that when it is opened and overflows the map container it scrolls in to view. And copy-and-paste which, for some insane reason, is disabled in the standard marker.
0.35 — Start thinking about compressing all the various JavaScript files.
0.4 — Pirate maps! And places more precise than just a marker. That's fancy-talk for polylines. This may wait until version 0.5.
0.5 — Mapstractify and start thinking about how to use available Open Street Maps data, more better. Being able play nice with OSM would go a long way towards making street names in the pirate maps, uh, possible.
0.6 — Better reverse-geocoding than whatever comes out of the Geonames API. This might be writing something from scratch or just using the clever driving directions hack that bubbled up the other day.
0.7 — Abstractify the data store layer so that you could use something other than del.icio.us. I don't really have any idea what that would look like except to maybe build something simple using Namazu and something using Redland (or maybe LARQ) for the people who want to do more than cow-pathing.
0.8 — More data, specifically private data, compound tag searches and searches not bound to individual users.

Famous last words, really.

The young architect Charles-Edouard Jeanneret, then working in Paris at the offices of the Perret brothers, witnessed the typical response as his employer, Auguste Perret, burst into the atelier, a newspaper in his fist, and shouted: 'Bleriot has crossed the channel! Wars are finished: no more wars are possible! There are no longer any frontiers!' It was not so much that frontiers no longer existed; it was that they were changing, and with them, perceptions and behaviour altered too. In Aircraft (1935), Jeanneret, better known as Le Corbussier, remembered the impact a series of historic flights made on Parisians during 1909:

...from my student's garret on Quai St. Michel I heard a noise which for the first time filled the entire sky of Paris. Until then men had been aware of one voice only from above — bellowing or thundering — the voice of the storm. I craned my neck out of the small window to catch sight of this unknown messenger. The Compte de Lambert, having succeeded in 'taking off' at Juvitsy, had descended toward Paris and circled the Eiffel Tower at a height of 300 metres. It was miraculous, it was mad! Our dreams then could become reality, however daring they might be.

— David Pascoe

Considering that barcodes were added at the very last minute, as I was working on packaging version 0.2, who knows what I will actually be working on by then. These things seem to take on a life of their own which is, honestly, where most of the magic comes from.

this is aaronland

“Aware of only one voice from above”