I just want to share my social network... with myself
Recently, someone (Kellan, I think) said to me:
You know I am right there with you with your "Your Address Book is the Center of your Social Network" Mantra... Surprised no one has created a plugin for Apple's. It would be *A Good Start TM*
And I said:
Back during the days of the Job Search Before Flickr I was slowly trying to teach myself Cocoa for just this reason; essentially the germ that has grown into "restobook" (and del.icio.us maps). The problem is that there is no way to modify the default "panel" for an addressbook entry. You can only add overlays, which made the whole thing kind of suck.
So, I am reduced to writing S60 apps.
Now that I have a working developer's certificate I am revisiting things
like the original fl(ickr)addressbook and the nwtracker and poking at
the landmarks
(the maps app) database. The latter is actually sort of interesting since you can populate it on
the fly. It is not so interesting because it doesn't have any sort of
hooks into the address book beyond abusing existing fields.
In *theory* you could run a copy of raccoon (web server) on the device and assign the www link fields in both the contacts and landmarks database to point to each other and ... uh ... synergize ?
Which is kind of a round-about way of saying that I wrote a Perl module on top of Christophe Beauregard's Flickr::Upload.pm library to fetch your current location, according to the Dopplr API, and to automagically add tags, machinetags and geotags accordingly.
For example, if you uploaded a photo while you were travelling in Beijing the following extra bits of information would be added:
- The tag Beijing would be added to your photo.
- The machine tag geonames:locality=1816670 would be added to your photo because Dopplr uses Geonames for their geocoding and that's their unique identifier for Beijing.
- If you allowed it to be, the machine tag
dopplr:trip=9999999
(read : whatever your actual trip ID is) would be added to your photo. I wonder if you'll ever be able to share trips in Dopplr... - Your photo would be geotagged with
a latitude of 39.9289 and a longitude of 116.388 and an
accuracy of 11, or
city
level.
In the same vein, YBR's ZoneTag has been using Upcoming, to infer location, since last year and the soon-to-be-public FireEagle is the same idea bundled up as a discreet service. So, if we can't ever figure out who is on first we can at least litter the Intercloud with little fortune cookies made of magic words.
Anyway, Flickr::Upload::Dopplr.pm
This blog post is full of links.
#reducedWhat the fuck are graduate students doing these days?
22:53:47 M: why so many? 22:53:53 M: all combinations? 22:54:00 A: yes 22:54:03 A: leftmost-iness 22:54:33 M: I don't see much of a way around that 22:54:38 A: because you will want to GROUP BY document_id 22:54:57 A: in order to do LIMITS without totally fucking up pagination 22:55:41 A: that was the first lesson of flickr for me : all bets are off the moment you decide that accurate pagination counts are important 22:56:01 M: bad decision ;) 22:57:15 A: but really, it's all INT-y indexes except for the string_value stuff so...all things considered, it's probably worth it 22:58:09 A: but it still begs the question : what the fuck are graduate students doing these days? 22:58:49 A: "I hold a doctorate from the University of DWIM" 22:59:41 M: they're all working on the semantic web 22:59:47 M: and polygon counts 22:59:52 M: and natural language web search 23:00:58 A: "the cloud will save us" 23:02:14 A: "how many polygons does a cloud have" 23:02:48 A: "would you like to buy a fog machine"
In October I am going to do a talk on machine tags, at the Semantic Web Strategies conference, in San Jose. Because talk is cheap, I always try to have something that looks like working code to make my case. I could, of course, simply point to Flickr.
But here's the thing : We have to architect things, including machine tags, differently than you. That is not a value judgement or posturing. It's just true. We have over one and a quarter billion photos.
And databases suck (which I'll get to in a minute).
There's lot of space between nothing and a billion and that's always been the place where I've hoped that people would play with machine tags. Ideally something you could plug into Wordpress or some other publishing system. Nothing fancy. Just a tool to index machine tags and provide a search interface so that you could link up disparate data sources.
In vanilla web-services-speak, you might say :
- mt.namespaces.list(predicate='', value='')
- mt.predicates.list(namespace='', value='')
- mt.values.list(namespace='', predicate='')
- mt.documents.search(namespace='', predicate='', value='')
- mt.documents.search_by_range(range, namespace='', predicate='')
- mt.document.add(uri, tags)
- mt.document.remove(uri)
- mt.document.tags(uri)
So I wrote one.
Specifically, I wrote a really simple stand-alone machine tag store that does not have ponies and will, hopefully, act as a spark for someone(s) to take it further.
It could be that I am the only person out there who really likes machine tags. I prefer, though, to think that part of the reason no one else has done this is that while machine tags are pretty simple, conceptually, by the time you start thinking about storing and querying them it begins to get ugly and complicated.
Typically, when you start to capital-T think
about the problem you find yourself saying things like :
Well, I could install
Lucene
or What about a
triple store?
. While there's nothing wrong with these
approaches they're a little like telling the person who wants
a glass of milk that they need to buy a cow.
I opted for making Python 2.5 the only requirement, and to use the built-in sqlite3 database magic.
Here's the rub : There are another 22 indexes on the machinetagstore table, in addition to what is displayed in that diagram. By any measure that is too many, even if they are all mostly just collections of the integers whose order has been re-arranged.
You could use a proper full-text search-engine to do this and not worry (as much) about indexing but then, more than likely, you won't be able to do range queries on your machine tag values. If you don't think you'll ever want to find stuff where, for example, the temperature was between -5 and 28 degrees Celcius then you could build something using a custom Namazu filter in time for dinner.
That said, range queries are easily the biggest feature request people have for machine tags on Flickr.
If you use a proper relational database then you get
ranges but at the cost of
leftmost-iness
. Specifically, to quote the sqlite documentation : It is not necessary for every column of an index
to appear in a WHERE clause term in order for that index
to be used. But there can not be gaps in the columns of the
index that are used. This is a problem native to most
(all?) SQL databases and the net result for machine tags
is that unless you enforce a strict ordering of query
parameters, thereby limiting the search-iness of your
data, you end up with indexes to match all the various
combinations :
- All the distinct namespaces
- All the distinct values, where the predicate
is
subject
- All the distinct documents, where the namespace
is
dc
and the value isnew york
- And so on...
Where distinct documents
in a database context
best means the ability to group results (read : documents) as
part of the actual query so that you can reliably
define an offset and limit to the number of results you
return at once. Given that this isn't meant to scale to
the moon, in the context of a home-user, or tinkerer,
you could probably safely return all the results for a
query and distinct-ify them in memory (read : programming
language) but the issues are still the same.
All of this is made worse if you want to range and/or
full-text indexing — or in my happy magic world a
context id
, which is just an arbitrary numeric
identifier that users can associate with a document
— because you just have to add that many more indexes. Did
I mention that people really seem to want to be able to
do range queries?
Or you can brush up on your Java and use Lucene which I'm told is smart enough to do (a convincing imitation) of range queries. Even then, though, you have to abuse the underlying model since you can't store the pieces of a machine tag as attributes. Or at least not more than one machine tag per document.
I would also like nothing more than to proven wrong about this because it's all a bloody nuisance. But I don't think I am.
So.
Really simple to install. And use.
Tinkeroserability.
How about we just worry about all those indexes if and when they become a problem?
Out of the box mtdb (machinetag database) comes with a standard Python interface for storing and retrieving machine tags associated with a document (read : URL) and a bare-bones HTTP interface, that squirts out XML and JSON, for playing with the database from another language. Say, JavaScript.
The HTTP interface is nothing more than a proof-of-concept; a Python script doubling as a web server and meant to be run on a personal computer or trusted server. Understand that it has no authentication or authorization mechanisms and its input validation is shockingly naive so you should not expose it publicly without suitable modifications.
Improvements aside, it is also something that a person using another another publishing system could write (or install) a plugin to use, with almost no additional overhead besides, like, the Interweb : 1) send an HTTP request 2) receive chunks of data formatted as whatever 3) profit!
And it
implements all those methods mt.*
API methods I
described above.
Ladies and gentlemen, mtdb.py 0.1
You will also need to install the machinetag and simplejson Python libraries as dependencies.
This blog post is full of links.
#mtdbRelease notes are boring
Like most point releases, especially in the early stages when you're going from versions 0.1 to 0.2, it can be difficult to keep sounding the thunder. But there you go : del.icio.us maps has blessed as version 0.2 (complete with a tarball and everything!) and given a more-better permanent home on the Interweb.
The exercise with My Maps reinforced my idea that maps are, metaphorically at least, allelopathic. “The inhibition of growth in one species of plants by chemicals produced by another species” (source). So existing maps (a species?) poison (limit, prevent) a diversity of potential maps (yes I know biological metaphors for social phenomena can be dangerous). This is a particular problem with creative mapping tools (like My Maps) aimed at the general public who have seen mostly maps of the Google/Yahoo!/MapQuest species. What if you could map anything and you just mapped what is on typical maps?
With that in mind, here's the list of things that have been added or changed in version 0.2 :
Magic Words
Restobook
To call restobook
magic is being
generous since it is just a collection bastardized
machine tags whose syntax is a direct result of the need
to abuse the del.icio.us note
field, itself
limited to 255 characters (hence the informal machine
tag syntax).
There's not a lot of formatting going on, as
previously discussed here, and here and here. In fact there's not really a spec
to format against beyond vague notions of copying what my friend Sarah did for her book on cheap places to eat in Montréal. At the moment there are some very basic rules for displaying street addresses and phone numbers and Yelp links. I will sort out the rest in time and continue to wonder, some more, about the advisibility of using the syntax to store addresses for, say, museums.
Anyway : complex
data in del.icio.us /
pretty
data on the map.
Geotudes
Yeah, I'm still undecided but figured this
was as good a place as any to try them out. Geotudes
consist of two parts : A major and minor
identifier. Every latitude and longitude can be
identified by one of the 65, 000 major
Geotudes
representing an area approximately 9, 000 square
kilometers and an infinite number of minor
Geotudes. Inifite although by the time you get to 12
points (or 6 pair, each containing two digits and separated by dots) you
are dealing with an area approximately 98 centimeters
square. Someone might try to measure England with
Geotudes but we'll cross that (tiny) bridge when we get
there.
Geotudes are calculated on-the-fly and added
automatically (as machine tags) when you save a new
location in del.icio.us maps and the major is used to
create the nearby-ish links a place. It's not
perfect but everything within the same 80 x 100 (ish) kilometer box
is a start and you can more easily fudge things like the
San Francisco Bay Area rather than searching for
sanfrancisco + oakland + maybesanjose
. (Which you can't,
anyway, but I'll get to that shortly.)
The website states : [A] Geotude is permanent
and hierarchical. And as a trade-off: Geotude is less
intuitive than address, but more intuitive than
latitude/longitude. Geotude is more precise than
address, but less precise than
latitude/longitude
.
Unfortunately, it's not possible to do wild-card tag
searches in del.cio.us so there's no way to search for a
particular major Geotude and then narrow it down by one
or two minor pairs
(approximately 10 x 10 and 1 x
1 kilometers, respectively).
Still, it's good enough for cities.
And machines.
If you're wondering, besides writing Geotude functions in JavaScript I also wrote libraries for PHP and Perl. If someone else would like to write the Python bindings, I would be much obliged.
Machine Tags
Like the restobook stuff, there isn't too much
happening here. Yet. This is still just a point release
so the feature-ness
of machine tags is they are
recognized, and parsed, as such. And then mostly not
displayed.
Barcodes
O RLY?
A little silly in the browser, maybe, but they do mostly work and that could be useful for things like phone numbers. Or URLs for — whoa — mobile websites.
At the moment I am using Guido Sohne's SemaFox encoder which
creates Semacodes. I generally prefer QR
codes since they are better suited for arbitrary
text, rather than just URLs. There's a very impressive
pure-JavaScript library for generating QR codes but it
runs ape-shit over all the default String
and Array
methods in the language (why do people do that ...
it makes Perl hackers look like prudes, by comparison) so I eventually put it on
the back-burner.
My hope was that addresses and phone numbers would be small enough to Just Work™ in a Semacode. This is only Sometimes True™.
Both libraries are, however, pretty slow since they use tables to draw all those boxes. Eventually, I may have to write a Canvas rendering widget for barcodes...
Navigation
Tag-surfing
That is, tags for individual locations are displayed in the marker widget and, when clicked, will redefine the search query and redraw the map accordingly.
That's the good news. The bad news is that you can
not poke around the intersection of multiple tags. At
the moment the del.icio.us JSON feeds are available only
on a per-user basis and limited to a maximum of two
combined tags. Since one of the tags has to be
del:bookmark=geo
in order to find stuff
with actual geographic coordinates, that only leaves one
tag left to play with.
Maybe I will add an optional feature to pull back
anything matching a particular set of tags and then loop
over them looking for geo
tags in the
browser. This will produce some weird results for some
people but might be a better 80/20 solution for most.
Permalinks
Tag-surfing happens entirely in JavaScript land to so that the browser does not need to refresh the page and suck down all the various dependencies. I've also added hooks for the code to read in query parameters in the URL so you can point to a specific tag search or modify it in the location bar.
Ultimately this will need to be expanded to shamelessly copy the Oakland Crimespotting site whose permalinks are updated based on the map's position and zoom level.
Icons
Maybe not better icons. But icons.
ph3ar
All the input from del.icio.us is properly sanitized. It's not that I don't trust the del.icio.us kids to send safe data. It's just that not trusting anything that comes back across the wire is More Safe ®.
I say properly
because I think I've
covered all the bases but this stuff gets ugly and
complicated fast.
Where complicated
means it's
actually all pretty straightforward except for the part
where you spend 80% of your time accounting for things
like Wikipedia including literal '
symbols in their URLs....
If you see something I've missed please direct the clue-bats accordingly.
Going forward, the ballpark roadmap
in my mind looks something like this :
- ongoing — Clean up and tighten the design (consider the search interfaces which are, well, bad) and spend some time consolidating the JavaScript code; both of which are starting to look like a neglected squash patch.
- 0.3 — Replace the standard Y! Maps marker with a custom overlay and ensure that when it is opened and overflows the map container it scrolls in to view. And copy-and-paste which, for some insane reason, is disabled in the standard marker.
- 0.35 — Start thinking about compressing all the various JavaScript files.
- 0.4 — Pirate maps! And places
more precise than just a marker. That's fancy-talk for
polylines
. This may wait until version 0.5. - 0.5 — Mapstractify and start thinking about how to use available Open Street Maps data, more better. Being able play nice with OSM would go a long way towards making street names in the pirate maps, uh, possible.
- 0.6 — Better reverse-geocoding than whatever comes out of the Geonames API. This might be writing something from scratch or just using the clever driving directions hack that bubbled up the other day.
- 0.7 — Abstractify the data
store layer so that you could use something other than
del.icio.us. I don't really have any idea what that
would look like except to maybe build something simple
using Namazu and something using Redland (or maybe LARQ) for the people who want to do more than
cow-pathing
. - 0.8 — More data, specifically private data, compound tag searches and searches not bound to individual users.
Famous last words, really.
The young architect Charles-Edouard Jeanneret, then working in Paris at the offices of the Perret brothers, witnessed the typical response as his employer, Auguste Perret, burst into the atelier, a newspaper in his fist, and shouted: 'Bleriot has crossed the channel! Wars are finished: no more wars are possible! There are no longer any frontiers!' It was not so much that frontiers no longer existed; it was that they were changing, and with them, perceptions and behaviour altered too. In Aircraft (1935), Jeanneret, better known as Le Corbussier, remembered the impact a series of historic flights made on Parisians during 1909:
...from my student's garret on Quai St. Michel I heard a noise which for the first time filled the entire sky of Paris. Until then men had been aware of one voice only from above — bellowing or thundering — the voice of the storm. I craned my neck out of the small window to catch sight of this unknown messenger. The Compte de Lambert, having succeeded in 'taking off' at Juvitsy, had descended toward Paris and circled the Eiffel Tower at a height of 300 metres. It was miraculous, it was mad! Our dreams then could become reality, however daring they might be.
Considering that barcodes were added at the very last minute, as I was working on packaging version 0.2, who knows what I will actually be working on by then. These things seem to take on a life of their own which is, honestly, where most of the magic comes from.
This blog post is full of links.
#delmaps_02