this is aaronland

Patches for the remaining 20 are welcome.

<burden> :of "proof" .

In XML, the burden is on the person with the query to figure out how the elements and attributes in one XML file relate to the elements and attributes in another. Glue-code has to be programmed to mesh the data. With RDF, the burden is on the people with the data to ensure that their identifiers for things overlap with other data sources. The difficulty in RDF is more of a design decision, and design decisions are tough too.

—Joshua Tauberer, GovTrack.us, Public Data, and the Semantic Web

All the News That's Fit to Describe

Or : How to Read the New York Times in 5 Minutes or Less

The daily dumps plotting the relationships, and geographies, of New York Times articles have returned. Version 2.0 does not contain any scrumjax but consolodates a number of different views that used to live at different URLs in to one place.

None of this is rocket science and, frankly, I'm stunned that the Times hasn't already done something like this themselves. I wouldn't suggest that they replace their homepage but I find it incredibly useful to see what's going on and get a feel for the pulse of The Man.

The dumps are rendered, primarily, as RDF for this simple reason : (rdf * rdf) = rdf. That said the RDF is also transformed, and presented on the web, as XHTML with microformats inside. Translation : I think you're so very wrong but that doesn't mean I don't love you anyway.

Daily indexes are archived in a simple YYYY/MM/DD series of subdirectories. Older versions currently located at nytimes/knows/related/ will stick around, or be tickled with mod_rewrite-fu. The other stuff, notably the Google knows page, will probably be deleted because they are kind of stupid and don't really serve any purpose other than to fill up disk space. The best part about the Google pages, for instance, was having them show up as the first or second query result for the corresponding phrase.

There are a couple other ideas floating around, time and resources permitting. In the meantime, if someone wants to deal with writing an XSLT stylesheet to generate JSON from either the RDF or the XHTML, before I do, that would be grand. Personally I think that the time would be better spent writing and lobbying browser developers for a SafeXMLHttpResponse JavaScript method but that's a discussion best saved for another day and working code always wins.

Finally, I hope that there is someone at the Times, and other news organizations (insert obligatory weblogging as journalism meme here), who understands what they've got and its value both to themselves and the intarweb in general.

Upcoming : A long and twisty rant about XPath in Python

pyupcoming is a simple Python interface to the Upcoming.org REST API.

It does not auto-create methods for the API or try to render the data returned by Upcoming into any kind of Pythonic model. It does allow you to query the results of an API call using XPath. Sort of.

Sort of, in the sense that you can query stuff using XPath but you'll get back an elementtree object rather than a proper XML-ish object with its own DOM functionality.

I could have used libxml but that introduces a whole other world of dependencies. I wanted a library that could easily run on a variety of platforms with Python support. I started out with another more Pythonic interface to the Upcoming API and then spent most of a morning trying to shoehorn in features that I needed before getting lost in a twisty maze of __getattr__ functions and giving up.

So, this is the 80. Patches for the remaining 20 are welcome.

Subject: lightbox.js “pass through” URL patch