this is aaronland

Things I Have Written Elsewhere #1305615600

Towers of History

Screen shot 2011-06-13 at 10.24.28 AM

One of the most important lessons from five years working at one of those so-called rockstar startups and inside the belly of a large corporation is that, with the exception of the exceptions, there are no conspiracies; Myles and Hammond being the exception. Generally there are just too many eyes and not enough time to allow anything that you could assign capital-M motive to to take hold.

Over the weekend, Tom Armitage discovered that the Twitter account he had created for London's Tower Bridge had disappeared. Specifically, it had been reassigned to another user and all of Tom's messages were deleted. Despite having said some snarky things online I don't want to point fingers at either Twitter or the people at the Tower Bridge Exhibition Corp. here. There's a big difference between cluelessness and malice. I don't think the latter was at play, in this case, and I'm sure there's already plenty of hand-wringing and feeling out the shape of the former going on without any help from me.

A lot has already been written about how the Tower Bridge account changed people's relationship with Twitter (and their idea of what it means to have a relationship with an inanimate object). I will only add that it's an open question whether Stamen would have thought to suggest to the NFL that they create a Twitter account for the ball (and the goal posts and the yard markers and pretty much everything in between) during the Super Bowl — no, really — if Tom hadn't created the Tower Bridge account. Maybe we would have, but Tom did and we sure as Hell cited his work as both example and inspiration.

Stef Lewandowski wrote an excellent blog post about the whole affair in which he says: History doesn't stop. Which is a nice and succint way of saying something similar to what I wrote at the end of the Buckets and Vessels paper, last year:

At the same time, commercial enterprises like Flickr's parent company Yahoo!, suddenly find themselves in the unusual position of being asked to be time keepers. Whether or not they asked to be, entire communities are now assuming that those companies will not only preserve and protect the works they’ve entrusted or the comments and other metadata they’ve contributed, but also foster their growth and provide tools for connecting the threads.

These are not mandates that most businesses take up willingly, but many now find themselves being forced to embrace them because to do otherwise would be to invite a betrayal of the trust of their users, from which they might never recover. On the other hand, this is exactly what the cultural heritage sector does, does well, and has spent a lot of time thinking about.

Buckets and Vessels, Museums and the Web 2010

The salt in the wound, here, being Twitter's ongoing relationship the Library of Congress to archive every public Twitter message since 2006. On measure I think this is great but the thing I've always found puzzling is the language that Twitter used to use to describe the service: That the beauty of Twitter messages was their ephemerality. If it was never explicitly stated people were also never discouraged from believing that any given tweet had a very short shelf-life and would eventually fade from the public record the way all memory has in the past. Until one day Twitter turned around and casually mentioned that they had been keeping everything in great big shoebox out back.

Product features are sometimes little more than engineering decisions, or constraints, wrapped up in a motivational story and God knows Flickr's no better or worse than anyone else in this regard. This strategy appears to be in full effect with the latest crop of photo-sharing applications who, I think, are confusing their perfectly reasonable desire not to deal with the drudgery of storing lots of files with the idea that transience is some kind of world view. And that's what bugs me.

The value of the web is in its history. The value of the web is that it grows over time and that it spiders out making connections, just as often doubling back on itself to find previously unseen patterns and connections. It is not a linear progression through time and space always discarding the near past. Or if it is then I'm sorry for wasting everyone's time because that sounds about as exciting, and about as valuable, as any given season of canned television programming.

Last week I had the opportunity to attend the first Linked Open Data in Libraries Archives and Museums (LOD-LAM) summit in San Francisco. If there was one common refrain to the event it was this: Unique identifiers and stable URLs. Because, honestly, without those the entire project is a farce. This was (remains) the fatal flaw in the specification for the Semantic Web as originally proposed: Identifiers are defined as URIs (URLs) that don't need to resolve to anything on the web or the larger Internet upon inspection. They are, in fact, nothing more than big long strings without either a de jure or de facto way to resolve disputed identifiers (let alone understand what they represent). Which is kind of fantastically stupid when you stop and think about it for a minute or two.

Implementing unique and truly permanent identifiers across the the totality of the Internet will always be a mine field of edge-cases and conflicting expectations but that is hardly a reason for individual institutions (and commercial enterprises) not to commit to doing the right thing in their own houses. Which is one reason that Flickr has, historically, always been so fierce about two things:

The first was an early bone of contention when Flickr was acquired by Yahoo! which maintains that the ability to recycle usernames is a feature. For a company with as many users as Yahoo! faced with an ever dwindling number of pretty usernames that's not an unreasonable position. But by choosing to tie every Flickr account to a Yahoo! account it ran straight up against the idea that what was just as important, to Flickr, as any notion of user activity (I think all the cool kids call this engagement now, but whatever...) were the photographs themselves and their place in this crazy magic historybox we were all building together.

The second is just as important even though it's proven to be a constant source of frustration over the years. When you choose your Flickr URL — for example the straup in — that name can never (ever) be re-used by anyone else. It means that the breadcrumbs we all share to find your photos on Flickr (and the larger Internet) will never change and that means that they become a reliable building block for other services to hold hands with. It is a common ground on which we can keep imagining what the Internet makes possible.

Is it inconvenient? Sometimes, yes. The alternative is far worse since it undermines any notion that Flickr is a safe place to put your photos. Or put another way: It can only work if you're comfortable breaking the Internet. One of the most appalling features of Google's new Places database is the fact that the identifiers assigned to any given location are temporal. I'm not really sure why they bothered with them at all because the only sure way to retrieve a place identifier is to query the Google geocoder service. This means that there are no shared identifiers. Instead, there is only Google's gated funnel for naming things (and rewriting history).

Should it be possible for a user to choose another, unused, Flickr URL and have the first one point to the newer one? Sure. I don't remember why we didn't do that during my time. Probably because it wasn't a something that most people thought was necessary and therefore not worth the time given everything else we were trying to do. Maybe it is now.

Which is why seeing what happened to Tom's Tower Bridge account was so disheartening.


built works / equals yes

Screen shot 2011-05-14 at 11.46.37 AM

Sometime back in the fall of 2010 when Christine Kuan asked whether I would be interested in being on the advisory board for something called the Built Works Registry (BWR) she also asked if I would help to write a short paper describing why an authoritative catalog of buildings, open to scholars and the public alike, was a worthwhile project. I said yes to both opportunities and what was supposed to be a few paragraphs in an otherwise short introductory piece mushroomed into a six thousand word word essay whose first draft (and there were many more to follow) I managed to finish just before the advisory board's first meeting, last January.

I don't honestly remember if I'd already been poking the data before the board meeting or if it was the presentation I was asked to do that prompted me to count the number of buildings in OpenStreetMap. There are, it turns out, over 26 million ways tagged building=yes where there are means 26 million records with complete geographic tprints rather than just points on a map. Which is an important consideration for any project like the BWR. It proves that there is a genuine interest and a shared motivation for something like a registry of buildings even if it's not wrapped up in the language of architecture or scholarship. You know, like this:

What's especially important to understand, though, is that what's really changed in the near-past is the fact the Internets make it possible for people to self-organize around a common interest, effectively collapsing the costs usually associated with production and distribution, and if communities of amateurs don't feel like they have any avenues to participate in established projects then they can — and will — just do it themselves. Once that happens, once a community project is born in deliberate opposition to the standard quo, it stands a real chance of rolling over everything that stands in its path. Which isn't always a good thing. Which is to say: The opportunity we have today is to think about, to stumble around in the dark and find, ways to marry communities of authority and communities of suggestion.

I will save the technical bits of the building=yes site for another day except to say that it uses the same model as the woedb taking advantage of the new built-in spatial hotness in Solr 3.1 as well as the path hierarchy tokenizer for storing both tags and machine tags (or things that could play machine tags on TV). I spent the time to reverse geocode each one of the buildings and so now most of them have between one and four parent Where On Earth (WOE) IDs associated with them. The buildings themselves have two unique identifiers. The first is their OSM way and the second is a 64-bit number guaranteed to be greater than 2^32 so that they won't collide with any existing WOE IDs. Which has the pleasant side-effect of adding 26 million new records to WOE!

I'm going to let the whole thing bake for another week or two and collect comments and ideas for future directions but the plan is to release the entire thing as a public dataset that can be downloaded in toto once I figure out the best way to actually do that.

And that six thousand word paper? It's finally been published! If some of it reads a lot like what I was talking about at Museums and the Web this year that's because it is. Attending the BWR board meeting and the whole process of writing the paper were both immensely helpful in articulating the argument and in shaping everything I said in the discussion about unfinished histories. The paper itself has been posted to the BWR blog (and is also available as a nicely formatted PDF file) but I've also included a copy of it, below:

(Get some coffee, Myles. Or maybe a drink. It's long.)

Imagining the Built Works Registry

By Aaron Straup Cope and Christine Kuan (PDF version)

If you want to build a ship, don't drum up the men to gather wood, divide the work and give orders. Instead, teach them to yearn for the vast and endless sea.

Antoine de Saint-Exupery, Citadelle 1

Why is a registry an interesting project for built works? Put simply: Because there is no escaping them. We are, for the most part, surrounded by buildings the entirety of our lives. They are the setting for and the cast of secondary characters— the soundtrack music— that inhabit our histories. Buildings shape our relationship with cities to the point of becoming those features we most cherish (or at least associate) with a place. The built landscape becomes the defining characteristic of a city, cultivating the emotional and cultural ties— connecting people and geography in a way that has historically been reserved for elements in nature like rivers and mountains.

The aim of the built works registry should be to create and maintain reliable or authoritative records for every built work. It should also become comprehensive enough to accommodate the widest range of global “architectures” in order to serve the largest possible audience of users. Such a registry of unique records would enable related digital assets to be easily and accurately linked together to form a large, interconnected body of knowledge about architecture and the built environment in all its shapes and practices.

This desire to create and have access to authoritative records for names, geographic terms, or individual works is nothing new to libraries or research institutions. The Getty Research Institute (GRI) has been developing a number of controlled authority files for cultural heritage information for over four decades 2. More recently, the GRI announced a new endeavor, Cultural Objects Name Authority (CONA) that will provide authority files for moveable works (paintings, sculpture, etc.) and built works.

In fall 2010, the Institute of Museum and Library Services (IMLS) awarded the Avery Architectural and Fine Arts Library at Columbia University, ARTstor, and the Getty Research Institute a three-year grant of nearly $1 million to develop and build infrastructure and tools and to begin seeding the BWR with built work records.3 The editorial policies will be determined by the BWR project administrative team in consultation with the BWR Advisory Board, which comprises international scholars and experts from museums, libraries, and organizations in eight countries (England, Italy, Japan, Germany, India, Italy, Taiwan, and the United States). Other area experts will be consulted as the project progresses. 4

As online communities of interest grow at an exponential rate 5, the collection of and tending to meaning (not to mention “facts”) has expanded beyond academic and scholarly institutions. Powerful community-driven resources are democratizing the ways and means that information is gathered, vetted, and ultimately distributed. Given these trends how do we preserve a culture of scholarly expertise that holds reliable descriptive metadata to be foundational in the pursuit of knowledge while also adapting it to flourish in an increasingly networked world that sees participation from the widest number of actors as a core principle and benefit to the public good?

These questions are increasingly pertinent in a digital age where the seemingly polar (and polarizing) concepts of “authority” and “community” are critical issues for the sharing of information. Drawing upon our experiences from both for-profit, public-access web initiatives and nonprofit, scholarly endeavors at Stamen Design, Flickr, ARTstor, and Grove Art Online, and through our participation on the Advisory Board of the IMLS-funded Built Works Registry project, this paper explores, at a very high level, an approach to the built works registry that promotes collaborations among expert communities and the broader public. 6

The potential of the Built Works Registry

The built works registry (BWR) should provide the means and the tools for people to talk about a physical environment fashioned by our past and our present needs and imagination. Beyond the needs of scholars and experts it should act as a conversational shortcut across space and time: This thing occupied this physical space, for this long. It could be a single reference point— an authority record— for the history of an artifact but also the rough surface that allows it to loosely join 7 an activity to the idea and the memory of a place. Specifically, the BWR might see itself not simply as a means of broadcasting information but as a homing beacon that can be used to attract, to ingest, and to dress itself anew in the understanding of others.

A shared vocabulary of referents (that's fancy-talk for unique identifiers) is the plumbing with which we might create and nurture a kind of architectural geometry 8 for every building and its many histories. The value of the BWR is in providing shared identifiers that can act as the prism between the many ideas and the disagreements that define our understanding of a place.

Unique identifiers, though, are not a panacea. First of all, you need to decide what is being identified. Consider a building that has existed for 60 years, during which time it's been renamed (as in, those times when the owners take trouble to change the text on the plaques or the masonry itself) three times. Should an identifier be created for each derivation of the physical property or should the intent of the building itself— the commitment of time and resources to alter space— be the thing assigned a unique and immutable designation? Even if you decide a priori that only the results of an action, and not the action itself, be assigned a designation how does a property whose physical boundaries evolve over time affect this procedure? What do you do with a shed built in 12th century Italy that grows rooms and entire wings over a span of 700 years to become a 20th century family home? Or built works such as Palace of Versailles which include multiple buildings, gardens, objects, and rooms (e.g. Hall of Mirrors) within those buildings and landscapes, structures since destroyed or lost, and are often also conflated with their geographic location (the city of Versailles)? 9

This is not a problem specific to a registry of built works. Projects like the Getty CONA and the Internet Archive's Open Library 10 face similar questions trying to define a “work” and provide a structured way of describing and organizing complex objects. 11 It is similarly difficult to untangle the meaning and differences between a written work and its many editions in order to create "a webpage for every book". 12 But it is important to start somewhere and, because the nature of the problem ranges from the twisty to the existential, the BWR should start small.

Imagine what the BWR might be if its focus was fiercely narrowed so that it was defined as:

  • Only things you might see from a helicopter.
  • Only buildings: any "container" whose motivation and construction is deliberate.
  • Only the simplest of places contained by buildings: Those that might require an "invitation" and those that don’t; for example, a person's home or a commercial space.

The goal here is to provide just enough guidance to structure the conversation in such a way that it will encourage the participation of non-experts who may lack the time or the inclination to pursue a formal investigation. Beyond the specific needs of academics, librarians, and catalogers, the BWR should aim to act as a "simple tool for self-organization" 13; one that enables people to build a narrative around built works beyond, or in some cases in advance of, the demands of scholarship: A tool that allows people to send postcards from the present to the future. 14

Why a BWR?

It’s really important that social tools of any kind start with the personal. We are not merely social types, we are selfishly social. You can offer me all the network effects and benefits of scale you like, but unless your service is immediately useful for me alone (it has a good one-player mode, say) I’m not going to get it. Great examples of this are Delicious and—I love that I can share and read other peoples’ bookmarks, or check out other peoples’ tastes and interests, but the core benefits, the reason I signed up, are my own online bookmarks and music discovery.

So the first direction is: design for the selfishly social. And the follow-up to this is to design the absolute minimum feature set. ... The flipside to this is casual strategies. By creating a system that is as bare-bones as possible, we allow users to evolve their own strategies to do the things that most interest them.

James Bridle, Selfish vs. Social 15

The simplest answer to the question Why a BWR? is: Persistent and reliable shared identifiers and the infrastructure to host human and machine readable endpoints that contain the full text and metadata about each building. If this is all that the BWR offered it would be a worthwhile project, by itself, but it also presents participating members the opportunity for reaching new and unexpected communities and uses. A short list of possibilities includes:

  • A bridge back (to the rest of us) for architecture. The language of architecture is burdened by its own history and has become foreign and remote to non-experts. The BWR should aim not to dumb down the discourse, but to enable wider and broader conversations between disparate groups that can be quilted back together through the use of unique identifiers. The BWR should be an avenue for participation in the history of a place and, by extension, a gateway drug back to the world of scholars and the language and history of experts.
  • A two-pass approach to adding data, or: Something is better than nothing. Encouraging non-experts to participate in describing and classifying built works can be an important contribution to the BWR by not only beginning to tackle the sheer volume of built works to catalog left to catalog but also establishing a framework to keep pace with all the works yet to be built! To be clear: This is not an exercise designed to replace domain-experts with the wisdom of the crowds. It is, instead, an attempt to find one of many small bridges to join communities of interest with a shared motivation too often separated by style and technique. If those entries compiled by amateurs and enthusiasts are clearly marked as such, they can continue to be shared publicly, lessen the traditional concerns around completeness and accuracy, and serve as indicators for records, that might have otherwise been overlooked. 16
  • A vehicle for play. For example, some users on the photo-sharing website Flickr have created accounts for individual buildings so that they may be "people-tagged" in photos:
    • The Contemporary Jewish Museum, in San Francisco, a.k.a. The Liebskube 17
    • The TransAmerica Building, in San Francisco, a.k.a. The Pointy Building 18
    • Sutro Tower in San Francisco, a.k.a. The Space Claw 19
    • The Theme Building, at Los Angeles International Airport, a.k.a. The Hand of the Future 20

If nothing else, this becomes a useful tool to search for photographs of built works in community-based websites instead of having to rely on casual indexing techniques like tags or captions. Encouraging informal and playful explorations of built works 21 ensures that the BWR can exist as a living and textured document in its own right and not simply be a mirror of things past.

The success of the BWR will also be measured in how, in real and practical terms, it serves the educational and scholarly community. Of the approximately 4,000+ academic libraries worldwide, and 17,000+ museums in the United States alone, millions of images, publications, multimedia files and archival documents pertain to built works. It is in the care and tending of the metadata collected from each of these works that they become even more discoverable and useful to both the educational and research communities and the general public. Enabling collections managers, registrars, librarians, visual resource curators, scholars, curators, and others to draw upon an open-access registry of built works is precisely the opportunity that the Internet makes possible today.

The challenges of authority

To educational and scholarly communities, authority will be the most critical factors in determining the ultimate usefulness, or even usability, of the BWR. While numerous online websites provide art historical information, encyclopedic resources such as Grove Art Online (the digital version of The Dictionary of Art, 1996, 34 vols.) are indispensable for teaching and research because the articles were written by some 6,800 subject specialists. The reliability of information is central to determining whether a record can be used for teaching, research, cataloging, and publication (e.g. all Grove Art creator names have been incorporated in the Getty Union List of Artist Names). The entire basis of scholarship, connoisseurship, and academic excellence hinges on getting as much reliable data as possible (i.e. it would be hugely problematic if a student looking for information on the Dome of the Rock found the date of completion to be 1691 CE rather than 691 CE). When one goes to catalog an image of the Yas Hotel & Marina in the United Arab Emirates by Asymptote Architecture, it would be just as troubling to a scholar or teacher if the built work was attributed to the Renzo Piano Building Workshop in Osaka, Japan due to drawing upon a BWR record that was full of errors. In order to effectively teach, research, and disseminate information for education and scholarship, users need to be able to trust the accuracy of the information.

At the same, we know that relying exclusively upon the scholarly community to create built works records would be unsustainable given the millions of built works that exist or are coming into existence every day around the world 22. One of the greatest limiting factors for the creation of authoritative data records has been the absence of a networked environment that allows an unlimited number of contributors to participate in building records. The Library of Congress recommends that the community identify ways to promote wider participation in the distribution of responsibility for creating, enhancing, and maintaining authority data. 23 Without encyclopedic coverage, such a registry would be of limited value for libraries, scholars, curators, enthusiasts, students, and the public.

However, a BWR that promotes open access for communities of amateurs, enthusiasts, and experts risks finding itself with multiple records all referring to same thing. This is especially true of popular and widely known works or points of interest (for example the Eiffel Tower in Paris or the CCTV building in Beijing) and efforts required to address this problem should not be underestimated. On the other hand it is a problem that a) already exists b) would be contained by and therefore addressable with the context of the BWR. There is enough evidence in the free-form world of social media and “tagging” to suggest that large and disparate communities of users can and will settle on a consensus of practice for naming things.

The question of how to foster a community of many voices will be central to any BWR open to the general public; but the benefits, be they in increased participation, awareness or broadening the scope and depth of the registry, stand to outweigh any growing pains encountered along the way. Thanks in large part to the collapsing of costs of organizing disparate groups and sharing information on the Internet, what was once considered to be impossible has been made merely difficult.

What might be the real elephant in the room is the editorial work involved in cleaning up these thousands (or eventually, millions) of BWR records to make the BWR actually useful. Major endeavors to create authoritative records are generally sustained by tremendous resources. Ongoing and enormously important projects, such as the Getty vocabularies Union List of Artist Names (ULAN), Thesaurus of Geographic Names (TGN), Art and Architecture Thesaurus (AAT); The Virtual International Authority File (VIAF); and Library of Congress Name Authority File (LCNAF) are community-contributed, but are spear-headed and managed by institutions with considerable staff and funding resources.

Another challenge will be multiple records in multiple languages for built works. How should an authoritative, preferred record name for the Forbidden City or Taj Mahal be handled in the BWR—in Chinese, Hindi, Persian, Urdu, or English? If work records are designed to support multiple languages, is the official record complete or useful if there is no entry for a given language? How does the BWR encourage international participation through multi-lingual tools, but still manage to create a reliable work record that is accessible, discoverable, and useful to a broad range of international users (experts and non-experts alike)? How many languages can the site realistically display or manage? Would we want to include all languages in their original form, such as enabling Maya glyphs to be recorded for Temples in Tikal in BWR records? There are also complexities with transliteration where there are multiple standards as in Arabic or Chinese (pinyin and Wade-Giles).

The VIAF project at OCLC allows national or regional variations in authorized form to co-exist and it supports variations in preferred language, script, and spelling. However, the VIAF project covers personal names only, and there are additional complexities, perhaps, when it comes to work records. The cultural, national, regional, religious, and political sensitivities associated with creating a registry of built work records may have prickly complications. For example, The Great Mosque in Cordoba may be considered by some to be a Christian church, or local names may be assigned to buildings which are in conflict with the various government or “official” titles of the buildings. Disputes over the names, functions, and even locations of buildings may have deep implications within different communities. All neighborhoods are debated and as we know from other community-driven projects such as Wikipedia, publishing information on the Web, whether that data is accurate or not, can incite heated battles over issues of territory, ownership, and meaning, sometimes requiring editorial changes, the “locking” of a record, or the deletion of information by the managers of a site itself.

The challenges of community

While it may not be possible to know, in advance, all the challenges that a BWR designed to support community involvement will yield, a short list of things that should be paid close attention are:

  • Understanding and accepting that public communities can and will build their own BWR in the absence of something they can relate to.
  • Imagining and building tools for both scholars and non-experts alike.
  • Considering what happens when buildings themselves want to participate in the BWR.

This list is neither comprehensive nor preordained but depending on how external communities are approached the first point has the potential to be the largest ongoing challenge to any authoritative registry of built works. The lesson of projects like Wikipedia 24 and OpenStreetMap (OSM) 25 is that the Internet has fostered the tools for distributing, coordinating, and even vetting the work of large, heterogeneous communities of amateurs armed only with individual slices of spare time and a common goal.

Not all projects will enjoy the success of the examples cited here but what those successes demonstrate is that once the need for a particular resource has been identified, whether it is a free and open encyclopedia or a BWR, it does not long for communities of interest to mass around a problem and, if the past is any guide, uproot any and all closed projects that stand in their way.

These efforts are also almost always messy and chaotic in a way that inspires an understandable measure of doubt about their longevity or scalability. OSM is both a database of geographic vector data (of nodes and ways) as well as the metadata associated with each geographic feature. The metadata is organized using simple key/value pairs with no content restrictions which at first glance might seem doomed to failure. 26

And yet, OSM has succeeded in spite (or perhaps because) of an ad-hoc structure that emphasizes convention, open debate and the shared responsibility of many eyes watching over the project. Six years ago it was little more than a handful of GPS traces in and around London but today it produces maps whose detail and quality rival those of the UK Ordinance Survey and, in the case of the maps made following the 2010 Earthquake in Haiti, are used by the United Nations and World Bank. 27

Proving that every rule needs breaking occasionally, Blackadder famously threw all standards out of the window to make a map of CERN by tagging the particle accelerator rings as highway=trunk and highway=primary (with tunnel=yes) even though they aren't major roads of any kind - he simply liked the colours and knew that they would show up. Don't follow his example!

OpenStreetMap Wiki, Tagging for the renderer 28

A second challenge will be the design of interfaces for the tools used to create and maintain the records themselves. The BWR has the opportunity to champion the idea of a "two-pass" interface. Specifically: Simplified interfaces designed for non-experts to compliment those formal and structured systems for capturing metadata.

Here again, the experience of the Internet Archive's Open Library website is instructive. When the project was first launched the form for adding a new book to the system contained approximately 36 different fields. While each field was potentially important and relevant for the purposes of archiving, this rigorous approach to data collection simply made the page too daunting for most people to even try adding a book. Besides the amount of manual labor required in creating a new record, there was the added problem that many of the terms and labels used were unknown to people not versed in the vernacular of library science. In 2010, these input forms were redesigned 29 and the minimum required information for entering a new book was reduced to just four fields! (It is still possible to catalog with the other 32 fields, but those fields are not the first thing that users of the site are confronted with.)

Another example is the New York Public Library which has been scanning their 400,000 maps and atlases using an open source tool called the Map Warper. 30 The Map Warper is used to register control points between a scanned map and a reference map and then adjust and warp the former to match the geographic and pixel coordinates of the latter. 31 The current user interface of the Map Warper remains very much a tool built by and for expert users and this has limited its wider use and adoption.

A hypothetical two-pass interface for the Map Warper might instead include the ability for a grade school student to browse a collection of maps with no existing metadata and then "place" them using a simple drag and drop interface roughly in the area where they are located. In the case of the Map Warper's atlas feature— where users align (or connect) the roads from individual maps to create a single unified canvas— the entire exercise could be re-imagined as a game where students tear the pages out of a book in order to create a wall-sized poster.

These sorts of alternative task-oriented interfaces are valuable for three reasons:

  • In a world where there was previously no data associated with a map now there is enough data to make an approximate geographic search possible. If a record is flagged as being not fully vetted the advantages of simply being to discover a map at all outweigh the disadvantage that the location data associated with it may not be 100% accurate.
  • These rough cuts can serve both as a guide for future work and as a kind of parallel reading of the different interests between scholars and the general public.
  • It yields a tangible sense of participation in the project: Imagine if there was also a print button that students could press to generate a physical copy of their work to take home to their parents.

Rather than seeing the potential for messiness as a problem the real challenge seems to be how best to design an environment suitable for encouraging controlled messes that can be used to fuel future work. This is also relatively uncharted territory so it is a facet of the BWR that will require inventiveness and constant iteration.

Another project that combines a community participation model with scholarly vetting is the relatively new SAHARA (Society of Architectural Historians Architecture Resources Archive), a digital image archive project headed by the Society of Architectural Historians (SAH) and built in collaboration with ARTstor. Launched in 2009, and funded by a grant from The Andrew W. Mellon Foundation, SAHARA allows SAH members (its 3,500 members include architectural historians, architects, preservationists, students, professionals in allied fields, and the interested public) to catalog and upload their own digital photographs and panoramic images to a shared online archive as well as to download images from the archive for teaching and research. SAHARA now offers over 25,000 images that were contributed by MIT, Brown University, University of Virginia, the Colonial Williamsburg Foundation, University of Illinois at Urbana-Champaign, and by independent photographers and historians.

The SAHARA cataloging tools offer 40 metadata fields with 11 required fields, and there is an editorial board and review process for warranting and “promoting” images and their associated records into the SAHARA Editor’s Choice collection. As a resource built by and for scholars, SAHARA is an example of the rewards of merging institutional collections with scholar collections through a peer-review process that is, in effect, defining a new approach to qualifying another kind of academic publication. The SAHARA Editor’s Choice collection is also made available in the ARTstor Digital Library for teaching and research at a broad network of more than 1,300 educational institutions and museums in 45 countries. 32 The BWR model that combines a mixture of the SAHARA model with more open models like OSM could prove incredibly fruitful.

Finally, while it may still seem like science fiction, we have begun to live in a world where more and more sensors and other computing technologies are being embedded in the built environment itself. It is not difficult to imagine a time when a building may also wish to participate in a built works registry. Services like Pachube 33, which is a centralized brokerage for environmental sensors and other collections of “time-series” data, are already morphing into a mirror registry of built works. They do not record the stories we are used to telling about the monuments we construct. The minute by minute tracking of a building's power consumption or temperature readings or the number of passengers in an elevator is not any kind of narrative structure we are accustomed to, but if you look at them a little sideways it is not hard to re-imagine them as an entirely new kind of oral history about a place; the raw source material for research yet to be imagined.

The practical and infrastructure-related difficulties of accommodating so much data remain non-trivial (for all but a limited number of commercial entities). Setting these details aside for the moment, though, allows us to ask the larger question: If one of the goals of the BWR is to open up the process to as large a community as possible why then wouldn't the participation of the buildings themselves also be welcome?

The wildflower garden of history

Of all the possible built works registries to launch and maintain the one described here is more difficult than most. All registries are, at some stage, disputed territories. Whether the argument is caused by insufficient or competing research or due to core philosophical beliefs whose subtleties are not easily described using a controlled vocabulary an authority record has to be considered a “living document”.

The advantage of the BWR made of many voices is that it would house not only an authoritative work records, but also a history of the effort that went in to creating those records. The BWR could serve as a forum that promotes consensus around a built work and also tracks the ebb and flow of the debate. Equally important are all those built works not yet deemed worthy of a scholar’s attention. If nothing else the BWR that encourages documentary efforts outside the scope of the contemporary zeitgeist creates a zone of safekeeping for historical records and their stories for a time when we are ready to reconsider them. Of Lizzy Oppenheimer’s project to document highway rest stops in the United States, Daniella Jaeger writes:

She’s out to document instances of individuality in a world that’s headed for homogenization. States all over the country have already announced the closing of many rest stops, destined to be replaced by commercialized service stations consisting of identical architecture, identical d options, and identical restrooms. As much as we may all like to pound Cinnabons in the back of a four-wheeler, this is a sad situation, and the unique architecture and authentic Americana that Lizzy captures in her images makes that clear. This project serves not just as sentimental memorialization but as an archive of an endangered cultural species.

Daniella Jaeger, Rest Stops of America 34

The BWR, done right, would provide a kind of "bias knob" with which we might read a built work, not simply as an object but understand its place in relation to the wildflower garden of history. At the time of writing this paper, there are already a number of open participation architectural resources available on the Web: Open Buildings 35, Archipedia 36, ArchDaily 37 and others. The goal of the BWR is not only to provide information about architecture and the built environment (as numerous websites do), but to enable concordances among records and materials beyond the registry. Ultimately, the registry we imagine would enable the built environment to be become part of the network itself.

The need for reliable work records to enable the efficient creation of metadata records and for the effective online retrieval of content has been one of the greatest needs of museums, libraries, archives, and other individual creators in order to digitize and share collections. The hope is that the BWR will eventually enable the many facts and stories about built works to be disseminated online thereby encouraging education, scholarship, and public access to this information worldwide.

Aaron Straup Cope is Design Technologist at Stamen Design; formerly Senior Engineer at Flickr. Christine Kuan is Chief Content Officer and Vice President of External Affairs at ARTstor; formerly Senior Editor of Grove Art Online at Oxford University Press. The Built Works Registry (BWR) project directors are Carole Ann Fabian, Director of the Avery Architectural and Fine Arts Library at Columbia University, and James Shulman, President of ARTstor. The opinions in this paper are the authors’ own and do not necessarily represent the policies or views of their organizations, the IMLS-funded BWR project, or other controlled vocabulary projects.


  2. See Getty’s Art and Architecture Thesaurus (AAT), Union List of Artist Names (ULAN), Thesaurus of Geographic Names (TGN), and Cultural Objects Name Authority (CONA). CONA is scheduled to launch in 2012.
  3. See BWR Press Release:
  4. The data records collected through the BWR project will be contributed to the Getty Cultural Objects Name Authority (CONA)—the official repository for all cultural heritage moveable and built works authority files scheduled to launch in 2012. The BWR project also complements ARTstor’s new endeavor, Shared Shelf (, which will be a web-based image management service with a controlled vocabulary warehouse, and tools for cataloging, digital asset management, and web publishing.
  5. Approximately 6,000 images are uploaded to Flickr every minute; approximately 8.5 million freely available media files in Wikimedia Commons; 24 hours of video are uploaded to YouTube every minute.
  6. This paper is not intended to determine the policies or framework for the IMLS-funded BWR project.
  7. The phrase small pieces, loosely joined is often used to describe the underlying architectural principles, and successes, of the Unix operating system. See also: Tim O'Reilly's The Architecture of Participation (
  8. This is a play on the idea of creating geometry from motion in three-dimensional photography. See also: Deepak Bandyopadhyay’s 3D Photography, Image-based Model Acquisition (
  9. For an extreme imagining of this scenario take a look at Frank Miller's graphic novel “Ronin” ( where the city is constructed from self-replicating bio-mechanical organisms that grow around and eventually absorb everything in their path.
  11. See VRA Core 4.0 metadata schema and Getty Cultural Objects Names Authority (CONA) metadata schema
  12. The first US edition of Clockwork Orange (used as the basis for Stanley Kubrick’s film adaptation) was missing the final chapter of the original UK edition; this was not an oversight but a deliberate editorial decision on the part of the American publishing house.
  13. This is a phrase that Kellan Elliott McCrea ( has used to describe successful projects at the photo-sharing website Flickr.
  14. Consider the photo-sharing website Flickr: In its seven year history the top tag given to photos, uploaded daily, has been "wedding". There is, in most cases, not much that can be said about these photos in the present. In general, they hold little meaning and even less artistic merit for anyone who wasn't present at the event but they remain a tangible artifact for those who were and, conversely, as time goes by they become an increasingly valuable research tool and a lens for understanding the past simply by virtue of there being so many of them.
  16. In March 2010, the Brooklyn Museum published their entire collections database (94, 000 objects at the time of writing) using an ingenious Record Completeness Meter to indicate the degree of authority that the museum was willing to assign to each item. See also: Opening the Floodgates (
  21. Writing about his Twitter account for London’s Tower Bridge, Tom Armitage says: I’ve written before about how wonderful Twitter can be as a messaging bus for physical objects. The idea of overhearing machines talking about what they’re doing is, to my mind, quite delightful. (
  22. See Brian E. C. Schottlaender and Linda Barnhart, The Union Catalog of Art Images (UCAI): A Project of the University of California, San Diego Libraries, Final Report to The Andrew W. Mellon Foundation, 27 February 2004. (accessed 23 Feb 2011)
  23., page 20.
  31. The field of geography has, in some ways, been defined by its pursuit of the many different and possible projections (taking a spherical surface such as the Earth and projecting it on a flat surface like a map). The easiest way to think about the NYPL Map Warper is that is allows old maps to be projected in to the coordinate system used by contemporary tools like Google Maps, Google Earth or Open Layers.
  32. ARTstor Digital Library ( is a nonprofit image resource that makes available more than 1.3 million images (with another one million in production) in the arts, architecture, humanities, and sciences. At the time of this paper, the Digital Library makes available more than 350,000 images and QuickTime Virtual Reality (QTVR) files related to architecture and the built environment. The ARTstor Digital Library was launched in July 2004.
Screen shot 2011-05-09 at 11.04.41 PM