<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:h="http://www.w3.org/1999/xhtml" version="2.0">
  <channel>
    <title>this is aaronland</title>
    <link>https://www.aaronland.info/weblog</link>
    <lastBuildDate>Wed, 13 May 2026 07:44:37 PDT</lastBuildDate>
    <description>temptation enough</description>
    <item>
      <link>https://www.aaronland.info/weblog/2026/04/09/temptation/#tools</link>
      <guid>https://www.aaronland.info/weblog/2026/04/09/temptation/#tools</guid>
      <title>Weird-shaped tools</title>
      <description><![CDATA[<div xmlns="http://www.w3.org/1999/xhtml">
	<div class="content" datetime="2026-04-09">
	  
	  <h1 class="usf museums">Weird-shaped tools</h1>

	  <div class="slide" id="tools-001">
	    <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-001"><img src="/weblog/2026/04/09/temptation/images/tools-001.jpg" loading="lazy"/></a>

	      <div class="creditline">
		<div><strong>Matchbox case: American Airlines</strong> 1970s</div>
		<div>Gift of Thomas G. Dragges</div>
		<div>Collection of SFO Museum <a href="https://collection.sfomuseum.org/objects/1964167251/">2014.058.002</a></div>
	      </div>
	    </div>

	    <div style="font-style:italic;">
	      
	      <p>Every spring, for the last few years, I have been invited to <a href="https://aaronland.info/weblog/tags/usf/">speak to a class of museum studies students at the University of San Francisco</a>. What follows is this year's talk. It felt, and still feels, like an odd-shaped talk. Did I really want to talk to the class about AI? No, but I did. The topic is sort of inescapable these days both because the marketing campaign which supports it is working in overdrive and because there is clearly something novel that these technologies make possible. From my perspective it's not clear what that is yet but it almost certainly isn't what AI's most ardent supporters think it is. It's definitely not clear that we have settled the accounting (environmental, human and social) that these systems demand. It's complicated and so the goal, more than anything, was to try to demonstrate how the cultural heritage might address this brave new world that we've all been thrust into with a measure of agency.</p>

	      <p>These talks rarely benefit from the time to write out my notes, in longhand, in advance. As such this is not exactly what I said but, rather, what I meant to say. As usual, it's long. For <q>fun</q> I asked a large language model to summarize each paragraph in to a short pithy statement which was interesting only in that it skipped, in its entirely, the section about the assumptions which underlie the belief that a glorious AI-future is inevitable. Computers, amirite?</p>
	      
	    </div>
	    
	  </div>
	  
	  <div class="slide" id="tools-002">
	    <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-002"><img src="/weblog/2026/04/09/temptation/images/tools-002.jpg" loading="lazy"/></a>
	    </div>

	    <p>I have put together <a href="https://aaronland.info/weblog/2026/04/09/temptation/appendix.html">an appendix of links</a> for this talk. These are things which I have read during the time I've been thinking about what to say today. There is no overarching narrative to these links. Many of them are not directly related to the things I will talk about today. Still, each one of them felt somehow <q>relevant</q> even if only tangentially. I will show this link again at the end of the talk.</p>
	  </div>
	  
	  <div class="slide" id="tools-003">
	    <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-003"><img src="/weblog/2026/04/09/temptation/images/tools-003.jpg" loading="lazy"/></a>
	    </div>

	    <p>I'd like to start with this guy: <a href="https://www.blaiseaguera.com/">Blaise Agüera y Arcas</a>. Blaise is the CTO of Technology &amp; Society and founder of the Paradigms of Intelligence group at Google. This screenshot is from a talk he delivered last year, at the Long Now Foundation, titled <a href="https://longnow.org/talks/02025-aguera-y-arcas/#watch">What Is Intelligence?</a>. The talk’s title was taken from a book he was promoting with the same name.</p>

	    <p>The thrust of the talk picks up on John von Neumann's ideas that <q>life is computational</q> and that <q>if you figure out how to copy yourself you will exist in the future</q>. Given that this was a talk about artificial intelligence you can probably see where things were going. The talk literally ends with the claim that Agüera y Arcas sees <q>no reason to believe that AI is any different</q>.</p>

	    <p>It's a good talk. Even if you don't agree with the conclusions there are compelling, or at least provocative (in the best sense of the word), arguments worth entertaining. I won't try to sum them up here except to mention something which Agüera y Arcas discusses at the end of his talk.</p>
	  </div>
	  
	  <div class="slide" id="tools-004">
	    <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-004"><img src="/weblog/2026/04/09/temptation/images/tools-004.jpg" loading="lazy"/></a>

	      <div class="creditline">
		<div><strong>Photograph: San Francisco International Airport (SFO)</strong> c. 1968</div>
		<div>Collection of SFO Museum</div>
		<div>Collection of SFO Museum <a href="https://collection.sfomuseum.org/objects/1913662775/">2011.068.017</a></div>
	      </div>
	    </div>

	    <p>It is the idea of conditions – specifically machines – which <q>externalize metabolism</q>. That is, they create or make available excess energy. They produce energy that we can offload and use to automate human efforts which, in turn, externalize their own metabolism. And so on.</p>
	    
	    <p>The example that he uses are fossil fuels. Whatever you think about them, and regardless of what you think about replacing them with renewable alternatives or the long-term effects of climate change, it's hard to deny they are the thing that has made most of contemporary life possible. They made it possible because they were the enabling factor which allowed <em>other</em> things to manifest themselves.</p>

	    <p>By extension, we are asked to believe that the <q>externalized metabolism</q> produced by artificial intelligence and machine learning systems will be on par with everything that the discovery of oil made possible. We'll see. The reason I am telling you this is that I started wondering, after the talk, whether there wasn't another parallel that was trying to be made.</p>

	    <p>There are, basically, a dozen enterprises, be they commercial or national, which <em>produce</em> oil. That is, they are the ones who suck it out of the ground and sell it to everyone else. The resources required to produce oil are just so fantasically high that only a very few have the means to do so. Everyone else just pays the proverbial <q>twenty bucks a week</q> to consume it. The contract works like this: <q>twenty bucks a week</q> for access to externalized metabolism at the cost of <q>twenty bucks a week</q> for the rest of time.</p>

	    <p>I would agree with you if you said that sounds like an odd-shaped social contract but it is still a kind of social contract. Living in California, where gas is currently averaging six (sometimes seven) dollars a gallon, it can be hard to recognize that there is a social contract around gasoline prices in the United States but one only has to look at prices, and the politics around those prices, in the rest of the country to see that there is. It might be a fundamentally <em>unfair</em> contract but that's a different issue.</p>
	    
	    <p>So, I came away from this talk wondering if I was being asked to accept a similar trade-off when it comes to artificial intelligence. Are AI companies just the new oil companies? Like I said, it's not yet clear whether that the second and third order effects of these technologies are on par with fossil fuels but it's clearly in the interests of the very big players promoting them to make you think they are. We are certainly being made to believe that the infrastructure required to operate artificial intelligence systems is reserved to a select few. It is a rhetoric steeped in undertones of <q><a href="https://www.youtube.com/watch?v=w8KY0vxI8kA">resistance is futile</a></q>-style inevitabilities which are always suspect. The marketing and influence campaign, which has accompanied the AI craze, has been so vast and pervasive that it often dwarves both understanding and imagination at the same time.</p>

	    <p>With that in mind I'd like to ask you a question: How many feel like they understand the fundamentals, not specifically the maths, of how large language models work? <span style="font-style:italic;">Few, if any, students raised their hand.</span></p>	    
	  </div>
	  
	  <div class="slide" id="tools-005">
	    <div class="image640">
              <a href="/weblog/2026/04/09/temptation#tools-005"><img src="/weblog/2026/04/09/temptation/images/tools-005.jpg" loading="lazy"/></a>
	    </div>

	    <p>How many feel like they understand the ideas around intersectionality? <span style="font-style:italic;">Most people raised their hand.</span></p>

	    <p>I am going to say something which might seem provocative, almost insensitive, at first: If you understand the basic concepts behind intersectionality then you understand the basic concepts behind how AI systems work, in 2026. <a href="https://en.wikipedia.org/wiki/Intersectionality">Wikipedia describes intersectionality</a> this way:</p>

	    <blockquote>
	      <p>Intersectionality is an analytical framework for understanding how groups' and individuals' social and political identities result in unique combinations of discrimination and privilege.</p>
	    </blockquote>
	    
	    <p>My point is not to say that the foundations of large language models and intersectionality are the same. My point is to say that if you understand that things (people) are not static and that their position of influence is dependent on a whole host of inter-related factors and actors, often outside of their control, then you basically understand at a foundational level how large language models work even if the maths and the statistics elude you.</p>

	    <p>I don't want to minimize all the work that underpins large language models because it is genuinely complex and impressive but, fundamentally, both topics are governed by ideas about dimensionality and relationships. I think it is important that you understand this because there are many interests, ranging from the malign to indifferent marketing, which benefit from people feeling like all of this <q>AI stuff</q> is beyond their reach.</p>
	    
          </div>

	  <div class="slide" id="tools-006">
            <div class="image640">
              <a href="/weblog/2026/04/09/temptation#tools-006"><img src="/weblog/2026/04/09/temptation/images/tools-006.jpg" loading="lazy"/></a>
	    </div>

	    <p>So, dimensionality. Arguably the easiest dimension to start with is one but I am going to start with two because all the evidence suggests we (humans) are hard-wired for contrast.</p>
	    
          </div>

	  <div class="slide" id="tools-007">
            <div class="image640">
              <a href="/weblog/2026/04/09/temptation#tools-007"><img src="/weblog/2026/04/09/temptation/images/tools-007.jpg" loading="lazy"/></a>
	    </div>

	    <p>We experience two-dimensionality in our daily lives all the time. It is the ubiquitous X-Y graph.</p>
	    
          </div>

	  <div class="slide" id="tools-008">
            <div class="image640">
              <a href="/weblog/2026/04/09/temptation#tools-008"><img src="/weblog/2026/04/09/temptation/images/tools-008.jpg" loading="lazy"/></a>
	    </div>

	    <p>It is the basis of capitalism. In an ideal world, profit (Y) goes up over time (X).</p>	    
          </div>

	  <div class="slide" id="tools-009">
	    <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-009"><img src="/weblog/2026/04/09/temptation/images/tools-009.jpg" loading="lazy"/></a>
	    </div>

	    <p>There is another two-dimensional space which everyone experiences every day, even more so with the mass adoption of location-aware mobile devices.</p>
	    
	  </div>
	  
	  <div class="slide" id="tools-010">
	    <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-010"><img src="/weblog/2026/04/09/temptation/images/tools-010.jpg" loading="lazy"/></a>
	    </div>

	    <p>The Cartesian grid of geography: longitude (X) and latitude (Y). This modeling has proven super useful for all kinds of things but we have also established that the planet isn't flat. It's not even a sphere. It is egg-shaped or, in technical jargon, an <q>oblate spheroid</q>.</p>

	    <p>This has given rise to a whole other set of maths called <q>map projections</q>; the math to translate a point on a three dimensional object (an oblate spheroid) to a two dimensional plane (a paper, or digital, map). There are literally <a href="https://proj.org/en/stable/index.html">hundreds of different map projections</a> in the world. California alone has six, reflecting how latitudes are calculated on the surface of an egg-shaped object because there are actual, real-life tax dollars associated with those differences. SFO has it own map projection because it turns out that land masses, like the one the airport sits on, move over time and those changes matter when you're trying to land airplanes.</p>
	    
	  </div>
	  
	  <div class="slide" id="tools-011">
	    <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-011"><img src="/weblog/2026/04/09/temptation/images/500px-Worlds_animate.gif" loading="lazy"/></a>	      
	      <!-- <a href="/weblog/2026/04/09/temptation#tools-011"><img src="/weblog/2026/04/09/temptation/images/tools-011.jpg" loading="lazy" /></a> -->
	    </div>

	    <p>This is the <a href="https://en.wikipedia.org/wiki/Mercator_projection">Mercator projection</a>. You can measure the amount of time it will take for any given cartographer to start complaining about the Mercator projection, and its offenses to geographic reality, in seconds. I am showing this to you to illustrate the challenges that arise when <q>facts on the ground</q> are pushed through math. At the risk of providing an intellectual justification for the self-serving fallacy of <q>alternative facts</q>: Facts are facts, until they are interpreted.</p>
	    
          </div>

	  <div class="slide" id="tools-012">
	    <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-012"><img src="/weblog/2026/04/09/temptation/images/tools-012.jpg" loading="lazy"/></a>
	    </div>

	    <p>If the Mercator projection wasn't problematic enough, there is also the Web Mercator projection. This is how Google Maps, and basically every other online map, works. Rather than treating the world as a sphere, the world is simply interpreted as a square which can be infinitely sub-divided into smaller grids using <a href="https://en.wikipedia.org/wiki/Power_of_two">powers of two</a> math</p>

	    <p>The decision to model the world this way wasn't born of out nefarious intent or even well-intentioned philosophy. It was simply a technical decision reflecting the constraints of 2005: If you make the world a grid of tiles, it makes calculating and rendering the infinite canvas of a global map on consumer-grade hardware both possible and fast enough that people will think it is <q>magic</q>.</p>
	    
          </div>

	  <div class="slide" id="tools-013">
	    <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-013"><img src="/weblog/2026/04/09/temptation/images/tools-013.jpg" loading="lazy"/></a>
	    </div>

	    <p>However, none of the maths in the Google map projection work especially well at either end of an oblate spheroid. The maths return an answer but that answer has little bearing on the reality that the people who are living in those places experience. Again, this is a side-effect not born out of malice but, rather, mundane business decisions. Most of the people in the world, people likely to be customers of Google's services, happen to live within the borders of the grid. Facts are facts until they are inconvenient.</p>
	    
          </div>

	  <div class="slide" id="tools-014">
	    <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-014"><img src="/weblog/2026/04/09/temptation/images/tools-014.jpg" loading="lazy"/></a>
	    </div>

	    <p>We experience life, for the most part, in a three-dimensional space; width (X), height (Y) and depth (Z). This is a world where each dimension has both a starting point and an ending point. A simpler way to talk about this, relative to the Cartesian grid, is to introduce altitude (Z). An airplane can be said to be flying over a specific longitude (X) and latitude (Y) at a given altitude (Z).</p>
	    
          </div>

	  <div class="slide" id="tools-015">
	    <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-015"><img src="/weblog/2026/04/09/temptation/images/tools-015.jpg" loading="lazy"/></a>
	    </div>

	    <p>Add a fourth dimension, time, and things start to get a bit harder to hold in your mind. Not impossible but there's more stuff going on. It starts to get harder to keep track of things. Now imagine dimensionalities measured in the thousands, millions, billions and occasionally trillions.</p>
	    
	  </div>
	  
	  <div class="slide" id="tools-016">
	    <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-016"><img src="/weblog/2026/04/09/temptation/images/worlds.gif" loading="lazy"/></a>	      
	      <!-- <a href="/weblog/2026/04/09/temptation#tools-016"><img src="/weblog/2026/04/09/temptation/images/tools-016.jpg" loading="lazy" /></a> -->
	    </div>


	    <p>It's hard. That's the problem we are struggling with in 2026. The challenge, for us, is that dimensionalities measured in the millions, billions and trillions are the domain of large language models. They contain more parameters, or dimensions, than we are able to articulate, let alone visualize, in our conscious minds or in shared language.</p>
          </div>

	  <div class="slide" id="tools-017">
	    <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-017"><img src="/weblog/2026/04/09/temptation/images/tools-017.jpg" loading="lazy"/></a>
	    </div>

	    <p>And, of course, dimensionalities don't exist in a vacuum. They all exist in relationship to one another.</p>
	  </div>

	<div class="slide" id="tools-018">
            <div class="image640">
              <a href="/weblog/2026/04/09/temptation#tools-018"><img src="/weblog/2026/04/09/temptation/images/tools-018.jpg" loading="lazy"/></a>
	    </div>

	    <p>In space and time (3D and 4D spaces) we have established rules - for example the Metric system or the 24-hour clock or the Julian calendar – to measure the distances between things. We have a consensus and a shared vocabulary for at least agreeing on the measure if not the meaning of the space between two points.</p>
	    
          </div>

	<div class="slide" id="tools-019">
            <div class="image640">
              <a href="/weblog/2026/04/09/temptation#tools-019"><img src="/weblog/2026/04/09/temptation/images/tools-019.jpg" loading="lazy"/></a>
	    </div>

	    <p>And this allows us to develop algorithms to operate on those common understandings. For example, in the case of 2D space, there is something called the <a href="https://en.wikipedia.org/wiki/Hilbert_curve">Hilbert curve</a>. The Hilbert curve allow us to arrange data – for example all the houses in the San Francisco Bay Area – and then to find houses near a person standing on the corner in the Mission neighbourhood. It's not that there aren't other ways to do this. You could just plow through the list checking each item. The point here is that this way, arranging data along a Hilbert curve, makes it <em>faster</em> to find results than checking every candidate in a list. We can debate whether or not those time saving constitute a form of <q>creation</q> but I think that Blaise's notion of <q>externalizing metabolism</q> is a good way to describe the effects, or at least by-products, of algorithms like this.</p>
	    
          </div>

	  <div class="slide" id="tools-020">
	    <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-020"><img src="/weblog/2026/04/09/temptation/images/tools-020.jpg" loading="lazy"/></a>
	    </div>
	    
	    <p>The Hilbert curve can even be extended to three dimensional space. Now try to imagine the maths that would allow you to find similar, or adjacent, data in a space with millions, billions or trillions of dimensions.</p>
	    
          </div>
	  
	  <div class="slide" id="tools-021">
	    <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-021"><img src="/weblog/2026/04/09/temptation/images/worlds.gif" loading="lazy"/></a>	    
	      <!-- <a href="/weblog/2026/04/09/temptation#tools-021"><img src="/weblog/2026/04/09/temptation/images/tools-021.jpg" loading="lazy" /></a> -->
	    </div>

	    <p>It's hard. This is, however, the world we live in today. It's not all maths, though. There are still some very impressive maths but one of the things which has happened is that we have advanced computer processing technology to the point where we can, in effect, basically compare every item in a list – even a list with billions of candidate parameters – in near real-time. That basically describes the history of computing which, it should be remembered, was initially developed to help calculate the maths to more-accurately lob missiles and bombs at one another faster than a room full of people doing those same calculations with pencil and paper.</p>

	  </div>
	  
	  <div class="slide" id="tools-022">
	    <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-022"><img src="/weblog/2026/04/09/temptation/images/tools-022.jpg" loading="lazy"/></a>

	      <div class="creditline">
		<div><strong>Pension calculator: Connecticut General Life Insurance Company, Pan American Airways System</strong> 1945</div>
		<div>Gift of the Family of Captain Roger J. Sherron, Jr.</div>
		<div>Collection of SFO Museum <a href="https://collection.sfomuseum.org/objects/1762831831/">2013.101.008</a></div>
	      </div>
	    </div>

	    <p>At this point, you might find yourself asking questions like: Okay, but what are the data populating this n-dimensional blob? Where did that data come from? Was that data used with permission? Even if that data was used with permission how are the distances between any two pieces of data being measured? Are those rules being applied universally across all data? Do the same rules apply for chains, or clusters, of relationships? How do those weights and measures affect the meaning and the nature of those data and their interactions with other data? Do the measures themselves change in the same way that the theory of relativity tells us that space and time are variable? How do we even keep track of things in an automated process involving millions, billions and trillions of data?</p>

	    <p>These are all the right questions to ask. If you are asking these questions you may not understand the <em>maths</em> but you understand the <em>issues</em> that these maths raise. The problem isn't that <em>no one</em> knows. The problem is that, with a handful of notable exceptions, only a very few people know and they have a variety of reasons for not sharing that information. Anecdotally the evidence suggests that most of those reasons are self-serving but the larger point is that these systems offer little to no means of <q>introspection</q> to outsiders.</p>

	    <p>This is important because those distances, the space between any two data points, are used as a proxy for <em>relevance</em>. Because these distances are pre-computed, or <q>trained</q>, using mountains of data of questionable or at least unknown provenance it raises the larger, philosophical question of whether these systems presuppose that past success is a guarantee of future results despite what all of recorded history (not to mention our parents) tells us to the contrary?</p>

	    <p>You might be asking yourself: Don't these conditions create an environment ripe for mistakes and false positive results? And the answer would be: Yes. Especially, because the nature of large language models is that answers, such as they are, are simply the <em>probabalistic</em> outcome of a question's distribution across a huge n-dimension blob of data. </p>
	    
          </div>

	<div class="slide" id="tools-023">
            <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-023"><img src="/weblog/2026/04/09/temptation/images/tools-023.jpg" loading="lazy"/></a>

	      <div class="creditline">
		<div><strong>Photograph: San Francisco International Airport (SFO), United Air Lines</strong> c. 1970</div>
		<div>Collection of SFO Museum</div>
		<div>Collection of SFO Museum <a href="https://collection.sfomuseum.org/objects/1913662495/">2011.068.102</a></div>
	      </div>
	    </div>

	    <p>The term of art for this phenomenon is to say that model is <q><a href="https://mitsloanedtech.mit.edu/ai/basics/addressing-ai-hallucinations-and-bias/">hallucinating</a></q> which, while poetic, is a fundamentally <a href="https://susam.net/inverse-laws-of-robotics.html">unhelpful anthropomorphization</a>. On the other hand people also name their cars so it's a practice which is unlikely to change any time soon.</p>

	    <p>To date the most successful approach to dealing with the problems of hallucinations has simply been to <a href="https://blogs.nvidia.com/blog/ai-scaling-laws/">flood the zone</a>. To overwhelm the possibility space, the possibility of hallucinations, with so much data that the probability of an incorrect answer becomes ever smaller in a sea of potentially <em>correct</em>, or at least plausible, answers.</p>

	    <p>There are a couple nuances to what I am saying that are worth considering. The first is, as I mentioned earlier, that there is real and genuine work being done, by real and genuine and well-intentioned people, to correct for hallucinations in math alongside otherwise questionable, often illegal, practices to harvest as much data as they can without consideration for compensation or consequence. The second is that, as communities and societies, we tolerate systems and processes with an allowable failure rate all the time. That things fail is not really the issue. The issue is that we typically have corresponding systems of monitoring, recovery and consequences. We don't have any of these guardrails for AI systems yet.</p>

	    <p>We have not, collectively, established what the acceptable rate of failure is for systems which claim to be able to do <q>all the things</q>. The negative consequences of even a fraction of a percent of <q>everything</q> being wrong stop being abstract when they start affecting people in their day-to-day lives.</p>
	    
          </div>

	  <div class="slide" id="tools-024">
	    <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-024"><img src="/weblog/2026/04/09/temptation/images/tools-024.jpg" loading="lazy"/></a>
	    </div>

	    <p>Everyone has their own personal set of <a href="https://simonwillison.net/2026/Apr/16/qwen-beats-opus/">tests and measures</a> for evaluating the output of a machine-learning model. Personally, I like to ask them to create images of <q><a href="https://www.aaronland.info/weblog/2025/10/22/mirror/#dancing-012">a pygmy hippopotamus holding an airplane</a></q> although the prompt for the image in the middle of this slide was actually for <q>an airplane landing on a hippopatamus</q>.</p>
          </div>

	  <div class="slide" id="tools-025">
	    <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-025"><img src="/weblog/2026/04/09/temptation/images/tools-025.jpg" loading="lazy"/></a>
	    </div>

	    <p>I also like to take these images, created by one AI model, and derive 3D models from them using another AI model. Why does the second AI model not know to extend the wings on this airplane to their full length? Has it been trained on a data set of a secret fleet airplanes flying with short and stubby wings? Is the model trying to tell us that the airplane manufacturing industry has been in the pocket of the <q>Big Wing</q> cabal this whole time? Also why does it think hippopotamuses have six legs? Is this a foreshadowing of things to come?</p>

	    <p><em><a href="https://www.anthropic.com/research/natural-language-autoencoders">No one knows</a></em>.</p>

	    <p>The reality is that we have learned a number of tricks to coax these systems in to doing things but our understanding of <em>how or why</em> they do is still basically on par with that of a rat in a <a href="https://en.wikipedia.org/wiki/Skinner_box">Skinner box</a>.</p>
	  </div>
	  
	  <div class="slide" id="tools-026">
	    <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-026"><img src="/weblog/2026/04/09/temptation/images/tools-026.jpg" loading="lazy"/></a>
	    </div>

	    <p>More recently I have been asking a variety of different models, and in particular open-weight, models two, sometimes three, very carefully crafted questions:</p>

	    <ol class="rel">
	      <li>Tell me about Madelien Thien's 2016 novel <q><a href="https://en.wikipedia.org/wiki/Do_Not_Say_We_Have_Nothing">Do Not Say We Have Nothing</a></q>.</li>
	      <li>Tell me about the role of <a href="https://en.wikipedia.org/wiki/Tiannamen_Square">Tiannamen Square</a> in the novel.</li>
	    </ol>

	    <p>It is well documented that the open-weight models coming out of China are <a href="https://www.theguardian.com/technology/2025/jan/28/we-tried-out-deepseek-it-works-well-until-we-asked-it-about-tiananmen-square-and-taiwan">notoriously unwilling</a> to discuss the 1989 military crackdown by the Chinese government on the student protests in Tiannamen Square.</p>

	    <p>Thien's novel spans three generations in, or from, China starting before the Cultural Revolution and ending at the beginning of the 21st century. What I like about the questions I've been asking these models is that while the events of the Cultural Revolution are central to the novel to say the novel doesn't focus on Tiannamen Square or the protests in 1989, and their subsequent crackdown, would be a fundamental misreading. If measured in word or page count those events may seem like a parenthetical to larger questions of collective action, collective guilt and self-preservation. By just about any other measure of interpretation, though, to not recognize the importance of Tiannamen Square to the narrative arc of the novel would be a basic failure of reading and comprehension.</p>

	    <p>I have tested these questions with about half a dozen of the largest and most popular open-weight models. I have tested these questions on models which were released as recently as last week. While most are able to capture some of the broad, thematic concerns of the novel none of them – and I mean, literally, none of them – were able to recap the narrative, principal characters or even documented factual information about the author correctly. I will publish the transcripts of these question and answer session shortly because they are an illuminating illustration of the way large language models work.</p>

	    <p>Given an n-dimensional blob of <q>all the things</q> they are able to quickly find the area of the blob where most of an answer mostly resolves. That kind of operational efficiency at basic information retrieval is genuinely novel. From there, though, you can see all the problems of trying to shape a final answer derived from the probability space – the relationships and their relative weights to one another – of all the possible answers. Or, more specifically, the possible answers derived from the data on which the model was trained.</p>

	    <p>I once saw <a href="https://scpnt.stanford.edu/scpnt-photo-and-video-gallery-overview/scpnt-video-albums/2023-pnt-symposium-day-2-invited-4">a presentation on the history of the Global Positioning System (GPS)</a> by one of the principals involved in its creation. The speaker, being a Navy veteran, was quick to point out that the GPS system was created by the US Navy and that, in its beginnings, the Air Force wanted nothing to do with it. The Navy had a vested interest in the kind of accuracy that GPS suggested because they had long been in the business of accuracy. The ocean a big place, especially when compared to a single enemy ship, so there is a lot at stake being able to precisely target another vessel and not waste munitions which, by virtue of being at sea, are in limited supply. At the time the GPS system was being developed the Air Force's acceptable degree of accuracy, when dropping bombs from the sky, was still <em>an entire city-sized square block</em>. Airplanes, by virtue of being in the sky, also have a limited supply of munitions so the solution to successfully bombing one thing on the ground was simply to bomb everything around it. Eventually the Air Force adopted GPS with gusto once its promise was proven, if only as a cost-saving measure since bombing one building is usually cheaper than bombing a dozen of them. I find this a helpful way to think about the accuracy of large language models, although it remains unclear whether there is a GPS-shaped equivalent for any of these systems. I'll come back to this in a minute.</p>
	    
          </div>

	<div class="slide" id="tools-027">
            <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-027"><img src="/weblog/2026/04/09/temptation/images/tools-027.jpg" loading="lazy"/></a>

	      <div class="creditline">
		<div><strong>Photograph: Graf Zeppelin airship</strong> 1930s</div>
		<div>Gift of Thomas G. Dragges</div>
		<div>Collection of SFO Museum <a href="https://collection.sfomuseum.org/objects/1762831553/">2015.166.2575</a></div>
	      </div>
	    </div>

	    <p>I mentioned before that I did most of these tests using open-weight models. These are models, the files which encode the n-dimensional blobs I've been talking about, which I can download and run locally. I am very interested in open-weight models for two reasons. First, they preserve a degree of privacy in their use. It's nice not having all your questions routed through a third-party service where the contract around their use is often ambiguous, at best. Second, the laws of economics dictate that by virtue of being free, and typically much smaller than the so-called <q>frontier</q> models used by commercial vendors, these models are what will find their way in to most other consumer products.</p>

	    <p>By virtue of that ubiquity it is worth trying to understand the shape of what we're about to set ourselves up for. Although these models have proven useful for specific, targeted applications their wide-spread adoption for <em>everything</em> suggests that all of life is about to become as pleasant, which is to say unpleasant, as your car's default, built-in, navigation and entertainment system. If you think all of human life can reduced to a series of customer support call-center playbooks then the future probably looks promising. If not get ready to do a lot of shouting.</p>

	    <p>I also asked a couple of the large commercial AI providers the same questions about Thien's novel. Specifically Microsoft Co-Pilot, which is basically OpenAI, and Google's Gemeni tool. Both of them were, for intents and purposes, correct. They got the factual aspects of Thien's life correct. They were able to correctly identify the principal characters. There was even an accurate understanding of the role that both Tiannamen Square and the student protests play in the novel. So what explains why these tools are so much better than the open-weight models and even the open-weight models produced by Google or OpenAI?</p>

	    <p>I think the <q>tell</q> is in one of the user-interface elements found in the big commercial offerings: Citations. What you see happening, first, is that a pool of raw materials from which to approximate an answer are being retrieved from a really, really, <em>really</em> big n-dimensional blob of inter-related data. Again, take a moment to acknowledge how impressive that alone is. But the next thing which is (or appears to be) happening is something like bog-standard web search. I am not trying to minimize bog-standard web searches, either. Doing them well is still hard and that's sort of the point I am getting at.</p>

	    <p>Once you have a set of candidate documents which can be treated as a kind of ground truth, the models are then being asked to formulate an answer derived from the initial pool of answer-mush but scoped to the boundaries these second set of documents. The answers, which are a probabilistic reconstruction of all the possible answers the model has been trained on, are <em>gated</em> against another set of truths which may or may not have been seen before.</p>
	    
          </div>

	<div class="slide" id="tools-028">
            <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-028"><img src="/weblog/2026/04/09/temptation/images/tools-028.jpg" loading="lazy"/></a>

	      <div class="creditline">
		<div><strong>Sign: Pan American Airways, maximum baggage weight</strong> 1930s</div>
		<div>Gift of Jon E. Krupnick</div>
		<div>Collection of SFO Museum <a href="https://collection.sfomuseum.org/objects/1846765543/">2022.124.784</a></div>
	      </div>
	    </div>

	    <p>What this tells me is that the models alone are not enough. This is important because it has become orthodoxy to believe that the capabilities of free and open models will meet or exceed those of commercial models after 18 months if not sooner.</p>

	    <p>It also tells me that there is an entirely separate infrastructure necessary to make these models <q>succeed</q> more often than they fail. That infrastructure is bog-standard web search which, we know, was already hard to do even before the introduction of AI and large language models. I think this is one reason why, after about a year of panicing, Google finally realized that they were incredibly well-positioned to provide AI-based offerings.</p>

	    <p>The challenge for everyone else is that for twenty years no one tried, seriously, to build anything to rival Google's search offerings. It's a hard problem and it was made harder by the promise, which we all wanted to believe, that Google could basically be trusted to operate what is – or should be – a public utility. That trust is less and less certain these days but it doesn't change the fact that most of us would be lost at sea, in contemporary life, without good, web-scale search. Not artificial intelligence but basic search and retrieval.</p>

	    <p>It is fashionable to talk about government and civil society offering subsidized <q>compute</q> to its citizens. I have nothing against the idea, in principle, but I think it is premature and misses the point. The cost of doing the maths to query an n-dimensional blob of answers is high. The cost of creating those n-dimensional blobs is even higher. There is absolutely a place for those services but I am not sure they will amount to much absent an equivalent service which can be used as a gating function. Personally I would like to see plain-vanilla search offered as a public-broadcasting style service, complete with rules around financing, governance and standards. Something like the BBC but for search. This would be a complicated endeavour but that doesn't mean it isn't necessary or that we shouldn't do it.</p>

	    <p>The reason I think this is important is that it means the rhetoric around open-weight models acts as a kind of bait and switch. It is not wrong to say that those models get better, faster and cheaper with each passing year. The functionality – the externalized metabolisms – that they create are not insignificant either but, on their own, they don't really address the ever-growing disparity in access to the means for all the extra, behind-the-curtains, infrastructure necessary for these technologies to serve as the general purpose utilities that the marketing department claims they will be.</p>
	    
	    <p>In the meantime do you know what else lends itself well to large language models paired with a gating function? Writing computer programs.</p>
	    
          </div>

	<div class="slide" id="tools-029">
            <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-029"><img src="/weblog/2026/04/09/temptation/images/tools-029.jpg" loading="lazy"/></a>

	      <div class="creditline">
		<div><strong>Slide: Pan American World Airways, Wake Island, crew layover</strong> 1956</div>
		<div>Gift of Evelyn R. David</div>
		<div>Collection of SFO Museum <a href="https://collection.sfomuseum.org/objects/1897902521/">2012.144.064</a></div>
	      </div>
	    </div>

	    <p>Computer programming languages are sort of the ultimate ground truth documents in that they are, out of necessity, a rigid and formalized representation of how systems should behave. They are also, in most cases, very well documented. Computer programs themselves, with their many bugs and unanticipated circumstances, are maybe less <q>truth-y</q> than the languages they implement but can, with some effort, be ranked and ordered in such a way that they are reliable fence-posts if not actual gates.</p>

	    <p>All of which has given rise to something affectionately known as <q>vibe-coding</q>. This is ability to describe <em>what</em> a computer program should do in natural language and then leave the <em>how</em> of that program's implementation details to a large language model. This in turn has led people to vibe-code in to an automated loop, often called <q>agentic</q> programming, where a model's output is measured against one or more tests and left to run ad infinitum until those tests pass. Unsurprisingly, this has led some people to adopt a borderline irresponsible <q>Look ma, no hands!</q> approach to creating and deploying software. I have observed that many of the people who choose this approach – sometimes referred to as <q>dark factory</q> programming after the fully-automated factories which have no lights because there are no human workers and robots don't need light to see – are also those who have achieved a level of social and financial status and stability where the cost and sometimes consequences of their actions have a limited impact.</p>

	    <p>While there are efforts to apply these patterns to open-weight models you really only see it happening with the large commercial <q>frontier</q> models precisely because, I suspect, those are the only organizations with the infrastructure to apply the gating functions I've been describing. But, and it is important to be honest about this, when it comes to large language models and writing software there is clearly <em>something</em> there.</p>

	    <p>There is ample evidence, at this point, that these systems do make it possible for people – notably people who don't already live and breathe software engineering – to make computers <em>do things</em> which previously they didn't. That is a meaningful kind of progress. Does that mean these incremental, bespoke achievements apply globally? I don't think so.</p>

	    <p>I am pretty sure that these tools, and the tools they produce, are something which should be left unsupervised. It's not just these systems, gating functions and all, still produce garbage. Rather, the problem is the speed with which that garbage is produced, the harms said garbage may cause and the volume of garbage produced being so great that it overwhelms our ability to see or reason about everything going on.</p>

	    <p>But, there is still <em>something</em> there which is worth paying attention to. The risk, as human history shows us over and over again, is that we will treat these systems as a hammer and see every situation as a nail. The results will eventually, after the shock has worn off, be hilarous assuming they aren't catastrophic first. But, and this is also important to remember, we still have hammers but we have them for a <em>reason</em>.</p>

	    <p>From a museum perspective all of these changes have signaled a shift in how we think about digital preservation, notably for software systems. Previously it was all about emulation. How do we recreate the entirety of the historical computing environment, inside of a contemporary computing environment, that a digital artifact operated in? The ability to automate software production has led a number of people to suggest that, in 2026, it is both easier and more efficient to simply have a coding agent rewrite a digital artifact from scratch. Which has had the interesting side-effect of validating the Cooper Hewitt's approach to collecting software in 2013.</p>
	    
          </div>

	  <div class="slide" id="tools-030">
	    <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-030"><img src="/weblog/2026/04/09/temptation/images/tools-030.jpg" loading="lazy"/></a>
	    </div>

	    <p>In 2013, I was part of the <a href="https://labs.cooperhewitt.org/">Digital and Emerging Media team</a> at the Cooper Hewitt Smithsonian National Design Museum. That year <a href="https://www.cooperhewitt.org/2013/08/26/planetary-collecting-and-preserving-code-as-a-living-object/">we acquired an iOS application</a>, a music player and visualization tool, called <a href="https://collection.cooperhewitt.org/objects/35520989/">Planetary</a>. One of the notable aspects of the acquisition was that we knew, in advance of the acquisition, that the application would stop working as soon as the next major version of the iOS operating system (iOS 7) was released.</p>

	    <p>As such, we adopted <a href="https://aaronland.info/weblog/2014/04/03/things/#mw2014">a very deliberate acquisition strategy</a>: We, being the Smithsonian, acquired the software from the company which produced it, Bloom Studios, and then the Smithsonian, rather than Bloom, relicensed that code as an open-source project. We, of course, also printed out that code and put it in a box on a shelf because that is historically what museums have done with digital acquisitions. You can see it there on your left: <code>DIG001</code>, the Cooper Hewitt's first digital acquisition. We chose to open source Planetary for two reasons.</p>

	    <p>The first was as a well-intentioned provocation to encourage the broader cultural heritage sector to think about what it means to acquire complex systems like this which comprise not just the software itself but the operating system it runs on in addition to the hardware that the operating system in turn runs out. All of these are dynamic systems produced by commercial interests which rarely have the inclination or the means to consider historical preservation. Systems which most cultural heritage organizations have neither the technical knowledge or the infrastructural capacity to reproduce. Mostly, we have done little more than preserve literal snapshots (photos) of these systems. If we can convert something into 2D media, be it photographs or other printed material, we know how and what to do to keep it safe for the future. That, however, feels wholely inadequate for digital systems which have become one of the defining characteristics of modern life.</p>

	    <p>Which led us to the second reason: That <q>design</q>, the bread and butter of museum, had already become increasingly ephemeral and tangible in its manifestations and we, as a museum, really didn't have any idea how to collect things for which there isn't always a physical object. What doesn it mean to collection service design or experience design? What does it mean to collect an iPhone solely as an industrial design object but absent any way to convey what the introduction of <q>pinching and zooming</q> represented. The list goes on.</p>

	    <p>The acquisition of the Planetary application, then, was not about the specific ObjectiveC code running on an iPad. It was the acquisition of the interaction design and the visualizations those two things made possible. The fact that these were executed in a particular programming language and on a particular hardware device tells you something about the context, and the constraints, in which the application was created but they do not define the application itself. Our hope was that one or more people would port the iOS code to one or more <em>other</em> platforms as a way to demonstrate the value and artistry of the design choices that Bloom had made.</p>

	    <p>For many years no one did. A lot of people thought we were crazy, <a href="https://www.cooperhewitt.org/2019/05/16/planetary-cooper-hewitts-first-ios-app/">perhaps even irresponsible</a>, in how we had gone about the acquisition. Then one day, <a href="https://aaronland.info/weblog/2020/09/10/story/#software">seven years later</a>, an Australian software developer named Kemel Enver published <a href="https://apps.apple.com/us/app/planetary-remastered/id1473561807">Planetary Remastered</a> to the iOS App Store. Enver, who had simply found the <a href="https://github.com/cooperhewitt/Planetary/">source code on GitHub</a> and who had not read any of the many talks and essays underpinning its release, had done exactly <a href="https://www.cooperhewitt.org/2022/02/16/a-love-letter-to-planetary/">the thing we had always hoped would happen</a>.</p>

          </div>

	  <div class="slide" id="tools-031">
	    
	    <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-031">
	      <video controls="true" style="max-width:100%">
		<source src="/weblog/2026/04/09/temptation/images/planetary.mp4" type="video/mp4"/>
		<!-- <img src="/weblog/2026/04/09/temptation/images/tools-031.jpg" loading="lazy" /> -->
	      </video>
	      </a>	      
	    </div>

	    <p>This is the video that Cooper Hewitt posted to announce Enver's contributions. I am going to play it in its entirely so you can see the breadth and depth of the application and the scope of Enver's efforts. Six years after his updates is it possible to imagine that his work could have been completed, or at least meaningfully aided, by a sufficiently robust large language model? I am not sure but I don't think the answer is a categorical <q>No</q>. It is probably something like a cautious <q>Maybe?</q></p>

          </div>

	<div class="slide" id="tools-032">
            <div class="image640">
              <a href="/weblog/2026/04/09/temptation#tools-032"><img src="/weblog/2026/04/09/temptation/images/tools-032.jpg" loading="lazy"/></a>
	    </div>

	    <p>And in fairness, I had to download <a href="https://www.youtube.com/watch?v=tB_28-56Hnc">that video</a> from YouTube which produced something called a <q>mvk</q> file which... <a href="https://www.adobe.com/creativecloud/file-types/video/container/mkv.html">I don't know what that is either</a>. So I asked a large language model running on my laptop how to convert it in to a plain-vanilla movie file. And it worked. This stuff <em>does</em> work sometimes. At least until it doesn't.</p>
	    
          </div>

	<div class="slide" id="tools-033">
            <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-033"><img src="/weblog/2026/04/09/temptation/images/tools-033.jpg" loading="lazy"/></a>

	      <div class="creditline">
		<div><strong>Negative: San Francisco International Airport (SFO), bond issue</strong> 1967</div>
		<div>Collection of SFO Museum</div>
		<div>Collection of SFO Museum <a href="https://collection.sfomuseum.org/objects/1511946817/">2011.032.1539</a></div>
 	      </div>
	    </div>

	    <p>It is important to point out that much, if not all, of the boosterism around AI depends on three things. It is also important to point out that these premises are not necessarily ill-founded. They all rely on evidence of having overcome similar challenges in the past or believing that these are consequences which can still be externalized. You and I may not agree that these analogues hold true in the current moment but a person might be forgiven the entertaining the idea that they do.</p>
	    
          </div>

	<div class="slide" id="tools-034">
            <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-034"><img src="/weblog/2026/04/09/temptation/images/tools-034.jpg" loading="lazy"/></a>

	      <div class="creditline">
		<div><strong>Negative: San Francisco International Airport (SFO), United Air Lines, Douglas DC-8</strong> 1959</div>
		<div>Collection of SFO Museum</div>
		<div>Collection of SFO Museum <a href="https://collection.sfomuseum.org/objects/1511944311/">2011.032.0484</a></div>
	      </div>
	    </div>

	    <p>The first assumption is that we will solve not only the environmental costs of these systems, but the environmental, human and social costs these technologies incur. These are already well-documented elsewhere so I am not going to go through the laundry list here. While some of the data is hard to come by, difficult to verify independently and sometimes subject to interpretation I think when you tally up just those costs across board – environmental, human and social – it is clear this technology is unsustainable without some serious remediation or a complete disregard for others.</p>
	    
          </div>

	<div class="slide" id="tools-035">
            <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-035"><img src="/weblog/2026/04/09/temptation/images/tools-035.jpg" loading="lazy"/></a>

	      <div class="creditline">
		<div><strong>Complimentary cocktail voucher: Hughes Airwest</strong> c. 1977</div>
		<div>Gift of Trudy and Henry Sandoval</div>
		<div>Collection of SFO Museum <a href="https://collection.sfomuseum.org/objects/1762830923/">2010.012.059</a></div>
	      </div>
	    </div>

	    <p>The second assumption is that all of this technology will be commoditized. This is not a crazy idea on the face of it. It is an idea which is central to most of contemporary life. That the cutting edge will be paid for by investors and early adopters and once the shape of its commercialization is worked out  it will be standardized and mass-produced, making it ubiquitous and low-cost. Profits, in turn, are recouped in volume. Everyone wins, for some definition of winning. It is a model that lacks romance but it is a model which has proven itself to work over and over again so it's not completely far-fetched to imagine it could happen for AI.</p>

	    <p>The problem is that I am not sure this model applies to AI. Partly for reasons already discussed, namely that the very infrastructure which makes general-purpose AI useful does not lend itself to commodification. Partly because I think when you look at the stated motivations, past and present, of many of the actors involved in promoting these technologies it is clear they are trying to create an all-encompassing framework where every interaction passes through one or more AI systems and each one of those events is both metered and billed.</p>

	    <p>We are desperate to believe that these technologies will be commodified because when they <em>do</em> work their promise is obvious but no one is bearing their true cost yet. None of the big commercial AI vendors are turning a profit yet and continue to burn through money at a spectacular rate. All of these interactions are, in effect, subsidized. That can be <a href="https://craigmod.com/roden/112/">a golden spot to find oneself in</a>, for as long as it lasts, particularly if you are using these tools to build things which previously didn't exist and, importantly, which don't depend on those tools to continue working.</p>

	    <p>People aren't excited about this stuff for nothing. They may be excited mostly because this stuff is basically free right now but that is not nothing. The problem is that I guarantee you the bill will come due soon. We are already starting to see that happen in the ways that these services are being marketed. It is gradual because there is a commercial incentive to not spook people but the real price is probably coming faster than anyone expects, or wants, and it will probably be ugly.</p>
	    
          </div>

	<div class="slide" id="tools-036">
            <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-036"><img src="/weblog/2026/04/09/temptation/images/tools-036.jpg" loading="lazy"/></a>
	      <div class="creditline">
		<div><strong>Watch band calendar set: Cathay Pacific Airways</strong> 1971</div>
		<div>Gift of Vincent Ma</div>
		<div>Collection of SFO Museum <a href="https://collection.sfomuseum.org/objects/1796996915/">2021.020.0766</a></div>
	      </div>
	      
	    </div>

	    <p>The third assumption is that we will simply adapt to the needs of these systems. Ruby Justice, in his talk <a href="https://www.youtube.com/watch?v=kNDnXadgvqo">Is AI Making Us Cyborgs?</a> talks about this as a kind of <q>numerical facsimile</q>. You already see this happening in vibe-coding circles where people are twisting themselves in knots to decipher the best language and phraseologies to use when providing instructions to a large language model. This has already prompted others to point out that any <q><a href="https://haskellforall.com/2026/03/a-sufficiently-detailed-spec-is-code">sufficiently detailed specification is code</a></q> but it wouldn't be the first time we have tried to mold our collectivity around ways of being. It's worth pointing out that those efforts have rarely ended well. They often yield at least a generation's worth of atrocities and traumas but it can be done.</p> 
	    
          </div>

	<div class="slide" id="tools-037">
            <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-037"><img src="/weblog/2026/04/09/temptation/images/tools-037.jpg" loading="lazy"/></a>

	      <div class="creditline">
		<div><strong>Negative: San Francisco International Airport (SFO), artwork display</strong> 1968</div>
		<div>Collection of SFO Museum</div>
		<div>Collection of SFO Museum <a href="https://collection.sfomuseum.org/objects/1511947033/">2011.032.1636</a></div>
	      </div>
	    </div>

	    <p>So, what does this mean for museums? Honestly, no one knows.</p>
	    
          </div>

	<div class="slide" id="tools-038">
            <div class="image640">
              <a href="/weblog/2026/04/09/temptation#tools-038"><img src="/weblog/2026/04/09/temptation/images/tools-038.jpg" loading="lazy"/></a>
	    </div>

	    <p>On good days, I like to believe that all these new-found externalized metabolisms will be the catalyst that allows the sector to move beyond decades worth of learned helplessness and disbelief about the kinds of things we can do with limited staff and limited budget.</p>
          </div>

	<div class="slide" id="tools-039">
            <div class="image640">
              <a href="/weblog/2026/04/09/temptation#tools-039"><img src="/weblog/2026/04/09/temptation/images/tools-039.jpg" loading="lazy"/></a>
	    </div>

	    <p>On bad days, I fear it will only re-enforce the existing organizational and power dynamics in museums: That, at least in North America, there has only ever been a thin crust of well-paid senior executives operating museums built atop the bones of an army of replaceable grad students (or recently graduated students saddled with debiliating debt) crushed under foot.</p>

	    <p>Will the curatorial department use these systems to finally do away with the education department? Will the director's office use these systems to replace the curatorial department? Will the board simply do away with the entire staff, director included, using these systems to create on-demand programming as their whim and folly suit them? Will all of a museum's technology dreams finally come true because those dreams were always <a href="https://www.aaronland.info/weblog/2020/04/06/futures/#capacity">about ephemeral flash rather than integrating technology in to the museum practice</a> and the cost of an <q>AI solution</q> is simply less than the amount already spent on external vendors?</p>

	    <p>I don't know but I do expect at least one museum will try one, if not all, of these things in the next few years.</p>
	    
          </div>

	<div class="slide" id="tools-040">
            <div class="image640">
              <a href="/weblog/2026/04/09/temptation#tools-040"><img src="/weblog/2026/04/09/temptation/images/tools-040.jpg" loading="lazy"/></a>
	    </div>

	    <p>The one thing I do know is that we – the cultural heritage sector – are little more than passive consumers right now. What little agency we have consists only of which commercial services we choose to use. We are able to choose who we give the proverbial <q>twenty bucks a week</q> to but that's about it.</p>
          </div>

	<div class="slide" id="tools-041">
	  <div class="image640">
	    <a href="/weblog/2026/04/09/temptation#tools-041"><img src="/weblog/2026/04/09/temptation/images/tools-041.jpg" loading="lazy"/></a>
	  </div>

	  <p>Ideally, we – the community of cultural heritage organizations – should be working to develop our own large language models. Models that strive to match the comprehensiveness of the commercial <q>frontier</q> models and which best them at the pesky details like provenance, governance, renumeration, bias and privacy. If that sounds hard and daunting that's because it is. In 2026, it is not just hard but effectively impossible given the difficult corner that the cultural heritage sector has painted itself into the question of a sustainable technology practice. It is still hard for many museums to operate a single website for more than five years so it would be unrealistic to think they have the capacity to support, let alone develop, capacities involving machine learning.</p>

	  <p>And yet, if we don't – if we don't even <em>try</em> – to address these deficits then we are probably assured a cultural heritage born of mystery-meat block-box systems whose background and biases are as confusing and opaque as their answers. Or only available to those who can afford to pay for the <q>correct</q> answer. If we are comfortable going to back to a world where the knowledge, the libraries, of the world are only available to the aristocracy and everyone else struggles to get by on rumour and innuendo then I guess our work is done and we can just go for a drink now.</p>

	  <p>Which is a bit of a grim note to end on so, instead, I am going to tell you about some of the work we're doing involving large language models at SFO Museum and about the larger <q>meta</q> project to try and use that work to help us become producers, or at least something which tilts towards producing, rather than mere consumers. These are very much baby steps and it is still not clear whether they will amount to much but they are <em>something</em>.</p>
	  
	</div>
	
	<div class="slide" id="tools-042">
            <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-042"><img src="/weblog/2026/04/09/temptation/images/tools-042.jpg" loading="lazy"/></a>
	      <div class="creditline">
		<div>Behind Ted McMann's Garage, James Torlakson</div>
		<div>San Francisco Arts Commission</div>
		<div><a href="https://www.sfomuseum.org/public-art/public-collection/behind-ted-mcmanns-garage">SFAC 7</a></div>
	      </div>
	    </div>

	    <p>This is what James Torlakson's painting <a href="https://www.sfomuseum.org/public-art/public-collection/behind-ted-mcmanns-garage">Behind Ted McMann's Garage</a>, on display at SFO in Terminal 3, <q>looks</q> like in a 512-dimensional space. This giant list of numbers is the kind of thing that large language models convert text and images in to in order to find similar things in an n-dimensional blob of <q>all the things</q> which more accurately, because it bears repeating, is really a blob of <q>all the things the model was trained on</q>.</p>

	    <p>These big lists of numbers are called vector embeddings and, so far, we have created <a href="https://millsfield.sfomuseum.org/blog/2026/01/09/similar/">three different sets of embeddings derived from three different large language models</a> for every object image in our collection, every Instagram photo we've posted and all the exhibition photos we've produced. We generate these once and store those data locally on our servers. Then we put all those data in to <a href="https://github.com/sfomuseum/go-embeddingsdb">a special database designed to index and query vector embeddings</a>. Importantly the database doesn't know anything about the models which created the embeddings. It only knows about the embeddings and the math necessary to find other similar embeddings.</p>
	    
          </div>

	  <div class="slide" id="tools-043">
	    <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-043"><img src="/weblog/2026/04/09/temptation/images/tools-043.jpg" loading="lazy"/></a>
	      <div class="creditline">
		<div><strong>Fixed fan: Japan Air Lines</strong> 1954</div>
		<div>Gift of Thomas G. Dragges</div>
		<div>Collection of SFO Museum <a href="https://collection.sfomuseum.org/objects/1913842055/">2002.035.186</a></div>
	      </div>
	    </div>

	    <p>Here's an example of that that looks like. These are the most <q>similar</q> objects in our collection to Japan Airlines fan from 1954.</p>
	    
          </div>

	<div class="slide" id="tools-044">
	  <div class="image640">
	    <a href="/weblog/2026/04/09/temptation#tools-044"><img src="/weblog/2026/04/09/temptation/images/tools-044.jpg" loading="lazy"/></a>
	    <div class="creditline">
	      <div><strong>Poster: Swissair, fleet history</strong> 1970s</div>
	      <div>Gift of the William Hough Collection</div>
	      <div>Collection of SFO Museum <a href="https://collection.sfomuseum.org/objects/1779488339/">2006.010.431</a></div>
	    </div>
	  </div>
	  
	  <p>Or for this poster of the Swissair fleet history. If you look carefully you'll notice that the list of similar objects – other fleet history posters – includes a flight-safety card. Why? As I've said before: <em>No one really knows</em>. With that in mind part of the larger <q>meta</q> project I mentioned earlier becomes producing vector embeddings for as much imagery as we can for as many different models as we can. The goal is to compile as large a dataset of inputs and outputs as possible with which we might be able to test those models and embeddings with data we can recognize.</p>
	  
	</div>

	<div class="slide" id="tools-045">
	  <div class="image640">
	    <a href="/weblog/2026/04/09/temptation#tools-045"><img src="/weblog/2026/04/09/temptation/images/tools-045.jpg" loading="lazy"/></a>
	    <div class="creditline">
	      <div><strong>Photograph: Pan American World Airways, Atlantic Division</strong> c. 1950</div>
	      <div>Gift of William Craig</div>
	      <div>Collection of SFO Museum <a href="https://collection.sfomuseum.org/objects/1998536701/">2023.093.130 a b</a></div>
	    </div>
	  </div>

	  <p>I've spent a lot of time talking about the idea of gating functions to correct the potential for weird or incorrect responses generated by large language models. By contrast, the similar objects feature, which is enabled for every object on <a href="https://collection.sfomuseum.org/">our collection website</a> now, relies on a measure of weirdness in the results as a way to make things interesting. It depends on relationships between objects which are <em>similar but not the same</em> as a way to invite discovery and to surface things which would otherwise never be found.</p>
	  
	</div>

	<div class="slide" id="tools-046">
            <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-046"><img src="/weblog/2026/04/09/temptation/images/tools-046.jpg" loading="lazy"/></a>
	      <div class="creditline">
		<div><strong>Luggage label: Sabena Belgian World Airlines, Canada</strong> c. 1960</div>
		<div>Gift of the Captain John B. Russell Family</div>
		<div>Collection of SFO Museum <a href="https://collection.sfomuseum.org/objects/1762888613/">2012.149.1296</a></div>
	      </div>
	    </div>

	    <p>I've also spent a lot of time talking about the scale of infrastructure necessary to operate large language models so I also want to make sure to point out that all the data processing which makes the similar objects feature possible is done <a href="https://millsfield.sfomuseum.org/blog/2026/02/10/docent/">offline on consumer-grade hardware</a> (a Mac Mini) in the closet. So, again, the larger <q>meta</q> project becomes about identifying and demonstrating ways and means by which we (and other museums) might entertain, or at least investigate, the use of these technologies without a third-party intermediary or a recurring monthly service fee.</p>
	    
          </div>

	  <div class="slide" id="tools-047">
	    <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-047"><img src="/weblog/2026/04/09/temptation/images/tools-047.jpg" loading="lazy"/></a>
	    </div>

	    <p>I mentioned that we had created vector embeddings not just for images of objects in our collection but for other sources, like <a href="https://millsfield.sfomuseum.org/instagram">photos posted to Instagram</a>. Obviously, the next thing we did was create vector embeddings for <em>other</em> museum collections. This is a screenshot of an object, and similar objects, from the <a href="https://www.nga.gov/">National Gallery of Art</a> (NGA).</p>
	    
          </div>
	  
	  <div class="slide" id="tools-048">
	    <div class="image640">
	      <a href="/weblog/2026/04/09/temptation#tools-048"><img src="/weblog/2026/04/09/temptation/images/tools-048.jpg" loading="lazy"/></a>
	    </div>

	    <p>This is a screenshot of a photograph by Jeff Devine, part of the <a href="https://www.sfomuseum.org/exhibitions/jeff-divine-1970s-surf-photography">1970s Surf Photography</a> exhibition next to similar objects from the NGA. Cross-institutional collections search continues to be something of a <q>holy grail</q> in museums and other cultural heritage institutions. Vector-based image similarity, while imperfect, offers a meaningful first step to finally achieving it.</p>
	  </div>

	<div class="slide" id="tools-049">
	  <div class="image640">
	    <a href="/weblog/2026/04/09/temptation#tools-049"><img src="/weblog/2026/04/09/temptation/images/tools-049.jpg" loading="lazy"/></a>
	  </div>

	  <p> It is a messy, or at least fuzzy, solution and tilts more towards the <q>discovery and surprise</q> end of the spectrum rather than informed and academic research. It is important to be realistic about what it affords and what it doesn’t. While it may not be a tool for scholars and experts it can still provide meaningful avenues for non-experts to investigate and discover the relationship between different collections. This was still the stuff of fantasy a few years ago and now it is within our reach using nothing more than consumer-grade computer hardware.</p>

	  <p>This is a screenshot showing the James Torlakson painting and similar images from the Flickr <a href="https://www.flickr.com/groups/airports-sfo/">SFO airport</a> group photo pool. These results may not be <q>right</q> but they aren't wrong either.</p>
	</div>

	<div class="slide" id="tools-050">
	  <div class="image640">
	    <a href="/weblog/2026/04/09/temptation#tools-050"><img src="/weblog/2026/04/09/temptation/images/tools-050.jpg" loading="lazy"/></a>
	  </div>

	  <p>This is another photo from that same Flickr group pool paired with similar photos from the Museum's Instagram account. We have not integrated similar images from other sources with our collection yet. There are a whole bunch of considerations to address but there are many things to like about the idea.</p>
	</div>

	<div class="slide" id="tools-051">
	  <div class="image640">
	    <a href="/weblog/2026/04/09/temptation#tools-051"><img src="/weblog/2026/04/09/temptation/images/tools-051.jpg" loading="lazy"/></a>
	  </div>

	  <p>The other thing we've started doing is <a href="https://static.sfomuseum.org/embeddings/">publishing these vector embeddings</a> for others to download and use as their circumstances merit. We have also posted the embeddings we have generated for other museums, derived from their respective open data releases, but are also encouraging them, and others, to do the same. Vector embeddings still take time and resources to create. Given that any given organization is already producing these products for their own internal use <a href="https://millsfield.sfomuseum.org/blog/2026/04/06/shared-embeddings/">why not also share them with others</a> and save everyone the duplicated efforts.</p>

	  <p>The goal is not to be proscriptive about the use of these embeddings. It is too soon for that. The goal is to produce enough data that we, and others, might figure out not just what we want to do with these data but to understand the tolerances and affordances that vector embeddings and the machine learning tools which produce allow. This work will not solve the larger issues we face when it comes to AI technologies but perhaps they will help us, the collective <q>us</q>, to develop the skills and to demonstrate the ability to use these tools on our own terms.</p>
	  
	</div>

	<div class="slide" id="tools-052">
	  <div class="image640">
	    <a href="/weblog/2026/04/09/temptation#tools-052"><img src="/weblog/2026/04/09/temptation/images/tools-052.jpg" loading="lazy"/></a>
	  </div>
	  
	  <p>In closing here is a screenshot showing the AI-generated hippo <q>holding</q> an airplane, the one I mentioned at the beginning of this talk, with similar objects from the NGA. Again, not right but, somehow, not wrong either.</p>
	  
	</div>
	
	<div class="slide" id="tools-053">
	  <div class="image640">
	    <a href="/weblog/2026/04/09/temptation#tools-053"><img src="/weblog/2026/04/09/temptation/images/tools-053.jpg" loading="lazy"/></a>
	  </div>
	  
	  <p>Thank you.</p>
	</div>
	
      </div>
    </div>]]></description>
    </item>
  </channel>
</rss>
