Experiments in Photogrammetry

Negative: San Francisco International Airport (SFO), film crew. Negative. Transfer from San Francisco International Airport, SFO Museum Collection. 2011.032.2109

This was originally published on the SFO Museum Mills Field weblog, in November 2023.

This is a blog post about using consumer-grade hardware and a suite of open-source software tools to generate high-quality three-dimensional (3D) models from photographic imagery, a process commonly referred to as "photogrammetry", which can be viewed and interacted within a web browser. This is a fairly technical blog post but the non-technical takeaways are: It's possible, it's affordable or at least meaningfully cheaper than it used to be, it's harder than it should be, we have developed tooling to make things easier and we would love your help to make it all better.

The catalyst for this work was the Meet Object Capture for iOS session at Apple's 2023 developer conference. This session introduced people to the ability for iOS devices equipped with LiDAR cameras to not only capture the imagery used to derive 3D models but also to render those models "on device". In 2023 an "iOS device equipped with LiDAR camera" can cost as little as $600 for a refurbished iPhone 12 Pro or $800 for an entry-level iPad Pro. As part of the developer session Apple provided a sample application demonstrating how to use these new features. We have taken that application and used it as the first building block in what we imagine could be an efficient pipeline for capturing and publishing 3D models within the operation and financial constraints of a museum. This blog post outlines that work describing the hiccups we encountered along the way and how we addressed those problems.

Before going any further I want to mention that the object used to test photogrammetry tools in this blog post is not an object from the SFO Museum Aviation Collection. It's an object I brought in from home and it was chosen because it is a complex shape with lots of textures and odd surfaces. In short it's an object that is more like a lot of the things in a museum collection than not and highlights some of the challenges (or at least considerations) that remain when creating 3D objects. In fact, the final model used for this blog post is incomplete precisely because I wasn't very careful when considering which angles and surfaces to scan. I chose to leave the incomplete model as-is to demonstrate the results of that inattention.

Capturing

As mentioned, capturing imagery to create 3D models is done using a fork of Apple's Scanning objects using Object Capture example application included with WWDC23 session 10191: Meet Object Capture for iOS. This is a very good application providing an intuitive interface, useful feedback and helpful prompts about lighting and indicating which parts of an object need to be scanned in order to produce a good model.

The motivation behind our fork is to use the existing application for scanning objects but to update and add functionality for transferring (scan) data to a MacOS device for rendering higher-quality models, described below. As of this writing our application contains the following changes:

It adds an Info.plist property to enable iTunes file sharing so that scan data can be transferred from an iOS device to a MacOS device.
It explicitly disables removing scan data (images and other associated files) when an object scan is completed or cancelled. That means it is left to individual users to remove scan data from iOS devices as necessary.
It adds stub (disabled) code to trigger confirmation dialogues to signal whether on-device rendering should occur and/or whether scan files should be removed. Neither of these things work yet. This is a SwiftUI application and I am not a SwiftUI person so I am still feeling my way around things. As of this writing I have exhausted the time available for an exploratory project. The changes described above are not sufficient for putting the application in the hands of other museum staff members but they are "good enough" to move things forward.

We have not published a "finished" version of this application so if you want to try it out you will need to download the code and compile it locally.

Rendering

After running the "guided capture" application I have a finished model that I can view and interact with on-device. I can even export the finished model but because it was rendered on an iOS device it is created with a lower resolution ("preview") than the museum would want for this kind of work.

We have developed a Swift library and command-line tool to render a set of scanned images at any of the available resolutions. There is also a good MacOS application called Photogrammetry for doing the same thing with a graphical user interface but we developed our own library and command line tool because we imagine integrating it in to an automated (and "headless") pipeline for processing 3D models.

The command line tool takes as its input the path to a folder containing images of a scanned object, the path to the 3D model derived from those images and a set of optional flags to specify how the model should be created. In this example I am creating a model with a resolution of "medium" which will produce a 27MB file; by comparison a model rendered with "preview" level resolution will be about 2.5MB and a model rendered with "full" resolution will be over 70MB.

$> cd swift-photogrammetry-render-cli
$> swift build

$> ./.build/debug/render \
	/usr/local/data/Scans/2023-11-20T19:18:18Z/Images/ \
	/usr/local/data/bollard.usdz \
	--detail medium

Remember: After completing the image capture and rendering the object on iPad (shown above) I have connected that iPad to a MacOS device and transferred the images manually in to the /usr/local/data/ folder.

After the model is created I can view and interact with in the MacOS Finder application.

Here's that model zoomed in to a detail in the Preview.app. This model was derived from images captured using the camera on an iPad. The fact that a (lower-resolution) model can also be generated on that same iOS device suggests that the upcoming Vision Pro headset will be capable of continuously "photogrammetry-ing" everything around it which is both fascinating and a little terrifying.

These models are gaining support throughout all of Apple's operating systems and applications. For example, I can load these models in to Keynote presentations or even in to VisionOS applications developed for the Vision Pro headset. Here's a screenshot of the latter running in XCode's Vision Pro emulator which conveniently ships with a "museum" environment:

On the other hand, I can not view these models in a web browser "as-is" yet.

USDZ

There are a lot of different file formats for 3D models but this is not the time or the place to discuss their relative merits. Apple, for its own reasons, has chosen the USDZ format which is a variant of Pixar's Universal Scene Description (USD) format.

Likewise, there are a number of different frameworks for rendering 3D models in a web browser but for the purposes of this blog post we started with a framework called three.js which has been around for a number of years and supports a variety of 3D model formats including USDZ models. For example, here is some abbreviated JavaScript code to do just that:

const loader = new USDZLoader();

loader.load('./bollard.usdz', function (usd){
	scene.add(usd);
});

But when I try to run that code in a web browser this is what I see:

Specifically, this error message:

THREE.USDZLoader: No usda file found.

It turns out that USDZ files can contain either a "USDA" or a "USDC" mesh. For example, here are the contents of a "USDZ" model produced by Apple:

M    Mode       Size         Date&time         Filename
- ----------  --------  --------------------  ----------------
  -rw-rw-rw-   2690955  21-Nov-2023 11:22:22  baked_mesh.usdc
  -rw-rw-rw-  26301765  21-Nov-2023 11:22:18  0/baked_mesh_disp0.exr
  -rw-rw-rw-  22684331  21-Nov-2023 11:22:18  0/baked_mesh_tex0.png
  -rw-rw-rw-   9811748  21-Nov-2023 11:22:18  0/baked_mesh_norm0.png
  -rw-rw-rw-   4326353  21-Nov-2023 11:22:18  0/baked_mesh_ao0.png
  -rw-rw-rw-   9092442  21-Nov-2023 11:22:18  0/baked_mesh_roughness0.png
- ----------  --------  --------------------  ----------------
              74907594                         6 files

At this point, I'll confess that I don't really understand what the difference is between a "USDA" and a "USDC" mesh except that Apple uses the latter and everyone else, it seems, expects the latter. During my investigations, I did discover that Pixar has released command line tools for converting between the two. These tools require time and patience to install so what follows is not a recommendation to use them (for reasons that will become clear shortly) so much as a demonstration that it is, technically, possible to convert a USDZ model generated by Apple in to a USDZ model that can be consumed by everyone else.

$> mkdir bollard
$> cp bollard.usdz bollard

$> cd bollard
$> unzip bollard.usdz

$> /opt/local/USD/bin/usdcat baked_mesh.usdc -o baked_mesh.usda

$> zip -r ../bollard-a.usdz ./

Which all works fine until I view the new model which ends up being rendered like this:

While this model has a certain quality and could be an interesting jumping-off point for some larger discussions about representation, interpretation and mediation it's really not going to be suitable for the purposes of a museum collection:

glTF

Did I mention that there are a lot of 3D object model formats? In addition to USDZ there is another one called glTF which is becoming a common standard for, it seems, everyone except Apple and is well-supported in web browsers. But how do we convert our USDZ model in to a glTF model? By benefitting from the hard work of all the people who've contributed to the Blender project and Rob Crosby who wrote a Blender plugin to import and export the wide range of USDZ models.

Blender is an open source 3D modeling application that supports plugins and scripting which can be run in "headless" mode from the command line. In short, we can combine all of these things alongside a custom script to import a USDZ model and then export it as a glTF model like this:

$> blender --python \
	/usr/local/sfomuseum/BlenderUSDZ/scripts/usdz2glb.py \
	/usr/local/data/bollard.usdz \
	/usr/local/data/bollard.glb

A quick reminder: The reason we want to be able to script these things and run them from the command line is that our larger goal is to automate this entire process so that the output of the render tool, described above, can be fed to our usdz2glb.py Blender script to produce a glTF model without any manual intervention.

And then if we update the three.js code to import and render a glTF model (instead of a USDZ model) like this:

const loader = new GLTFLoader();

loader.load('./bollard.glb', function (gltf){
	scene.add(gltf.scene);
});

Our model now appears in a web browser:

Using Blender to do this work feels a little bit like lighting a match with a tank but it does seem to "just work" and it (Blender) is supported in both Linux and Windows environments which means this piece of the pipeline we're imagining could happen in the "cloud" or some other set of computing resources not limited to Apple hardware.

Model Viewer

Recently, Google has introduced the model-viewer Web Component for declaring and rendering 3D models (or at least glTF models) in just a couple of lines of HTML and JavaScript code. For example this code:

<html>
    <head>
	<script type="module" src="model-viewer.min.js"></script>
	<style type="text/css">
	 model-viewer {
		 height: 100%;
		 width: 100%;
	 }
	</style>
    </head>
    <body>
	<model-viewer src="bollard.glb" ar environment-image="" shadow-intensity="1" camera-controls touch-action="pan-y"></model-viewer>
    </body>
</html>

Yields the following in a web browser:

Which is pretty great! three.js is an incredibly powerful framework for working in three dimensions in a web browser but because it is can do so many things it is also a bit "too much" for simply rendering models in place and the model-viewer Web Component makes everything just a little bit simpler. So much so that we've included a model-viewer element for the model discussed throughout this blog post below:

Note: For reasons I don't completely understand I can not get this model to load on my own site. For the time being you can see it over here. Computers, amirite?

Note: The W3C is developing the specification for a standard model HTML element to display 3D content across all browsers but it might still be a while before that work is completed.

Next Steps (Help Wanted)

This blog post describes everything we've done so far. There are a number of things to be improved including, but not limited to:

ios-guided-capture

Modal dialogs or a settings panel to indicate whether a model should be rendered on device.
Modal dialogs or a settings panel to indicate that source imagery should not be removed when a new scan is started.
How do you integrate settings panels (bundle) in a SwiftUI application?
Controls to bundle (compress) scan imagery and transfer (upload) to a remote source (or at least via the native share menu controls for easier transferring via AirDrop, etc.)

swift-photogrammetry-render (-cli)

Improved documentation
Tests

BlenderUSDZ scripts

Improve the custom "export as gltf" Blender script that we wrote. Specifically this script represents the minimum amount of code to "make it work" without really understanding either Blender or the Blender Python API in any great detail. I find it hard to believe that I found the most robust way to do things on the first try.

The Code

We would welcome pointers, suggestions and contributions for any of these items. Our goal is, first, to have a suite of tools that SFO Museum can use to quickly, easily and affordably produce 3D models of its collection objects but also, as part of our commitment to "small focused tools" to publish this work as discrete reusable (and rearrange-able) components for use in a variety of circumstances by a variety of cultural heritage organizations.

The cultural heritage sector needs as many small, focused tools as it can produce. It needs them in the long-term to finally reach the goal of a common infrastructure that can be employed sector-wide. It needs them in the short-term to develop the skill and the practice required to make those tools successful. We need to learn how to scope the purpose of and our expectations of any single tool so that we can be generous of, and learn from, the inevitable missteps and false starts that will occur along the way.

All of the code described in this blog post is available from the sfomuseum GitHub account:

This blog post is full of links.

#3d

this is aaronland

things I have written about elsewhere #20231129