In this episode, Dr. Austin Madson interviews Chris Crosby, the co-principal investigator and director of operations and strategy for the OpenTopography project. OpenTopography facilitates efficient access to topographic data, tools, and resources to advance our understanding of the Earth’s surface, vegetation, and the built environment. Funded by the National Science Foundation, it’s arguably the most comprehensive source of topographic data on the internet. During our session, we review the different resources available at OpenTopography.org (and who can/can’t access them) as well as the varied landscape of modern topographic data access and management.
Episode Transcript:
#8 – Chris Crosby
May 14th, 2024
Announcer: Welcome to the LIDAR Magazine Podcast, bringing measurement, positioning and imaging technologies to light. This event was made possible thanks to the generous support of rapidlasso, producer of the LAStools software suite.
Dr. Austin Madson: Alright, everyone. Welcome to the LIDAR Magazine podcasts series. My name is Austin Madson, I’m an Associate Editor at LIDAR Magazine and I recently joined LIDAR Mag to help with these podcasts. I’m a big fan of the format here and there are just too many exciting topics for the magazine and the podcast to cover. So, I’ll be supporting Stewart as an alternate host moving forward.
Just briefly, my background is in application of remote sensing in earth models to answer science questions within the biosphere, the hydrosphere, the cryosphere and the lithosphere. But rather than share my full story now, look for an introductory episode with Stewart and myself in the coming weeks.
So, that said, let’s go ahead and get to it. Our guest today is Chris Crosby, someone we’ve wanted to talk to for a while now. Chris is the Co-PI and Director of Operations and Strategy for the really awesome NSF-funded OpenTopography project.
OpenTopography itself facilitates this efficient access to topographic data, tools and resources to advance our understanding of the earth’s surface, vegetation as well as the built-up environment. OpenTopo is based at the San Diego Supercomputer Center at the University of California San Diego and Chris manages the day-to-day operations of OpenTopo and is actually trained as a geologist whose worked on paleoseismology projects, earthquake geology, active tectonics research in the Western United States all the way to the Caribbean and over to Central Asia.
Chris is really interested in utilization of cyberinfrastructure to manage and improve access to earth science data and processing tools. He’s an expert in the management, distribution, processing and application of high-resolution topographic data acquired via lidar and has guided the development of OpenTopography since its inception, really.
In addition to his work in OpenTopo, Chris also manages what’s called the Geodetic Imaging facility within the Earth Scope Consortium which is formally known as UNAVCO and actually met Chris several years ago when I was a data processing intern at UNAVCO during graduate school, so it’s really great to come full circle and talk to Chris today.
So, that’s enough from me. Let’s go ahead and hear from Chris. Chris, why don’t you start by providing an overview of what OpenTopo is for our listeners that may not be super familiar with it and how the project even came about?
Chris Crosby: Sure. Thanks, Austin. Thanks for the invitation to be on the podcast. This is fun. As you said, it’s nice to catch up after since we’ve known each other for quite a few years now.
Yes. So, OpenTopography is essentially an online clearinghouse for topographic data of all different types, so lots of lidar data in particular but also global types of topographic datasets that might be a 30-meter pixel resolution or even lower-resolution datasets. These are datasets that many people are familiar with, things like SRTM and the Copernicus DEMs and those kinds of products.
But OpenTopography is really this centralized place to discover those kinds of data and then to access them. So, we provide both centralized search and discovery of data but also a suite of tools that sit on top of these datasets that allow users to derive products dynamically, so things like “I want to make a digital elevation model of a specific resolution using a specific algorithm,” we will run that job for you in real time through the browser, effectively.
Then we also do things like data visualization, so derive products from digital elevation models—hills, shades, slopes, aspect, all those kinds of pretty standard topographic derivatives—and then also visualization of point clouds in the browser.
There’s API access to a lot of these datasets for people who want programmatic access, we have Jupyter Notebooks and other tools for larger scale data processing.
And then the other thing that we’ve done a lot of is training, so we spend a lot of time teaching short courses, everything from intensive one-week short courses to quick-and-dirty webinar type of things on specific topics. I think we’ve done 35 or 40 of these types of training activities over the years.
The history of OpenTopography, we were founded officially and first funded by the National Science Foundation in 2009, but a lot of the work that we did to build the system goes back to my graduate work at Arizona State University in the mid-2000’s and this is when lidar in particular was still a pretty emerging technology: there was a lot of enthusiasm around the data, a lot of interest in the scientific community about collecting these kinds of datasets for mapping of geologic features, so like the San Andreas Fault, for example.
And these were datasets that were being collected with funding from the National Science Foundation and were meant to be community resources, so then multiple researchers, graduate students, others who work with these datasets to answer scientific questions.
But for anybody who has dealt with lidar point cloud data, you know that these datasets are pretty unwieldy, they’re large. Back in those days there wasn’t a lot of software support for the data. Things like the LAS file format were still just emerging and sometimes datasets were delivered in ASCII format and not in a binary format.
So, the data management around these kinds of data that are really challenging, sharing these data was really hard and so we started playing with this idea of putting lidar data in the browser—what does that look like?—and building tools that sit on top of those datasets.
So, this idea of colocation of data storage with processing resources is something we’ve been doing for 20 years now. And that’s really, I think, the power of OpenTopo, really, is that we don’t just deliver you tiles of data, we deliver products that people actually need. So, “I want contour lines derived from my resolution topography,” we’ll make those for you dynamically.
Dr. Austin Madson: Well, speaking of dynamic processing and hosting data in the cloud and then processing it through on the supercomputer down at UCSD, what kind of tools do you all use for all this heavy lifting? I noticed you’re generating surface models and gridding out these massive point clouds and that you’re creating HTML-based interactive viewers. Are you guys using SESAM and PoTree? Can you talk a little bit about what you’re using in the background for some of these core processing workflows?
Chris Crosby: Sure. Yes, so, OpenTopo in terms of hardware is we’re based at the San Diego Supercomputer Center so we have access to both commodity compute resources but also access to some high-performance computing. We also have access to things like cloud storage and other resources.
So, OpenTopo is a hybrid system that’s what we call fit-for-purpose in terms of hardware; so some jobs run on commodity machines, other jobs we spawn out to high-performance compute infrastructure. Some datasets we are accessing via the commercial cloud and so we run our processing in the commercial cloud so we’re adjacent to the data.
So, the architecture is a little bit – is pretty distributed as a function of what we think we need or where the data are located, also in a way that we’re trying in many ways to optimize the cost associated with running the facility because it gets expensive.
So, that’s the compute infrastructure and then the software stack is really based on – largely on the open source geospatial packages that many people probably listening to the podcast are familiar with. So, we’re built on top of PDAL which is the Point Data Abstraction Library, GDAL which is the Geospatial Data Abstraction Library. These are very common, open source geospatial tools in the open source domain.
We have used historically some LAStools under the hood as well for specific kinds of things because LAStools is a very fast and well-optimized for certain kinds of processing and then we’re also using PostGIS and a lot of the other open source geospatial tools in that stack.
So, there’s nothing really about OpenTopography that’s particularly revolutionary in terms of the software stack. What I think is really interesting about OpenTopography is the orchestration layer, the fact that you’re – when you’re interacting with the portal, portal.opentopography.org and selecting data and parameterizing a workflow, essentially, we execute that workflow for you.
So, it’s really about how we do scheduling and queuing and build durability into those workflows that I think is what is really the magic of OpenTopography in many ways, not the specific geospatial modules under the hood, so to speak.
Dr. Austin Madson: Totally, yes. It’s really incredible what you all have done there over the years. When – I teach geospatial courses in my day job and a lot of times I’ll point students to OpenTopo to get a sense for processing some of this data and getting some of these really nice outputs and so there’s a lot of really cool stuff that OpenTopo does. And I know, Chris, you touched a little bit on the APIs and notebooks and learning materials. Can you talk a little bit more about them?
I was browsing the website the other day and I came across the OpenLandform catalog, for example, so that my point here is that there’s all kinds of cool little gems inside the OpenTopo website ranging from the API access to large-scale elevation models, all these really great learning notebooks and things.
Can you talk a little bit about those and how that fits into the whole OpenTopo mantra?
Chris Crosby: Your example of pointing student towards OpenTopography is a good example of this and that the system – it was built, really, to enable research and education. It gets used a lot in classes. We’ve got tons of instructors in earth sciences and geospatial classes at the community college level, at the undergraduate level, at the graduate level who have whole exercises built around using OpenTopography because it’s browser-based, it’s pretty easy and actually we just make it a lot faster to get to the product you want relative to going and downloading a bunch of LAS files and having to build that workflow.
Even if you have the expertise to do so yourself, it is almost always faster to get the data from OpenTopo than to sit down and download it and build that processing workflow inside of whatever geospatial environment you prefer.
So, that’s one of the angles here, but we’ve tried to build the system such that it allows people to get to the data at whatever level of expertise they have, so we have bulk data access so you can just do a curl or a wget and pull thousands and thousands of LAS files off our cloud servers if you’re interested in doing so. So, that’s like serving a power user.
And then in the middle of this distribution of users is the average user, I would think, is using a portal primarily. But we have APIs and those APIs provide access primarily to, at this point, digital elevation products, so not point clouds so much as raster datasets, but those things like a simple RESTful API, essentially a URL call where you can pass in a bounding box and a data type and get back an SRTM dataset, for example, for your area of interest.
And this is – these APIs have become incredibly popular; the numbers of users that interact with OpenTopography specifically because they want to use that API is very high, so that’s interesting. Millions of jobs are run against our APIs.
Dr. Austin Madson: Oh, amazing.
Chris Crosby: People pulling data, a lot of those people are pulling data into things like BlenderGIS or other toolkits where the API has actually been baked into these software packages and provide a free key that you need to use so we can keep track of utilization.
But once you have that key you can hit this API and people are using it for all kinds of really interesting applications, many of which are not in the research or education domain. People doing AR and VR and developing virtual worlds and 3D printing and just like cartography and art in and those kind of things, there’s a lot of interesting use cases that sit on top this infrastructure that we’ve built.
They’re, frankly, pretty far outside of what was the original motivating factor for OpenTopography and they were interested in access for earth science research and education originally.
So, that’s the API layer, the data access layer and then continuing on to the tour of the resources, yes, we’ve got quite a few Jupyter Notebooks that we have developed over the years that are designed to teach people how to interact with various types of software so we have a whole notebook on how to use PDAL, for example, and some other tools that demonstrate how to interact with, say, cloud optimized GeoTIFFs and the kinds of things you can do with those resource that we provide and a notebook to teach people how to use them.
And then all the way over at the other end we have basic tutorials on all kinds of different things related to topographic data. So, we just published a 24-video series on how to basically make a geologic map in ArcGIS Pro using lidar data, like a bunch of 5- to 10-minute short videos on various steps in that workflow. We have classroom resources that can be picked up and used by, say, an undergraduate level instructor.
So, our funding is primarily the National Science Foundation and the National Science Foundation is really interested in that broader impacts piece which is the education, training, capacity building piece of what we do.
So, that’s an important part of OpenTopo; it’s not just wrangling data and making data available online but it’s this supporting infrastructure and training resources and curriculum that sit around those datasets that we spend a fair amount of time on.
Dr. Austin Madson: And this is a really great segue to my next question. You touched a little bit about different user bases and things. So, can you talk a little bit more about OpenTopo’s user bases and what they do and don’t have access to and why that is? And I think you touched on that with the NSF funding, for example.
Chris Crosby: Yes. So, there’s something like 200,000 registered users of OpenTopography; that’s a lot of people interacting with a scientific portal {laughter} which is by itself kind of mind-blowing to me sometimes. And those users come from all kinds of different sectors: there’s our core academic community, that’s certainly who NSF is funding us to support. So, these are people downloading data for research applications, writing scientific papers.
We pay close attention to publications that cite OpenTopography as a basic metric of our impact and our justification for existence, frankly.
And then classroom use is another really important piece of OpenTopography impacting that development of the next generation of geospatial scientists who are going to be familiar with these datasets and understand how to process them and work with them.
But topographic data is wildly applicable to all kinds of different things whether it’s industry or it’s the governmental sector or it’s recreation. As I was saying, there’s people that hit OpenTopography from all over the world, we have a global user base; a lot of those people are interested not necessarily in the high-resolution lidar data but are interested in these nearly global topographic datasets, the BSRTM-type products and a lot of people who are grabbing those data from OpenTopo.
And then you have lots of governmental users coming from different agencies, a lot of industry use and so this is like a great demonstration of the success of OpenTopography to have this many people using the system for all kinds of really interesting things, as I said, go beyond what were the original motivations.
But it also cuts in the other direction sometimes; it’s a little challenging because we’re providing free compute, essentially, to all of these 200,000 people on the Internet. Anybody who comes in, the only requirement – there is really no requirement, you can use OpenTopography as a guest, even, and we don’t even require an account creation, although we incentivize you to create accounts by giving you higher processing limits and a few other things.
But there is a little bit of tension constantly in our world around building a system that’s robust and can stand up to this variable use from all kinds of different levels of expertise on the Internet, free on-demand computation. We answer a lot of user emails, like user support is an ongoing activity that takes a fair amount of time. I answer a lot of emails, several of the people that work for me answer a lot of emails and try to be helpful.
But this is just – there’s like a constant balance of a limited resource environment. How do you best support this diversity of users? So, we had to do – make some restrictions, essentially. There are – on a per-job basis there are restrictions on how big a job you can run. These are just practical considerations.
People try to do things like download every lidar point in the state of California and it’s sort of an unreasonable act through the Internet, through the browser. So, we put some checks and balances in the system to steer users towards rational job sizes and not to over monopolize the resource.
And then we also have access to – in the last several years we’ve started to pull datasets in from other sources. So, we have data that we host that sit on our servers in San Diego where we’re the authoritative archive.
But then we also are able now with this migration towards cloud-native data and datasets being published into things like AWS Public Datasets and other open repositories, we can actually now federate these datasets into OpenTopography so we don’t actually have to store a copy of them on our servers, we can grab them dynamically.
And that’s really exciting because we can do things like deliver every point in the USGS 3DEP 3D-elevation program datasets; this is nearly the whole United States or at least the lower 48 and you can get all that data from OpenTopography.
But that kind of spatial coverage also drives a lot of demand and so the number of users goes way up when you can get that kind of data. And so we have applied some restrictions. So, for example, some of these restricted access datasets, things like USGS 3DEP are only available to academic users. So, if you’ve got a .edu email address, you can log in and get access to these datasets.
We also support by application, a simple application, K12 educators, international researchers, people who are inside of that mandate from the National Science Foundation. But if you’re a commercial .com user, up until recently you did not have access to these datasets; we’ve actually just launched a pilot progress that we’re calling OpenTopography Plus which is a subscription-based service for access to see if there’s enough interest out there from outside of academia to – that people who would actually pay a small fee per month to use OpenTopography to access these datasets.
So this is part of our sustainability planning, trying to figure out how to keep this facility functional and resourced well enough that we can serve this diverse user community.
—
A note from our sponsor: The LIDAR Magazine Podcast is brought to you by rapidlasso. Our LAStools software suite offers the fastest and most memory efficient solution for batch-scripted multi-core lidar processing. Watch as we turn billions of lidar points into useful products at blazing speeds with impossibly low memory requirements. For seamless processing of the largest datasets, we also offer our BLAST extension. Visit rapidlasso.de for details.
—
Dr. Austin Madson: It sounds like there’s some upcoming changes going on at OpenTopo with OpenTopo Plus and this brings me to my next question. I wanted to ask about how can users outside of academia best take advantage of OpenTopo?
And I had mentioned in a previous discussion, we were chatting before we started recording here about terrain mapping. So, I fly a lot of UAS for research and for my side projects and things, and I’ve used OpenTopo data for terrain mapping and the like.
So, can you talk a little bit about how users in the private sector in the .com world can utilize OpenTopo and maybe tie that in with OpenTopo Plus?
Chris Crosby: All these lower-resolution global datasets of which I think there are a dozen or so different DEM products, these are like the USGS 10-meter data, the USGS 30-meter data, SRTM, Copernicus, NASA DEM, ALOS DEM; there’s a whole bunch of these kinds of products that are available for OpenTopography and they’re entirely unrestricted at no cost. You just need to create an API key if you want to use their API or just go to the portal and download data.
And then there’s a bunch of lidar point cloud datasets that are also freely accessible through OpenTopography for anybody to use. The only datasets that are currently restricted are these places where we’re putting our value-added, on-demand processing services on top of these other federally funded datasets, so USGS 3DEP, the NOAA coastal lidar collection is another big one that NOAA manages and you can get that data through OpenTopography.
And then we have some other more specialized datasets that are coming from the academic domain like ArcticDEM which is a two-meter resolution product that covers high latitudes and there’s an equivalent product over Antarctica.
Those datasets are currently restricted unless you sign up for OpenTopography Plus and so this was an attempt to provide a pathway for users; we’ve had hundreds of people ask, “Can we use OpenTopography to access these datasets?” and the answer has always been, “Well, we’re not particularly well-resourced to do that.”
And so this pilot program is an attempt to see if there’s enough interest out there to warrant a more sustained program of a monthly subscription-based access. So, this is sort of the equivalent to a freemium type of approach like you see in media or other services on the Internet.
We’re still testing this as we’ve only had this program running for a month and a half or so, so we’re still getting the word out that this is an option for people.
Dr. Austin Madson: I think you just got the word out.
Chris Crosby: Yes. Well, thanks for that.
Dr. Austin Madson: No, it’s all good. I think it could be really beneficial for a lot of our users.
You talked a little bit, Chris, about different data holding houses. So, NOAA has the data access for all the coastal lidar, the USGS has that really – well, I won’t say clunky, but the USGS has that National Map Viewer that hosts a lot of this data as well, the 10-meter (sounds like: net) data. I think it might even has some 3DEP tiles on there; I can’t remember. The 3DEP data is on AWS.
Can you talk a little bit about data access and management of topographic data and why do you think it’s so different and mixed in this wide-ranging hosting landscape?
Chris Crosby: Yes. So, this is one of the challenges in this space with these kinds of open geospatial data and I don’t think it exclusively applies to lidar or topographic data, it just applies to any kind of geospatial data. But there’s a lot of different organizations funding these kinds of collections.
So, in the United States the USGS 3D Elevation program is the USGS’s effort to map the United States roughly at one-meter resolution and so they are publishing those data sets in a couple of different places. You can get them through the National Map Viewer, all the 3DEP LAS files as well as the digital elevation data model are available through the National Map.
But as you said that interface can be challenging for folks especially if you access a lot of data you’re basically getting tiles of data, those kinds of things.
They’re also publishing all that data into Amazon Web Services and also it’s available through the Microsoft Planetary Computer and that’s a nice step in the right direction because it provides cloud-hosted datasets that people can then leverage and we grab data from the AWS Public Datasets buckets to make those data available through OpenTopography and then we put our tools on top of theirs.
So, if you just go to that bucket, right now you’re getting either an LAZ or Entwine Point Tile formatted data which is, again, more of a power user type of data delivery mechanism; it requires you have some understanding of software and data processing to utilize those data.
So, what we do is we grab the data from that resource but then we put our processing stack on top of it so you can create DEMs and contours and all the sort of things dynamically, the things that make OpenTopography special is that processing layer. And that’s also the thing that costs real money; running on-demand jobs is pretty expensive on a per-user basis.
That’s one pathway, but then there are other organizations that are collecting these kinds of data in the public domain, so the states, counties, other nonprofit organizations, science entities like the National Science Foundation, NASA.
And so everybody is managing their own data in different ways and so the states sometimes feel a mandate to manage data collected for their state and so there are all kinds of different clearing houses with different levels of functionality.
So, it’s just like an FTP site we grab data, but in other cases it’s like some kind of web GIS-type of interface where you can download tiles. In some cases it’s you got to call up Joe at the local GIS office and he’s got a hard drive in his office and he’ll make you a copy and he’ll FedEx it to you for the right amount of money.
So, data access is really heterogenous and then if you step – and that’s just in the US, but then if you look at the global context, increasingly lidar is being collected in other parts of the world, Europe in particular has increasingly good coverage.
But again, in each of these different countries data are managed differently, they have different licensing constraints applied to them and so this idea of building a centralized place to access all these datasets is actually quite complicated.
I think OpenTopography is the most comprehensive source of topographic data on the Internet in terms of number of points you can grab if you’re looking at point cloud data or number of global type of DEMs that are available.
But we, by no means, cover the whole globe. There’s tons of data out there that I know about that we don’t have and part of that’s a funding issue, part of that’s the challenges of centralizing access to these kinds of datasets. So, I think the move towards cloud-native data and publication into things like public datasets via like, say, the AWS Public Datasets program is a really nice step in the right direction.
But it’s a challenge to build a centralized clearinghouse that would be truly comprehensive. I’d love to do it; we’re just so frustrated by this and we’d like to frame OpenTopography in the future as a global resource for these kinds of datasets because right now when you land there and most of the data is – the high-resolution topographic data, lidar, is US-centric with the exception of New Zealand, actually.
The Land Information New Zealand, who is the national mapping agency there, are undertaking a national lidar program similar to the USGS’s 3DEP program. They elected several years ago to use OpenTopography as their primary distribution pathway for their point cloud datasets. So, we have a nice relationship with LINZ where they fund us to publish their data and so there’s great lidar point cloud coverage in New Zealand which is a cool country. There’s a lot of interesting geology and specific processes happening there.
Dr. Austin Madson: So, speaking of hosting data and things, Chris, I think OpenTopo also hosts data that folks can self-upload. Can you talk a little bit about that and what that process looks like? Maybe people that are listening now could be – maybe they have a really great airborne lidar dataset that doesn’t have particular restrictive licensing and maybe they want to upload it for the community and spread the world and all that good stuff.
Chris Crosby: So, we had this thing that we built several years ago now that we call the OpenTopography Community Dataspace and the idea was that as UAS, in particular, had become pretty cost-efficient for people, you can buy a DDI drone with a camera and go create photogrammetric surface models and point clouds or relatively low-cost laser scanners on UAS these days now.
Increasingly in the academic world researchers were going out into the field with a laser scanner or a drone or whatever and collecting data over their sites. So, it’s relatively small datasets in terms of their spatial extent but they’re pretty high resolution and they’re interesting; they’re being collected for monitoring of landslides or mapping faults or studying volcanic processes or mapping for urban planning. There’s all kinds of different applications and those datasets didn’t have an easy place to be published; there was no centralized repository for those.
So, we built this thing that we call the Community Dataspace which is essentially a drag-and-drop upload interface where you drop your LAS point clouds, your GeoTIFFs or IMGs or use your raster derivatives whether it’s digital elevation models or its orthoimagery and we do a programmatic validation of that dataset and then it gets a quick review by one of our staff to make sure that those descriptions of the datasets are sufficient and all the boxes have been checked.
And then we mint a digital object identifier which is a persistent identifier and then we publish that dataset into OpenTopography so it becomes searchable and discoverable through our interfaces through our catalog services. And it’s a really nice pathway for publishing these relatively small datasets and putting them someplace where people can find them or reuse them.
And from an academic perspective this is data publications, there was a big push in academia overall to make data findable, accessible and droppable and reusable. There’s a pathway for doing that for topographic data.
And so, right now we allow people to basically submit an application and say, “I want an account that has this community dataspace permissions” and then we allow you to upload and publish data.
Right now, again, funding sort of limits limitations but this is really mostly oriented towards research and education. So, if you flew your Phantom around your house and made a cool DSM of your backyard or whatever, that dataset’s interesting but it’s something that the National Science Foundation isn’t particularly funding a persistent storage of.
And so we do have to tell some people no at the moment, that we cannot accept their data just because once you put something on disk and associate a digital identifier with it, it’s suppose to be persistent into perpetuity and that becomes an expensive proposition, really, because in this case you’re giving away storage away for free forever. Again, I had to put some bounds on some of these things.
Dr. Austin Madson: So, I guess I won’t upload my mesh of my neighbor’s back deck, then.
Chris Crosby: Exactly. {Laughter} But if you’ve got data that were collected for or something that is broadly research-oriented or relevant to earth science, we’re certainly interested in hearing about it and I would encourage you to send us an email, basically, and describe the data and say, “This is why I think it’s interesting and why it should on OpenTopography” and we’ll work with you.
Dr. Austin Madson: Great, thanks for touching on that. I think you’re also affiliated, Chris, with a slightly newer project called OpenAltimetry. I checked out the paper; I think you’re maybe the fourth or fifth author, so I know you had something to do with that website.
And can you touch a little bit about what OpenAltimetry is and I know it’s kind of like this clearing house for ICESat and ICESat-2 and for our users who don’t know what that, it’s NASA space borne laser altimeters. ICESat-2 is still currently orbiting our planet and collecting a lot of really fantastic data.
So, Chris, do you want to talk a little bit about OpenAltimetry and how that maybe spawned from OpenTopo or how it didn’t?
Chris Crosby: Yes, sure. So, that’s a – is like a sort of sister project that was funded by NASA. We wrote a proposal several years ago now to give those two datasets—ICESat and ICESat-2—the OpenTopo treatment in a sense of making these datasets that are super interesting for all kinds of applications but were historically kind of hard to access, making them easier to find and visualize and work with.
And so, the team that built OpenAltimetry is essentially the same team that built OpenTopography from the technology side and then we had some collaborators who were cryospheric scientists with expertise in those two satellite platforms.
And so, yes, OpenAltimetry, they’re orbital laser altimeters, they look a little different than an airborne laser scanner type of dataset because they’re collected primarily along tracks so they’re basically orbiting the earth and collecting tracks of data. So, the data tend to be quite accurate in terms of elevation measurements but pretty sparse.
But they’re super powerful for things like measuring ice sheet change, so we look at ice sheet change in Greenland or Antarctica or something like that, you can do that quite effectively.
And so, yes, those datasets are available through OpenAltimetry, it’s got a really cool visualization in the frontend, it allows you to grab a subset and plot the data and look at the tracks and then you can download that data as a – and do things with it outside of the system.
That system was built by our team but then it’s since been adopted by NASA as a formal part of the NASA – it’s an Earth Data Access System, so it’s run by the National Snow and Ice Data Center in Boulder.
Dr. Austin Madson: Oh, I see.
Chris Crosby: And so we – it was built, we built it and then for a couple years it was operated by us and then over the last year or 18 months or something it’s been transitioned to NSIDC and so we don’t actually – we’re not receiving any funding to operate it anymore, we’re not involved. Those guys are running it now. But, yes, the technology was developed by us.
Dr. Austin Madson: Got it, yes. So, some of the back-end processing is similar with OpenTopo; I don’t think there are as many tools that you can utilize in OpenAltimetry like you can in OpenTopo, but feel free to correct me if I’m wrong there. But I guess…
Chris Crosby: Yes, it doesn’t have the same set of – because those data don’t look and behave the same way as like a laser scanner…
{Crosstalk}
Chris Crosby: …the idea of dynamically turning and gridding them to DEMs is not really like the way people work with those data. So, OpenAltimetry is more about easy discovery and visualization than it is about downstream processing. If you really want to use those data, you ultimately probably have to download them and build your own processing or pull them into some other package to do that.
Dr. Austin Madson: Well, data viz is really fun. Can you touch on what tools you’re all using? You said a while ago – is it Plotly? I guess, what’s on the backend for all the data viz stuff for the ICESat-2 and ICESat data?
Chris Crosby: ICESat-2 interface, it’s a good question of what exactly is running. There’s some kind of fancy optimizations going on to make the data render quickly because, again, it’s – I don’t know how many points are in the ICESat-2 catalog now, but it’s billions to trillions…
Dr. Austin Madson: Yes, it’s got to be.
Chris Crosby: …of photon and photon-ranging measurements. So, there’s some optimizations on the backend to make the retrieval of those data fast. You can actually look at the photons through the browser which is pretty cool and that’s something that the guys in San Diego, the San Diego Supercomputer Center colleagues developed.
In OpenTopo, our point cloud viewer is the PoTree tool which is something that is open source developed by Markus Shutz who is in Austria and that’s a tool that a lot of people are using to display point clouds in the browser. There’s a couple other similar tools out there, but we’ve been using PoTree for quite a few years.
Dr. Austin Madson: PoTree, I think, is really popular. I use it in a lot of my side project work and my research work, too, for viz stuff because sometimes other executable-based viewers don’t work on my personal laptops so it’s nice to have those HTML viewers.
What about, have you all incorporated anything of the new cloud-based point cloud COPC and things? Howard Butler, et al, and all of that?
Chris Crosby: We do a lot with cloud optimized GeoTIFFs which is a cloud data format for raster data, it’s a digital – most of the digital elevation models they are accessible through OpenTopography, you can also interact with as a COG on a cloud server.
And then on the point cloud side, as I said, we grabbed the USGS 3DEP data and the NOAA coastal lidar data. Both of those are in Entwine point teleformat inside of Amazon Web Services. And so we’re interacting with those.
But we have not yet migrated our catalog into one of those cloud-native formats mostly just because it’s a space that’s still evolving a little bit, I think. There formats are relatively new, (sounds like: Off-C) for sure is super-interesting, but for us it’s a pretty significant amount of work to migrate our holdings and I don’t think we want to be particularly early adopters in this space if the community decides that that format’s going to change again or something, we have to go back and do it again.
So we’re a little conservative when it comes to these things just again from a limited resource environment migrating a catalog and many trillions of points into a new format; it’s not something we undertake lightly. But certainly I totally see the advantages.
Our system is – for a lot of our workloads it’s really optimized around an (inaudible) query and deriving products and we can do that pretty effectively operating on data in LAZ format which is our primary archival format right now.
Dr. Austin Madson: Yes, thanks for shedding light on that and to start to wrap things up, Chris, do you want to talk a little bit about what new things are in development at OpenTopo? I know you talked about OpenTopo Plus, but are there any really exciting tools that you all are currently developing?
I know the differencing tool came out three or four years which was, I guess, kind of new but maybe not new anymore. Any new things going on at OpenTopo?
Chris Crosby: For us, the multitemporal topography piece is still pretty interesting. We’ve been working pretty hard to build workflows for allowing users to do topographic change detection inside the browser and that’s been stepwise over the years. We did it first for datasets that we hosted but in the last – oh, it’s been – maybe it’s been less than a year, you can now topographic difference across the USGS 3DEP datasets and with the lidar data.
It’s the same set of tools but these large-scale datasets that are managed by others have a different set of complications associated with them and largely associated with coordinate systems and things like that. And so, we spend a lot of time trying to figure out how to do things like dynamic reprojection and backing out (sounds like GO) models or applying (sounds like: GO) models so you can actually meaningfully difference across these datasets.
The USGS 3DEP catalog is great; it’s relatively consistent but things like coordinate systems are not standardized. If you’ve got two overlapping polygons and you want to compute the change across them over what are five to ten years of time between those datasets, you really have to make sure you solve for the coordinate system problem.
And so that’s actually – it took us a lot of time to figure it out and was basically chasing metadata and trying to understand what’s really going on with these datasets so we can then erect them or align them such that you can actually compute the difference because the calculation difference is relatively straightforward. It’s basic raster math, right?
Dr. Austin Madson: Right.
Chris Crosby: But you get the coordinate systems wrong then you’re going to get erroneous results and so that’s problematic.
So, I think the topographic difference thing is still one of the more interesting things going on inside of OpenTopography. It’s not – the number of users that use those tools is relatively low, it’s not driving a ton of users.
A lot of academic users are using – a lot of students are using it, but to us it’s one of the more interesting things because we’re getting to the point now where in a lot of parts of in the United States but also other parts of the world you’ve got multitemporal data. We think something like a third of the United States has at least two lidar collects over it now…
Dr. Austin Madson: Wow, that’s amazing.
Chris Crosby: …over the lower 48 so you can start computing topographic difference. And especially at large scale.
We had a paper last year where we actually computed the topographic difference for the whole state of Indiana based on two public domain lidar collects; that’s our one-meter resolution and just that exercise alone is super-interesting, it’s a big data problem.
So, it was super interesting to see what this reveals, what is changing. At the state scale you see all kinds of real interesting things: you see anthropogenic change, you see vegetation change, you see change in the earth’s surface, like the (sounds like: bearer) of topography. It’s just a lot of interesting signal in these datasets.
Dr. Austin Madson: It brings to mind – I’m kind of based in Wyoming and sometimes in L.A. and there’s a lot of these self-driving and various level 3, 4 and 5 driving automobiles that are collecting an insane amount of point cloud data and imagine getting – yes, I think it’s amazing getting access to a whole state of multitemporal lidar data, but imagine getting access to daily temporal street-level data in various urbanscapes and just the questions that you could pose of that data is really mind boggling.
Chris Crosby: Yes, for sure. Years ago we wrote an abstract called like “The Age of Ubiquitous Point Clouds” and you’ve got these really high-resolution datasets whether they’re coming from airborne laser scanners, UAS-based sensors, mobile mapping systems, self-driving vehicles.
There’s a ton of this data out there and those kinds of datasets are not particularly well-accessible in the public domain, there’s no place to go find all the data coming off of autonomous vehicles.
I’m not sure there ever will be but there are some interesting applications for those kinds of things, especially if you have super high-frequency repeat especially in urban environments along the road corridors and other kinds of things.
Dr. Austin Madson: Yes, and then to go back a few minutes ago, I think you mentioned a really good point about corner reference systems and vertical datums and things and so for listeners out there who are just kind of getting into this space, make sure you really spend a lot of time on metadata and CRS and getting a feel for that because I think Chris and other folks can attest to dealing with some of these issues and it’s getting into the weeds of it is really important. Otherwise, like Chris said, you’re going to get a lot of erroneous results when you’re doing a lot of this processing.
Chris Crosby: Yes. And in particular we worry about this a lot because, again, we – OpenTopography is a web-facing tool that anybody on the Internet can utilize and so we can’t make assumptions about people’s knowledge about some of these things. We get all kinds of people who are using these datasets and they certainly do not understand differences in orthometric versus (inaudible) elevations for example. So we get, “I just compared this dataset I downloaded from you guys to this other set and they’re off by 30 meters.” That’s a common email that we get.
So, certainly for us we spend a lot of time worry about those kinds that do seem like they’re in the weeds but ultimately are pretty impactful.
Dr. Austin Madson: Yes, I agree. And to wrap things up, Chris, you’ve been in this space for many decades now. What are you most excited about? All this 3DEP stuff is coming out. You said one-third of the contiguous United States has multitemporal lidar data. What kind of things that you’re seeing this space are really exciting for you and keep you going?
Chris Crosby: A couple things. I think the multitemporal piece is really interesting, the fact that not just things like these large-scale, national-level mapping initiatives are creating, at least one collect, in many cases multiple collects with landscape.
But then the fact that you can go out with a UAS with a laser scanner or a camera on it and collect other time slices over dynamic features is really interesting. So you can start nesting these datasets from different sensors by difference resolutions at different repeat times to really start looking at processes. So, I think that’s super exciting.
From more of a technology perspective, I think the move towards publication of these datasets in these more cloud-native types of environments, so the work with the USGS and NOAA and Howard Butler and other folks are doing to put these datasets into AWS Public Datasets is really powerful because it enables groups like us to come along and apply our services on top of those datasets. So, it’s a step in the right direction towards federation and truly building a one-stop shop access to these types of datasets.
So, I think that’s super cool and then just also the emergence of these kinds of datasets in other parts of the world; there’s a lot of the world where high-resolution data would be super powerful but is not – it’s just slowly starting to become available and seeing these datasets become accessible and – that are developing roam and is something that I think I hope we’ll start to see more of it.
It’s still quite expensive to fly a (inaudible) so it’s going to take a while but slowly but surely we’re starting to see these kinds of interesting datasets coming out of places that have not historically had access to this kind of data.
Dr. Austin Madson: Right. I really want to thank you, Chris, for taking the time out of your day and chatting with us.
Chris Crosby: Of course. Thanks, I appreciate the invitation.
Dr. Austin Madson: Yes. And so that’s all we have for this episode and thanks, everyone, for tuning in. I hope everyone was able to learn something new and, if you haven’t already, make sure to subscribe to receive episodes automatically via our website or Spotify or Apple podcasts. Stay tuned for other exciting podcast episodes in the coming week or weeks and take care out there.
Announcer: Thanks for tuning in. Be sure to visit lidarmag.com/podcast/ to arrange automated notification of new podcast episodes, subscribe to newsletters, our print publication and more. If you have a suggestion for a future episode, contact us. Thanks again for listening.
This edition of the LIDAR Magazine Podcast is brought to you by rapidlasso. Our flagship product, the LAStools software suite is a collection of highly efficient, multicore command line tools to classify, tile, convert, filter, raster, triangulate, contour, clip, and polygonize lidar data. Visit rapidlasso.de for details.
THE END