Open Topography is an online portal for imagery and lidar data, that lets users create their own deliverables based on that same data.
Christopher Crosby, co-founder of Open Topography, explains all about this platform which finds itself at the intersection of GIS, geoscience, and IT, and how different stakeholders are finding new applications for the Open Topography and how to keep making data accessible with a stable business model for the future.
Open Topography is an online portal or clearing house or spatial data infrastructure for topographic data. Through a web-based interface, it provides access to everything from global DEM data like Shuttle radar topography mission (SRTM) data sets all the way through to state and national lidar collections.
Apart from a data archive, Open Topography offers tools that can be run dynamically on the hosted data to generate produces.
One of the co-founders of Open Topography is Christopher Crosby, who was introduced to lidar as an intern at the USGS (United States Geological Survey) in the year 2000. This was still early days for lidar, as there was not much support in most mainstream geospatial packages.
For Crosby, working with hundreds of millions of lidar points in ASCII format was an eye-opening experience, but also challenging.
Collaborating with San Diego Supercomputer Center (SDSC)
In 2009, Crosby got to work at the University of California in San Diego with National Science Foundation funding from an advisor from Arizona State University. This is where Open Topography started as a collaboration project at the intersection between GIS, geoscience and information technology, and with the San Diego Supercomputer Center being the lead institution.
Crosby was able to build crude web based lidar data processing systems, so users could upload an ASCII file and receive a Digital Elevation Model (DEM).
“We tried to build interactive web-based interfaces and put them behind things such as ArcIMS. The team at the San Diego Supercomputer Center (SDSC) at the University of California did lots of prototyping, iterating and building more mature systems. Over time, it became clear that we were going in the right direction. We also seemed to building something that fit a critical need, which was delivering lidar collections for earth science applications to users, for example mapping landslides, volcanoes and the San Andreas fault”, says Crobys.
Open Topography datasets and raster and lidar file formats
Open Topography today offers global and very homogeneous raster datasets that cover most of the earth. This means that the user can make a selection anywhere on the earth and get back the underlying DEM data, as well as derivatives from that data.
However, lidar datasets tend to be more project-specific and are treated as such, says Crosby: “Things get messy when you try stitching datasets collected from different time periods. There are also issues with coordinate systems, datums and differences in processing and quality control, so that’s why we manage those datasets on a case-by-case basis.”
Open Topography recently adopted the Cloud Optimized GeoTIFF (COG) format. A COG is a regular GeoTIFF file, aimed at being hosted on a HTTP file server, with an internal organization that enables more efficient workflows on the cloud. Open Topography took their whole raster archive and converted it to COGs, so that their on-demand services now produce COGs.
Something similar is being developed for point clouds called Cloud Optimized Point Cloud (COPC), defined by Howard Butler as a file format that allows for single-file storage of the most common container format with support for spatially accelerated incremental remote access. “The idea behind it is to be able to dynamic streaming from a static set of point cloud files sitting on disc. It should enable a lot of interesting things once it’s complete, but it’s still early days”, says Crosby.
Open Topography’s Data upload and data access policies
There are two ways to get data into Open Topography.
The first is to upload datasets through a web browser. These need to be relatively small, as you can’t upload hundreds of gigabytes of data through a web browser.
The second way is by pulling data off of a remote resource or via hard drive, explains Crosby: “For both workflows, what happens next is that we run it through ingestion process on the backend and do file-level quality control. But we cannot check everything, for example how good point cloud qualifications are or if there are flight line misalignments. That level of quality control is very labor intensive, so we don’t do that. We take the data as it was delivered and hope someone else upstream of us did some of that quality control. However, every data set comes with a survey metadata report, so people can check how the GNSS was collected or what kind of ground control was used.”
Some of the data on Open Topography has restricted access, says Crosby. “We believe strongly in open data and open data access, but in specific cases, we’ve been asked by the research community to provide to provide access on a case by case basis.”
An example of this is the archeological community, who have done lidar surveys in places in Central America and Southeast Asia. As these data were collected with funding from the NSF, they are inside of Open Topography, but access is restricted because of what is visible in the data and the potential for looting of such sites.
Another case where data access was restricted by Open Topography was where data was collected for academic purposes, but where a research group hadn’t yet submitted their publication. “In that case, we do a temporary embargo until the publication is released.”
Open Topography data deliverables
Users of Open Topography can create multiple products based on the data that’s being uploaded to the platform. With regards to lidar data, the first thing that people typically do is create a raster from lidar points, such as a DEM or a DTM, for visualization purposes.
The platform also provide raster-based visualization, such as hillshades and slope maps. These are produced as browser-based images or KMZ-type files.
Additionally, more sophisticated processing tools exist, such as hydrologic routing on top of topography. Crosby explains how such a workflows looks like: “You can select a boundary box to select an area of interest on the map, rasterize that data and then route water over the topography and calculate the contributing area, as well as other basic kind of hydrologic metrics.”
Another recent addition is a set of change detection tools, based on multi temporal data analysis. “This is a new area for us, as we’re getting to the point where we see places where there are multiple data sets sitting on top of each other, so you can start looking at change. We now offer tools to do topographic differencing on-demand in the browser, which is pretty cool.”
Open Topography is free to use
At the moment, Open Topography is entirely free to use, thanks to a generous sponsor, says Crosby: “The NSF and specifically the Geoscience Directorate of the NSF are covering the cost of putting data into the system as well as the cost of the user interfacing with it.”
There are two ways of user access: either as a totally anonymous user or through an account, which grants the user full power access to the data in the form of data download and data processing.
Open Topography use cases
Looking at the different user groups and use cases of Open Topography, the scientific community represents roughly a third of its user base. Although Open Topography was funded for a specific academic application, it has taken on a life of its own because the datasets are useful for things beyond the original application.
For example, there are now a lot of commercial sectors users, from consultants who are doing engineering or urban planning. Another interesting use case is the creation of immersive visualization elements, either by hobbyists or commercial video game developers.
Finally, there’s 3D printing, so for example rock climbers are using Open Topography to pull out their favorite locations and use them to 3D print those.
There are different ways in which the team behind Open Topography communicates with their user base and can keep track of their individual use cases. First of all, there’s a power user opportunity where a user fills out a form and explains why higher limits for data access and processing limits are needed. Second, Crosby and his team receive email from groups of individuals who want to publish Open Topography and want to clarify licensing terms or acknowledgement, as well as many one-off emails from people asking for technical support.
Partnerships with Open Topography
Crosby admits there’s a tension between Open Topography’s relatively narrow funding source with its academic emphasis and its many non-academic use cases: “Something we’re constantly trying to navigate is building a sustainable funding model around Open Topography that acknowledges the diversity of its user base, an example of this being a partnership with Land Information New Zealand, which is the public service department of New Zealand charged with geographical information and surveying functions.
The partnership means that they leverage our system, but they cover the cost of data storage and distribution for their datasets because they value the return on investment that Open Topography enables.”
Crosby adds that Open Topography has always avoided charging the end user, as they believe in open data and open data access. “However, we’re constantly trying to figure out ways to navigate the space between open data access and on-demand computing at a large scale, which can get relatively expensive.”
Partnering with groups who have a mandate to make data public, such as Land Information New Zealand, is a way leverage this problem that “open” data doesn’t necessarily mean free, says Crosby: “We’ve emphasized with Open Topography that handing our users raw data doesn’t solve their problem. Our processing suite makes data easier to use and therefore drive greater value from the data, but that doesn’t come cheap. That’s why we’ve made the case of partnering with such groups. Also, it’s probably more cost-efficient to build such partnerships with a party that runs the platform, buy into it and leverage it, instead of building it yourself, which in the case of Open Topography took more than fifteen years of prototyping.”
Does Crosby consider Google Earth Engine a threat to Open Topography’s business model, in the sense that it’s free? Crosby replies that although there’s a subset of the academic community that uses Google Earth Engine quite heavily for photogrammetric data access and processing as an alternative to Open Topography, it doesn’t offer any of these capabilities for lidar. “The last time I’ve checked Google Earth Engine, it was still very much raster centric. The point cloud access is what makes Open Topography unique in that perspective.”
Point cloud challenges
With the anticipated increase of lidar collection in the coming years as a result of sensors becoming ubiquitous, which challenges does Crosby expect to see? He starts by distinguishing different type of point clouds, with one the one hand the consumer-type point clouds created with the latest iPads and iPhones, and well-organized and controlled datasets from industry-standard national mapping programs: “Different datasets probably need to be managed in a different way. Also, there’s a lot more data coming, but whether all that data needs to end up in a big community archive or not is an interesting question.”
The number one research question for Crosby is how the industry will do data fusion and analysis across different kinds of datasets: “When all those data sets look and feel differently and are collected with different sensors and technologies, they will have different accuracies and errors associated with them. We’ve spent a fair amount of time on doing change detection between two airborne lidar collections, for example, which is non-trivial. But when you start introducing different datasets collected with different platforms, things become even more complicated.”
Solving big data problems is another challenge waiting to be solved, such as computing the change of a whole state that’s been flown by lidar twice. “We now have the data access and the workflows to do it, which is definitely a growth area for point cloud data analysis.”
Looking back and forward
Looking back at the early days of the initiative, there are two things stand out that could have been managed better, says Crosby. “In hindsight, there was some low-hanging fruit that was easy to pick off that we didn’t go after right away that we probably could have, such as really good global topographic datasets with a 10-30m resolution. These really benefit from an Open Topography treatment, even though they’re less challenging to deal with compared to lidar point cloud data.”
Another lesson that the team behind Open Topography learned over time, is providing good user support and being responsive to users. This means that apart from answering individual user requests, providing short courses from everything from working with the data downstream of Open Topography to using all kinds of software packages to do data analysis. “We’ve been around long enough that we’ve had graduate students come through Open Topography and take a short course ten years ago. Now they’re faculty members at a university and write papers about lidar data processing. It’s rewarding to see that full life cycle of training and seeing them go and do interesting things with it.”
To be able to continue with Open Topography’s mission of providing open data access, processing and creation of data deliverables, there are a couple of challenges to be met. Global data access is an important one, as are building new tools and sustainable business models to keep doing things at scale. “Because data storage is a primary cost, decentralization of data into an Amazon S3 bucket enables us to do federated access in a way that enables scalability. For example, we are now able to grab data from USGS’ 3D elevation data and pipe it through our processing tools without having to copy all that data”, concludes Crosby.