LiDAR and Data Compression

Have you thought about how much LiDAR data you are going to collect next year?

Yes, I know, the cost of hard drive storage keeps going down and down but the amount of data you collect keeps going up and up. Point cloud densities are increasing, and the number of fields stored for each point are increasing, too (RGB anyone?).

Sooner or later, most people consider the idea of using some sort of compression scheme to cram more xyz points onto their servers and so this article is about LiDAR data compression. This column is about the open source world, so as you might imagine were going to talk about an open source library for doing LiDAR compression.

But first, lets look at some of the factors to consider when choosing a compression technology.

Trade-offs & Use Cases

Compression usually just means storing a bunch of data in some fiendishly clever packed format that takes up less space than it would if it were left unpacked. Some compression algorithms are lossless, meaning that you can get back the original data you started with (think WinZip). Other algorithms are lossy, meaning you get back only an approximation of the original data (think JPEG) sometimes a very good approximation, admittedly, but still not identical to the original bits. The trade-off is clear: if youre willing to sacrifice some fidelity, you can achieve much higher compression rates.

Modern compression algorithms can be very complicated, requiring a noticeable amount of CPU power. This trade-off is clear too: the more work (computation) youre willing to do, the better compression rates you can achieve.

Lets consider two different possible LiDAR data use cases: archiving and visualization. In the archival use case, you are using your original LiDAR data to derive other products, like DEMs, to be used in your production workflows, and the original data is stored off somewhere for the long-term. In this situation, you will likely want to store that original data losslessly: it is a permanent record that should never be modified. By the same token, you dont plan on using that original data very often, since you have derivative products, so you can afford a little more time to do the compression and decompression work.

On the other hand, if your point cloud data is to be used largely for visualization to drape imagery over, to do rough measurements, to present graphical models to your customers then you probably dont need bit-for-bit accuracy: a lossy representation will do. But youre going to be working with this data frequently and maybe in an interactive 3D viewer, so performance is likely important.

Youll have to decide where your own use cases lie on the quality and performance continuum, but the trend seems to be favoring the archival situation for point cloud data. Visualization and similar workflows are more often done with derived products representing data that arent really point clouds anymore. If true, then, what the world needs is a lossless, reasonably efficient, point cloud compression scheme for long-term storage.

Enter LASzip

The venerable LAS file format stores points in a raw, uncompressed form. Fortunately, there is now an alternative: LASzip is an open source library (www.laszip.org) which implements the same point formats and data fields as required by LAS, but uses advanced compression techniques to store the data in only 10-20% of the space.

LASzip is already being used in some production environments, saving significant storage space without disrupting workflow efficiencies. LASzip files are supported in the open source world by the lastools utilities and the libLAS/PDAL point cloud libraries, and other vendor support can be expected in the future.

Of course, LASzip is not an official part of the LAS standard maintained by ASPRS. This puts it into the realm of de facto, as opposed to de jure, standards. Open source developers are strong advocates of interoperability and do not lightly invent new versions of formats but where the communitys need is evident, as in this case, such a move is justified. De facto standards are successful if vendors adopt them and the users find them helpful (often a chicken-and-egg situation). The open source path has historically shown to be an excellent way to develop and maintain bottom-up standards like this, especially as the geospatial world continues to move away from proprietary file formats and vendor lock-in.

The LASzip library is released under the LGPL license; this does require any changes you make to the LASzip code to be publically released, but as long as you use LASzip as a shared library, those licensing terms do not apply to the rest of your application code. Using LGPL for implementing a file format like this is a wise choice: it discourages unscrupulous persons from modifying the underlying algorithms to create incompatible versions of the format.

Looking Forward

There are lots of other file formats out there besides LAS. ASCII/XYZ is widely used, but is notoriously uncompressed; the new E57 file format was introduced earlier this year with a little compression support, but not much. The lossless compression algorithms used in LASzip could be used to add compression to those two formats as well.

Some work has been done towards supporting lossy compression of point clouds, but to date the results have been computationally inefficient (relative to the compression rates achieved), have incurred an unacceptable degree of quality loss, or have not been released as open, nonproprietary specifications.

It is certain that, despite dropping disk prices, the need for compression of LiDAR data will continue, and the open standards and open source communities will continue to work together to meet that need.