Automated Feature ExtractionThe Quest for the Holy Grail

A 688Kb PDF of this article as it appeared in the magazine complete with images is available by clicking HERE

It’s clear that we are in the midst of a geospatial revolution. Commercial high-resolution satellites, such as WorldView-2, can image nearly 1 million square kilometers per day. Airborne LiDAR collections over major cities yield data sets consisting of billions of points. Mobile LiDAR sensors are now capable of collecting millions of points in a single second. What’s less clear is how much of the data driving this revolution is turned into meaningful information. Automated feature extraction has long been considered the Holy Grail of remote sensing, but for decades there has been relatively little to show for the untold millions, perhaps even billions, of dollars that were invested in this technology. Some of the failings can be attributed to the limitations of the sensors at the time, but equal responsibility lies with the methods employed. LiDAR, particularly when combined with multispectral imagery, has the greatest potential to advance automated feature extraction. Unfortunately, the pixel-based digital image processing techniques most of us learnt in our college remote sensing courses are not effective for extracting information from a combination of imagery and LiDAR.

Humans are extraordinarily adept at identifying features in remotely sensed data sets thanks to our cognitive abilities. Dr. Charles Olson first identified what are now commonly known as the "elements of image interpretation" in 1960. Olson proved to be far ahead of his time; numerous studies in the cognitive sciences now support his conclusions that humans rely on a combination of spectral, geometric, and contextual properties to derive information from remotely sensed data. For much of the past four decades, approaches to the automated classification of images have focused almost solely on the spectral properties of individual pixels. Initially, this approach made sense; processing capabilities were limited and pixels in the early satellite images were relatively large and contained a considerable amount of spectral information. Yet pixel-based approaches were only marginally successful, and over time 80% became an accepted accuracy standard in the peer reviewed literature. Of course, this dismayed decision makers who were left with land cover maps that were 20% wrong. A good many of us in the remote sensing community, myself included, continually said that more accurate classification techniques were just around the corner in the form of future sensors with more spectral bands. We thought that using hyperspectral imagery, in conjunction with comprehensive spectral libraries, would enable us to classify any material with an extraordinarily high degree of accuracy. The hyperspectral revolution never took off and the technology, while valuable for certain applications, is costly, challenging to work with, and thus largely remains a niche tool. I believe the critical mistake that we made was thinking that advances in sensor technology would solve all of our problems instead of questioning our approach to automated classification. Although Olson’s elements of image interpretation were taught in every introductory remote sensing course somehow we never thought to ask, "Why are humans so successful at this?"

One of the great strengths of the human vision system is that we can perceive depth in 2D imagery. Fusing LiDAR with imagery effectively exposes depth, and over the past few years a considerable amount of attention has been paid to point and pixel fusion, in which the spectral information from the image pixel is combined with the structural information from the LiDAR point. However, using height as nothing more than an additional digital number to apply the same old image classification routines is short sighted. Such an approach is completely counter to human cognition, which relies more on the spatial of arrangement of pixels/ points to extract features extraction than the individual point/pixel values.

In a very short time the LiDAR community has succeeded in harnessing the spatial information in the point cloud. Algorithms, such as those used in the Terrasolid and TIFFS software packages, are extraordinarily effective separating ground from above ground returns. Grid statistical approaches, such as those available within Quick Terrain Modeler, can serve as a proxy for the type of complex contextual relationships the human vision system is capable of discerning (Figure 1). Years of research into LiDAR point segmentation are finally bearing fruit in the form of Autodesk Labs’ shape extraction technology. Perhaps the greatest breakthrough from a data fusion perspective is the ability to incorporate LiDAR into object-based image analysis (OBIA) workflows. OBIA techniques are widely considered to be vastly superior to pixel-based approaches. By grouping pixels and points together into a connected network of objects, spectral, geometric, and contextual information can be harnessed during classification process. eCognition, the first commercial OBIA software package in 1999, added support for LiDAR point clouds in 2009 and results from object-based data fusion approaches to feature extraction yield accuracies are almost on par with manual methods, even in complex urban environments (Figure 2).

While LiDAR adds tremendous value from a feature extraction standpoint, the methods are just as important as the data. There is not single software package that does it all, but the experienced analyst now has the tools to automate the extraction of features using a combination of LiDAR and imagery with accuracies that far surpass those that were achieved using pixel-based approaches on imagery alone. These techniques, which offer substantial cost savings when compared to manual methods, will help to bridge the gap between data and information.

A 688Kb PDF of this article as it appeared in the magazine complete with images is available by clicking HERE