Evaluation of Structure from Motion (SfM) in Compact, Long Hallways

A 1.948Mb PDF of this article as it appeared in the magazine complete with images is available by clicking HERE

Structure from Motion (SfM) is an emerging technology which can generate 3D point clouds from a series of overlapping 2D images. Research and commercial interests in SfM technology are due to the minimal cost associated with 3D cloud generation from a personal computer and a consumer-grade camera. Competing technologies such as LIDAR have a high barrier to data acquisition due to the expense of calibrated sensors, knowledgeable technicians and processing software. However, LIDAR systems are more robustly calibrated and have other capabilities such as full-waveform diagnostics, multiple returns, and intensity measurements.

While SfM has the potential for rapid 3D data acquisition and modeling, it is important to note that for quality results, it is not simply a matter of snapping a series of pictures. Significant overlap is required and care must be taken to ensure that the scene is adequately covered.

To this end, the purpose of the study is to expand the body of knowledge regarding the strengths and limitations of SfM while simultaneously contrasting open-source SfM software performance to a commercial solution. We intentionally chose a challenging environment (the basement of Peavy Hall at Oregon State University, Corvallis, OR) to compare the geometric accuracy of SfM to LIDAR.

There are several key features of this basement that present a challenge to SfM algorithms. (1) The halls are too narrow for the traditional hierarchical photo sampling regime, in turn complicating the photo pairing. (2) The walls are the same color with a repeating texture pattern resulting from cinder block construction making unique feature detection difficult. (3) Hallways intersect at 90 degrees, which results in very few images from which to draw key points at this transition.

Equipment and Software
The equipment used in the study includes a tripod mounted FARO Focus 3D laser scanner and a Canon S100 point-and-shoot camera. We utilized FARO Scene ver. 4.8 for registration of the LIDAR scans and SfM processing occurred in both Visual SFM (opensource) and Agisoft’s PhotoScan (ver. 0.9). Model rendering and evaluation was completed in Leica Cyclone ver. 8.0.

Field Sampling
Fifty-four black and white pattern targets were placed on the walls of the Peavy Hall basement two to four meters apart. Targets at corners and ends of hallways were more numerous and placed semi-randomly to aid in the transition. The targets were both used for the registration of LIDAR scans as well as comparing the fit quality between the SfM and LIDAR derived point clouds.

Twenty-one scans were taken using the LIDAR to ensure adequate coverage of the four hallways. LIDAR horizontal angular sampling delta was 0.0360. We took 700 images of the basement hallways with the Canon camera. Orthogonal images of the walls (on both sides) were taken every 0.5m, and images looking down the hallways were taken every 1m. The camera was set to automatic mode which controls autofocus, white balance, aperture and shutter speed. Output images were 3000 x 4000 pixels and JPEG compression was activated.

Data Processing
Initial LIDAR registration was performed using FARO Scene. The software automatically identified black and white targets and performed the registration to combine the scans. The LIDAR point cloud registered together with an overall RMS of 0.002 m. Figure 1 shows the resulting point cloud and model of the Peavy Hall basement floor plan. Image processing utilized the commercial solution, Agisoft Photoscan, and a combination of open-source programs bundled in VisualSfM (VSFM). Targets were identified in the LIDAR point cloud within Leica Cyclone and local coordinates were exported to facilitate seven-way transformation of SfM models (translation, rotation and scale) in their respective software packages.

SfM software settings were adjusted after initial testing with the photo dataset. In the Agisoft Photoscan "preferences menu" alignment accuracy , was set to `high’ to ensure the best possible feature matching between images. Depth filtering was set to `moderate’ during sparse point cloud computation to prevent over filtering. Dense point cloud outputs utilized the `low’ density feature because `high’ produced point clouds and file sizes that are unwieldy for a project of this scope.

Initial testing with VSFM revealed limitations using the automated pairwise matching routine; instead the four photo sequential matching routine was used because it scans for feature matches between a selected photo and the next four photos before moving to the next photo in the sequence.

In both software packages, the SfM workflow was nearly identical producing similar sparse cloud visualizations. A complete explanation of how SfM produces point clouds is found in Turner et al. (2010) and Lowe (2004). Due to the limitations described below, only the interior walls of the four hallways were modeled in SfM. After modeling, targets were marked on each wall then referenced and transformed to the local coordinate system established by the LIDAR scans.

Limitations of SfM Software
Simply pushing 700 images through SfM software and expecting 3D model of the four hallways proved to be a lofty goal. The complete dataset was too complex for either SfM program to interpret and the SIFT routine took 53 hours in Photoscan. Processing times in VSFM were not recorded but were substantially reduced relative to Photoscan due to the use of a sequential matching routine. The initial result was a model with no discernible structure.

Expectations were reduced, and models of individual hallways were attempted. This reduced model also proved difficult for the SfM packages we tested and took more than three hours. Photoscan performed significantly better than VSFM, producing an incomplete but coherent model with two parallel hallway walls.

The most likely explanation was the homogeneity of the wall surface. Homogeneous surface without distinct features is a known perturbation to the SfM algorithms (Dandois and Ellis 2010). We further reduced our testing objective to independently model the four interior hallway walls using 217 of the 700 photos. The result was the best balance between manual intervention, processing time and the production of comparable models.

Models produced by PhotoScan were nearly complete for each hallway and processing took less than one hour apiece. Models produced by VSFM were significantly lower in quality, especially in the north and east hallway where the similarity between images was high due to scene heterogeneity (Figure 2). One intriguing contrast between the two modeling programs is that VSFM seems to have a bias for detecting the mortar joints in the brick wall. Both models suffered from the effects of error propagation as evidenced by the curvature in Figure 2. Curvature is partially corrected by transformation to a local coordinate system (not shown).

Performance Comparison
Point Density: A comparison of point densities from both sparse and dense clouds revealed significant differences between software packages (Table 1). For example, both packages modeled fewer points in the east hallway than the west, although the west hallway is the same length. This disparity is due to the east hallway interior wall being bare for half of its traverse, whereas the west hallway contained varying structure in the form of wooden baseboards down its length (Figure 2).

Photoscan modeled the full length of the north hallway, but point density reduces halfway down where the wooden baseboards stop. Similarly, VSFM cuts off the north hallway after the base boards terminate, resulting in a shortened model.

Accuracy: A seven way transformation was conducted by referencing four to five targets on each modeled wall to the LIDAR coordinate system within the respective modeling packages. The resulting RMS error of the SfM target locations relative to LIDAR are presented in Table 1. Photoscan consistently produced better fits than VSFM. Insufficient targets were visible in the point cloud for the east hallway interior wall model produced by VSFM so the RMS error could not be accurately computed. The North hallway RMS error for VSFM is high for similar reasons. Although a larger expanse of the wall was modeled, the image sequence was not reproduced properly, resulting in out-of-place control points.

As found in a current study (Raugust and Olsen–In Preparation), SfM has a seemingly natural niche augmenting LIDAR models due to its ability to fill in gaps in less-dense LIDAR clouds for improved efficiency and cost. Figure 3 shows a rendered sample from the south wall of the LIDAR cloud and both SfM models and the comparison demonstrates the potential quality gain that could be realized by augmenting LIDAR with SfM. PhotoScan would have been capable of producing an even clearer reproduction; however, we experienced hardware limitations handling the highly dense output (over 150 million points) for the south wall.

In this evaluation, it is important to consider that SfM and associated software algorithms are experimental, developing technologies so results can vary significantly between datasets and packages used.

The resulting accuracy of our VSFM models was unimpressive but PhotoScan performed admirably enough to warrant further inquiry regarding optimal application scenarios. Additional testing would be required to assess the accuracy to cost per pixel ratio between LIDAR and SfM.

However, 0.5 to 1 m RMS errors are unacceptable for most engineering applications considering the distances in the study were only 2040 meters in length. It is very likely that both VSFM and PhotoScan results could be improved through manual interventions that include rigorous point filtering and manual photo matching; however, more intervention means more cost in time and training.

Despite the increased popularity of using SfM in terrestrial and indoor applications, our study highlighted a few limitations of SfM technology in indoor environments. The low and variable accuracy compared to LIDAR and the wide variability of quality between SfM software suggests that it is still immature technology. However, just like LIDAR, SfM is ever-evolving and will no doubt continue to improve in the years to come.

We thank Leica Geosystems for providing licenses for the use of Cyclone v8.0 software for this study and in the 3D laser scanning courses at Oregon State University. We also, extend thanks to Agisoft for providing an academic license of Photoscan to Oregon State University.

Dandois, Jonathan P., and Erle C. Ellis. "Remote Sensing of Vegetation Structure using Computer Vision." Remote Sensing 2, no. 4 (2010): 1157-1176.
Lowe, David G. "Distinctive image features from scale-invariant keypoints." International Journal of Computer Vision 60, no. 2 (2004): 91-110.
Turner, Darren, Arko Lucieer, and Christopher Watson. "An Automated Technique for Generating Georectified Mosaics from Ultrahigh Resolution Unmanned Aerial Vehicle (UAV) Imagery, Based on Structure from Motion (SfM) Point Clouds." Remote Sensing 4, no. 5 (2012): 1392-1410.

Jonathan Burnett is a graduate student at Oregon State University’s College of Forestry Department of Forest Engineering, Resources and Management.
Richard Gabriel is a graduate student at Oregon State University’s College of Forestry Department of Forest Engineering, Resources Management.
Michael J. Olsen is the Inaugural Eric HI and Janice Hoffman Faculty Scholar in the Geomatics program in the School of Civil and Construction Engineering at Oregon State University.
Michael Wing is an assistant professor at Oregon State University’s College of Forestry Department of Forest Engineering, Resources Management.

A 1.948Mb PDF of this article as it appeared in the magazine complete with images is available by clicking HERE