A 2.731Mb PDF of this article as it appeared in the magazine complete with images is available by clicking HERE
As surveyors establish laser scanning as a new method of data collection, its applications continue to grow. In many projects, laser scanning data serves as a platform to link data and information for decision making. Because of the diverse range of data needs and requirements, surveyors are often asked to provide information in specific formats requested by their client so that it can be easily integrated into downstream workflows.
The phase between the acquisition of a scan and delivery of the primary data products can be considered as pre-processing. Unfortunately, the pre-processing phase contains several tedious and time consuming operations. Typically, surveying managers have a general idea of the workflow of the overall project, but are not directly running the software. They are often directly responsible to oversee the execution of the pre-processing workflow to ensure the desired output is produced in the correct format, like putting together the pieces of a puzzle. Hence, having a timesaving tool to perform all necessary pre-processing operations automatically can be highly beneficial. Batch programming is a solution to automatically go through the pre-processing steps and dramatically reduce time and eliminates user mistakes.
Many users of the powerful, widely used LAStools can testify how scripting has improved their workflows and reduced bottlenecks in processing airborne lidar data.
Lidar workflows
As you are likely aware, lidar data is a powerful tool for the production of high resolution, accurate, geospatial information for 3D modeling of an environment. As a result, lidar has been used for a variety of applications, including industrial plant management, transportation asset management, cultural heritage, coastal erosion, rockfalls, landslides, and seismic displacements. Although workflows will vary depending on the application, the most common, overarching preprocessing steps for an example slope stability application of terrestrial lidar data are presented as follows:
1. Registergeo-Reference
Individual scans need to be linked into a common coordinate system through registration, which can be done through a variety of techniques. Geo-referencing is often done to provide real-world coordinates in the coordinate system of choice. In this example, we will use the PointReg program (Olsen et al., 2011), which was developed at the University of California, San Diego and is being constantly improved by the Oregon State University (OSU) Civil and Construction Engineering Geomatics group. This program is highly automated with minimal user interference.
2. Filter (Artifact Removal)
Scan data must be cleaned of points that do not have meaningful information such as noisy points from moisture in the open sky and undesirable moving objects (passing cars or people). In some civil engineering projects the desired goal is to produce Digital Terrain Models representing the bare ground, so a vegetation filter is another tool which can be used to erase the vegetation and other artifacts from multiple returning signals. In some cases, a combination of manual and automated vegetation removal is needed, particularly for steep slopes.
3. Produce DTMs
Digital Terrain Models (DTMs) are created from the point cloud data by gridding or triangulating the point cloud. DTMs provide reference models to perform calculations and analysis, which can be difficult to analyze using the point cloud alone. The OSU Geomatics custom computer program for DTM generation is named "BinNGrid" which , employs a vegetation filter (Olsen 2011). This code inputs a text file and converts it to binary for fast processing. It then assigns points into cells based on a desired cell size. There are different modes for estimating a Z value for each cell. The results of this code are two output files (a header file and a binary raster file) suitable for other spatial software packages like ESRI’s ArcGIS.
The above 3 steps appear to be relatively simple to implement. However, there are a lot of substeps that are required for those processes to occur correctly. Often, because lidar data can be difficult to work with due to its size, the data are often divided into tiles for processing. A more detailed lidar data pre-processing procedure for a slope stability study within a tile termed "LL01" is presented as follows:
Transfer raw data from USB drive to a secure backup
Create a folder for the desired area (e.g. figure 1, LL01)
Create subfolders for raw data (lidar points, GPS observations, …)
Transfer raw data to a hard drive (preferably solid state)
Filter by effective laser range (usually between 2 m to 250 m) while converting to binary
Align and merge the scans (PointReg)
Verify the outputs and ensure the data are satisfactorily geo-referenced
Import data into software of choice
Remove artifacts and clean noise effects (e.g., vegetation filter)
Export the scans as ASCII files (When possible, ASTM E57 is a preferred format to work with)
Create DTMs with different resolutions–(run BinNGrid with different modes, cell sizes)
Create a new folder for each increment. (e.g. LL01grids_0p20m for a 20 cm grid files for LL01)
Fill holes in the created DTMs
Generate derivative products (slope, curvature, hill shade , or different requested grids)
Determine surface roughness for different window sizes
Zip the resulted files and send with a report to the clients
The process is then completed for each of the tiles. Obviously, steps and procedures can be modified depending on the required applications and client report type. Figure 1 shows a sketch of the generated files and folders from the above lidar pre-processing workflow.
Given all of these steps and files to keep track of, one of the most challenging parts in the mentioned workflow from a manager’s point of view is how to arrange the executable codes in order and consistently follow a meaningful naming convention. Usually these steps are very tedious and prone to user mistakes. Often, different team members will do some steps in the process while they have no idea about the file arrangements and can see just a small portion of the project rather than the whole picture. In the other words, most of the time only the project manager can see the logic and the connection between data and folders.
Now imagine the manager has to lead their crew to perform the same operations as figure 1, on a number of files. Think of the time and overburden of operations required to perform this task. We leave it as an exercise to the reader to determine how many times an operator has to click to create a folder, run a program, rename, copy, paste, etc. for this example.
"Batch programming" is a solution to perform the operations of the pre-processing step automatically with limited user input or feedback.
Batch Programming
Sometimes working with a fancy userfriendly interface to perform a function, by route of shells, menus or windows, takes a lot of your time and freedom. Also it is highly prone to user mistakes. In these cases, the user ends up leaving the friendly window environment to work with the more efficient, lessforgiving command line (COMMAND. COM or cmd.exe) shell. By using the command line shell, the user types commands to tell the computer what they want the computer to do for them. These commands can be pre-written in a file so the user does not have to type repetitive commands. This is different from programming the computer, where the user directly communicates with the computer.
A text file containing command lines with ".bat" extension is called a batch file. Batch files are executable by the command interpreter. The ability of a batch file to combine multiple commands into one batch file "program" and customize how each command operates makes it a powerful tool for system administration. In fact, batch programming is the native programming offered by the Microsoft Windows operating system. A comparable version of the batch file is the shell script in Unixbased operating systems.
Like other scripting languages, batch programming can perform arithmetic and logical operations such as AND, OR, NOT, shifting and redirection operation and separators and grouping operators. Also it is able to understand "statements" in order to apply looping (FOR) and decision branching (IF).
Benefits
Some of the benefits of batch programming are listed below:
Batch files can process and store large amounts of input data (perhaps terabytes or more), access a large number of records, and produce a large volume of outputs
Batch files themselves are small (usually not more than 100 bytes)
Batch files can be prepared off-line. You can review them over and over until you are sure they are exactly the way you want, enabling you to correct errors.
It is simple to enhance a script or change options.
Batch files are easy to create, and only as complex as you want them to be, but they can perform many useful operations ranging from file backups to system configuration, quickly and automatically.
Batch files can call other external batch files, enabling one to mix and match processes without having to substantially rewrite scripts.
They can shift the time of job processing to when the computing resources are less busy.
It can run over networks, so a main computer (manager laptop) can run installed heavy programs on office workstations.
Batch programs are designed to process data with little or no operator intervention in contrast to interactive computing.
It makes the computation time very efficient by avoiding idling the computing resources with minuteby-minute manual intervention and supervision. (In other words, you can let a program run overnight to do the bulk of the processing; leaving the system free while you are at work).
Batch files can be executed quickly so you can login and run them remotely at any time the system is free.
It reduces system overhead by running the program only once for many transactions rather than running one program multiple times to process one transaction each time.
One does not need to store large volumes of data for the various derivative products. They simply can storearchive the primary data source and batch file and the derivative products can be quickly created again if they are needed.
Procedure
The OSU CCE Geomatics group acquires and processes data to support a wide assortment of collaborative research projects including: transportation, geotechnical, geological, environmental, hydrological, forestry, and ecology. In a significant portion of these research projects, one of the desired outcomes of the lidar data are maps, DTMs, and derivative products (e.g. slope) with different resolutions to enable further scientific investigation and analysis. The scientific aspects are the primary scope of their research, not the data collection and pre-processing.
Needless to say, the magnitude of tasks necessary for pre-processing, can give graduate students nightmares when the procedure needs to be completed systematically for large areas, multiple sites, or repeat monitoring surveys (Figure 1). Most wish for a "magic" red button to do all the procedure by one click so they can focus on their research questions rather than data processing!
After evaluating some commonalities between research projects and frustration when processing needed to be redone due to inconsistencies resulting from user error, we decided to produce our "red button" a batch file that can , save time by automating actions down to one simple click.
As Figure 2 shows, the red button can do all the pre-processing and data storage off-line over one computer or on-line in an office network over other workstations.
Figure 3 shows the same listed procedure with the same outputs (Figure 1) by batch programming. The only task that you have to do is run the red button by dragging and dropping the point cloud file into the red button if you do not need to vary your input parameters. Based on the purpose of the project, the program takes a set of data files as input, processes the data, produces a set of output data files, and copies the output data files to their relative folders.
All input parameters can be predefined through simple control files. The control files are editable and ensure the parameters are consistent. In this example project we used two control files, one for defining the desired naming convention and another one for input parameters such as cell and window sizes (CSV files 1 and 2). CSV files were preferred because we can visualize the hierarchical structure of the resulting data.
The first batch file performs the file organization. It reads the first control file and creates folders based on the information in the first control file. It reads the control file line by line and creates folders and subfolders in the particular directory. This batch process takes less than a second to complete, while it can be very time consuming when completed manually. It also ensures consistency, which can be difficult manually. This first batch file then calls the second batch file which reads information (input parameters) from the second control file and runs executable files in order to generate desired data.
The second batch file is the heart of the Red Button, which runs the various standalone programs in order and defines their input and the type of their output. It supports loops for when a program needs to be called multiple times as parameters vary. One can easily add or remove steps based on the desired outputs for a certain project. Just be careful to not eliminate the steps whose results you need for your final output data!
A lot of progress can be done in this stage to make it even faster or more "magical" For example, developing a . parallel algorithm for the procedure and running the programs on different workstations simultaneously is one of the interesting improvements that could be added in the "Red Button" .
Finally, it is worth mentioning that running "Red Button" for each section just takes less than 5 seconds to implement compared to the past where graduate students have had this headache for days!
Conclusions
Based on the advantages of batch programming, clearly it could be widely used in lidar computation that involves bulk data processing, a tedious series of operations and very complex hierarchical folder structure. Much of this can be completed without user interaction through batch scripting. Often, someone who is proficient can write a batch script in the same or less time than it would to manually complete the processing on a single dataset.
Rather than work through a series of tutorials and classes, the easiest way to learn a programming language is to have a purpose and end product in mind. Then you can use resources such as "en.wikibooks.org/wiki/ Windows_Batch_Scripting" to help guide you through developing a script. It is often best to start simple and then add complexity.
Think of a possible application in your business. Say you want to develop your own procedure. After you write your batch file to perform a function for a specific project, you can use it later for your future projects with the same purpose as many times as you want, without additional work. It can be adapted quickly, when needed for other applications. This fact again shows that a batch file is not only the redirection of the input to the command prompt, it is a powerful, yet simple programming language.
You will quickly realize that it is simpler to write a batch file than it appears. Batch programming gives you a good sensation of power over the computers and you will realize how much control you have on your computer if you write one!
Acknowledgements
Development of this tool was partially funded by PACTRANS and Alaska DOT. We also acknowledge the support of Leica Geosystems, David Evans and Associates, and Maptek I-Site in providing software used in various research projects.
References
Olsen, M.J., Johnstone, E., Kuester, F., Ashford, S.A., & Driscoll, N. (2011). "New automated point-cloud alignment for ground based lidar data of long coastal sections," Journal of Surveying Engineering, ASCE, 137(1), 14-25.
Olsen, M.J., (2011). Bin `N’ Grid: A simple program for statistical filtering of point cloud data, Lidar news 1(10), lidarmag.com, posted online: May 29, 2011.
Hamid Mahmoudabadi is doing his last year of Ph.D in Geomatics in the School of Civil and Construction Engineering at Oregon State University. His research interest is merging different areas of scene like Geostatistics, Machine Learning, and Computer Vision for Geospatial Analysis on Terrestrial Laser Scanning Data and GPS Optimization.
A 2.731Mb PDF of this article as it appeared in the magazine complete with images is available by clicking HERE