Random Points: Generalized Data

A 1.731Mb PDF of this article as it appeared in the magazine complete with images is available by clicking HERE

In the last Random Points article, I discussed the overview concepts of data and sensor fusion. I promised at the conclusion to continue with a fusion technique I call “Just in Time” (JIT) product generation. However, I thought it would a useful exercise to first discuss data generalization.

In the mid-1980’s I had the privilege of participating in a massive Defense Mapping Agency (DMA) development called the Mark 90 program. The DMA was producing a wide range of mapping products (e.g. Nautical Charts, Harbor and Approach Charts, Topographic Line Maps and more) essentially as one-off products. Data were collected from a wide variety of sources to create thematic map separates. These separates were transferred to film at a one-to-one (relative to the printed map size) scale and used in printing physical maps; there were virtually no digital maps at this time. You can imagine how difficult and time consuming this process could be. Additionally, the DMA had to stockpile literally pallets of maps in warehouses for dissemination to the user community. If a change was deemed necessary, the map had to either be reprinted or a “red-line” separate produced for overprinting of the map.

The DMA embarked on a huge series of modernization programs aimed at creating a “map factory” that could produce maps in a JIT fashion. The idea was to build a world-wide database of base information and use this database in a series of software processes to create map products. The database was called the Mapping, Charting and Geodesy (MC&G) database, reflecting its wide purpose.

Imagine the machinery of such a system. Given a multitude of data sources, the mapping algorithms must pull together those sources needed to create a specific product, accurately combine them, create labels, render color schemes, and so forth. Just a few of the technologies involved include conflation, conflict detection and resolution, label placements, names resolution and generalization. The general idea is depicted in Figure 1.

This is an immensely complex task that even to this day (nearly 30 years later) has not been fully accomplished. However, the various attempts (beginning with the DMA programs) have created thousands of algorithms that take us ever closer to the generalized solution.

All of this is directly related to some difficult problems in representing the real world in point clouds. We are particularly interested in the “generalization” problem–that is, how do we accurately represent data as we “zoom out” on that data? Consider a large scale map where a highway runs very close to a parallel railroad. As we zoom out on the map, the road and railroad would normally merge into a single line. However, if the desired map product specified that these two features must remain distinct, a cartographic displacement must occur as smaller scale products are produced–this is the process of generalization.

Consider the point cloud “zoom out” problem. To represent features in a point cloud, the data must be at least twice as dense as the smallest desired feature (this is the Nyquist criteria to which I have referred in previous articles). For example, if you want to be able to measure a 4 cm diameter walk button at a crosswalk, the LIDAR data point spacing must be 2 cm or closer. As one zooms out on a view, more and more data must be accessed to render the view. For example, if the view volume contains an area 5 meters high, 20 meters wide and 10 meters deep, the view volume encompassed is 1,000 m3! At a point density of 1 point per 8 cm3 (one point per cube measuring 2 cm on a side), we have 125,000 points per cubic meter. Thus we would need to access 125 million points to render the scene of our view volume. For this viewing to scale, the data must be decimated. We routinely do this in image processing by low pass filtering the data and then decimating. This is a natural way of decimating images since this is what happens in an optical system such as our eyes.

However, we have adopted a system of tagging points within a point cloud, anointing each point with “intelligent” content. For example, a single point may be tagged as a “Vertical Obstruction” such as the top of a radio tower. If we were to simply treat the point data as we do an image, low pass filtering and decimating, we most likely would lose this vertical obstruction point. In most uses of the data, this would be completely unacceptable. Enter Cartographic Generalization (or a similar technique that we apply to point clouds).

A search of the literature will reveal very little on the topic of point cloud generalization. Thus we seem to be at the forefront of this technology that will prove critical as point clouds begin to replace other data sources for base intelligence sources.

There has been some initial work on a very specific case of generalization– terrain thinning. Consider a ground model comprising point cloud data with a nominal point spacing (NPS) of 50 cm. This would yield an average of 8 triangles per m2 when the point cloud of the ground surface were rendered as a triangulated irregular network (TIN)–see Figure 2.

Most CAD software would be overwhelmed if it were faced with loading a TIN of such high density. For this reason, thinning algorithms are included with most LIDAR processing software. This thinning operation for ground data is typically called a “Model Key Point” algorithm. In fact, the LAS point cloud specification includes a special bit flag to mark points as “MKP” points.

This thinning method functions as a simple error bracketing algorithm. The user specifies the maximum vertical error that is permissible for the model (treating the original data as “truth”). The original data are modeled as a TIN and points are iteratively removed. As each point is removed, the modified TIN is generated and compared (vertically) to the original TIN. If the error is below the user specified threshold, the point is left out of the model and the next point is removed. If, on the other hand, the model deviates from “truth” by more than the user specified vertical limit, the test point is restored to the model.

This process repeats until no further points can be removed from the model without violating the accuracy criteria. The remaining points are flagged as Model Key Points. It should be intuitively obvious that very few MKPs will be required in flat terrain whereas a fairly dense collection of MKPs will be required where the terrain sharply changes. This is illustrated in Figure 3 where the nodes of the TIN represent the model key points and all points (orange points and TIN nodes) represent the ground.

So now the question is “how do we extend this MKP idea to other features?” Can we flag important points (literally) in the data set and preserve these point features as we generalize the data? Obviously this requires a new class of algorithm. Much can be learned from the old DMA generalization efforts as well as other developments in these areas.

This is a very important preamble to the idea of JIT product generation from LIDAR data. If we are to store LIDAR data at full resolution and then generate ad hoc products (using server-side processing, for example), we will have to develop intelligent (and fast!) algorithms that can perform the requisite processing. We will pick up on this theme in the next edition of Random Points.