Random Points: What I Mean Is….

A 909Kb PDF of this article as it appeared in the magazine complete with images is available by clicking HERE

So the question this month relates to the so-called "Z Bump." When processing point clouds from both LIDAR systems and photogrammetric dense image matching (e.g. Structure from Motion, SfM), we are often faced with the question of vertical (and horizontal, for that matter) bias. It shows up as the mean of the residuals but is it truly a bias (a form of systematic error)? If so, can we safely remove it? I am going to outline the problem this month but I am not giving a definitive answer; I am still thinking about that aspect of the problem!

A quick illustration of how this problem appears in analytic software is shown in Figure 1. Here we see (in GeoCue’s LP360 point cloud software) a check point (red symbol in the profile view) that appears above the point cloud. In other words, the point cloud appears depressed. The question is, can we shift the point cloud up by a constant amount to reduce or eliminate this shift?

Let’s meander around and give some thought to this very common processing problem. Consider the targets shown in Figure 2 (from the book I cannot over recommend: "An Introduction to Error Analysis" by John R. Taylor). As pointed out by Taylor, the misleading thing about figures like this is that we can see whether or not we are achieving accuracy by virtue of the target rings. In the process of assessing point clouds, we do not have these rings. In other words, we do not know what "truth" is.

Our "targets" tend to be image identifiable ground check points that we survey in to measure residuals. Thus we have a good idea of accuracy at these check points but nowhere in between. For those from the signal processing community, you will immediately recognize that when we assess point cloud accuracy, we are very seriously violating the Nyquist sampling criteria!

I call the measure of how well a point cloud fits the true object space conformance. As far as I know, very little has been published to date on conformance or how it can be measured. More on that in a future article. A ubiquitous problem in the mapping industry is that we tend to misapply Gaussian ("Normal") statistics. One of the most important theorems in all of statistics is the Central Limit Theorem (CLT). This theorem basically states that metrics (such as the mean) from ensembles of independent random samples tend toward a Normal distribution regardless of the true distribution from which the samples are drawn. The CLT is often misapplied when analyzing error.

When we test the vertical accuracy of a point cloud, we do so by measuring the vertical distance from an independent check point to the point cloud. We call this measurement a residual. If we had perfect agreement, the residual would be zero. The residual comprises both a systematic shift (bias) and a random deviation. We correctly express the accuracy of a point cloud with respect to these independent check points by a metric called the Root Mean Square Error (RMSE). This is given by:

See PDF

We do this square root of the residuals squared trick because we do not want negative residuals canceling positive ones. There is an incredibly important but sometimes neglected relationship between bias, mean error and variance that needs to be considered when thinking about error analysis:

See PDF

This says that, across all residuals we have measured, the Mean Squared Error is equal to the sum of the bias and the variance of the residuals. This is all fine and perfectly accurate. Where the problem comes in is the next step it is often assumed that the distribution of the residuals is Normal. We have no reason to believe this; the Central Limit Theorem does not apply because we are not repeatedly measuring ensembles of the same thing!

Still, we are OK because the definition of variance (and its square root, the standard deviation) has nothing to do with the parent distribution of the data. It is only when we start to try to bracket errors based on multiples of the standard deviation that we get in to trouble – for example, saying that 95% of the data lie within 2 standard deviations of the mean. This is not true, in general, when speaking of residuals!!

So finally, the rub. We want to remove the systematic error in our data (the point cloud) by adding (or subtracting) the mean of the residuals. But is this a valid operation? What is our confidence that the mean of the residuals truly represents the systematic error (bias) in our data?

Let me be a bit more concrete. Suppose we measure a bunch of check points and find a vertical mean error of 8.5 cm and a RMSE of 16.0 cm. This tells us that the standard deviation of the set of samples is 13.6 cm (based on the formula above). Are we justified in removing the 8.5 cm of "bias" and recomputing the RMSE (which, if you are following all of this, would reduce to 13.6 cm)?

The answer is probably no. You can address this with a simple thought experiment. Imagine you start with a single check point and then keep adding check points. Suppose with one point you have a residual of 10 cm (with a single point, the standard deviation is undefined). Obviously you cannot remove this since you don’t have any idea how much is bias (mean) and how much is noise. Suppose you add a second point with residual 10 cm. Now you start thinking, "wow, I have perfect data with a 10 cm bias!). But what if the residual is 5 cm? Do you then think "Oh, I have 7.5 cm of bias with some superimposed noise." You can see how this goes. You are forming qualitative ideas about the uncertainty of the mean but not quantitative. Most folks at this point latch on to the normal distribution and make assumptions about the randomness of the data based on a multiple of the standard deviation. This is generally wrong since, as we have already discovered, the residuals do not necessarily follow a Normal distribution.

So here I have left you with a problem. Fortunately, I have run out of room in this column to give the answer. Seriously though, we will continue this in a future column. In the meantime, not all is normal!

Lewis Graham is the President and CTO of GeoCue Corporation. GeoCue is North America’s largest supplier of LIDAR production and workflow tools and consulting services for airborne and mobile laser scanning.

A 909Kb PDF of this article as it appeared in the magazine complete with images is available by clicking HERE