We are hearing a lot about “deep learning” these days, with even Amazon Web Services offering hosted deep learning networks for rent. Recently Pix4D announced a prerelease of a deep learning classifier for point cloud data. I thought it might be fun to review deep learning and see what all this hype is about.

Editor’s note: A 530Kb PDF of this article as it appeared in the magazine is available **HERE**

Way back in 1957, a new binary classifier algorithm called the Perceptron was invented by Dr. Frank Rosenblatt at Cornell’s Aeronautical Laboratory. A binary classifier divides input into one of two categories. For example, you might have a bunch of geometric shapes feeding a binary classifier that decides if each shape most closely resembles a rectangle or a circle. The novel thing about the perceptron was that it was not a preprogrammed analytic algorithm with intrinsic knowledge of the characteristics of a rectangle or a circle. Rather it was a generic “filter” that “learned” the classes by being fed known examples and having a set of weights adjusted to move the network toward the correct response. This is shown in Figure 1.

In this figure, you can imagine the inputs as being cells on a grid that feed in a 1 if the shape intersects the cell and a minus one otherwise. Each of these individual inputs (for example, if our sampling grid were 16 x 16 cells, we would have 256 individual inputs) are conditioned at the initial input layer. For our example, the conditioning is to make the input 1 for a covered pixel and -1 for an uncovered one. Each input is then multiplied by an adjustable weight and fed to a summer. Finally, the output of the summer is fed to a discriminator that outputs one of two values (say 1 or 0 which represent circle or rectangle respectively). The discriminator might be a simple threshold that says if the output of the summer exceeds 18.7, output a 1, otherwise output a 0.

A training set is presented to the network and the error is fed back into the system to adjust the weights. For example, if we feed the system a circle and output a zero, we have an error since we said a circle is represented by an output of 1. We feed numerous examples to the system and tweak the weights based on whether the output is correct or not. We test the efficacy of our training by feeding the system shapes which have not been used in the training process and see how it does. We typically express the success rate as a fraction or percentage. Thus a score of 97.5% means that our system is correctly classifying all but 2.5% of the test samples.

So what is the big deal about this? Well, if I had made a coded set of algorithms to detect a circle or a rectangle, I might do something such as code up an edge detector, look at curvature of edges, number of sides and so forth. It would be very “hard coded” to detect circles and rectangles. If I wanted to switch to detecting circles and triangles, I would have to go in and rewrite my core logic. Not so with the perceptron. I just reset the weights and train with the new set of training data. Thus as long as I have sufficient training samples, my classifier is trainable. This was a very novel concept for its time, becoming one of the pillars of the birth of computer implemented artificial intelligence (AI).

Unfortunately, the perceptron is binary and thus can only work with inputs that cleanly segregate into two classes. In 1969 Marvin Minsky (and Seymour Papert) pointed out that the perceptron could not solve a simple XOR classification. Minsky went on to say (without proof) that while this could potentially be solved by adding more layers to the network, it would not be possible to train. Since Minsky was such a force to be reckoned with in AI at the time, the perceptron and its variants were dead in mainstream AI research.

During this same period of time, adjusting parameters in a systematic fashion was being explored by electrical engineers in control systems design. The general technique is called back-propagation where the error of the output is fed in reverse order through the system using an adjustment algorithm called gradient descent. In 1974, Dr. Paul Werbos applied back propagation to multilayer perceptron’s and the artificial neural network (ANN) was born (see Figure 2). ANNs were very popular in research circles in the late 1980’s but the compute power for solving the weights for a large network made them impractical for real world problems.

Perhaps 5 years ago, ANN once again was taken out of the closet, dusted off and programmed on new, low cost parallel processors such as Nvidia GPUs. Suddenly, programming the weights of large ANNs via backpropagation looked doable at a reasonable cost. Rapid advances were made in specific problem spaces, particularly natural language parsing. The expression Deep Learning that we now so often hear refers to ANNs with one or more hidden layers of neurons (hence, the deep part). The great thing about ANN today is that you really do not have to program anything you can just use ready-made application programmer interfaces from a number of providers (including the aforementioned hosted system in AWS).

In next month’s edition of Random Points, we’ll explore the value of ANN and examine the sorts of problems to which these algorithm might be applicable. In the meantime, if you want to do experimentation on your own, I highly recommend the book “Make Your Own Neural Network” by Tariq Rashid available as a Kindle book for $3.98. In the meantime, keep those neurons firing!

*Recommended reading:* “Convoluted Thinking” – Neural Networks, Part II

*Lewis Graham is the President and CTO of GeoCue Corporation. GeoCue is North America’s largest supplier of LIDAR production and workflow tools and consulting services for airborne and mobile laser scanning. *