Editor's note: When I recently saw the new iMac with the iSight built in, it reminded me of a project we've been working on. In a nutshell, Matthew Russell and I have been talking about using the iSight to take and classify images, such as those of a user sitting at the iMac, so it knows who's using it. (Face-sensing engines have been in the news lately.) Aside from being a cool hack, this possibly could used be in addition to your user password for authentication.
It became clear as we worked on this project that an introduction to the technology behind this operation was in order. So as we chip away at the actual tutorial for enabling the iSight, Matthew has put together this introduction to artificial intelligence. Call it weekend reading. I hope you enjoy it.
Artificial intelligence (AI) is gradually weaving itself into virtually ever aspect of our daily lives. It already turns up in places as diverse as the transmission systems of cars, vacuum cleaners, computer games, hospital equipment, Google Labs, and postal equipment. The richest man in the world is rumored to have AI systems integrated into his home that even adjust room temperatures, lighting, background music, etc., depending on who walks into the room. In this installment, we'll investigate artificial neural networks, a powerful AI learning technique that can be used to accomplish some of these interesting things.
A conceptual representation of a synapse (from Wikipedia). If enough neurotransmitters are released into the synapse, adjoining neurons may fire an electrical impulse that continues the signal transmission.In the blockbuster hit Terminator 2: Judgment Day, a Terminator robot divulges the inner workings of its intellect when it says, "My CPU is a neural net processor--a learning computer." Although the jargon "neural network" sounds intimidating, it's actually pretty simple. From its etymology, we already know that "neural" refers to an association with the nervous system, and a "network" is a connected group or system. Thus, neural networks are just a learning model based on the way that neurons (nerve cells) store information. For the purpose of employing neural networks to solve problems, it's helpful to quickly review the basics of how biological neurons work.
Neural networks found in living organisms are highly interconnected but physically separated from one another by a tiny gap called a synapse. As an electrical impulse travels through a neuron, it eventually reaches the physical end of the cell and may cause a chemical signal to be released into the synapse. Adjoining neurons, sometimes as many as 10,000 or so, detect the chemical signal and may transform it back into an electrical impulse if a certain chemical threshold is reached. This process repeats over and over--eventually reaching out to such things as muscle fibers and enables us to walk, run, swim, etc. These signals travel incredibly fast and massively in parallel. Although biologists would gasp at this gross oversimplification, it provides sufficient background to develop a neural network learning model. Let's get on with the geekery.
Neural networks are usually categorized under a specific division of artificial intelligence known as machine learning. While general AI is interested in systems that can demonstrate a perceived intelligence, machine learning narrows in on learning models that primarily use statistical analysis to accomplish the learning process. In the case of neural networks, we provide the network with sets of input and output values for "training" purposes, and then expect the network to perform well on previously unseen data sets. This isn't all that different than the way we'd expect our own brains to learn.
For each of the input values, the network outputs a value that is compared to the expected training output value, and an error value is computed based on the difference. Internal network parameters are adjusted by an amount proportional to the error and the network's learning rate, so that the network will perform with higher accuracy the next time it classifies the example. This process is repeated many times for each of the distinct training values. Once a network has learned a training set with sufficient accuracy, it can often classify "new" data sets surprisingly well.
Given some biological inspiration and some object-oriented programming, let's model a perceptron--the basic object in a feed-forward neural network that is analogous to a neuron in a real biological model. A good starting point is to represent a simple Boolean logic function like the AND function. Recall that the binary AND function's output is activated if and only if both inputs are activated.
From our uber-simplified model of a neuron, we know that each neuron in our model has multiple input units and a single output unit, so we'll model a perceptron the same way. We'll provide it with two binary input values and a single binary output value. An additional bias value is used to stabilize the perceptron's learning and is equal to a constant value of 1.0. The inputs are each multiplied by a corresponding weighting, summed up, and then passed through a step function that produces only 0 or 1 for output values, since we want a strictly binary output. For values less than 1.0, the step function outputs a 0; otherwise, it outputs a 1. Since the AND function has linearly separable output when put on a grid, it can be represented using a single perceptron.

On the left: A perceptron. It accepts two input values and produces a single binary output value. Each input value is multiplied by a corresponding weighting and added together to form a weighted sum. The weighted sum is input into the step function to produce a final output.
On the right: The AND function has output that is linearly separable.
|
Related Reading Developing Bioinformatics Computer Skills |
|
Let's look at the algorithm neural networks use to learn, and then run through an example using a Java implementation. The following presents the basic idea:
You can download a flexible and easy to use neural network written in Java and released under the GNU GPL, complete with an accompanying test harness, here. Follow these steps to get it running:
CLASSPATH.
export CLASSPATH=$CLASSPATH:/path/to/file/nnet.jar.javac testHarness.java.java testHarness 100 0.05 (100 training epochs and a learning rate of 0.05) to run it.testHarness.java source code.As you experiment with the AND and OR examples, you'll notice that lower learning rates require more training examples and vice versa. As you try to approximate more advanced functions with partial data sets, you generally want to keep the learning rate as low as possible--low enough to do the job, but no higher than it needs to be. This allows your network to more easily adapt to future training examples that may be unknown at the present time.
The XOR function does not have linearly separable output.Let's take a look at a more complex function, the exclusive-or (XOR) function. This function produces an output if and only if exactly one of its two input values is equal to 1.0. If you try to use the same tactics that you used with the AND and OR functions, you'll frustrate yourself and eventually become very irritated. To approximate this nonlinear function (and avoid irritation), we'll need to add an additional layer to the network in between the input and output layers. This layer is often referred to as a "hidden" layer. To add this hidden layer to the network, make the following changes to testHarness.java:
int[] networkTopology = {3,1}; to int[] networkTopology = {3,2,1};.int[] thresholdTopology = {TANH,STEP}; to int[] thresholdTopology = {TANH,TANH,TANH};.javac testHarness.java.java testHarness 1000 0.06 to run it and note effect of using TANH as the output layerAlthough the neural net package is flexible enough to accommodate more than one hidden layer and various thresholding functions, these parameters work fine. Feel free to try comparing the sigmoid and hyperbolic tangent functions if you want a better feel for how they work. Be advised that expected output values can vary considerably between the two.

To represent the XOR function, the network needs a hidden layer.
|
Before we discuss a more involved example that involves applying pattern recognition to images, lets look at one more example that really showcases the function approximation capability of neural networks. Consider the problem where we want to provide a network with values ranging from 0 to 7, and want it to output the same value it received as input. For example, if we give it a 2 as input, we want a 2 back out as output.
To model such a network, we'll have an input layer with eight input units (not including the bias) that can denote the full range of input values and give it eight output units to represent the full range of output values. So if we wanted to input a 2 to the network, we'd supply the input set {0, 0, 1, 0, 0, 0, 0, 0} and expect the same output set. At this point, we need to determine how many hidden layers to introduce and how many units have to be in each layer. Although we could use an arbitrarily large number of layers and units, we want to keep these values minimal in order to make the network efficient. Take a moment to ponder the minimal number of layers and units in each layer the network could use. Hint: think in terms of binary representation.
If you realized that you can represent eight distinct values with only three bits, it might have occurred to you that a neural network should be able to do the same. This problem is a classic one for neural networks and is called the "8-3-8 problem." Take a moment to try completing this task using various network topologies and thresholding functions. The training data is included in the test harness.

The minimal topology for the classic 8-3-8 problem. A single hidden layer with three units is the minimal needed to represent the range of inputs, since the range of 0-7 can be represented with three bits. (To clarify, the bias unit is not shown.)
One particular configuration that works uses the following configuration for topology, thresholding functions, and network configuration:
int[] networkTopology = {9,3,8}; //remember extra bias unit in input layer
int[] thresholdTopology = {TANH,TANH,STEP};
//...
network myNet = new network(networkTopology, thresholdTopology, eta, 2.0);
The network constructor uses a parameter value of 2.0 to provide what is known as "momentum" to the initial random weights. Recall from earlier that the previous functions set weights to random values between +/- 0.05. A momentum of 2.0 simply extends this initial range to +/-0.10. The net effect is that it increases the initial spread of the weights and allows weightings to eventually settle on a wider range of values. Since the 8-3-8 problem is somewhat difficult, adding this momentum during initialization is important. Once you've made these changes, recompile and type java testHarness 7000 0.12 and you should get the expected results. Convince yourself that this function is fairly difficult to learn by adjusting the number of epochs and learning rate by small amounts and noticing the different output.
|
Now that we've covered the fundamentals and worked through some of the basics, let's consider a problem with widespread applicability: image classification. This is a problem that the postal service faces every day when routing mail, and neural networks excel at these types of tasks. Given that a zip code contains five (or 9) numbers, the problem becomes a matter of producing a digital output value corresponding to a series of symbols on an image.
To show how neural networks can be used to automate mail processing, let's work through the subproblem of identifying individual numeric codes. We'll train the network with some samples of several different fonts that correspond to handwriting samples, and then test the network with a totally different set of data to see how it performs. After all, for AI to be useful, it must be able to perform well even when faced with new information.
In order to supply and train the network with input values, we'll scale a series of images to a reasonable size and provide each of the pixels in the image to the network as a discrete input value. A very simple file format to use for this task is the PNM format. Starting out with files of other formats, we can use the mogrify command to convert to a PNM file very simply. Type which mogrify in Terminal to determine if you already have mogrify installed on your machine (or try to find it using Spotlight). If you don't get a path back to the command, you'll need to install it with Fink or directly from imagemagick's website.
To use mogrify to produce a 30 by 30 text-based file with 16 colors from a binary JPG file, for example, just type mogrify -format pnm -geometry 30x30 -colors 16 +compress image.jpg. Type mogrify -help for a listing of all of the options. Once this command completes, you can view the contents of the resulting PNM file in a text editor and should see a file with the following or similar PNM format.
P2 width height maxcolorvalue color color color ... ... color color color ...
If you want to view the resulting PNM file as an image and don't already have a capable viewer, the GIMP (also available through Fink) will happily display it for you if you open it via the GIMP's File -> Open command.
Try starting out with a 30 by 30 image format with 16 colors, and expand from there. You can modify testHarness.java to read the file format and train the network without much effort. Here are three different sets of numeric codes in JPG format that you can mogrify and use. Try using two of the sets for training purposes, and once the network performs well, try testing it with the third set. The number of input units you use is determined by the image dimensions, so for a 30 by 30 image, that's 900 input units (not including the bias). You'll want to use ten output units in an output scheme like that of the 8-3-8 problem. The number of hidden layers and units can vary. You should theoretically be able to use one hidden layer with as few as four units, but go ahead and experiment with other configurations. Also, try varying the learning rate and number of training epochs. For 900 units and a low learning rate, tens of thousands of training epochs is not unreasonable.
I'm going to continue to play with the iSight to learn how we can use it to capture and classify images. In the meantime, check out ALVINN (Autonomous Land Vehicle In a Neural Network)--a project that used neural networks to drive unmanned land vehicles. If you're really hungry, Generation5 also has some good reading on neural networks.
Matthew Russell is a computer scientist from middle Tennessee; and serves Digital Reasoning Systems as the Director of Advanced Technology. Hacking and writing are two activities essential to his renaissance man regimen.
Return to the Mac DevCenter
Copyright © 2009 O'Reilly Media, Inc.