**An Introduction to Artificial Intelligence**

Pages: 1, 2, **3**, 4

### A More Involved Example

Before we discuss a more involved example that involves applying pattern recognition to images, lets look at one more example that really showcases the function approximation capability of neural networks. Consider the problem where we want to provide a network with values ranging from 0 to 7, and want it to output the same value it received as input. For example, if we give it a `2`

as input, we want a `2`

back out as output.

To model such a network, we'll have an input layer with eight input units (not including the bias) that can denote the full range of input values and give it eight output units to represent the full range of output values. So if we wanted to input a `2`

to the network, we'd supply the input set {0, 0, 1, 0, 0, 0, 0, 0} and expect the same output set. At this point, we need to determine how many hidden layers to introduce and how many units have to be in each layer. Although we could use an arbitrarily large number of layers and units, we want to keep these values minimal in order to make the network efficient. Take a moment to ponder the minimal number of layers and units in each layer the network could use. Hint: think in terms of binary representation.

If you realized that you can represent eight distinct values with only three bits, it might have occurred to you that a neural network should be able to do the same. This problem is a classic one for neural networks and is called the "8-3-8 problem." Take a moment to try completing this task using various network topologies and thresholding functions. The training data is included in the test harness.

*The minimal topology for the classic 8-3-8 problem. A single hidden layer with three units is the minimal needed to represent the range of inputs, since the range of 0-7 can be represented with three bits. (To clarify, the bias unit is not shown.)*

One particular configuration that works uses the following configuration for topology, thresholding functions, and network configuration:

```
int[] networkTopology = {9,3,8}; //remember extra bias unit in input layer
int[] thresholdTopology = {TANH,TANH,STEP};
//...
network myNet = new network(networkTopology, thresholdTopology, eta, 2.0);
```

The `network`

constructor uses a parameter value of 2.0 to provide what is known as "momentum" to the initial random weights. Recall from earlier that the previous functions set weights to random values between +/- 0.05. A momentum of 2.0 simply extends this initial range to +/-0.10. The net effect is that it increases the initial spread of the weights and allows weightings to eventually settle on a wider range of values. Since the 8-3-8 problem is somewhat difficult, adding this momentum during initialization is important. Once you've made these changes, recompile and type `java testHarness 7000 0.12`

and you should get the expected results. Convince yourself that this function is fairly difficult to learn by adjusting the number of epochs and learning rate by small amounts and noticing the different output.